From Eqtnminer

Jump to: navigation, search



The eqmr-hmdb3 program builds the initial database required to fit a Bayesian hierarchical model.


Short Long Description
--xml With --print flag: output model parameters in xml (can then be reused)
--annot-update With --print flag: update the expected number of eqtls per annotation
-d --data The database (.db file)
-f --feature The feature database (.fdb file)
-c --cisdb The cis regression database (.fcis file)
-h --hmodb The HM database (use with --print or --xml-param)
-p --xml-param Update parameters of the HM database (use with -h)
-s --chrom The name of chromosomes to focus on (must be separated by a coma)

Configuration File


The <CisWindow> tag allows the user to define the size of the window from both feature ends.

The following example

 <CisWindow size="10000"/>

will tell to eqmr-hmdb3 to build a model with a window size of 10kb around both feature end.


The <FeatureTable> tag allows the user to provide a list of feature to focus on

The following example

 <FeatureTable file="/path/to/feature.txt"/>

will tell to eqmr-hmdb3 to build a model by considering only the features which names are listed in the first column of the file feature.txt


The <TranscriptWeight> tag allows the user to provide a priori weights to each transcript of each feature providing that the feature is a least of transcript set.

The following example

 <TranscriptWeight file="/path/to/weights.txt"/>

will tell to eqmr-hmdb3 to build a model by using the transcript weights from the file feature.txt. Note that otherwise all transcripts are assumed to be equally likely.


The <InitThreshold> tag allows the user to provide an initial cut-off value for the Bayes Factors in order to compute the initial $\pi_0$ parameter of the model.

The following example

 <InitThreshold value="1000"/>

will tell to eqmr-hmdb3 to consider only features with at least a variant with a Bayes Factor greater than 1000 as a positive feature to initialize the $\pi_0$ parameter.


The <GeneModel> tag allows the user to provide a logistic model for the Template:Math component depending on a user predefined feature related covariate table.

The following example

 <GeneModel model_file="/path/to/the/model.txt"/>

will tell to eqmr-hmdb3 to use a logistic model for the Template:Math component where the model is described in the file model.txt. This later file has to be organized as follows:

DataFrame /the/path/to/the/feature/data/frame
Formula <model>
LinkTable /the/path/to/the/link/table

where the last line is optional.


The <PriorModel> tag is used to provide to eqmr-hmdb3 the model to fit. The model can deal with different variant types (see <VarModel>) and it can include several compononents (see <PriorComponent>).



name= type=snp,ins,del file=


A <PriorComponent> tag will define a model for the probability of a variant to be a causal variant. Several components can be used within the model and each component will then have a weight that will be estimated directly from the data. In other words, when multiple components are used, the overall probability of a variant to be a causal is seen as a mixture of probabilites from distinct and exclusive components.



Let's assume we have performed the following steps:

$# Create the global database (genotypes and expression levels)
$eqmr-db -c conf/eqmrDB.conf 
$# Create the feature database (gene level)
$eqmr-fdb -d eqmrDB.db -f gene -o eqmrGeneDB
$# Create the cis regression database (all chromosomes onto 8 CPUs per chromosome)
$eqmr-fcrth -d eqmrDB.db -f eqmrGeneDB.fdb -o eqmrGeneCisDB -w 100000 -m BayesianRegConf.txt -n 8 

and we want now to fit a hierarchical model with both a distance model and some additional annotations. We will then start by creating the configuration file that will detail the model we want to fit. Let's create the main configuration file, hm_config.xml as follows:

$cat hm_config.xml
   <CisWindow size="100000"/>
	 <VarType name="snp" type="*"/>
      <PriorComponent name="global" not_only_annot="yes">
	 <DistanceModel var_type="*" model_file="hm_snp_dist.txt" step_file="hm_step_model.txt"/>
         <AnnotationModel var_type="*" model_file="hm_annot_model.txt"/>
	 <VarType name="snp" type="*"/>
      <EffectModel name="snp" component="*" var_type="snp"/>

This configuration file indicates to eqtnminer that we want to build a hierarchical model with the following components:

Regarding the distance model, we want to consider both distance from the TSS and distance from the TES. So we will have the following

$cat hm_snp_dist.txt
TSS FSS step
TES FES step

where as detailed previously the first column provides the name of the distance model, the second one the reference anchor (so FSS for feature start site, which will be the TSS since we are using gene features here) and the third column indicates that the model is a so called step model, i.e a discrete model where breakpoints are provided in the file hm_step_model.txt as indicated within the configuration file. This file can look like that

$cat hm_step_model.txt
TSS -1 -100000,-50000,-10000,-5000,-1000,-500,0,500,1000,5000,10000,50000,100000
TES -1 -100000,-50000,-10000,-5000,-1000,-500,0,500,1000,5000,10000,50000,100000

where the first column stands for the distance model name, the second one is ignored but has to be included and the third one provides the "distance bin" breakpoints separated by a coma.

TODO: annotation file with the different kind of annotation files that can be used and the different parametrization options.

So, let's build now the database for our hierarchical model and save it into the file hm.db:

$eqmr-hmdb3 -d eqmrDB.db -f eqmrGeneDB.fdb -r eqmrGeneCisDB.fcis -c hm_config.xml -o hm.db

See Also

Personal tools