Introduction

The eqmr-hmdb3 program builds the initial database required to fit a Bayesian hierarchical model.

Options

Short Long Description
--xml With --print flag: output model parameters in xml (can then be reused)
--annot-update With --print flag: update the expected number of eqtls per annotation
-d --data The database (.db file)
-f --feature The feature database (.fdb file)
-c --cisdb The cis regression database (.fcis file)
-h --hmodb The HM database (use with --print or --xml-param)
-p --xml-param Update parameters of the HM database (use with -h)
-s --chrom The name of chromosomes to focus on (must be separated by a coma)

Configuration File

CisWindow

The <CisWindow> tag allows the user to define the size of the window from both feature ends.

• Status: mandatory
• Attributes:
• size: a positive integer providing the size of the window in bp unit.

The following example

 <CisWindow size="10000"/>


will tell to eqmr-hmdb3 to build a model with a window size of 10kb around both feature end.

FeatureTable

The <FeatureTable> tag allows the user to provide a list of feature to focus on

• Status: optional
• Attributes:
• file: the absolute path to the file providing the list of feature

The following example

 <FeatureTable file="/path/to/feature.txt"/>


will tell to eqmr-hmdb3 to build a model by considering only the features which names are listed in the first column of the file feature.txt

TranscriptWeight

The <TranscriptWeight> tag allows the user to provide a priori weights to each transcript of each feature providing that the feature is a least of transcript set.

• Status: optional
• Attributes:
• file: the absolute path to the file providing the weights

The following example

 <TranscriptWeight file="/path/to/weights.txt"/>


will tell to eqmr-hmdb3 to build a model by using the transcript weights from the file feature.txt. Note that otherwise all transcripts are assumed to be equally likely.

InitThreshold

The <InitThreshold> tag allows the user to provide an initial cut-off value for the Bayes Factors in order to compute the initial $\pi_0$ parameter of the model.

• Status: optional
• Attributes:
• value: a positive double value

The following example

 <InitThreshold value="1000"/>


will tell to eqmr-hmdb3 to consider only features with at least a variant with a Bayes Factor greater than 1000 as a positive feature to initialize the $\pi_0$ parameter.

GeneModel

The <GeneModel> tag allows the user to provide a logistic model for the Template:Math component depending on a user predefined feature related covariate table.

• Status: optional
• Attributes:
• model_file: the absolute path to the file detailing the model to fit

The following example

 <GeneModel model_file="/path/to/the/model.txt"/>


will tell to eqmr-hmdb3 to use a logistic model for the Template:Math component where the model is described in the file model.txt. This later file has to be organized as follows:

DataFrame /the/path/to/the/feature/data/frame
Formula <model>


where the last line is optional.

PriorModel

The <PriorModel> tag is used to provide to eqmr-hmdb3 the model to fit. The model can deal with different variant types (see <VarModel>) and it can include several compononents (see <PriorComponent>).

VarModel

VarType

name= type=snp,ins,del file=

PriorComponent

A <PriorComponent> tag will define a model for the probability of a variant to be a causal variant. Several components can be used within the model and each component will then have a weight that will be estimated directly from the data. In other words, when multiple components are used, the overall probability of a variant to be a causal is seen as a mixture of probabilites from distinct and exclusive components.

Examples

Let's assume we have performed the following steps:

$# Create the global database (genotypes and expression levels)$eqmr-db -c conf/eqmrDB.conf
$# Create the feature database (gene level)$eqmr-fdb -d eqmrDB.db -f gene -o eqmrGeneDB
$# Create the cis regression database (all chromosomes onto 8 CPUs per chromosome)$eqmr-fcrth -d eqmrDB.db -f eqmrGeneDB.fdb -o eqmrGeneCisDB -w 100000 -m BayesianRegConf.txt -n 8


and we want now to fit a hierarchical model with both a distance model and some additional annotations. We will then start by creating the configuration file that will detail the model we want to fit. Let's create the main configuration file, hm_config.xml as follows:

$cat hm_config.xml <CisModel> <CisWindow size="100000"/> <PriorModel> <VarModel> <VarType name="snp" type="*"/> </VarModel> <PriorComponent name="global" not_only_annot="yes"> <DistanceModel var_type="*" model_file="hm_snp_dist.txt" step_file="hm_step_model.txt"/> <AnnotationModel var_type="*" model_file="hm_annot_model.txt"/> </PriorComponent> </PriorModel> <EffectModels> <VarModel> <VarType name="snp" type="*"/> </VarModel> <EffectModel name="snp" component="*" var_type="snp"/> </EffectModels> </CisModel>  This configuration file indicates to eqtnminer that we want to build a hierarchical model with the following components: • a window size of 100kb from both feature end (here it will be the gene ends, so the TSSs and TESs of the gene) - note that the window size cannot be greater than the one used to run eqmr-fcr or eqmr-fcrth, • a single variant model focusing on all variants (here called snp by default), • a distance model which is discrete distance model (since the 'step_file' attribute is used), • an annotation model as defined in the file hm_annot_model.txt, • finally a single effect model for all variants of the model. Regarding the distance model, we want to consider both distance from the TSS and distance from the TES. So we will have the following $cat hm_snp_dist.txt
TSS FSS step
TES FES step


where as detailed previously the first column provides the name of the distance model, the second one the reference anchor (so FSS for feature start site, which will be the TSS since we are using gene features here) and the third column indicates that the model is a so called step model, i.e a discrete model where breakpoints are provided in the file hm_step_model.txt as indicated within the configuration file. This file can look like that

$cat hm_step_model.txt TSS -1 -100000,-50000,-10000,-5000,-1000,-500,0,500,1000,5000,10000,50000,100000 TES -1 -100000,-50000,-10000,-5000,-1000,-500,0,500,1000,5000,10000,50000,100000  where the first column stands for the distance model name, the second one is ignored but has to be included and the third one provides the "distance bin" breakpoints separated by a coma. TODO: annotation file with the different kind of annotation files that can be used and the different parametrization options. So, let's build now the database for our hierarchical model and save it into the file hm.db: $eqmr-hmdb3 -d eqmrDB.db -f eqmrGeneDB.fdb -r eqmrGeneCisDB.fcis -c hm_config.xml -o hm.db