The eqmr-dbget program allows to query a database created by the eqmr-db program. The result of a query is stored in a plain-text file. Note that, before doing the first query, an index must be created: this index is stored into a binary file and will speed any subsequent queries.
|--create-index||create an index for fast retrieval|
|--qnorm||Gaussian quantile normalize expression levels within each population|
|--allelic-dose||provide genotype data as allelic dose|
|-d||--data||a previously created database (binary file with the .db extension)|
|-n||--sample||a single sample name or a sample file to extract|
|-p||--pop||the name of a population|
|-s||--snp||a single SNP name or a SNP file to extract|
|-g||--gene||a single gene name or a gene name file to extract|
|-p||--probe||a single probe name or a probe name file to extract|
|-i||--index||the index (created via --create-index)|
|-o||--output||the output stem|
For each kind of features that can be queried (snp, sample, gene and probe), the program creates a file suffixed _<feature>.txt. For example, if we use the option --snp, the program creates a file output_snp.txt where output corresponds to the value of the --output option. These files are described below.
- With the --snp option, the first five columns of the output file are the name of the chromosome on which the given SNP resides, the name of that SNP, the position of that SNP, the "0" allele of that SNP, and the "1" allele of that SNP. Then follow the genotypes for that SNP in the same order than they are stored in the database, except if the --sample option is used. In this case, only the genotypes of the given samples will be displayed, and the order of the genotypes will match the order provided via --sample.
- With the --sample option, the output file contains the sample information for the subset of samples given in the command line.
- With the --gene option, the output file details the structure of each gene. Each record starts by ’>’ followed by the name of the gene, its strand, its minimal starting position, its maximal ending position, its number of transcripts, its number of distinct exons, its number of probes, and finally its number of distinct probe x transcript set. After such a line come:
- the list of the distinct exons. Each exon is printed on a new line, starting with the uppercase keyword ’EXON’, followed by the index of the exon within that gene, its starting position, its ending position and the list of transcript indexes (separated by a comma ’,’) to which this exon belongs.
- the list of the distinct transcripts. Each transcript is printed in a new line starting with the uppercase keyword ’TRANSCRIPT’, followed by the index of the transcript within that gene, the name of the transcript, its starting site position, its ending site position, its CDS start position, its CDS end position, its number of exons and the list of the corresponding exon indexes (separated by a comma ’,’) that define this transcript.
- the list of the distinct probes that fall into that gene. Each probe is printed in a new line starting with the uppercase keyword ’PROBE’, followed by the index of the probe within that gene, the name of the probe, its (mapping) strand, the number of exons the probe is spanning (1 if within an exon or 2 if the probe lies on an exon-exon boundary), its starting positions separated by a coma, its ending positions separated by a coma, and the index of the exons the probe is targeting. Then come the expression levels of the probe in the same order than they are stored in the database, except if the --sample option is used. In this case, only the expression levels of the given samples will be displayed, and the order of the expression levels will match the order provided via --sample.
- With the --probe option, the output file provides the details on the given probes. The format is the same than above for the probe record, except that the exon indexes are not provided here.
We can access a short help with the following command:
$ eqmr-dbget --help eqmr-dbget - version 2.1 Copyright (C) 2008,2009 Jean-Baptiste Veyrieras (University of Chicago) eqmr-dbget comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. --help Display a brief help on program usage --verbose Output message on standard output to see what the program is doing --create-index Create an index for fast retrieval --qnorm Gaussian quantile normalize expression levels within each population --allelic-dose Provides genotype data as allelic dose --data or -d The input database file --sample or -n A single sample name or a sample file to extract --pop or -p The population name --snp or -s A single SNP name or a SNP file to extract --gene or -g A single gene name or a gene name file to extract --probe or -p A single probe name or a probe name file to extract --index or -i The fast index (created via --create-index) --output or -i The output stem
Before querying the database, we need to build the index:
$ eqmr-dbget -d eqmr.db --create-index -o eqmr.dbx --verbose
Then, for instance, we may want to get all the information on a given gene (output in file myquery_gene.txt):
$ eqmr-dbget -d eqmr.db -i eqmr.dbx -g ENSG00000212875 -o myquery
Alternatively, we may want information on 3 particular SNPs (output in file myquery_snp.txt):
$ cat subset_of_SNPs.txt rs17160620 rs2905037 rs12124819 $ eqmr-dbget -d eqmr.db -i eqmr.dbx -s subset_of_SNPs.txt -o myquery
And of course, we can combine several options:
$ eqmr-dbget -d eqmr.db -i eqmr.dbx -g ENSG00000212875 -s subset_of_SNPs.txt -o myquery