<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Thanks this is helpful. We will look into making the resulting file an annotation file for GEMINI.
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Nov 10, 2015, at 2:06 AM, Will McLaren <<a href="mailto:wm2@ebi.ac.uk" class="">wm2@ebi.ac.uk</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">Hi Jessica,
<div class=""><br class="">
</div>
<div class="">Here's a way you can get the phenotype names in VEP without me writing any new code!</div>
<div class=""><br class="">
</div>
<div class="">1) Create a BED file from Ensembl's database of phenotype annotations. This will take a few minutes:</div>
<div class=""><br class="">
</div>
<div class="">mysql -<a href="http://hensembldb.ensembl.org/" class="">hensembldb.ensembl.org</a> -uanonymous -Dhomo_sapiens_variation_82_38 -e'select
<a href="http://sr.name/" class="">sr.name</a>, pf.seq_region_start-1,pf.seq_region_end, concat("\"", concat_ws(":", pf.type,
<a href="http://s.name/" class="">s.name</a>, pf.object_id, p.description), "\"") from seq_region sr, source s, phenotype p, phenotype_feature pf where pf.seq_region_id = sr.seq_region_id and pf.source_id = s.source_id and pf.phenotype_id = p.phenotype_id'
| grep -v concat_ws | sort -k1,1 -k2,2n -k3,3n | bgzip -c > phenotypes.bed.gz<br class="">
</div>
<div class=""><br class="">
</div>
<div class="">2) Index it with tabix:</div>
<div class=""><br class="">
</div>
<div class="">tabix -p bed phenotypes.bed.gz</div>
<div class=""><br class="">
</div>
<div class="">3) Use it as a custom annotation in VEP (see <a href="http://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html" class="">
http://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html</a>):</div>
<div class=""><br class="">
</div>
<div class="">echo "rs533747784" | perl <a href="http://variant_effect_predictor.pl/" class="">
variant_effect_predictor.pl</a> -force -cache -custom phenotypes.bed.gz,phenotypes,bed,overlap<br class="">
</div>
<div class=""><br class="">
</div>
<div class="">Then have fun parsing the output :-)</div>
<div class=""><br class="">
</div>
<div class="">You could change the MySQL query above to limit it to one source of data, or remove the structural variants for example (there are a lot of them and they tend to overlap a significant portion of the genome).</div>
<div class=""><br class="">
</div>
<div class="">Regards</div>
<div class=""><br class="">
</div>
<div class="">Will</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
</div>
<div class="gmail_extra"><br class="">
<div class="gmail_quote">On 9 November 2015 at 19:45, JESSICA X. CHONG <span dir="ltr" class="">
<<a href="mailto:jxchong@uw.edu" target="_blank" class="">jxchong@uw.edu</a>></span> wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word" class="">Could you maybe consider making it an option to output the phenotype names? At least for me, in most cases, if I’m dealing with a small number of samples/variants, it’s much more useful to be able to see the phenotype
names within the output file rather than have to see a “1” and then have to start searching each of the individual phenotype sources (OMIM, clinvar, DDG2P, etc etc etc) to figure out which database had a phenotype for the gene and maybe the phenotype might
not even be relevant.
<div class=""><br class="">
</div>
<div class="">The GEMINI developers were considering adding this as a feature (<a href="https://github.com/arq5x/gemini/issues/571" target="_blank" class="">https://github.com/arq5x/gemini/issues/571</a>) but obviously it’s much better to rely on VEP to do
this when VEP is so close to already having this implemented.</div>
<div class=""><br class="">
</div>
<div class="">Thanks again,</div>
<div class="">Jessica
<div class="">
<div class="h5"><br class="">
<div class=""><br class="">
</div>
<div class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Nov 9, 2015, at 9:41 AM, JESSICA X. CHONG <<a href="mailto:jxchong@uw.edu" target="_blank" class="">jxchong@uw.edu</a>> wrote:</div>
<br class="">
<div class=""><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">Hi
Will,</span>
<div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Nov 9, 2015, at 1:44 AM, Will McLaren <<a href="mailto:wm2@ebi.ac.uk" target="_blank" class="">wm2@ebi.ac.uk</a>> wrote:</div>
<br class="">
<div class="">
<div dir="ltr" class="">Hi Jessica,
<div class="">
<div class="gmail_extra"><br class="">
<div class="gmail_quote">On 9 November 2015 at 00:20, Jessica Chong<span class=""> </span><span dir="ltr" class=""><<a href="mailto:jxchong@uw.edu" target="_blank" class="">jxchong@uw.edu</a>></span><span class=""> </span>wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
I am trying to use the --gene_phenotype option in VEP v82 but I am having problems.<br class="">
<br class="">
1) if a variant is in a gene that is associated with a particular phenotype (e.g. CFTR and cystic fibrosis), where does this information get stored in the resulting annotated vcf? I don’t see a corresponding field name listed as a possible extras column output
on this page<span class=""> </span><a href="http://uswest.ensembl.org/info/docs/tools/vep/vep_formats.html#output" rel="noreferrer" target="_blank" class="">http://uswest.ensembl.org/info/docs/tools/vep/vep_formats.html#output</a></blockquote>
<div class=""><br class="">
</div>
<div class="">The phenotype name does not get stored in the output. Firstly, many phenotype names are long and contain odd characters that can break file format encoding. Secondly, many genes and/or variants have several phenotypes associated with them, so
to list all of them (and multiple times in the case where you are reporting e.g. the same gene multiple times) would cause the output file size to increase hugely.</div>
<div class=""><br class="">
</div>
<div class="">Example for CFTR: <a href="http://www.ensembl.org/Homo_sapiens/Gene/Phenotype?g=ENSG00000001626" target="_blank" class="">http://www.ensembl.org/Homo_sapiens/Gene/Phenotype?g=ENSG00000001626</a></div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Ok, I see. Thanks.</div>
<br class="">
<blockquote type="cite" class="">
<div dir="ltr" class="">
<div class="">
<div class="gmail_extra">
<div class="gmail_quote">
<div class=""> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br class="">
<br class="">
2) if a variant itself is associated with a particular “phenotype, disease, or trait” then my understanding from the VEP output documentation is that I should expect this information to show up under the PHENO field?<br class="">
</blockquote>
<div class=""><br class="">
</div>
<div class="">I don't think this is stated anywhere in the documentation. <a href="http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#output" target="_blank" class="">http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#output</a><span class=""> </span>shows
the output fields and their descriptions.</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">
<div class="">Ahhh, ok. Because GENE_PHENO isn’t listed in the VEP documentation as a possible output field, the closest I could find was “PHENO,” so I was just guessing.</div>
<div class=""><br class="">
</div>
</div>
<div class=""><br class="">
</div>
<br class="">
<blockquote type="cite" class="">
<div dir="ltr" class="">
<div class="">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
I tried annotating a tiny vcf that just includes a variant in CFTR (and the variant is dF508, so it is definitely pathogenic and CFTR should certainly be associated with a phenotype as well on the gene level) but PHENO is always blank (and I don’t see any field
mentioning cystic fibrosis as a phenotype/disease name).<br class="">
</blockquote>
<div class=""><br class="">
</div>
<div class="">Apologies, there seems to be an issue with the v82 caches in that they are missing the necessary information to populate the fields for --gene_phenotype. However, the field you want is GENE_PHENO, which should show a binary flag indicating whether
the gene is associated with a phenotype.</div>
<div class=""><br class="">
</div>
<div class="">I will look into why this information is missing.</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Thanks!</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="">
<div class="gmail_extra">
<div class="gmail_quote">
<div class=""><br class="">
</div>
<div class="">Regards</div>
<div class=""><br class="">
</div>
<div class="">Will McLaren</div>
<div class="">Ensembl Variation</div>
<div class=""> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br class="">
<br class="">
Here is what I ran:<br class="">
perl variant_effect_predictor/<a href="http://variant_effect_predictor.pl/" rel="noreferrer" target="_blank" class="">variant_effect_predictor.pl</a><span class=""> </span>\<br class="">
-i CFTR.VT.vcf \<br class="">
-o CFTR.VT.VEP.vcf \<br class="">
--vcf --offline --cache \<br class="">
--dir_cache variant_effect_predictor/cache/ \<br class="">
--species homo_sapiens --assembly GRCh37 \<br class="">
--fasta Homo_sapiens_assembly19.fasta \<br class="">
--fork 8 --force_overwrite \<br class="">
--compress 'gunzip -c' \<br class="">
--sift b --polyphen b --symbol --numbers --biotype \<br class="">
--total_length --canonical --ccds --hgvs --shift_hgvs 1 --gene_phenotype \<br class="">
--fields Consequence,Codons,Amino_acids,Gene,SYMBOL,Feature,EXON,PolyPhen,SIFT,Protein_position,BIOTYPE,CANONICAL,CCDS,HGVSc,HGVSp,PHENO<br class="">
<br class="">
<br class="">
The resulting vcf contains these lines:<br class="">
##VEP=v82 cache=/ensembl-tools/82/Linux/RHEL6/x86_64/variant_effect_predictor/cache//homo_sapiens/82_GRCh37 db=. polyphen=2.2.2 sift=sift5.2.2 COSMIC=71 ESP=20141103 gencode=GENCODE 19 HGMD-PUBLIC=20152 genebuild=2011-04 regbuild=13 assembly=GRCh37.p13 dbSNP=144
ClinVar=201507<br class="">
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Consequence|Codons|Amino_acids|Gene|SYMBOL|Feature|EXON|PolyPhen|SIFT|Protein_position|BIOTYPE|CANONICAL|CCDS|HGVSc|HGVSp|PHENO”><br class="">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1A3 sample1A4 sample1A5 sample1A6 sample1A7 sample1A8 sample1A3 sample1A4 sample1A5 sample1A6 sample1A7 sample1A8<br class="">
7 117199644 rs199826652 ATCT A 3996.41 PASS AC=4;AF=0.023;AN=24;BaseQRankSum=-0.941;ClippingRankSum=-0.033;DB;DP=7203;FS=2.556;InbreedingCoeff=-0.0257;MLEAC=5;MLEAF=0.023;MQ0=0;MQ=60.36;MQRankSum=0.129;QD=21.37;ReadPosRankSum=0.673;SOR=0.921;VQSLOD=1.51;culprit=SOR;CSQ=downstream_gene_variant|||ENSG00000232661|AC000111.3|ENST00000441019|||||antisense|YES||||,inframe_deletion|aTCTtt/att|IF/I|ENSG00000001626|CFTR|ENST00000426809|10/26|||477-478/1438|protein_coding|||ENST00000426809.1:c.1431_1433delCTT|ENSP00000389119.1:p.Phe478del|,inframe_deletion|aTCTtt/att|IF/I|ENSG00000001626|CFTR|ENST00000454343|10/26|||446-447/1419|protein_coding|||ENST00000454343.1:c.1338_1340delCTT|ENSP00000403677.1:p.Phe447del|,inframe_deletion|aTCTtt/att|IF/I|ENSG00000001626|CFTR|ENST00000003084|11/27|||507-508/1480|protein_coding|YES|CCDS5773.1|ENST00000003084.6:c.1521_1523delCTT|ENSP00000003084.6:p.Phe508del|,upstream_gene_variant|||ENSG00000001626|CFTR|ENST00000472848|||||processed_transcript|||||
GT:AD:DP:GQ:PL 0/0:4,0:4:9:0,9,135 0/0:10,0:10:24:0,24,360 0/0:3,0:3:5:0,5,86 0/1:10,12:22:99:441,0,474 0/1:9,18:27:99:729,0,424 0/0:23,0:23:63:0,63,945 0/0:22,0:22:66:0,66,714 0/0:33,0:33:84:0,84,1096 0/0:22,0:22:62:0,62,736
0/1:17,14:31:99:537,0,881 0/1:13,16:29:99:620,0,632 0/0:24,0:24:60:0,60,791<br class="">
<br class="">
<br class="">
Thanks!<br class="">
_______________________________________________<br class="">
Dev mailing list <span class=""> </span><a href="mailto:Dev@ensembl.org" target="_blank" class="">Dev@ensembl.org</a><br class="">
Posting guidelines and subscribe/unsubscribe info:<span class=""> </span><a href="http://lists.ensembl.org/mailman/listinfo/dev" rel="noreferrer" target="_blank" class="">http://lists.ensembl.org/mailman/listinfo/dev</a><br class="">
Ensembl Blog:<span class=""> </span><a href="http://www.ensembl.info/" rel="noreferrer" target="_blank" class="">http://www.ensembl.info/</a><br class="">
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
_______________________________________________<br class="">
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank" class="">Dev@ensembl.org</a><br class="">
Posting guidelines and subscribe/unsubscribe info:<span class=""> </span><a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank" class="">http://lists.ensembl.org/mailman/listinfo/dev</a><br class="">
Ensembl Blog:<span class=""> </span><a href="http://www.ensembl.info/" target="_blank" class="">http://www.ensembl.info/</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
<span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">_______________________________________________</span><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">
<span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">Dev
mailing list </span><a href="mailto:Dev@ensembl.org" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">Dev@ensembl.org</a><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">
<span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">Posting
guidelines and subscribe/unsubscribe info:<span class=""> </span></span><a href="http://lists.ensembl.org/mailman/listinfo/dev" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">http://lists.ensembl.org/mailman/listinfo/dev</a><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">
<span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">Ensembl
Blog:<span class=""> </span></span><a href="http://www.ensembl.info/" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">http://www.ensembl.info/</a></div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</div>
</div>
<br class="">
_______________________________________________<br class="">
Dev mailing list <a href="mailto:Dev@ensembl.org" class="">Dev@ensembl.org</a><br class="">
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" rel="noreferrer" target="_blank" class="">
http://lists.ensembl.org/mailman/listinfo/dev</a><br class="">
Ensembl Blog: <a href="http://www.ensembl.info/" rel="noreferrer" target="_blank" class="">
http://www.ensembl.info/</a><br class="">
<br class="">
</blockquote>
</div>
<br class="">
</div>
_______________________________________________<br class="">
Dev mailing list <a href="mailto:Dev@ensembl.org" class="">Dev@ensembl.org</a><br class="">
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" class="">
http://lists.ensembl.org/mailman/listinfo/dev</a><br class="">
Ensembl Blog: <a href="http://www.ensembl.info/" class="">http://www.ensembl.info/</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>