<div dir="ltr">Hi Nicolas,<div><br></div><div>Apologies for the delay in replying to this.</div><div><br></div><div>The <a href="http://gtf2vep.pl">gtf2vep.pl</a> parser at the moment is somewhat limited, and it has very strict expectations on the format of the data coming in.</div><div><br></div><div>One of these is that the exon and CDS lines appear in _transcript_ order (i.e. 5' -> 3'), even if this means the first exon listed for a transcript in a gene has a genomic position larger (more 3' on the genome) than the last.</div><div><br></div><div>If I change the order of your input to the following (the start_codon and stop_codon lines are actually ignored by the parser):</div><div><br></div><div><div>3<span class="" style="white-space:pre"> </span>protein_coding<span class="" style="white-space:pre"> </span>exon<span class="" style="white-space:pre"> </span>113077591<span class="" style="white-space:pre"> </span>113077946<span class="" style="white-space:pre"> </span>.<span class="" style="white-space:pre"> </span>-<span class="" style="white-space:pre"> </span>.<span class="" style="white-space:pre"> </span>gene_id toto; transcript_id toto_a; exon_number 1;</div><div>3<span class="" style="white-space:pre"> </span>protein_coding<span class="" style="white-space:pre"> </span>CDS<span class="" style="white-space:pre"> </span>113077591<span class="" style="white-space:pre"> </span>113077746<span class="" style="white-space:pre"> </span>.<span class="" style="white-space:pre"> </span>-<span class="" style="white-space:pre"> </span>0<span class="" style="white-space:pre"> </span>gene_id toto; transcript_id toto_a; product_id toto_a; exon_number 1;</div><div>3<span class="" style="white-space:pre"> </span>protein_coding<span class="" style="white-space:pre"> </span>exon<span class="" style="white-space:pre"> </span>113063155<span class="" style="white-space:pre"> </span>113063561<span class="" style="white-space:pre"> </span>.<span class="" style="white-space:pre"> </span>-<span class="" style="white-space:pre"> </span>.<span class="" style="white-space:pre"> </span>gene_id toto; transcript_id toto_a; exon_number 2;</div><div>3<span class="" style="white-space:pre"> </span>protein_coding<span class="" style="white-space:pre"> </span>CDS<span class="" style="white-space:pre"> </span>113063355<span class="" style="white-space:pre"> </span>113063561<span class="" style="white-space:pre"> </span>.<span class="" style="white-space:pre"> </span>-<span class="" style="white-space:pre"> </span>0<span class="" style="white-space:pre"> </span>gene_id toto; transcript_id toto_a; product_id toto_a; exon_number 2;</div></div><div><br></div><div>it works OK, and I get the expected stop_gained consequence for your input variant:</div><div><br></div><div><div>#Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation Extra</div><div>3_113063450_G/A 3:113063450 A toto toto_a Transcript stop_gained 468 268 90 R/* Cga/Tga - STRAND=-1</div></div><div><br></div><div>The parser is something we intend to revisit and improve sometime soon, so thanks for your patience in the meantime!</div><div><br></div><div>Regards</div><div><br></div><div>Will McLaren</div><div>Ensembl Variation</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 1 October 2014 12:08, Nicolas Thierry-Mieg <span dir="ltr"><<a href="mailto:Nicolas.Thierry-Mieg@imag.fr" target="_blank">Nicolas.Thierry-Mieg@imag.fr</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello developers,<br>
<br>
we would like to use VEP with custom transcript definitions. To this end we are building a VEP cache, following instructions found here:<br>
<a href="http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html" target="_blank">http://www.ensembl.org/info/<u></u>docs/tools/vep/script/vep_<u></u>cache.html</a><br>
<br>
For some transcripts things seem to work fine, but for others the SNP effects reported by VEP are incorrect.<br>
<br>
We have constructed a small example that exhibits the problem. I am attaching the GFF but also copying it inline, in case attachments don't work on this ML:<br>
3 protein_coding exon 113063155 113063561 . - . gene_id toto; transcript_id toto_a; exon_number 2;<br>
3 protein_coding stop_codon 113063352 113063354 . - . gene_id toto; transcript_id toto_a; product_id toto_a;<br>
3 protein_coding CDS 113063355 113063561 . - 0 gene_id toto; transcript_id toto_a; product_id toto_a; exon_number 2;<br>
3 protein_coding exon 113077591 113077946 . - . gene_id toto; transcript_id toto_a; exon_number 1;<br>
3 protein_coding CDS 113077591 113077746 . - 0 gene_id toto; transcript_id toto_a; product_id toto_a; exon_number 1;<br>
3 protein_coding start_codon 113077744 113077746 . - . gene_id toto; transcript_id toto_a; product_id toto_a;<br>
<br>
This is a (fabricated) two-exon transcript on the reverse strand of chromosome 3, using hg19 coordinates.<br>
I have checked and re-checked, the two CDS features indeed represent a CDS (no STOPs in-frame).<br>
For reference, the CDS sequences from exons 1 and 2 (hg19, chrom3, reverse strand) are respectively:<br>
atgaacagatcttcttatctgcaggatttt<u></u>gatgtagattcccaaatccgtgctgagatg<u></u>cacagaaaaacagctttcaaaattcaacaa<u></u>gtggaaaaggaattagcttgggaaaaagag<u></u>aaacatgaactcggcctaatgaagctaaag<u></u>aatcgg<br>
agatttcgagatccactggaaagtgatact<u></u>attgtggttcatgccatactgagtgaccac<u></u>aagatatcctcttacaggctggtgcagccc<u></u>tctaagtactccaaattcaaacgagctagt<u></u>caatcagagagaaaaccaagcaaattggac<u></u>aggtttgaaaaagagggacctggaagaaag<u></u>gacagccagagagatgcaggtagccta<br>
<br>
<br>
Using this GFF, I build the cache with:<br>
bgzip -c transcript.gff > transcript.gff.gz<br>
tabix -p gff transcript.gff.gz<br>
perl <a href="http://gtf2vep.pl" target="_blank">gtf2vep.pl</a> -i transcript.gff -f hs.chrom_3.fasta --dir cache/ -d 1 -s homo_sapiens<br>
<br>
<br>
I then have a single SNV in a VCF file:<br>
#CHROM POS ID REF ALT QUAL FILTER INFO<br>
3 113063450 . G A . PASS<br>
<br>
This SNV falls within the CDS, it is a nonsense mutation (stop-gained). However, when I run VEP it sees it as a "3_prime_UTR_variant".<br>
<br>
<br>
Specifically, I run VEP with:<br>
perl <a href="http://variant_effect_predictor.pl" target="_blank">variant_effect_predictor.pl</a> --custom transcript.gff.gz,MyGene,gff,<u></u>overlap,0 --offline -i snp.vcf --vcf --dir_cache cache/ --cache_version 1<br>
<br>
I obtain:<br>
##VEP=v76 cache=cache//homo_sapiens/1 db=.<br>
##INFO=<ID=CSQ,Number=.,Type=<u></u>String,Description="<u></u>Consequence type as predicted by VEP. Format: Allele|Gene|Feature|Feature_<u></u>type|Consequence|cDNA_<u></u>position|CDS_position|Protein_<u></u>position|Amino_acids|Codons|<u></u>Existing_variation|DISTANCE|<u></u>STRAND"><br>
##INFO=<ID=MyGene,Number=.,<u></u>Type=String,Description="<u></u>transcript.gff.gz (overlap)"><br>
#CHROM POS ID REF ALT QUAL FILTER INFO<br>
3 113063450 . G A . PASS CSQ=A|toto|toto_a|Transcript|<u></u>3_prime_UTR_variant|468|||||||<u></u>-1;MyGene=CDS_3:113063355-<u></u>113063561,exon_3:113063155-<u></u>113063561<br>
<br>
<br>
<br>
This is with the latest release of VEP (76).<br>
<br>
<br>
Please let me know if more info is needed to debug this. It would be great if VEP could be used with custom annotations!<br>
<br>
<br>
Regards,<br>
Nicolas Thierry-Mieg<span class="HOEnZb"><font color="#888888"><br>
<br>
<br>
<br>
-- <br>
------------------------------<u></u>-----------------------------<br>
Nicolas Thierry-Mieg<br>
Laboratoire TIMC-IMAG/BCM, CNRS UMR 5525<br>
Pavillon Taillefer, Faculte de Medecine<br>
38700 La Tronche, France<br>
tel: <a href="tel:%28%2B33%29456.520.067" value="+33456520067" target="_blank">(+33)456.520.067</a>, fax: <a href="tel:%28%2B33%29456.520.055" value="+33456520055" target="_blank">(+33)456.520.055</a><br>
------------------------------<u></u>------------------------------<br>
<br>
</font></span><br>_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br></div>