<div dir="ltr">Hello Miguel,<div><br></div><div>Using a GTF file as a custom annotation will not affect the consequence type called - all that the VEP can do with this is to annotate whether or not your input variants overlap with any of the features in your GTF file. If there is an overlap, it should appear in the Extra column and look something like</div>
<div><br></div><div>myFeatures=[feature]_[chr]:<span style="font-family:arial,sans-serif;font-size:12.727272033691406px">[start]-[end]</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">It looks like you are doing the right thing, however I think there is an issue with the way the tabix indexes GTF files. The example data you give I presume is for a gene on the reverse strand, since start > end. Tabix doesn't seem to like this (even though it indexes the file without issue), and doesn't return any overlapping data. You could get around this by swapping start and end for those features on the reverse strand in your GTF file.</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">It is also possible to create a VEP cache file using a GTF file, and in this way you can annotate your variants with consequences according to the gene structures specified in the GTF file; see </span><font face="arial, sans-serif"><a href="http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gtf" target="_blank">http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gtf</a> . Note the restrictions on formatting in the GTF - you may need to parse yours judging by the format you have pasted.</font></div>
<div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">Hope this helps</font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">Will McLaren</font></div>
<div><font face="arial, sans-serif">Ensembl Variation</font></div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On 17 September 2013 21:28, Miguel Perez-Enciso <span dir="ltr"><<a href="mailto:Miguel.Perez@uab.cat" target="_blank">Miguel.Perez@uab.cat</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
Dear developers<br>
I am trying to use <a href="http://variant_effect_predictor.pl" target="_blank">variant_effect_predictor.pl</a> with a custom gtf file, I run these commands below but it seems that, although it reads my gtf file, it does not recognize the genes and returns all intergenic variants. I attach also the first rows of my gff file. I have installed latest v. (73). Could you help? Thanks a lot!<br>
Miguel Perez Enciso<br>
<br>
sort -k1,1 -k4,4n -k5,5n GFF_NDJ_TR.gtf | bgzip > GFF_NDJ_TR.gtf.gz<br>
tabix -p gff GFF_NDJ_TR.gtf.gz<br>
<br>
file=in.vcf<br>
<br>
perl <a href="http://variant_effect_predictor.pl" target="_blank">variant_effect_predictor.pl</a> --custom GFF_NDJ_TR.gtf.gz,myFeatures,<u></u>gff,overlap,0 -i $file --species sus_scrofa -o $file.annot --format vcf --offline --force_overwrite --cache<br>
<br>
# gtf file<br>
1 protein_coding stop_codon 21476516 21476514 . - . gene_id GRM1-201<br>
1 protein_coding exon 21477448 21476379 . - . gene_id GRM1-201<br>
1 protein_coding CDS 21477448 21476517 . - . gene_id GRM1-201<br>
<br>
...<br>
<br>
The first lines of annot file are ( so it does read my annotation file)<br>
<br>
## ENSEMBL VARIANT EFFECT PREDICTOR v73<br>
## Output produced at 2013-09-17 22:04:13<br>
## Connected to<br>
## Using cache in /home/miguel/.vep/sus_scrofa/<u></u>73<br>
## Using API version 73, DB version ?<br>
## Extra column keys:<br>
## DISTANCE : Shortest distance from variant to transcript<br>
## myFeatures : /home/miguel/Documents/ngs/<u></u>GFF_NDJ_TR.gtf.gz (overlap)<br>
#Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation Extra<br>
5_63894199_C/T 5:63894199 T - - - intergenic_variant - - - - - -<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/<u></u>mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
</blockquote></div><br></div>