<div dir="ltr">Hi Mateo,<div><br></div><div>I don't think this solution solves all of your requirements, but it may come close enough.</div><div><br></div><div>You can use the <a href="http://filter_vep.pl" target="_blank">filter_vep.pl</a> script [1] to filter on a list of transcripts (provided in a file or a comma-separated list on the command line), and to remove non-matching "blocks" of annotation from the VCF line.</div><div><br></div><div>After producing normal VEP output in my_vep_output.vcf (don't use --pick), the command would look something like:</div><div><br></div><div>perl <a href="http://filter_vep.pl">filter_vep.pl</a> -i my_vep_output.vcf -f "Feature in transcript_list.txt" -only_matched > my_vep_output_filtered.vcf</div><div><br></div><div>This will work to an extent, but it will still produce some lines of VCF with more than one transcript annotated if your transcript_list.txt contains sets of transcripts that overlap (i.e. from the same gene).</div><div><br></div><div>If you really want VCF with one line per transcript, you could write a simple perl or awk script to post-process the output, perhaps something like:</div><div><br></div><div>perl -e'while(<>) { chomp; if(m/(.+?)(CSQ|ANN)\=([^;^\s]+)(.*)/) { foreach my $s(split ",", $3) { print "$1$2\=$s\;$4\n"}}}'  my_vep_output.vcf > my_vep_output_split.vcf<br></div><div><br></div><div>You could then use the filter script as above to reduce to your transcript set of interest, or even simply use "grep -f transcript_list.txt"</div><div><br></div><div>Regards</div><div><br></div><div>Will McLaren</div><div>Ensembl Variation</div><div><br></div><div>[1] <a href="http://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html">http://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html</a></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 5 April 2016 at 09:26, Matteo <span dir="ltr"><<a href="mailto:matteodeg@gmail.com" target="_blank">matteodeg@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi there,<br>

<br>

I'm using variant effect predictor on local, release 83. I have a question for you.<br>

This is my command line:<br>

<br>

perl <a href="http://variant_effect_predictor.pl" rel="noreferrer" target="_blank">variant_effect_predictor.pl</a> -i $path/20160317_Cardio_GATK_Filter.vcf -o $path/20160317_ANN_Cardio.vcf --stats_file $path/20160317_ANN_Cardio.html --cache --assembly GRCh37 --offline --force_overwrite -v --variant_class --sift b --poly b --vcf_info_field ANN --hgvs --protein --canonical --check_existing --gmaf --pubmed --species homo_sapiens --failed 1 --vcf --plugin LoFtool<br>

<br>

I'm trying to get a vcf output with one line for each transcript, in fact, I'd like to filter after the annotation using a list of transcripts like this:<br>

<br>

ENST00000299421<br>

ENST00000372980<br>

ENST00000310706<br>

ENST00000252321<br>

ENST00000315987<br>

<br>

Using --pick option, I get only one trascript for each position but I'm looking for all transcripts splitted on different lines, like the .txt output but in a vcf format like this example:<br>

<br>

chr1    25890050    .    A    G    20396.77    PASS AC=15;AF=0.625;AN=24;BaseQRankSum=3.48;ClippingRankSum=0.973;DP=909;ExcessHet=5.5287;FS=0.000;InbreedingCoeff=-0.2456;MLEAC=15;MLEAF=0.625;MQ=60.00;MQRankSum=0.167;QD=23.05;ReadPosRankSum=0.394;SOR=0.709;ANN=G|upstream_gene_variant|MODIFIER|LDLRAP1|ENSG00000157978|Transcript|ENST00000470950|processed_transcript||||||||||rs6661159|1|1032|1|SNV|HGNC|18640||||||G:0.4393|||| GT:AD:DP:GQ:PL    0/1:34,54:88:99:1817,0,946<br>

<br>

<br>

chr1    25890050    .    A    G    20396.77    PASS AC=15;AF=0.625;AN=24;BaseQRankSum=3.48;ClippingRankSum=0.973;DP=909;ExcessHet=5.5287;FS=0.000;InbreedingCoeff=-0.2456;MLEAC=15;MLEAF=0.625;MQ=60.00;MQRankSum=0.167;QD=23.05;ReadPosRankSum=0.394;SOR=0.709;ANN=G|intron_variant&non_coding_transcript_variant|MODIFIER|LDLRAP1|ENSG00000157978|Transcript|ENST00000484476|processed_transcript||1/3|ENST00000484476.1:n.339-102A>G|||||||rs6661159|1||1|SNV|HGNC|18640||||||G:0.4393|||| GT:AD:DP:GQ:PL    0/1:34,54:88:99:1817,0,946<br>

<br>

Last question: is it possible to split also multiallelic sites annotation or do you suggest to normalize them before?<br>

<br>

Thank you for your help,<br>

<br>

Matteo<br>

<br>

_______________________________________________<br>

Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" rel="noreferrer" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

Ensembl Blog: <a href="http://www.ensembl.info/" rel="noreferrer" target="_blank">http://www.ensembl.info/</a><br>

</blockquote></div><br></div>