<div dir="ltr">Hi Cyriac,<div><br></div><div>--minimal does not reformat your VCF, but rather uses the minimised alleles and positions when calculating consequences. Apologies if this is not clear in the documentation.</div><div><br></div><div>The VEP output for the first line (on GRCh37, using --minimal) is:</div><div><br></div><div>1 198498326 . ATATAT ATATATAT . . CSQ=AT|intron_variant|MODIFIER.....<br></div><div><br></div><div>compared to not using --minimal</div><div><br></div><div>1 198498326 . ATATAT ATATATAT . . CSQ=TATATAT|intron_variant|MODIFIER.....<br></div><div><br></div><div>so it has been reformatted internally to an insertion of AT (Ensembl treats this as REF="-", ALT="AT", see <a href="http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf">http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf</a>).</div><div><br></div><div>The reason we don't reformat the VCF is that each ALT is treated separately, which might result in a different effective POS being used for each ALT. This is quite commonly seen in the ExAC project data, which was in fact the reason why we have introduced this flag as it is.</div><div><br></div><div>You can track which allele is which from the VCF input using --allele_numbers.</div><div><br></div><div>Regards</div><div><br></div><div>Will McLaren</div><div>Ensembl Variation</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 18 September 2015 at 23:34, Cyriac Kandoth <span dir="ltr"><<a href="mailto:kandoth@cbio.mskcc.org" target="_blank">kandoth@cbio.mskcc.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">This might be limited to VCFs. I haven't tested it on other input formats. Here 2 sample variants that I ran through VEP in offline cached mode...<div><br></div><div><div><font face="monospace, monospace">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR NORMAL</font></div><div><font face="monospace, monospace">1 198498326 . ATATAT ATATATAT . . . GT:AD:DP 0/1:10,10:20 0/0:30,5:35</font></div><div><font face="monospace, monospace">1 198498325 . AATATAT A,AATATATAT . . . GT:AD:DP 0/2:10,0,10:20 0/0:30,0,5:35</font><br></div></div><div><br></div><div>Here is the full command used:<br></div><div><font face="monospace, monospace">perl <a href="http://variant_effect_predictor.pl" target="_blank">variant_effect_predictor.pl</a> --species homo_sapiens --assembly GRCh37 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --regulatory --canonical --protein --biotype --uniprot --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --check_alleles --check_ref --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir <vep_data> --fasta <ref_fasta> --input_file <input_vcf> --output_file <output_vcf></font><br></div><div><br></div><div>This is the way I would expect the output VCF to be re-formatted...</div><div><br></div><div><div><font face="monospace, monospace">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR NORMAL</font></div><div><font face="monospace, monospace">1 198498331 . T TAT . . . GT:AD:DP 0/1:10,10:20 0/0:30,5:35</font></div><div><font face="monospace, monospace">1 198498325 . AATATAT A,AATATATAT . . . GT:AD:DP 0/2:10,0,10:20 0/0:30,0,5:35</font></div></div><div><br></div><div>Thanks,</div><div>Cyriac</div></div>
<br>_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" rel="noreferrer" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" rel="noreferrer" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br></div>