<div dir="ltr">Hi Stuart,<div><br></div><div>"-" is not a valid character in the VCF ALT field, see section 1.4.1 in <a href="https://samtools.github.io/hts-specs/VCFv4.2.pdf">https://samtools.github.io/hts-specs/VCFv4.2.pdf</a></div><div><br></div><div>I'll add in a check to future versions, but unfortunately there will always be situations where some weird/wrong input will kill VEP.</div><div><br></div><div>You could pass your VCF through a validator before sending it to VEP?</div><div><br></div><div>Regards</div><div><br></div><div>Will McLaren</div><div>Ensembl Variation</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 24 May 2016 at 19:26, Stuart Watt <span dir="ltr"><<a href="mailto:morungos@gmail.com" target="_blank">morungos@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi all<div><br></div><div>I’ve hit an issue with some invalid VarScan2 VCF files crashing VEP extremely fatally. A VCF that triggers this is:</div><div><br></div><div><blockquote type="cite"><div>##fileformat=VCFv4.1</div><div>##source=VarScan2</div><div>##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases"></div><div>##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation"></div><div>##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)"></div><div>##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value"></div><div>##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls"></div><div>##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls"></div><div>##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand"></div><div>##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position"></div><div>##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"></div><div>##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"></div><div>##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"></div><div>##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)"></div><div>##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)"></div><div>##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency"></div><div>##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev"></div><div>#CHROM  POS     ID<span style="white-space:pre-wrap">     </span>REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  498_tissue</div><div>chr2    242814072<span style="white-space:pre-wrap"> </span>.<span style="white-space:pre-wrap">       </span>TG<span style="white-space:pre-wrap">      </span>T<span style="white-space:pre-wrap">       </span>.<span style="white-space:pre-wrap">       </span>PASS    .<span style="white-space:pre-wrap">     </span>GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:34:34:0:0%:18,16,0,0<span style="white-space:pre-wrap">      </span>0/1:.:77:73:2:2.67%:35,38,1,1</div><div>chr3    239555  .<span style="white-space:pre-wrap">        </span>C<span style="white-space:pre-wrap">       </span>CT/-T   .<span style="white-space:pre-wrap">      </span>PASS    .<span style="white-space:pre-wrap">     </span>GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:77:29:19:39.58%:10,19,4,15        0/1:.:72:43:15:25.86%:19,24,4,11</div><div><br></div></blockquote><br></div><div>it’s the last like that does this. If the chr2 entry is missing, the file isn’t even detected as a VCF.</div><div><br></div><div>The error is:</div><div><br></div><div><blockquote type="cite"><div>MSG: start arg must be less than or equal to end arg + 1</div><div>STACK Bio::EnsEMBL::TranscriptMapper::genomic2cds /mnt/work1/software/vep/83/Bio/EnsEMBL/TranscriptMapper.pm:397</div><div>STACK Bio::EnsEMBL::Variation::BaseTranscriptVariation::cds_coords /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/BaseTranscriptVariation.pm:325</div><div>STACK Bio::EnsEMBL::Variation::BaseVariationFeatureOverlapAllele::_pre_consequence_predicates /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/BaseVariationFeatureOverlapAllele.pm:393</div><div>STACK Bio::EnsEMBL::Variation::BaseVariationFeatureOverlapAllele::get_all_OverlapConsequences /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/BaseVariationFeatureOverlapAllele.pm:237</div><div>STACK Bio::EnsEMBL::Variation::Utils::VEP::tva_to_line /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:2568</div><div>STACK Bio::EnsEMBL::Variation::Utils::VEP::vfoa_to_line /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:2504</div><div>STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:2191</div><div>STACK Bio::EnsEMBL::Variation::Utils::VEP::rejoin_variants /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:1777</div><div>STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:1485</div><div>STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:1205</div><div>STACK main::main /mnt/work1/software/vep/83/<a href="http://variant_effect_predictor.pl:321" target="_blank">variant_effect_predictor.pl:321</a></div><div>STACK toplevel /mnt/work1/software/vep/83/<a href="http://variant_effect_predictor.pl:148" target="_blank">variant_effect_predictor.pl:148</a></div><div>Date (localtime)    = Tue May 24 13:59:39 2016</div><div>Ensembl API version = 83</div><div>---------------------------------------------------</div><div>ERROR: Forked process(es) died</div></blockquote><br></div><div>We’re still trying to figure the VarScan issue, but this shouldn’t really take out an entire VEP run. Even the issue where this line breaks recognition of VEP input is, I’d say, less than ideal, as the file contains about 7000 other valid records. </div><div><br></div><div>All the best</div><div>Stuart</div><div><br><div>
<div><div>—</div><div><font color="#4f0500"><b>Stuart Watt, PhD</b></font></div><div><font color="#4f0500">Scientific Research Associate, Princess Margaret Cancer Centre</font></div><div><font color="#4f0500">MaRS Centre, 101 College Street<br>Toronto Medical Discovery Tower, Room 9-302<br>Toronto, Ontario, Canada M5G 1L7</font></div><div><font color="#4f0500"><a href="mailto:stuart.watt@uhnresearch.ca" target="_blank">stuart.watt@uhnresearch.ca</a><br><a href="tel:416-634-8816" value="+14166348816" target="_blank">416-634-8816</a></font></div></div>
</div>
<br></div></div><br>_______________________________________________<br>
Dev mailing list    <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" rel="noreferrer" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" rel="noreferrer" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br></div>