<div> </div>
<div>Hi Reece,</div>
<div> </div>
<div>Thanks for the detailed bug report. The fix you suggest will not have unwanted impact so has been added to CVS for branch 71.</div>
<div> </div>
<div>Best wishes,<br></div>
<div>Sarah<br></div>
<div class="gmail_quote">On Tue, May 14, 2013 at 5:48 AM, Reece Hart <span dir="ltr"><<a href="mailto:reece@harts.net" target="_blank">reece@harts.net</a>></span> wrote:<br>
<blockquote style="BORDER-LEFT:#ccc 1px solid;MARGIN:0px 0px 0px 0.8ex;PADDING-LEFT:1ex" class="gmail_quote">
<div dir="ltr">Hi- 
<div><br></div>
<div>VEP 2.8 and VEP 71 appear to have a bug in which the start coordinate for protein HGVSp effects are repeated for certain delins variants.</div>
<div><br></div>
<div>
<div>For example, with this VCF line as input:</div>
<div>  3<span style="WHITE-SPACE:pre-wrap"> </span>10191482<span style="WHITE-SPACE:pre-wrap"> </span>CVID1003553<span style="WHITE-SPACE:pre-wrap"> </span>A<span style="WHITE-SPACE:pre-wrap"> </span>ATTT<span style="WHITE-SPACE:pre-wrap"> </span>60<span style="WHITE-SPACE:pre-wrap"> </span>PASS</div>

<div>VEP 2.3 correctly returns <br></div>
<div>
<div>
<div>  HGVSp=ENSP00000256474.2:p.Lys159delinsIleX</div>
<div>  HGVSp=ENSP00000344757.2:p.Lys118delinsIleX</div>
<div>for two transcripts, whereas VEP 2.8 and VEP 71 each return</div></div></div></div>
<div>  HGVSp=ENSP00000256474.2:p.Lys159159delinsIleX<br></div>
<div>
<div>  HGVSp=ENSP00000344757.2:p.Lys118118delinsIleX</div>
<div>(two coding transcripts)</div>
<div><br></div></div>
<div>Notice that the start coordinates, 159 and 118, are repeated VEP 2.8 and 71.</div>
<div><br></div>
<div><br></div>
<div>More examples:</div>
<div>
<div>5<span style="WHITE-SPACE:pre-wrap"> </span>112175315<span style="WHITE-SPACE:pre-wrap"> </span>CVID1010109<span style="WHITE-SPACE:pre-wrap"> </span>T<span style="WHITE-SPACE:pre-wrap"> </span>TAAA<span style="WHITE-SPACE:pre-wrap"> </span>60<span style="WHITE-SPACE:pre-wrap"> </span>PASS<br>
</div>
<div>9<span style="WHITE-SPACE:pre-wrap"> </span>98238379<span style="WHITE-SPACE:pre-wrap"> </span>CVID6007616<span style="WHITE-SPACE:pre-wrap"> </span>AT<span style="WHITE-SPACE:pre-wrap"> </span>ACTGCTGC<span style="WHITE-SPACE:pre-wrap"> </span>60<span style="WHITE-SPACE:pre-wrap"> </span>PASS</div>

<div>10<span style="WHITE-SPACE:pre-wrap"> </span>43610045<span style="WHITE-SPACE:pre-wrap"> </span>CVID4000412<span style="WHITE-SPACE:pre-wrap"> </span>AG<span style="WHITE-SPACE:pre-wrap"> </span>ATTCT<span style="WHITE-SPACE:pre-wrap"> </span>60<span style="WHITE-SPACE:pre-wrap"> </span>PASS</div>

<div>10<span style="WHITE-SPACE:pre-wrap"> </span>89711990<span style="WHITE-SPACE:pre-wrap"> </span>CVID4000640<span style="WHITE-SPACE:pre-wrap"> </span>TTCC<span style="WHITE-SPACE:pre-wrap"> </span>TATAAAT<span style="WHITE-SPACE:pre-wrap"> </span>60<span style="WHITE-SPACE:pre-wrap"> </span>PASS</div>

<div>17<span style="WHITE-SPACE:pre-wrap"> </span>7578202<span style="WHITE-SPACE:pre-wrap"> </span>CVID6007473<span style="WHITE-SPACE:pre-wrap"> </span>AC<span style="WHITE-SPACE:pre-wrap"> </span>AACCA<span style="WHITE-SPACE:pre-wrap"> </span>60<span style="WHITE-SPACE:pre-wrap"> </span>PASS</div>

<div>17<span style="WHITE-SPACE:pre-wrap"> </span>78063675<span style="WHITE-SPACE:pre-wrap"> </span>CVID6004403<span style="WHITE-SPACE:pre-wrap"> </span>A<span style="WHITE-SPACE:pre-wrap"> </span>ATGT<span style="WHITE-SPACE:pre-wrap"> </span>60<span style="WHITE-SPACE:pre-wrap"> </span>PASS</div>

<div><br></div></div>
<div><br></div>
<div>The error appears to be in TranscriptVariationAllele.pm:794:<br></div>
<div><br></div>
<div>$hgvs_notation->{'hgvs'} .= $ref_pep_first . $hgvs_notation->{start} . $hgvs_notation->{end} . $hgvs_notation->{type} . $hgvs_notation->{alt} ;<br></div>
<div><br></div>
<div>Having both $ref_pep_first and $hgvs_notation->{start} has the effect of repeating the starting coordinate. Removing $hgvs_notation->{start} from the above line solves this problem for these cases, but I'm unsure that I fully understand the logic that is implemented in _get_hgvs_protein_format or the impact of this change on other cases.<span class="HOEnZb"><font color="#888888"><br>
</font></span></div><span class="HOEnZb"><font color="#888888">
<div><br></div>
<div><br></div>
<div>-Reece</div>
<div><br></div></font></span></div><br>_______________________________________________<br>Dev mailing list    <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br><br></blockquote></div><br>