<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body style='font-size: 10pt; font-family: Verdana,Geneva,sans-serif'>

<p>Hi ensembl-dev,</p>

<p>Sometimes the annotation I try to load up has transcripts that don't translate to valid proteins. I then go to look at them in a genome viewer to get an idea what's wrong, and it's helpful to know where to look.</p>

<p>I've tried to work with the values reported to me by the ProteinTranslation healthcheck log, until I realised they're nonsense - I think this code is wrong:</p>

<p><a href="https://github.com/Ensembl/ensj-healthcheck/blob/release/92/perl/Bio/EnsEMBL/Healthcheck/Translation.pm#L306">https://github.com/Ensembl/ensj-healthcheck/blob/release/92/perl/Bio/EnsEMBL/Healthcheck/Translation.pm#L306</a></p>

<p>It takes the protein sequence (in the peptide alphabet), looks for indexes of '*', adds these to the beginning of transcript start ( in the dna alphabet), and claims these to be locations of stop codons.</p>

<p>I currently have no good way of doing this. I have been translating the exons in all three phases, saving them to a file, and then text searching for bits of the sequence around the *. Does Ensembl offer a better way that I couldn't find, or, can you think of one?Thanks,Wojtek</p>


</body></html>