<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Reece,<div class=""><br class=""></div><div class="">Apologies for the long wait for a reply. I don’t believe this is as much of an issue as you may think. This Ensembl & GENCODE annotation is performed against the assembly in question. There is no alignment of transcript model to the genome. In the example you gave in GENCODE19 this was a processed transcript [1]. In all GRCh38 annotations this became a protein coding gene [2] thanks to the injection of KF495714.1 [3] into the assembly config AC107373.4 by GRC allowing us to correctly annotate NEFL.</div><div class=""><br class=""></div><div class="">The upshot of this is that GENCODE annotation reflects what occurs in the genome. Providing an alignment file would match the transcript models.</div><div class=""><br class=""></div><div class="">As for using accession versus names we have considered this previously. Whilst we understand the possible confusion that can occur when not using accessions. However there are a large number of users who are using these long standing names. We have planned to develop a tool which can convert the FTP dumps for a number of analyses and changing names, where applicable, is a possible feature. I’m happy to keep you in the loop on further developments if you’d like.</div><div class=""><br class=""></div><div class="">Cheers</div><div class=""><br class=""></div><div class="">Andy</div><div class=""><br class=""></div><div class="">1 - <a href="http://grch37.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000104725;r=8:24808468-24814624;t=ENST00000221169" class="">http://grch37.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000104725;r=8:24808468-24814624;t=ENST00000221169</a></div><div class="">2 - <a href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000277586;r=8:24950955-24957110" class="">http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000277586;r=8:24950955-24957110</a></div><div class="">3 - <a href="http://www.ebi.ac.uk/ena/data/view/KF495714.1" class="">http://www.ebi.ac.uk/ena/data/view/KF495714.1</a></div><div class=""><br class=""><div apple-content-edited="true" class="">

<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">------------<br class="">Andrew Yates - Genomics Technology Infrastructure Team Leader<br class="">The European Bioinformatics Institute (EMBL-EBI)<br class="">Wellcome Trust Campus<br class="">Hinxton, Cambridge<br class="">CB10 1SD, United Kingdom<br class="">Tel: +44-(0)1223-492538<br class="">Fax: +44-(0)1223-494468<br class="">Skype: andrewyatz</div><div style="orphans: auto; text-align: start; text-indent: 0px; widows: auto; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><a href="http://www.ebi.ac.uk/" class="">http://www.ebi.ac.uk/</a><br class="">http://www.ensembl.org/</div></div></div>

</div>

<br class=""><div><blockquote type="cite" class=""><div class="">On 3 Aug 2015, at 07:12, Reece Hart <<a href="mailto:reece@harts.net" class="">reece@harts.net</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_extra"><br class=""><div class="gmail_quote">On Fri, Jul 31, 2015 at 2:38 AM, mag <span dir="ltr" class=""><<a href="mailto:mr6@ebi.ac.uk" target="_blank" class="">mr6@ebi.ac.uk</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000" class="">And a similar endpoint can provide you with transcript coordinates

    translated to genomic coordinates, although the reverse is not

    available<br class="">

<a href="http://rest.ensembl.org/map/cdna/ENST00000288602/100..300?content-type=application/json" target="_blank" class="">http://rest.ensembl.org/map/cdna/ENST00000288602/100..300?content-type=application/json</a><br class="">

    <br class="">

    I am not sure if that covers your use case, so please don't hesitate

    to give us some feedback.<br class="">

    The REST API is still in active development and we are always happy

    to get some feature requests that allow us to prioritise future

    endpoints.</div></blockquote></div><br class="">Hi Magali-</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">My primary goal is to build a table of the correspondence between exons in genomic and transcript coordinates so that we can map between g., c., and p. sequence coordinates. The mapping is trivial in most cases, of course, but indels and sequence errors are common enough that "most cases" isn't good enough for diagnostic purposes. (For example, NEFL contains an insert of a single G in a poly-G tract in GRCh37 that leads to a frameshift; the refseq transcript is correct and doesn't have this pathology.) </div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">NCBI started providing files with exon-level correspondence in April. These are extremely useful. See <a href="ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/alignments/" class="">ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/alignments/</a> for examples.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">So, the request/suggestion is to provide a similar file or REST interface for ENST transcripts.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">While I'm at it, I would encourage Ensembl (everyone, actually) to use sequence accessions (like NC_000001.10) rather than chromosome names. Chromosome names are imprecise and unnecessarily create opportunities for errors. (It also makes it hard to even consider a world in which multiple assemblies are in one database schema.)</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">Thanks,</div><div class="gmail_extra">Reece</div><div class="gmail_extra"><br class=""></div></div>

_______________________________________________<br class="">Dev mailing list    <a href="mailto:Dev@ensembl.org" class="">Dev@ensembl.org</a><br class="">Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" class="">http://lists.ensembl.org/mailman/listinfo/dev</a><br class="">Ensembl Blog: <a href="http://www.ensembl.info/" class="">http://www.ensembl.info/</a><br class=""></div></blockquote></div><br class=""></div></body></html>