<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi Wolf,<br>
    <br>
    Just to confirm this is now fixed in the Ensembl release 92 -
    GENCODE 28 which was released yesterday.<br>
    <br>
    Thank you for reporting that.<br>
    <br>
    Cheers,<br>
    Carlos<br>
    <br>
    <div class="moz-cite-prefix">On 01/12/17 16:22, Fergal wrote:<br>
    </div>
    <blockquote
      cite="mid:12C53CE4-8EF2-4C52-A90A-38D5218F3823@ebi.ac.uk"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      Hi Wolf,
      <div><br>
      </div>
      <div>Yes, we should be able to apply the code change to
        stop ENST00000614349.4 being added to the readthrough for e92.</div>
      <div><br>
      </div>
      <div>Fergal.</div>
      <div><br>
        <div>
          <div>On 1 Dec 2017, at 15:56, Wolf Beat <<a
              moz-do-not-send="true" href="mailto:Beat.Wolf@hefr.ch"><a class="moz-txt-link-abbreviated" href="mailto:Beat.Wolf@hefr.ch">Beat.Wolf@hefr.ch</a></a>>
            wrote:</div>
          <br class="Apple-interchange-newline">
          <blockquote type="cite">
            <div style="font-size: 12px; font-style: normal;
              font-variant: normal; font-weight: normal; letter-spacing:
              normal; line-height: normal; orphans: auto; text-align:
              start; text-indent: 0px; text-transform: none;
              white-space: normal; widows: auto; word-spacing: 0px;
              -webkit-text-stroke-width: 0px;">Thank you very much for
              the detailed answer. Will this be fixed for a future
              release? (specifically the ENST00000614349.4 transcript).<br>
              <br>
              <br>
              I the meantime the idea to filter read through genes
              (although i will have to read up a little more about what
              exactly it really means, even if your explanation is
              already quite good). I will check if i can do this though
              the biomart interface.<br>
              <br>
              <br>
              Kind regards<br>
              <br>
              <br>
              Beat Wolf<br>
              <br>
              ________________________________<br>
              From: Dev <<a moz-do-not-send="true"
                href="mailto:dev-bounces@ensembl.org">dev-bounces@ensembl.org</a>>
              on behalf of Fergal <<a moz-do-not-send="true"
                href="mailto:fergal@ebi.ac.uk">fergal@ebi.ac.uk</a>><br>
              Sent: Friday, December 1, 2017 4:52:56 PM<br>
              To: Ensembl developers list<br>
              Subject: Re: [ensembl-dev] Probably duplicated human gene
              in latest release<br>
              <br>
              Hi Wolf,<br>
              <br>
              This is a rather complicated scenario. ENSG00000255292 is
              a readthrough gene. Readthrough genes are manually
              annotated by the Havana team and are made when there is
              some biological evidence of transcription of a single
              molecule that spans two distinct loci (in this case
              ENSG00000204370 and ENSG00000197580). This information can
              be seen on the gene summary via the annotation attribute
              “overlapping locus”, though it admittedly this is not very
              obvious.<br>
              <br>
              While there is experimental evidence for this occurring,
              it is unclear if such events have any true biological
              meaning. Often the assigned biotypes is non-sense mediated
              decay in these instances to signify that the product is
              not viable.<br>
              <br>
              As ENSG00000204370 and ENSG00000197580  are two distinct
              genes and ENSG00000255292 represents a readthrough event
              between them, the records should not be merged. However,
              as you’ve noted this scenario does then provide challenges
              for mapping pipelines when it comes to naming and
              cross-referencing.<br>
              <br>
              One thing that does appear to have gone wrong is the
              inclusion of ENST00000614349.4 in the readthrough gene.
              This was added into the gene via our merge code, which
              bases the decision to merge automatically annotated
              transcripts into manually curated genes based on exon
              overlap. ENST00000614349.4 had the most exon overlap with
              one of the transcripts in the readtrhough gene and thus
              was merged in. We are going to add a rule avoid merging
              protein coding transcripts into readthrough genes to
              hopefully solve the issue in future releases.<br>
              <br>
              A workaround (depending on what you’re doing) is to just
              filter readthrough genes out of you analysis. You can
              generate a list of readthroughs via the following SQL:<br>
              <br>
              mysql -uanonymous -<a moz-do-not-send="true"
                href="http://hensembldb.ensembl.org/">hensembldb.ensembl.org</a><<a
                moz-do-not-send="true"
                href="http://hensembldb.ensembl.org/"><a class="moz-txt-link-freetext" href="http://hensembldb.ensembl.org">http://hensembldb.ensembl.org</a></a>>
              homo_sapiens_core_90_38 -NB -e "select
              distinct(concat(gene.stable_id,'.',gene.version)) from
              gene join transcript using(gene_id) join transcript_attrib
              using(transcript_id) where value='readthrough'"<br>
              <br>
              This can also be done through the API by looking at the
              transcript attributes.<br>
              <br>
              Hope this helps,<br>
              <br>
              Fergal.<br>
              <br>
              <br>
              On 1 Dec 2017, at 14:56, Wolf Beat <<a
                moz-do-not-send="true" href="mailto:Beat.Wolf@hefr.ch"><a class="moz-txt-link-abbreviated" href="mailto:Beat.Wolf@hefr.ch">Beat.Wolf@hefr.ch</a></a><<a
                moz-do-not-send="true" href="mailto:Beat.Wolf@hefr.ch"><a class="moz-txt-link-freetext" href="mailto:Beat.Wolf@hefr.ch">mailto:Beat.Wolf@hefr.ch</a></a>>>
              wrote:<br>
              <br>
              Sorry, this is my fault. I was comparing all possible
              ensembl versions and copied the link from the wrong tab.<br>
              <br>
              <br>
              So the correct links are:<br>
              <br>
              <br>
              <a moz-do-not-send="true"
href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013">http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013</a><br>
              <br>
<a class="moz-txt-link-freetext" href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000255292;r=11:112086824-112193805">http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000255292;r=11:112086824-112193805</a><br>
              <br>
              <br>
              Sorry for being incorrect in the last email with my links.<br>
              <br>
              ________________________________<br>
              From: Matthew Laird <a class="moz-txt-link-rfc2396E" href="mailto:lairdm@ebi.ac.uk"><lairdm@ebi.ac.uk></a><br>
              Sent: Friday, December 1, 2017 3:54:18 PM<br>
              To: Ensembl developers list; Wolf Beat<br>
              Subject: Re: [ensembl-dev] Probably duplicated human gene
              in latest release<br>
              <br>
              Hello Wolf,<br>
              <br>
              The latter link is for Ensembl release 75, which was the
              final release<br>
              in which GRCh37 was used. So the records represented by
              those two links<br>
              are for two different assemblies of the genome. Between
              that and the<br>
              periodic updates that do happen to annotations between
              releases, it's<br>
              not surprising the transcripts for the gene would be
              different. If you<br>
              look in the gene history [1] page on the current release
              you can see the<br>
              gene was updated and the version number incremented in
              Ensembl release 81.<br>
              <br>
              If I'm misunderstanding your question, please let me know
              and we can try<br>
              to resolve it. Cheers.<br>
              <br>
              [1]<br>
<a class="moz-txt-link-freetext" href="http://www.ensembl.org/Homo_sapiens/Gene/Idhistory?db=core;g=ENSG00000204370;r=11:112086773-112120013">http://www.ensembl.org/Homo_sapiens/Gene/Idhistory?db=core;g=ENSG00000204370;r=11:112086773-112120013</a><br>
              <br>
              On 01/12/17 13:27, Wolf Beat wrote:<br>
              Hello,<br>
              <br>
              <br>
              i just noticed that the human gene SDHD. It does not have
              the same transcripts in both entries, but at least one
              protein coding gene is present in both. Also the
              description, including the HGNC Symbol is the same, which
              makes me think that this is some kind of error. Both
              entries should probably be merged. Here are the two
              entries for the same gene:<br>
              <br>
              <br>
<a class="moz-txt-link-freetext" href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013">http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013</a><br>
              <br>
              <br>
<a class="moz-txt-link-freetext" href="http://feb2014.archive.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:111957497-111990353">http://feb2014.archive.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:111957497-111990353</a><br>
              <br>
              <br>
              Kind regards<br>
              <br>
              <br>
              Beat Wolf<br>
              _______________________________________________<br>
              Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
              Posting guidelines and subscribe/unsubscribe info:
              <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
              Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a><br>
              <br>
              _______________________________________________<br>
              Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
              Posting guidelines and subscribe/unsubscribe info:
              <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
              Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a><br>
              <br>
              _______________________________________________<br>
              Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
              Posting guidelines and subscribe/unsubscribe info:
              <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
              Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a></div>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>