<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Wolf,<br>
<br>
Just to confirm this is now fixed in the Ensembl release 92 -
GENCODE 28 which was released yesterday.<br>
<br>
Thank you for reporting that.<br>
<br>
Cheers,<br>
Carlos<br>
<br>
<div class="moz-cite-prefix">On 01/12/17 16:22, Fergal wrote:<br>
</div>
<blockquote
cite="mid:12C53CE4-8EF2-4C52-A90A-38D5218F3823@ebi.ac.uk"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
Hi Wolf,
<div><br>
</div>
<div>Yes, we should be able to apply the code change to
stop ENST00000614349.4 being added to the readthrough for e92.</div>
<div><br>
</div>
<div>Fergal.</div>
<div><br>
<div>
<div>On 1 Dec 2017, at 15:56, Wolf Beat <<a
moz-do-not-send="true" href="mailto:Beat.Wolf@hefr.ch"><a class="moz-txt-link-abbreviated" href="mailto:Beat.Wolf@hefr.ch">Beat.Wolf@hefr.ch</a></a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div style="font-size: 12px; font-style: normal;
font-variant: normal; font-weight: normal; letter-spacing:
normal; line-height: normal; orphans: auto; text-align:
start; text-indent: 0px; text-transform: none;
white-space: normal; widows: auto; word-spacing: 0px;
-webkit-text-stroke-width: 0px;">Thank you very much for
the detailed answer. Will this be fixed for a future
release? (specifically the ENST00000614349.4 transcript).<br>
<br>
<br>
I the meantime the idea to filter read through genes
(although i will have to read up a little more about what
exactly it really means, even if your explanation is
already quite good). I will check if i can do this though
the biomart interface.<br>
<br>
<br>
Kind regards<br>
<br>
<br>
Beat Wolf<br>
<br>
________________________________<br>
From: Dev <<a moz-do-not-send="true"
href="mailto:dev-bounces@ensembl.org">dev-bounces@ensembl.org</a>>
on behalf of Fergal <<a moz-do-not-send="true"
href="mailto:fergal@ebi.ac.uk">fergal@ebi.ac.uk</a>><br>
Sent: Friday, December 1, 2017 4:52:56 PM<br>
To: Ensembl developers list<br>
Subject: Re: [ensembl-dev] Probably duplicated human gene
in latest release<br>
<br>
Hi Wolf,<br>
<br>
This is a rather complicated scenario. ENSG00000255292 is
a readthrough gene. Readthrough genes are manually
annotated by the Havana team and are made when there is
some biological evidence of transcription of a single
molecule that spans two distinct loci (in this case
ENSG00000204370 and ENSG00000197580). This information can
be seen on the gene summary via the annotation attribute
“overlapping locus”, though it admittedly this is not very
obvious.<br>
<br>
While there is experimental evidence for this occurring,
it is unclear if such events have any true biological
meaning. Often the assigned biotypes is non-sense mediated
decay in these instances to signify that the product is
not viable.<br>
<br>
As ENSG00000204370 and ENSG00000197580 are two distinct
genes and ENSG00000255292 represents a readthrough event
between them, the records should not be merged. However,
as you’ve noted this scenario does then provide challenges
for mapping pipelines when it comes to naming and
cross-referencing.<br>
<br>
One thing that does appear to have gone wrong is the
inclusion of ENST00000614349.4 in the readthrough gene.
This was added into the gene via our merge code, which
bases the decision to merge automatically annotated
transcripts into manually curated genes based on exon
overlap. ENST00000614349.4 had the most exon overlap with
one of the transcripts in the readtrhough gene and thus
was merged in. We are going to add a rule avoid merging
protein coding transcripts into readthrough genes to
hopefully solve the issue in future releases.<br>
<br>
A workaround (depending on what you’re doing) is to just
filter readthrough genes out of you analysis. You can
generate a list of readthroughs via the following SQL:<br>
<br>
mysql -uanonymous -<a moz-do-not-send="true"
href="http://hensembldb.ensembl.org/">hensembldb.ensembl.org</a><<a
moz-do-not-send="true"
href="http://hensembldb.ensembl.org/"><a class="moz-txt-link-freetext" href="http://hensembldb.ensembl.org">http://hensembldb.ensembl.org</a></a>>
homo_sapiens_core_90_38 -NB -e "select
distinct(concat(gene.stable_id,'.',gene.version)) from
gene join transcript using(gene_id) join transcript_attrib
using(transcript_id) where value='readthrough'"<br>
<br>
This can also be done through the API by looking at the
transcript attributes.<br>
<br>
Hope this helps,<br>
<br>
Fergal.<br>
<br>
<br>
On 1 Dec 2017, at 14:56, Wolf Beat <<a
moz-do-not-send="true" href="mailto:Beat.Wolf@hefr.ch"><a class="moz-txt-link-abbreviated" href="mailto:Beat.Wolf@hefr.ch">Beat.Wolf@hefr.ch</a></a><<a
moz-do-not-send="true" href="mailto:Beat.Wolf@hefr.ch"><a class="moz-txt-link-freetext" href="mailto:Beat.Wolf@hefr.ch">mailto:Beat.Wolf@hefr.ch</a></a>>>
wrote:<br>
<br>
Sorry, this is my fault. I was comparing all possible
ensembl versions and copied the link from the wrong tab.<br>
<br>
<br>
So the correct links are:<br>
<br>
<br>
<a moz-do-not-send="true"
href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013">http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013</a><br>
<br>
<a class="moz-txt-link-freetext" href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000255292;r=11:112086824-112193805">http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000255292;r=11:112086824-112193805</a><br>
<br>
<br>
Sorry for being incorrect in the last email with my links.<br>
<br>
________________________________<br>
From: Matthew Laird <a class="moz-txt-link-rfc2396E" href="mailto:lairdm@ebi.ac.uk"><lairdm@ebi.ac.uk></a><br>
Sent: Friday, December 1, 2017 3:54:18 PM<br>
To: Ensembl developers list; Wolf Beat<br>
Subject: Re: [ensembl-dev] Probably duplicated human gene
in latest release<br>
<br>
Hello Wolf,<br>
<br>
The latter link is for Ensembl release 75, which was the
final release<br>
in which GRCh37 was used. So the records represented by
those two links<br>
are for two different assemblies of the genome. Between
that and the<br>
periodic updates that do happen to annotations between
releases, it's<br>
not surprising the transcripts for the gene would be
different. If you<br>
look in the gene history [1] page on the current release
you can see the<br>
gene was updated and the version number incremented in
Ensembl release 81.<br>
<br>
If I'm misunderstanding your question, please let me know
and we can try<br>
to resolve it. Cheers.<br>
<br>
[1]<br>
<a class="moz-txt-link-freetext" href="http://www.ensembl.org/Homo_sapiens/Gene/Idhistory?db=core;g=ENSG00000204370;r=11:112086773-112120013">http://www.ensembl.org/Homo_sapiens/Gene/Idhistory?db=core;g=ENSG00000204370;r=11:112086773-112120013</a><br>
<br>
On 01/12/17 13:27, Wolf Beat wrote:<br>
Hello,<br>
<br>
<br>
i just noticed that the human gene SDHD. It does not have
the same transcripts in both entries, but at least one
protein coding gene is present in both. Also the
description, including the HGNC Symbol is the same, which
makes me think that this is some kind of error. Both
entries should probably be merged. Here are the two
entries for the same gene:<br>
<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013">http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013</a><br>
<br>
<br>
<a class="moz-txt-link-freetext" href="http://feb2014.archive.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:111957497-111990353">http://feb2014.archive.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:111957497-111990353</a><br>
<br>
<br>
Kind regards<br>
<br>
<br>
Beat Wolf<br>
_______________________________________________<br>
Dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info:
<a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a><br>
<br>
_______________________________________________<br>
Dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info:
<a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a><br>
<br>
_______________________________________________<br>
Dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info:
<a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a></div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
</body>
</html>