<div dir="ltr">Sorry for the crazy formated text... ;)<div><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br> Duarte Miguel Paulo Molha <br></font><div><font style="background-color:rgb(255,255,255)" color="#999999"> <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a> <br>=========================</font></div></div></div>
<br><div class="gmail_quote">On 11 March 2015 at 16:32, Duarte Molha <span dir="ltr"><<a href="mailto:duartemolha@gmail.com" target="_blank">duartemolha@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks Magali <div><br></div><div>Are there any plans to correct the naming structure of your refseq imports?<div><br></div><div>Take as an example the transcript NM_001164603 associated with gene ASXL1</div><div><br></div><div>On your main database you have associated it with transcript ID <h1 style="color:rgb(51,51,51);margin:0px 0px 1em;font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif"><span style="color:rgb(85,85,85)"><font>ENST00000497249 <span style="font-weight:normal">that has a transcript length of 296 bp and 5 exon</span></font></span></h1><div><a href="http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000171456;r=20:32358811-32372242;t=ENST00000497249" target="_blank">ENST00000497249</a><br></div><div><br></div><h1 style="color:rgb(51,51,51);margin:0px 0px 1em;font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif"><span style="font-family:arial,sans-serif;font-weight:normal;color:rgb(34,34,34)"><font>However, the true refseq transcript is stored in your other features as a novel transcript with the correct identifier</font></span></h1><div><a href="http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=otherfeatures;g=171023;r=20:32358344-32372549;t=NM_001164603.1" target="_blank">NM_001164603.1</a><br></div><div><br></div><div>It contains 5 exons but a transcript length of 1070bp</div><div><br></div><div>however, </div><div><br></div><div>In that table you call the transcript with the correct REFSEQ_ID, but then on the exon layer, you pick different identifiers for exons 2,3 and 4 of this transcript!</div><div><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32358294-32358882;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">NM_001164603.1.1</a><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32366334-32366516;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">XM_006723729.1.2</a><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32367677-32367779;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">XM_006723729.1.3</a><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32368965-32369173;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">XM_006723729.1.4</a><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32372114-32372599;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">NM_001164603.1.5</a><br></div><div><br></div><div><br></div><div><br></div><div>Many thanks</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Duarte</div><div><br></div><div><br></div></font></span></div></div><div class="gmail_extra"><span class=""><br clear="all"><div><div><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br> Duarte Miguel Paulo Molha <br></font><div><font style="background-color:rgb(255,255,255)" color="#999999"> <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a> <br>=========================</font></div></div></div>
<br></span><div><div class="h5"><div class="gmail_quote">On 11 March 2015 at 09:35, mag <span dir="ltr"><<a href="mailto:mr6@ebi.ac.uk" target="_blank">mr6@ebi.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hi Duarte,<br>
<br>
I am not convinced all genes in Ensembl will have at least one
mapping to RefSeq, but your snippet of code should work regardless.<br>
<br>
<br>
Regards,<br>
Magali<span><br>
<br>
<div>On 10/03/2015 17:05, Duarte Molha
wrote:<br>
</div>
</span><blockquote type="cite">
<div dir="ltr"><span>Thanks ... I think I have understood
<div><br>
</div>
<div>Just confirm one thing to me ... </div>
<div><br>
</div>
<div>if I get all ensembl transcripts of any given gene at least
one of those transcripts will have a database mapping to
refseq correct?</div>
<div><br>
</div>
<div>for example ... consider the code:</div>
<div><br>
</div>
<div>
<div><span style="white-space:pre-wrap">$transcripts =
$gene->get_all_Transcripts();
while ( my $transcript = shift @{$transcripts} ) { my
%transcripts_refseq_ids = (); foreach my $dbe (@{
$transcript->get_all_DBEntries() }) {
if($dbe->dbname() eq "RefSeq_mRNA") {
$transcripts_refseq_ids{ $dbe->display_id() } = 1; } }
}</span></div>
</div>
<div><span style="white-space:pre-wrap"><br>
</span></div>
<div><span style="white-space:pre-wrap">I should be confident
that by cycling through all ensembl transcripts of a gene
and checking for a mRNA refseq entry I should be able to
pull out all transcripts that map . Correct?</span></div>
<div><span style="white-space:pre-wrap"><br>
</span></div>
</span><div><span style="white-space:pre-wrap">Thanks</span></div>
<div><span style="white-space:pre-wrap"><br>
</span></div>
<div><span style="white-space:pre-wrap">Duarte</span></div>
<div> </div>
</div>
<div class="gmail_extra"><span><br clear="all">
<div>
<div><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br>
Duarte Miguel Paulo Molha <br>
</font>
<div><font style="background-color:rgb(255,255,255)" color="#999999"> <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a>
<br>
=========================</font></div>
</div>
</div>
<br>
</span><div class="gmail_quote"><span>On 10 March 2015 at 16:20, mag <span dir="ltr"><<a href="mailto:mr6@ebi.ac.uk" target="_blank">mr6@ebi.ac.uk</a>></span>
wrote:<br>
</span><div><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Hi Duarte,<br>
<br>
It is important to bear in mind that Ensembl and RefSeq
transcripts are different objects.<br>
<br>
There is a large overlap between the two resources, but
small differences in coding sequence and UTRs mean that
there is not always a one-to-one mapping between an
Ensembl transcript and a RefSeq transcript.<br>
This also means that an Ensembl transcript might overlap
some RefSeq exons, but not all.<br>
<br>
In your use-case however, you should be able to get the
information you want by replacing the following call:<br>
$gene->get_all_DBLinks( 'RefSeq_mRNA')<br>
with $transcript->get_all_DBEntries('RefSeq_mRNA')<br>
<br>
RefSeq_mRNA corresponds to RefSeq transcripts, which we
consequently map to Ensembl transcripts.<br>
With your current script, you are fetching all genes where
at least one transcript is mapped to a RefSeq transcript.<br>
Instead, you can directly fetch only the transcripts which
have a mapping to RefSeq.<br>
<br>
<br>
Hope that helps,<br>
Magali<span><br>
<br>
<div>On 10/03/2015 15:30, Duarte Molha wrote:<br>
</div>
</span>
<blockquote type="cite"><span>
<div dir="ltr">Thanks Keiron
<div><br>
</div>
<div>But this still leaves me with a question.</div>
<div><br>
</div>
<div>Say that I have a gene, and I retreive the
correct gene object from the ensembl database. How
can I output only the transcripts that are
referenced in Refseq is not my the way I have done
it?</div>
<div><br>
</div>
<div>If I go the normal way, the
$gene->get_all_Transcripts(); method will
retrieve all ensembl transcripts. How can I limit
it to only get transcripts that are refseq?</div>
<div><br>
</div>
<div>Thanks</div>
<div><br>
</div>
<div>Duarte</div>
</div>
</span>
<div class="gmail_extra"><span><br clear="all">
<div>
<div><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br>
Duarte Miguel Paulo Molha <br>
</font>
<div><font style="background-color:rgb(255,255,255)" color="#999999"> <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a>
<br>
=========================</font></div>
</div>
</div>
<br>
</span>
<div class="gmail_quote"><span>On 10 March
2015 at 15:22, Kieron Taylor <span dir="ltr"><<a href="mailto:ktaylor@ebi.ac.uk" target="_blank">ktaylor@ebi.ac.uk</a>></span>
wrote:<br>
</span>
<div>
<div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Duarte,<br>
<br>
The issue you have exposed is subtle. You seem
to be printing “exon stable IDs” but expecting
them to be RefSeq accessions. Our mistake was
to use the RefSeq IDs as arbitrary identifiers
for internal use, but I must stress the what
Ensembl calls a Stable ID must never be
assumed to have any meaning outside of an
Ensembl database. What you want are display
labels. The exon labels were generated by
picking only the first of any possible RefSeq
IDs, hence you cannot get everything you want
in this way.<br>
<br>
The correct way to handle this in your code is
to fetch the transcript name and print that in
each exon, as RefSeq IDs refer to transcripts
and not exons.<br>
<br>
<br>
Regards,<br>
<br>
Kieron<br>
<br>
<br>
Kieron Taylor PhD.<br>
Ensembl Core senior software developer<br>
<br>
EMBL, European Bioinformatics Institute<br>
<div>
<div><br>
<br>
<br>
<br>
<br>
> On 10 Mar 2015, at 11:57, Duarte
Molha <<a href="mailto:duartemolha@gmail.com" target="_blank">duartemolha@gmail.com</a>>
wrote:<br>
><br>
> Dear developers<br>
><br>
> I have a script that I wrote (in
attachment) that gets me the refseq exons
for give input gene<br>
><br>
> However when I use this code using
the gene ASXL1 as an example is:<br>
><br>
> <a href="http://test_query.pl" target="_blank">test_query.pl</a> ASXL1<br>
><br>
> QueryName feature_type
common_name Biotype id chr
start end strand<br>
> ASXL1 Exon ASXL1 protein_coding
NM_001164603.1.1 chr20 30946147
30946635 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_001164603.1.2 chr20 30954187
30954269 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_001164603.1.3 chr20 30955530
30955532 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_001164603.1.4 chr20 30956818
30956926 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.5 chr20 31015931
31016051 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.6 chr20 31016128
31016225 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.7 chr20 31017141
31017234 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.8 chr20 31017704
31017856 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.9 chr20 31019124
31019287 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.10 chr20 31019386
31019482 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.11 chr20 31020683
31020788 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.12 chr20 31021087
31021720 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.13 chr20 31022235
31027122 +<br>
><br>
><br>
> As you can see, I am missing some of
the exons for transcript NM_015338.5<br>
> In this case, the 1st 3 exons of
transcript NM_015338.5 are identical to
NM_001164603.1, but I would expect to have
them listed as :<br>
><br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.1 chr20 30946147
30946635 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.2 chr20 30954187
30954269 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.3 chr20 30955530
30955532 +<br>
><br>
> Can you tell me what is wrong with my
approach and how I can retrieve the
missing data?<br>
><br>
> Best regards<br>
><br>
> Duarte<br>
</div>
</div>
> <<a href="http://test_query.pl" target="_blank">test_query.pl</a>>_______________________________________________<br>
> Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
> Posting guidelines and
subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
> Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br>
<br>
_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe
info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
</blockquote>
</div>
</div>
</div>
<br>
</div>
<div>
<div> <br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
</div>
</div>
</blockquote>
<br>
</div>
<br>
_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br>
</blockquote>
</div></div></div>
<br>
</div><div><div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
</div></div></blockquote>
<br>
</div>
<br>_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>