<div dir="ltr">Thanks ... I think I have understood<div><br></div><div>Just confirm one thing to me ... </div><div><br></div><div>if I get all ensembl transcripts of any given gene at least one of those transcripts will have a database mapping to refseq correct?</div><div><br></div><div>for example ... consider the code:</div><div><br></div><div><div><span style="white-space:pre-wrap">$transcripts = $gene->get_all_Transcripts();
while ( my $transcript = shift @{$transcripts} ) {
my %transcripts_refseq_ids = ();
foreach my $dbe (@{ $transcript->get_all_DBEntries() }) {
if($dbe->dbname() eq "RefSeq_mRNA") {
$transcripts_refseq_ids{ $dbe->display_id() } = 1;
}
}
}</span></div></div><div><span style="white-space:pre-wrap"><br></span></div><div><span style="white-space:pre-wrap">I should be confident that by cycling through all ensembl transcripts of a gene and checking for a mRNA refseq entry I should be able to pull out all transcripts that map . Correct?</span></div><div><span style="white-space:pre-wrap"><br></span></div><div><span style="white-space:pre-wrap">Thanks</span></div><div><span style="white-space:pre-wrap"><br></span></div><div><span style="white-space:pre-wrap">Duarte</span></div><div> </div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br> Duarte Miguel Paulo Molha <br></font><div><font style="background-color:rgb(255,255,255)" color="#999999"> <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a> <br>=========================</font></div></div></div>
<br><div class="gmail_quote">On 10 March 2015 at 16:20, mag <span dir="ltr"><<a href="mailto:mr6@ebi.ac.uk" target="_blank">mr6@ebi.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hi Duarte,<br>
<br>
It is important to bear in mind that Ensembl and RefSeq transcripts
are different objects.<br>
<br>
There is a large overlap between the two resources, but small
differences in coding sequence and UTRs mean that there is not
always a one-to-one mapping between an Ensembl transcript and a
RefSeq transcript.<br>
This also means that an Ensembl transcript might overlap some RefSeq
exons, but not all.<br>
<br>
In your use-case however, you should be able to get the information
you want by replacing the following call:<br>
$gene->get_all_DBLinks( 'RefSeq_mRNA')<br>
with $transcript->get_all_DBEntries('RefSeq_mRNA')<br>
<br>
RefSeq_mRNA corresponds to RefSeq transcripts, which we consequently
map to Ensembl transcripts.<br>
With your current script, you are fetching all genes where at least
one transcript is mapped to a RefSeq transcript.<br>
Instead, you can directly fetch only the transcripts which have a
mapping to RefSeq.<br>
<br>
<br>
Hope that helps,<br>
Magali<span class=""><br>
<br>
<div>On 10/03/2015 15:30, Duarte Molha
wrote:<br>
</div>
</span><blockquote type="cite"><span class="">
<div dir="ltr">Thanks Keiron
<div><br>
</div>
<div>But this still leaves me with a question.</div>
<div><br>
</div>
<div>Say that I have a gene, and I retreive the correct gene
object from the ensembl database. How can I output only the
transcripts that are referenced in Refseq is not my the way I
have done it?</div>
<div><br>
</div>
<div>If I go the normal way, the
$gene->get_all_Transcripts(); method will retrieve all
ensembl transcripts. How can I limit it to only get
transcripts that are refseq?</div>
<div><br>
</div>
<div>Thanks</div>
<div><br>
</div>
<div>Duarte</div>
</div>
</span><div class="gmail_extra"><span class=""><br clear="all">
<div>
<div><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br>
Duarte Miguel Paulo Molha <br>
</font>
<div><font style="background-color:rgb(255,255,255)" color="#999999"> <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a>
<br>
=========================</font></div>
</div>
</div>
<br>
</span><div class="gmail_quote"><span class="">On 10 March 2015 at 15:22, Kieron
Taylor <span dir="ltr"><<a href="mailto:ktaylor@ebi.ac.uk" target="_blank">ktaylor@ebi.ac.uk</a>></span>
wrote:<br>
</span><div><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear
Duarte,<br>
<br>
The issue you have exposed is subtle. You seem to be
printing “exon stable IDs” but expecting them to be RefSeq
accessions. Our mistake was to use the RefSeq IDs as
arbitrary identifiers for internal use, but I must stress
the what Ensembl calls a Stable ID must never be assumed to
have any meaning outside of an Ensembl database. What you
want are display labels. The exon labels were generated by
picking only the first of any possible RefSeq IDs, hence you
cannot get everything you want in this way.<br>
<br>
The correct way to handle this in your code is to fetch the
transcript name and print that in each exon, as RefSeq IDs
refer to transcripts and not exons.<br>
<br>
<br>
Regards,<br>
<br>
Kieron<br>
<br>
<br>
Kieron Taylor PhD.<br>
Ensembl Core senior software developer<br>
<br>
EMBL, European Bioinformatics Institute<br>
<div>
<div><br>
<br>
<br>
<br>
<br>
> On 10 Mar 2015, at 11:57, Duarte Molha <<a href="mailto:duartemolha@gmail.com" target="_blank">duartemolha@gmail.com</a>>
wrote:<br>
><br>
> Dear developers<br>
><br>
> I have a script that I wrote (in attachment) that
gets me the refseq exons for give input gene<br>
><br>
> However when I use this code using the gene ASXL1
as an example is:<br>
><br>
> <a href="http://test_query.pl" target="_blank">test_query.pl</a>
ASXL1<br>
><br>
> QueryName feature_type common_name
Biotype id chr start end strand<br>
> ASXL1 Exon ASXL1 protein_coding
NM_001164603.1.1 chr20 30946147
30946635 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_001164603.1.2 chr20 30954187
30954269 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_001164603.1.3 chr20 30955530
30955532 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_001164603.1.4 chr20 30956818
30956926 +<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.5 chr20 31015931 31016051
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.6 chr20 31016128 31016225
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.7 chr20 31017141 31017234
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.8 chr20 31017704 31017856
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.9 chr20 31019124 31019287
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.10 chr20 31019386 31019482
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.11 chr20 31020683 31020788
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.12 chr20 31021087 31021720
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.13 chr20 31022235 31027122
+<br>
><br>
><br>
> As you can see, I am missing some of the exons for
transcript NM_015338.5<br>
> In this case, the 1st 3 exons of transcript
NM_015338.5 are identical to NM_001164603.1, but I would
expect to have them listed as :<br>
><br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.1 chr20 30946147 30946635
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.2 chr20 30954187 30954269
+<br>
> ASXL1 Exon ASXL1 protein_coding
NM_015338.5.3 chr20 30955530 30955532
+<br>
><br>
> Can you tell me what is wrong with my approach and
how I can retrieve the missing data?<br>
><br>
> Best regards<br>
><br>
> Duarte<br>
</div>
</div>
> <<a href="http://test_query.pl" target="_blank">test_query.pl</a>>_______________________________________________<br>
> Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
> Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
> Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br>
<br>
_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
</blockquote>
</div></div></div>
<br>
</div><div><div class="h5">
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
</div></div></blockquote>
<br>
</div>
<br>_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br></div>