<div dir="ltr">Sorry for the crazy formated text... ;)<div><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br>     Duarte Miguel Paulo Molha      <br></font><div><font style="background-color:rgb(255,255,255)" color="#999999">         <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a>         <br>=========================</font></div></div></div>
<br><div class="gmail_quote">On 11 March 2015 at 16:32, Duarte Molha <span dir="ltr"><<a href="mailto:duartemolha@gmail.com" target="_blank">duartemolha@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks Magali <div><br></div><div>Are there any plans to correct the naming structure of your refseq imports?<div><br></div><div>Take as an example the transcript NM_001164603 associated with gene ASXL1</div><div><br></div><div>On your main database you have associated it with transcript ID <h1 style="color:rgb(51,51,51);margin:0px 0px 1em;font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif"><span style="color:rgb(85,85,85)"><font>ENST00000497249 <span style="font-weight:normal">that has a transcript length of 296 bp and 5 exon</span></font></span></h1><div><a href="http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000171456;r=20:32358811-32372242;t=ENST00000497249" target="_blank">ENST00000497249</a><br></div><div><br></div><h1 style="color:rgb(51,51,51);margin:0px 0px 1em;font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif"><span style="font-family:arial,sans-serif;font-weight:normal;color:rgb(34,34,34)"><font>However, the true refseq transcript is stored in your other features as a novel transcript with the correct identifier</font></span></h1><div><a href="http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=otherfeatures;g=171023;r=20:32358344-32372549;t=NM_001164603.1" target="_blank">NM_001164603.1</a><br></div><div><br></div><div>It contains 5 exons but a transcript length of 1070bp</div><div><br></div><div>however, </div><div><br></div><div>In that table you call the transcript with the correct REFSEQ_ID, but then on the exon layer, you pick different identifiers for exons 2,3 and 4 of this transcript!</div><div><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32358294-32358882;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">NM_001164603.1.1</a><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32366334-32366516;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">XM_006723729.1.2</a><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32367677-32367779;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">XM_006723729.1.3</a><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32368965-32369173;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">XM_006723729.1.4</a><br></div><div><a href="http://www.ensembl.org/Homo_sapiens/Location/View?db=otherfeatures;g=171023;r=20:32372114-32372599;t=NM_001164603.1" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;background-color:rgb(240,240,240)" target="_blank">NM_001164603.1.5</a><br></div><div><br></div><div><br></div><div><br></div><div>Many thanks</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Duarte</div><div><br></div><div><br></div></font></span></div></div><div class="gmail_extra"><span class=""><br clear="all"><div><div><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br>     Duarte Miguel Paulo Molha      <br></font><div><font style="background-color:rgb(255,255,255)" color="#999999">         <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a>         <br>=========================</font></div></div></div>
<br></span><div><div class="h5"><div class="gmail_quote">On 11 March 2015 at 09:35, mag <span dir="ltr"><<a href="mailto:mr6@ebi.ac.uk" target="_blank">mr6@ebi.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Hi Duarte,<br>
    <br>
    I am not convinced all genes in Ensembl will have at least one
    mapping to RefSeq, but your snippet of code should work regardless.<br>
    <br>
    <br>
    Regards,<br>
    Magali<span><br>
    <br>
    <div>On 10/03/2015 17:05, Duarte Molha
      wrote:<br>
    </div>
    </span><blockquote type="cite">
      <div dir="ltr"><span>Thanks ... I think I have understood
        <div><br>
        </div>
        <div>Just confirm one thing to me ... </div>
        <div><br>
        </div>
        <div>if I get all ensembl transcripts of any given gene at least
          one of those transcripts will have a database mapping to
          refseq correct?</div>
        <div><br>
        </div>
        <div>for example ... consider the code:</div>
        <div><br>
        </div>
        <div>
          <div><span style="white-space:pre-wrap">$transcripts =
              $gene->get_all_Transcripts();
              while ( my $transcript = shift @{$transcripts} ) { my
              %transcripts_refseq_ids = (); foreach my $dbe (@{
              $transcript->get_all_DBEntries() }) {
              if($dbe->dbname() eq "RefSeq_mRNA") {
              $transcripts_refseq_ids{ $dbe->display_id() } = 1; } }
              }</span></div>
        </div>
        <div><span style="white-space:pre-wrap"><br>
          </span></div>
        <div><span style="white-space:pre-wrap">I should be confident
            that by cycling through all ensembl transcripts of a gene
            and checking for a mRNA refseq entry I should be able to
            pull out all transcripts that map . Correct?</span></div>
        <div><span style="white-space:pre-wrap"><br>
          </span></div>
        </span><div><span style="white-space:pre-wrap">Thanks</span></div>
        <div><span style="white-space:pre-wrap"><br>
          </span></div>
        <div><span style="white-space:pre-wrap">Duarte</span></div>
        <div>             </div>
      </div>
      <div class="gmail_extra"><span><br clear="all">
        <div>
          <div><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br>
                   Duarte Miguel Paulo Molha      <br>
            </font>
            <div><font style="background-color:rgb(255,255,255)" color="#999999">         <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a> 
                       <br>
                =========================</font></div>
          </div>
        </div>
        <br>
        </span><div class="gmail_quote"><span>On 10 March 2015 at 16:20, mag <span dir="ltr"><<a href="mailto:mr6@ebi.ac.uk" target="_blank">mr6@ebi.ac.uk</a>></span>
          wrote:<br>
          </span><div><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"> Hi Duarte,<br>
              <br>
              It is important to bear in mind that Ensembl and RefSeq
              transcripts are different objects.<br>
              <br>
              There is a large overlap between the two resources, but
              small differences in coding sequence and UTRs mean that
              there is not always a one-to-one mapping between an
              Ensembl transcript and a RefSeq transcript.<br>
              This also means that an Ensembl transcript might overlap
              some RefSeq exons, but not all.<br>
              <br>
              In your use-case however, you should be able to get the
              information you want by replacing the following call:<br>
              $gene->get_all_DBLinks( 'RefSeq_mRNA')<br>
              with $transcript->get_all_DBEntries('RefSeq_mRNA')<br>
              <br>
              RefSeq_mRNA corresponds to RefSeq transcripts, which we
              consequently map to Ensembl transcripts.<br>
              With your current script, you are fetching all genes where
              at least one transcript is mapped to a RefSeq transcript.<br>
              Instead, you can directly fetch only the transcripts which
              have a mapping to RefSeq.<br>
              <br>
              <br>
              Hope that helps,<br>
              Magali<span><br>
                <br>
                <div>On 10/03/2015 15:30, Duarte Molha wrote:<br>
                </div>
              </span>
              <blockquote type="cite"><span>
                  <div dir="ltr">Thanks Keiron
                    <div><br>
                    </div>
                    <div>But this still leaves me with a question.</div>
                    <div><br>
                    </div>
                    <div>Say that I have a gene, and I retreive the
                      correct gene object from the ensembl database. How
                      can I output only the transcripts that are
                      referenced in Refseq is not my the way I have done
                      it?</div>
                    <div><br>
                    </div>
                    <div>If I go the normal way, the
                       $gene->get_all_Transcripts(); method will
                      retrieve all ensembl transcripts. How can I limit
                      it to only get transcripts that are refseq?</div>
                    <div><br>
                    </div>
                    <div>Thanks</div>
                    <div><br>
                    </div>
                    <div>Duarte</div>
                  </div>
                </span>
                <div class="gmail_extra"><span><br clear="all">
                    <div>
                      <div><font style="background-color:rgb(255,255,255)" color="#999999">=========================<br>
                               Duarte Miguel Paulo Molha      <br>
                        </font>
                        <div><font style="background-color:rgb(255,255,255)" color="#999999">         <a href="http://about.me/duarte" target="_blank">http://about.me/duarte</a> 
                                   <br>
                            =========================</font></div>
                      </div>
                    </div>
                    <br>
                  </span>
                  <div class="gmail_quote"><span>On 10 March
                      2015 at 15:22, Kieron Taylor <span dir="ltr"><<a href="mailto:ktaylor@ebi.ac.uk" target="_blank">ktaylor@ebi.ac.uk</a>></span>
                      wrote:<br>
                    </span>
                    <div>
                      <div>
                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Duarte,<br>
                          <br>
                          The issue you have exposed is subtle. You seem
                          to be printing “exon stable IDs” but expecting
                          them to be RefSeq accessions. Our mistake was
                          to use the RefSeq IDs as arbitrary identifiers
                          for internal use, but I must stress the what
                          Ensembl calls a Stable ID must never be
                          assumed to have any meaning outside of an
                          Ensembl database. What you want are display
                          labels. The exon labels were generated by
                          picking only the first of any possible RefSeq
                          IDs, hence you cannot get everything you want
                          in this way.<br>
                          <br>
                          The correct way to handle this in your code is
                          to fetch the transcript name and print that in
                          each exon, as RefSeq IDs refer to transcripts
                          and not exons.<br>
                          <br>
                          <br>
                          Regards,<br>
                          <br>
                          Kieron<br>
                          <br>
                          <br>
                          Kieron Taylor PhD.<br>
                          Ensembl Core senior software developer<br>
                          <br>
                          EMBL, European Bioinformatics Institute<br>
                          <div>
                            <div><br>
                              <br>
                              <br>
                              <br>
                              <br>
                              > On 10 Mar 2015, at 11:57, Duarte
                              Molha <<a href="mailto:duartemolha@gmail.com" target="_blank">duartemolha@gmail.com</a>>

                              wrote:<br>
                              ><br>
                              > Dear developers<br>
                              ><br>
                              > I have a script that I wrote (in
                              attachment)  that gets me the refseq exons
                              for give input gene<br>
                              ><br>
                              > However when I use this code using
                              the gene ASXL1 as an example is:<br>
                              ><br>
                              > <a href="http://test_query.pl" target="_blank">test_query.pl</a> ASXL1<br>
                              ><br>
                              > QueryName     feature_type   
                              common_name     Biotype id      chr   
                               start   end     strand<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_001164603.1.1        chr20   30946147 
                                    30946635        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_001164603.1.2        chr20   30954187 
                                    30954269        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_001164603.1.3        chr20   30955530 
                                    30955532        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_001164603.1.4        chr20   30956818 
                                    30956926        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.5   chr20   31015931       
                              31016051        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.6   chr20   31016128       
                              31016225        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.7   chr20   31017141       
                              31017234        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.8   chr20   31017704       
                              31017856        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.9   chr20   31019124       
                              31019287        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.10  chr20   31019386       
                              31019482        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.11  chr20   31020683       
                              31020788        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.12  chr20   31021087       
                              31021720        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.13  chr20   31022235       
                              31027122        +<br>
                              ><br>
                              ><br>
                              > As you can see, I am missing some of
                              the exons for transcript NM_015338.5<br>
                              > In this case, the 1st 3 exons of
                              transcript  NM_015338.5 are identical to
                              NM_001164603.1, but I would expect to have
                              them listed as :<br>
                              ><br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.1   chr20   30946147       
                              30946635        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.2   chr20   30954187       
                              30954269        +<br>
                              > ASXL1 Exon    ASXL1   protein_coding 
                              NM_015338.5.3   chr20   30955530       
                              30955532        +<br>
                              ><br>
                              > Can you tell me what is wrong with my
                              approach and how I can retrieve the
                              missing data?<br>
                              ><br>
                              > Best regards<br>
                              ><br>
                              > Duarte<br>
                            </div>
                          </div>
                          > <<a href="http://test_query.pl" target="_blank">test_query.pl</a>>_______________________________________________<br>
                          > Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
                          > Posting guidelines and
                          subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
                          > Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
                          <br>
                          <br>
_______________________________________________<br>
                          Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
                          Posting guidelines and subscribe/unsubscribe
                          info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
                          Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
                        </blockquote>
                      </div>
                    </div>
                  </div>
                  <br>
                </div>
                <div>
                  <div> <br>
                    <fieldset></fieldset>
                    <br>
                    <pre>_______________________________________________
Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
                  </div>
                </div>
              </blockquote>
              <br>
            </div>
            <br>
            _______________________________________________<br>
            Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
            Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
            Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
            <br>
          </blockquote>
        </div></div></div>
        <br>
      </div><div><div>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
    </div></div></blockquote>
    <br>
  </div>

<br>_______________________________________________<br>
Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>