<div dir="ltr">Hi Matthieu,<div><br></div><div>The script you provided helps a lot!</div><div>For a genomic region in human, I need its alignment in another species. Therefore, I need to know the species names. Here is the short script, but it returns no names. My question is how to find the species names used by the compara API? Another question is about the genomic version, I think by default the perl API uses human genome 38, but how to use 37 with the API?</div><div>Many thanks!</div><div>Jinrui</div><div>





<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span class="gmail-s1" style="font-variant-ligatures:no-common-ligatures;color:rgb(208,59,255)">use</span><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures"> </span><span class="gmail-s3" style="font-variant-ligatures:no-common-ligatures;color:rgb(52,161,161)">Bio</span><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures">::EnsEMBL::Registry;</span></p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);min-height:11px"><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures"></span><br></p>
<p class="gmail-p3" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(175,55,130)"><span class="gmail-s4" style="font-variant-ligatures:no-common-ligatures;color:rgb(52,163,39)">my</span><span class="gmail-s5" style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0)"> $</span><span class="gmail-s6" style="font-variant-ligatures:no-common-ligatures;color:rgb(205,121,35)">registry</span><span class="gmail-s5" style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0)"> = </span><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures">'Bio::EnsEMBL::Registry'</span><span class="gmail-s5" style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0)">;</span></p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);min-height:11px"><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures"></span><br></p>
<p class="gmail-p3" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(175,55,130)"><span class="gmail-s5" style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0)">$</span><span class="gmail-s6" style="font-variant-ligatures:no-common-ligatures;color:rgb(205,121,35)">registry</span><span class="gmail-s5" style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0)">->load_all(</span><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures">"configuration_file"</span><span class="gmail-s5" style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0)">);</span></p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);min-height:11px"><span style="font-variant-ligatures:no-common-ligatures;color:rgb(215,57,30)">                                                                                               </span></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span class="gmail-s4" style="font-variant-ligatures:no-common-ligatures;color:rgb(52,163,39)">my</span><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures"> @</span><span class="gmail-s7" style="text-decoration-line:underline;font-variant-ligatures:no-common-ligatures;color:rgb(205,121,35)">species_names</span><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures"> = @{$</span><span class="gmail-s6" style="font-variant-ligatures:no-common-ligatures;color:rgb(205,121,35)">registry</span><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures">->get_all_species() };</span></p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(0,0,0);min-height:11px"><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures"></span><br></p>
<p class="gmail-p3" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(175,55,130)"><span class="gmail-s5" style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0)">print </span><span class="gmail-s2" style="font-variant-ligatures:no-common-ligatures">"@species_names\n"</span><span class="gmail-s5" style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0)">;</span></p><p class="gmail-p3" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:10px;line-height:normal;font-family:Menlo;color:rgb(175,55,130)"><span class="gmail-s5" style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0)"><br></span></p></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 20, 2019 at 1:58 PM Matthieu Muffato <<a href="mailto:muffato@ebi.ac.uk">muffato@ebi.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    <p>Hi Jinrui</p>
    <p>Have a look back at my first email. There is an example URL to
      our REST API server to get the alignment of a human region</p>
    <p>If you want to use the Perl API, you can adapt
<a class="gmail-m_8876899558341469702moz-txt-link-freetext" href="https://github.com/Ensembl/ensembl-presentation/blob/master/API/Compara/exercises/gab1.pl" target="_blank">https://github.com/Ensembl/ensembl-presentation/blob/master/API/Compara/exercises/gab1.pl</a><br>
    </p>
    <p>Regards,<br>
      Matthieu<br>
    </p>
    <div class="gmail-m_8876899558341469702moz-cite-prefix">On 18/03/2019 16:53, Jin-Rui Xu wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">Hi Matthieu,
        <div><br>
        </div>
        <div>I am going to use the human self-alignment to detect
          paralogous genomic regions (particularly non coding regions).
          But I can not find examples of API for this purpose. Could you
          pass me some scripts or examples where I can start? Say I have
          a human genomic coordinate, and want to find its paralogous
          regions and alignments.</div>
        <div>Many thanks.</div>
        <div>Jinrui</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Wed, Mar 13, 2019 at 8:37
          AM Matthieu Muffato <<a href="mailto:muffato@ebi.ac.uk" target="_blank">muffato@ebi.ac.uk</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div bgcolor="#FFFFFF">
            <p>Hi Jinrui</p>
            <p>In all our pairwise alignments, we refine the LastZ
              alignment blocks with two steps called "chaining" and
              "netting" (see <a class="gmail-m_8876899558341469702gmail-m_-1501946497467315318moz-txt-link-freetext" href="http://europepmc.org/articles/PMC4852398" target="_blank">http://europepmc.org/articles/PMC4852398</a>
              and <a class="gmail-m_8876899558341469702gmail-m_-1501946497467315318moz-txt-link-freetext" href="http://genomewiki.ucsc.edu/index.php/Chains_Nets" target="_blank">http://genomewiki.ucsc.edu/index.php/Chains_Nets</a>
              for more information). What you get in our database is the
              product of these two steps.<br>
              The netting phase is done on the reference species only,
              we don't do bidirectional netting. This means that there
              is very little overlap / nesting on the reference species
              (human in the case of the human vs * alignments). Overlap
              / nesting is allowed on the non-reference species, though.
              For instance, in the human-mouse alignments, there are
              20,000 pairs of blocks that overlap on human, and
              1,900,000 pairs of blocks that overlap on mouse.<br>
            </p>
            <p>So in this case, yes you can identify human paralogous
              regions 1) through the self-alignment and 2) through the
              human-mouse alignment (or any pairwise alignment that
              involves human) by finding human regions that align to the
              same region in the other species</p>
            <p>Hope this helps,</p>
            <p>Matthieu<br>
            </p>
            <div class="gmail-m_8876899558341469702gmail-m_-1501946497467315318moz-cite-prefix">On
              11/03/2019 19:45, Jin-Rui Xu wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">
                <div>Hi Matthieu,</div>
                <div><br>
                </div>
                <div>Thank you very much for your email. </div>
                <div><br>
                </div>
                <div>I am wondering in the human self alignment, one
                  genomic region may be mapped to multiple other
                  regions. These multiple hits also exist in e.g. human
                  vs mouse genome alignment.</div>
                <div>Does ensembl provide all these multiple regions or
                  just the best one? Can these multiple hits achieved by
                  compara perl API?</div>
                <div><br>
                </div>
                <div>Thanks!</div>
                <div>Jinrui</div>
                <div><br>
                </div>
                <div><br>
                </div>
                <div>  </div>
                <div><br>
                </div>
                <br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">On Mon, Mar 11, 2019
                    at 3:05 PM Matthieu Muffato <<a href="mailto:muffato@ebi.ac.uk" target="_blank">muffato@ebi.ac.uk</a>>
                    wrote:<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear Jinrui,<br>
                    <br>
                    We have a human self-alignment, that has been
                    computed with LastZ and <br>
                    identifies paralogous regions within the genome. You
                    can find the whole <br>
                    alignment on the FTP <br>
                    <a href="ftp://ftp.ensembl.org/pub/current_maf/ensembl-compara/pairwise_alignments/" rel="noreferrer" target="_blank">ftp://ftp.ensembl.org/pub/current_maf/ensembl-compara/pairwise_alignments/</a>
                    <br>
                    but also query specific regions: <br>
                    <a href="http://rest.ensembl.org/alignment/region/homo_sapiens/17:63997797-64000390:1?species_set=homo_sapiens;content-type=application/json;method=LASTZ_NET" rel="noreferrer" target="_blank">http://rest.ensembl.org/alignment/region/homo_sapiens/17:63997797-64000390:1?species_set=homo_sapiens;content-type=application/json;method=LASTZ_NET</a><br>
                    <br>
                    Human is the only species for which we have a
                    self-alignment.<br>
                    <br>
                    Kind regards,<br>
                    Matthieu<br>
                    <br>
                    On 09/03/2019 03:10, Jin-Rui Xu wrote:<br>
                    > Hello,<br>
                    ><br>
                    > I just started learning the compara API.
                    However, I am still not sure <br>
                    > whether it can address my questions. I am
                    wondering if someone could <br>
                    > give me some guidance and example scripts. Here
                    is my question: (1) I <br>
                    > want to identify all paralogous DNA fragments
                    (not neccessarily genes) <br>
                    > in a genome. One genomic regions may have more
                    than one duplicate. (2) <br>
                    > Then, I want to find in which of the other
                    species, the two paralogous <br>
                    > DNAs have a common ancestor.<br>
                    > Alternatively, I can focus on two genomic
                    regions in a genome to test <br>
                    > if they are paralogous, and then which species
                    has their common <br>
                    > ancestral DNA<br>
                    > How could I get this done using compara API
                    (version 95)?<br>
                    ><br>
                    > Many thanks!<br>
                    ><br>
                    > Jinrui<br>
                    <br>
                    -- <br>
                    Matthieu Muffato, Ph.D.<br>
                    Ensembl Compara and TreeFam Project Leader<br>
                    European Bioinformatics Institute (EMBL-EBI)<br>
                    European Molecular Biology Laboratory<br>
                    Wellcome Trust Genome Campus, Hinxton<br>
                    Cambridge, CB10 1SD, United Kingdom<br>
                    Room  A3-145<br>
                    Phone + 44 (0) 1223 49 4631<br>
                    Fax   + 44 (0) 1223 49 4468<br>
                    <br>
                  </blockquote>
                </div>
              </div>
            </blockquote>
            <pre class="gmail-m_8876899558341469702gmail-m_-1501946497467315318moz-signature" cols="72">-- 
Matthieu Muffato, Ph.D.
Ensembl Compara and TreeFam Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468</pre>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <pre class="gmail-m_8876899558341469702moz-signature" cols="72">-- 
Matthieu Muffato, Ph.D.
Ensembl Compara and TreeFam Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468</pre>
  </div>

</blockquote></div>