<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi Sabrina<br>

    <br>

    It is certainly possible to get proteins from several species.<br>

    <br>

    If you are interested in getting alignments for all possible

    isoforms (each possible protein from each gene), you would have to

    use the Ensembl families. These are groups of similar proteins, but

    you should not assume that they are all orthologues. To infer

    orthology, you need a phylogenetic tree. The trees we provide are

    built using only one single representative protein per gene.<br>

    <br>

    In your case, I would recommend to use the Ensembl families, query

    the families using each cow (this is you query species, isn't it?)

    protein and dump the alignments. There are several options for this.

    You may want to use all possible species (the families are built

    using Ensembl and non-Ensembl proteins) or limit the alignment to a

    subset of species. Also, in some cases you will find that more than

    one cow proteins are in the same family, so you will get duplicated

    alignments. Is this OK?<br>

    <br>

    Kind regards<br>

    <br>

    Javier<br>

    <br>

    <div class="moz-cite-prefix">On 05/11/12 13:47, srodriguez wrote:<br>

    </div>

    <blockquote

      cite="mid:20121105144701.114227a0o1ysiiqs@www2.jouy.inra.fr"

      type="cite">Hi Javier,

      <br>

      <br>

      Thank you for your answer.

      <br>

      <br>

      Actually, I would like to obtain, 1 file per protein query aligned

      to all other species ortholog proteins (and not 1 sequence to 1

      sequence).

      <br>

      <br>

      ex:

      <br>

      for protein ENSBTAP00000032594, the file containing:

      <br>

      ENSBTAP00000032594/1-397

      MDALRASAAKPPTGRKMKARAPPPPGKPATPNLHSGQRSPRRASPGPPQNQLSR

      <br>

      ENSP00000265136/1-1261  

      MDAPRASAAKPPTGRKMKARAPPPPGKAATLHVHSDQKPPHDGALGSQQNLVRMK

      <br>

      ENSSPECIE2...

      <br>

      ENSSPECIE3...

      <br>

                               *** ***********************.** ::**.*:.*:

      .: *. ** :

      <br>

      <br>

      Also, I would like to have 1 file per protein from the query, and

      if a gene has several proteins, obtain all the proteins query as

      single files with the alignment as above.

      <br>

      <br>

      Do you know if it is feasible to obtain such an output with

      Ensembl compara?

      <br>

      <br>

      In that case, could you please modify the script to obtain it?

      <br>

      <br>

      Thank you very much in advance.

      <br>

      <br>

      Best regards,

      <br>

      <br>

      Sabrina.

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      Javier Herrero <a class="moz-txt-link-rfc2396E" href="mailto:jherrero@ebi.ac.uk"><jherrero@ebi.ac.uk></a> a écrit :

      <br>

      <br>

      <blockquote type="cite">Dear Sabrina

        <br>

        <br>

        I have modified the script slightly only. Essentially, I have

        removed some bits that were not required and cleaned up the code

        a little. I have also added the possibility of specifying the

        query and the target species in the command line. Last, I have

        also changed the script to output the alignments into separate

        files.

        <br>

        <br>

        Your strategy using the ENSEMBLGENE was correct. Indeed, you get

        two proteins aligned. I believe this is what you want, isn't it?

        <br>

        <br>

        I have added a few comments. Let me know if there something that

        is not clear.

        <br>

        <br>

        Javier

        <br>

        <br>

        On 22/10/12 15:58, srodriguez wrote:

        <br>

        <blockquote type="cite">Dear all,

          <br>

          <br>

          I would like to use compara EnsEMBL API to get the aligned

          protein sequences of a query animal with homologous protein

          sequences from other species.

          <br>

          <br>

          The script would take as input the query specie name, (and if

          possible the hit species names). The script would get the

          proteins of the query organism, then the homologous protein

          sequences, and then retrieves 1 file per protein query

          sequence containing the alignment of the query (placed as the

          first sequence) and then the other specie protein sequences

          aligned.

          <br>

          <br>

          I was thinking about using an "homology adaptor" with

          ENSEMBLPEP, so I started a script that way, but I do not

          obtain any results with ENSEMBLPEP and the results with

          ENSEMBLGENE are 2 sequences per alignment (see script

          attached).

          <br>

          <br>

          I also tried with "families", but sometimes, I do not get the

          protein sequence for my specie query in the sequence alignment

          even though I searched by using my taxon id (script N#2

          attached).

          <br>

          <br>

          Would you have a script that already performs my goal?

          <br>

          <br>

          If not, could you please help me reaching my goal?

          <br>

          <br>

          Thank you very much in advance.

          <br>

          <br>

          Best regards,

          <br>

          <br>

          Sabrina.

          <br>

          <br>

          <br>

          *******************************************

          <br>

          Sabrina Rodriguez

          <br>

          Bioinformatics

          <br>

          Département de Génétique animale

          <br>

          Unité GABI

          <br>

          Domaine de Vilvert

          <br>

          78532 Jouy en josas

          <br>

          <br>

          +33 (0) 1 34 65 29 53

          <br>

          <br>

          <br>

          _______________________________________________

          <br>

          Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>

          <br>

          Posting guidelines and subscribe/unsubscribe info:

          <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>

          <br>

          Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>

          <br>

        </blockquote>

        <br>

        -- <br>

        Javier Herrero, PhD

        <br>

        Ensembl Coordinator and Ensembl Compara Project Leader

        <br>

        European Bioinformatics Institute (EMBL-EBI)

        <br>

        Wellcome Trust Genome Campus, Hinxton

        <br>

        Cambridge - CB10 1SD - UK

        <br>

        <br>

        <br>

      </blockquote>

      <br>

      <br>

      <br>

      <br>

      *******************************************

      <br>

      Sabrina Rodriguez

      <br>

      Bioinformatics

      <br>

      Département de Génétique animale

      <br>

      Unité GABI

      <br>

      Domaine de Vilvert

      <br>

      78532 Jouy en josas

      <br>

      <br>

      +33 (0) 1 34 65 29 53<br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>

Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Javier Herrero, PhD

Ensembl Coordinator and Ensembl Compara Project Leader

European Bioinformatics Institute (EMBL-EBI)

Wellcome Trust Genome Campus, Hinxton

Cambridge - CB10 1SD - UK</pre>

  </body>

</html>