<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi Zhang<br>

    <br>

    In the PAIR_ALIGNER module, the reference_collection_name

    corresponds to the target_collection_name. In the other two modules

    (CHAIN_CONFIG and NET_CONFIG), the reference_collection_name

    corresponds to the query_collection_name. Part of the reason to

    change the name was to be able to use the same name for all 3

    modules.<br>

    <br>

    In short, just use the same genome as the reference_collection_name

    for all 3.<br>

    <br>

    Note that we have changed the way we run this pipeline and this

    configuration file is not longer supported. The documentation for

    the new pipeline is available under

    ensembl-compara/docs/README-pairaliger. Obviously, you can still use

    ensembl_64 API and pipeline for as long as you want.<br>

    <br>

    I hope this helps<br>

    <br>

    Javier<br>

    <br>

    On 18/04/12 13:33, Zhang Di wrote:

    <blockquote

cite="mid:CAMHeD-cCz9zoQ9CHKMi799xWxR-2b1WxWEHpeWpW=NEDUNhCNw@mail.gmail.com"

      type="cite">Thank you, Javier

      <div><br>

      </div>

      <div>You mean I should set some thing like below in the

        compara_2x.conf to prepare the compara_db for gene projection.</div>

      <div><br>

      </div>

      <blockquote style="margin:0 0 0 40px;border:none;padding:0px">

        <div>{</div>

        <div>TYPE => PAIR_ALIGNER,</div>

        <div>reference_collection_name => 'human',</div>

        <div>non_reference_collection_name => 'my_genome',</div>

        <div>}</div>

        <div>

          <div>{</div>

        </div>

        <div>

          <div>TYPE => CHAIN_CONFIG,</div>

        </div>

        <div>

          <div>non_reference_collection_name => 'human',</div>

        </div>

        <div>

          <div>reference_collection_name => 'my_genome',</div>

        </div>

        <div>

          <div>}</div>

        </div>

        <div>

          <div>{</div>

        </div>

        <div>

          <div>TYPE => NET_CONFIG,</div>

        </div>

        <div>

          <div>non_reference_collection_name => 'human',</div>

        </div>

        <div>

          <div>reference_collection_name => 'my_genome',</div>

        </div>

        <div>

          <div>}</div>

        </div>

        <div><br>

        </div>

      </blockquote>

      <div>By the way, I'm using ensembl-compara version 64.</div>

      <div><br>

        <div class="gmail_quote">On Wed, Apr 18, 2012 at 4:57 PM, Javier

          Herrero <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:jherrero@ebi.ac.uk">jherrero@ebi.ac.uk</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000"> Dear Zhang<br>

              <br>

              The query/target naming has always been quite confusing in

              the pairwise alignment pipeline. When running an

              alignment, you use a sequence (query) to look for similar

              regions in another sequence (target genome). The pairwise

              alignment pipeline has three major steps: (a) raw

              alignments; (b) chaining; and (c) netting. The chaining

              step tries to link all the raw alignments that are in the

              same order and orientation to create a longer structure

              called chain. The netting step requires you to define a

              target genome such as the so-called nets are the subset of

              chains that form the best-in-genome alignment. In other

              words, the final set will provide you for each bp of the

              target genome with the best match on the other genome.<br>

              <br>

              As I said earlier, query and target are often times

              confusing terms. To make the situation worse, we use to

              run the pipeline such as the query for the raw alignment

              step was the target for the chaining and netting steps and

              vice versa. We have now changed the way we refer to both

              sequences and call them reference and non-reference

              genomes. We find that nomenclature less confusing.<br>

              <br>

              Kind regards<br>

              <br>

              Javier

              <div>

                <div class="h5"><br>

                  <br>

                  On 18/04/12 09:43, Dan Barrell wrote:

                  <blockquote type="cite"> Hi,<br>

                    <br>

                    The document low_coverage_gene_build.txt is quite

                    old and possibly very out of date as we no longer

                    build on low coverage genomes in Ensembl. As far as

                    I know, the reason that the semantics of the

                    reference and target terms got swapped is to do with

                    the importance of directionality in a Net. When

                    dealing with the low coverage genomes the idea was

                    that they wanted the species they were projecting

                    onto as the reference because it is important that

                    each bp in the target species aligns to at most one

                    location in the reference species. <br>

                    <br>

                    I would suggest you also look at the Ensembl Compara

                    documentation which is maintained here:<br>

                    <br>

ensembl-compara/docs/README-low-coverage-genome-aligner<br>

                    <br>

                    Dan<br>

                    <br>

                    <br>

                    <br>

                    <br>

                    <br>

                    <br>

                    <br>

                    <br>

                    <br>

                    On 17/04/12 12:47, Zhang Di wrote:

                    <blockquote type="cite">Hi, 

                      <div><br>

                      </div>

                      <div>I'm using ensembl pipeline for projection

                        genebuild.</div>

                      <div><br>

                      </div>

                      <div>when I read the doc

                        low_coverage_gene_build.txt, I was confused by

                        the target/query genome terms.</div>

                      <div><br>

                      </div>

                      <div>It calls our newly sequenced genome the

                        target, calls the reference genome the query.</div>

                      <div><br>

                      </div>

                      <div>It is contrary to lastz terms where target

                        means reference and query means our sequences.</div>

                      <div><br>

                      </div>

                      <div>It just OK if I stick to this convention.</div>

                      <div><br>

                      </div>

                      <div>However,</div>

                      <div><br>

                      </div>

                      <div>In the whole genome alignment section in the

                        same doc,</div>

                      <div><br>

                      </div>

                      <div>It says that :</div>

                      <div><br>

                      </div>

                      <div>    "each bp in the target genome should be

                        represented at most once."</div>

                      <div><br>

                      </div>

                      <div>What does it mean by saying "target"?</div>

                      <div><br>

                      </div>

                      <div>lastz-chain-net produces the lastz termed

                        "target genome" with this property.</div>

                      <div><br>

                      </div>

                      <div>Does it mean that I should set my genome as

                        the reference genome, while the genome from

                        ensembl such as "human" as the non-reference in

                        the compara/hive pipeline?</div>

                      <div><br>

                      </div>

                      <div>I can project human genes to my genome with

                        this somewhat weird setting, in the next

                        wga2genes step?</div>

                      <div><br>

                      </div>

                      <div>Some slice of human genome containing

                        genes may exist several times in the compara_db,

                        how can it produce gene projection right here?</div>

                      <div><br>

                      </div>

                      <div><br>

                      </div>

                      <div>Thanks</div>

                      <div><br>

                      </div>

                      <div>Best Reguards</div>

                      <div><br>

                      </div>

                      <div>-- <br>

                        Zhang Di<br>

                      </div>

                      <br>

                      <fieldset></fieldset>

                      <br>

                      <pre>_______________________________________________

Dev mailing list    <a moz-do-not-send="true" href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>

List admin (including subscribe/unsubscribe): <a moz-do-not-send="true" href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a moz-do-not-send="true" href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>

</pre>

                    </blockquote>

                    <br>

                    <br>

                    <br>

                    <fieldset></fieldset>

                    <br>

                    <pre>_______________________________________________

Dev mailing list    <a moz-do-not-send="true" href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>

List admin (including subscribe/unsubscribe): <a moz-do-not-send="true" href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a moz-do-not-send="true" href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>

</pre>

                  </blockquote>

                  <br>

                </div>

              </div>

              <span class="HOEnZb"><font color="#888888">

                  <pre cols="72">-- 

Javier Herrero, PhD

Ensembl Coordinator and Ensembl Compara Project Leader

European Bioinformatics Institute (EMBL-EBI)

Wellcome Trust Genome Campus, Hinxton

Cambridge - CB10 1SD - UK</pre>

                </font></span></div>

            <br>

            _______________________________________________<br>

            Dev mailing list    <a moz-do-not-send="true"

              href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>

            List admin (including subscribe/unsubscribe): <a

              moz-do-not-send="true"

              href="http://lists.ensembl.org/mailman/listinfo/dev"

              target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

            Ensembl Blog: <a moz-do-not-send="true"

              href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>

            <br>

          </blockquote>

        </div>

        <br>

        <br clear="all">

        <div><br>

        </div>

        -- <br>

        Zhang Di<br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>

List admin (including subscribe/unsubscribe): <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Javier Herrero, PhD

Ensembl Coordinator and Ensembl Compara Project Leader

European Bioinformatics Institute (EMBL-EBI)

Wellcome Trust Genome Campus, Hinxton

Cambridge - CB10 1SD - UK</pre>

  </body>

</html>