<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Dear Zhang<br>

    <br>

    The query/target naming has always been quite confusing in the

    pairwise alignment pipeline. When running an alignment, you use a

    sequence (query) to look for similar regions in another sequence

    (target genome). The pairwise alignment pipeline has three major

    steps: (a) raw alignments; (b) chaining; and (c) netting. The

    chaining step tries to link all the raw alignments that are in the

    same order and orientation to create a longer structure called

    chain. The netting step requires you to define a target genome such

    as the so-called nets are the subset of chains that form the

    best-in-genome alignment. In other words, the final set will provide

    you for each bp of the target genome with the best match on the

    other genome.<br>

    <br>

    As I said earlier, query and target are often times confusing terms.

    To make the situation worse, we use to run the pipeline such as the

    query for the raw alignment step was the target for the chaining and

    netting steps and vice versa. We have now changed the way we refer

    to both sequences and call them reference and non-reference genomes.

    We find that nomenclature less confusing.<br>

    <br>

    Kind regards<br>

    <br>

    Javier<br>

    <br>

    On 18/04/12 09:43, Dan Barrell wrote:

    <blockquote cite="mid:4F8E7EA9.1080701@sanger.ac.uk" type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      Hi,<br>

      <br>

      The document low_coverage_gene_build.txt is quite old and possibly

      very out of date as we no longer build on low coverage genomes in

      Ensembl. As far as I know, the reason that the semantics of the

      reference and target terms got swapped is to do with the

      importance of directionality in a Net. When dealing with the low

      coverage genomes the idea was that they wanted the species they

      were projecting onto as the reference because it is important that

      each bp in the target species aligns to at most one location in

      the reference species. <br>

      <br>

      I would suggest you also look at the Ensembl Compara documentation

      which is maintained here:<br>

      <br>

      ensembl-compara/docs/README-low-coverage-genome-aligner<br>

      <br>

      Dan<br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      On 17/04/12 12:47, Zhang Di wrote:

      <blockquote

cite="mid:CAMHeD-eXYuTGRu46M2Hb=VhYufJWPiUmgNev=m7vtVn6ZhjDaA@mail.gmail.com"

        type="cite">Hi, 

        <div><br>

        </div>

        <div>I'm using ensembl pipeline for projection genebuild.</div>

        <div><br>

        </div>

        <div>when I read the doc low_coverage_gene_build.txt, I was

          confused by the target/query genome terms.</div>

        <div><br>

        </div>

        <div>It calls our newly sequenced genome the target, calls the

          reference genome the query.</div>

        <div><br>

        </div>

        <div>It is contrary to lastz terms where target means reference

          and query means our sequences.</div>

        <div><br>

        </div>

        <div>It just OK if I stick to this convention.</div>

        <div><br>

        </div>

        <div>However,</div>

        <div><br>

        </div>

        <div>In the whole genome alignment section in the same doc,</div>

        <div><br>

        </div>

        <div>It says that :</div>

        <div><br>

        </div>

        <div>    "each bp in the target genome should be represented at

          most once."</div>

        <div><br>

        </div>

        <div>What does it mean by saying "target"?</div>

        <div><br>

        </div>

        <div>lastz-chain-net produces the lastz termed "target genome"

          with this property.</div>

        <div><br>

        </div>

        <div>Does it mean that I should set my genome as the reference

          genome, while the genome from ensembl such as "human" as the

          non-reference in the compara/hive pipeline?</div>

        <div><br>

        </div>

        <div>I can project human genes to my genome with this somewhat

          weird setting, in the next wga2genes step?</div>

        <div><br>

        </div>

        <div>Some slice of human genome containing genes may exist

          several times in the compara_db, how can it produce gene

          projection right here?</div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>Thanks</div>

        <div><br>

        </div>

        <div>Best Reguards</div>

        <div><br>

        </div>

        <div>-- <br>

          Zhang Di<br>

        </div>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <br>

        <pre wrap="">_______________________________________________

Dev mailing list    <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>

List admin (including subscribe/unsubscribe): <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>

</pre>

      </blockquote>

      <br>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>

List admin (including subscribe/unsubscribe): <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Javier Herrero, PhD

Ensembl Coordinator and Ensembl Compara Project Leader

European Bioinformatics Institute (EMBL-EBI)

Wellcome Trust Genome Campus, Hinxton

Cambridge - CB10 1SD - UK</pre>

  </body>

</html>