Thank you for your kind help.<div><br></div><div>I'll try your suggestions.</div><div><br></div><div>Best Reguards</div><div><br></div><div><br><br><div class="gmail_quote">On Thu, Apr 19, 2012 at 1:09 PM, Javier Herrero <span dir="ltr"><<a href="mailto:jherrero@ebi.ac.uk">jherrero@ebi.ac.uk</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Dear Zhang<br>
    <br>
    InnoDB is much faster for updates in large tables, but inserts are a
    different matter.<br>
    <br>
    It might be a configuration error, make sure you can connect to all
    the databases you have configured in your pipeline<br>
    <br>
    If you believe your MySQL server cannot cope with the load, you can
    reduce the hive_capacity for the process. The hive_capacity tells
    eHive roughly how many concurrent jobs for that particular analysis
    you are happy to run concurrently. The value is in the
    analysis_stats table. You can either modify the value manually in
    the database or change it in the PairAligner_conf file if you are
    re-starting the pipeline.<br>
    <br>
    Kind regards<span class="HOEnZb"><font color="#888888"><br>
    <br>
    Javier</font></span><div><div class="h5"><br>
    <br>
    On 19/04/12 05:18, Zhang Di wrote:
    <blockquote type="cite">Hi Javier
      <div><br>
      </div>
      <div>Thank you for the explanation.</div>
      <div><br>
      </div>
      <div>I've tried the new ensembl_66 API. </div>
      <div><br>
      </div>
      <div>Compared ti 64, it changed too much, both in the compara
        database structure (such as, mysql store engine type MyISAM to
        Innodb) and the analysis work flow.</div>
      <div><br>
      </div>
      <div>When I run it according to
        ensembl-compara/docs/README-pairaliger, some workers failed in
        the lastz step complaining:"cannot connect to mysql server."</div>
      <div><br>
      </div>
      <div>Sounds like mysql time out error, although the concurrent
        write of Innodb table should be faster than that of MyISAM
        table.</div>
      <div><br>
      </div>
      <div>
        <div class="gmail_quote">On Wed, Apr 18, 2012 at 9:32 PM, Javier
          Herrero <span dir="ltr"><<a href="mailto:jherrero@ebi.ac.uk" target="_blank">jherrero@ebi.ac.uk</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"> Hi Zhang<br>
              <br>
              In the PAIR_ALIGNER module, the reference_collection_name
              corresponds to the target_collection_name. In the other
              two modules (CHAIN_CONFIG and NET_CONFIG), the
              reference_collection_name corresponds to the
              query_collection_name. Part of the reason to change the
              name was to be able to use the same name for all 3
              modules.<br>
              <br>
              In short, just use the same genome as the
              reference_collection_name for all 3.<br>
              <br>
              Note that we have changed the way we run this pipeline and
              this configuration file is not longer supported. The
              documentation for the new pipeline is available under
              ensembl-compara/docs/README-pairaliger. Obviously, you can
              still use ensembl_64 API and pipeline for as long as you
              want.<br>
              <br>
              I hope this helps<span><font color="#888888"><br>
                  <br>
                  Javier</font></span>
              <div>
                <div><br>
                  <br>
                  On 18/04/12 13:33, Zhang Di wrote:
                  <blockquote type="cite">Thank you, Javier
                    <div><br>
                    </div>
                    <div>You mean I should set some thing like below in
                      the compara_2x.conf to prepare the compara_db for
                      gene projection.</div>
                    <div><br>
                    </div>
                    <blockquote style="margin:0 0 0 40px;border:none;padding:0px">
                      <div>{</div>
                      <div>TYPE => PAIR_ALIGNER,</div>
                      <div>reference_collection_name => 'human',</div>
                      <div>non_reference_collection_name =>
                        'my_genome',</div>
                      <div>}</div>
                      <div>
                        <div>{</div>
                      </div>
                      <div>
                        <div>TYPE => CHAIN_CONFIG,</div>
                      </div>
                      <div>
                        <div>non_reference_collection_name =>
                          'human',</div>
                      </div>
                      <div>
                        <div>reference_collection_name =>
                          'my_genome',</div>
                      </div>
                      <div>
                        <div>}</div>
                      </div>
                      <div>
                        <div>{</div>
                      </div>
                      <div>
                        <div>TYPE => NET_CONFIG,</div>
                      </div>
                      <div>
                        <div>non_reference_collection_name =>
                          'human',</div>
                      </div>
                      <div>
                        <div>reference_collection_name =>
                          'my_genome',</div>
                      </div>
                      <div>
                        <div>}</div>
                      </div>
                      <div><br>
                      </div>
                    </blockquote>
                    <div>By the way, I'm using ensembl-compara version
                      64.</div>
                    <div><br>
                      <div class="gmail_quote">On Wed, Apr 18, 2012 at
                        4:57 PM, Javier Herrero <span dir="ltr"><<a href="mailto:jherrero@ebi.ac.uk" target="_blank">jherrero@ebi.ac.uk</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                          <div bgcolor="#FFFFFF" text="#000000"> Dear
                            Zhang<br>
                            <br>
                            The query/target naming has always been
                            quite confusing in the pairwise alignment
                            pipeline. When running an alignment, you use
                            a sequence (query) to look for similar
                            regions in another sequence (target genome).
                            The pairwise alignment pipeline has three
                            major steps: (a) raw alignments; (b)
                            chaining; and (c) netting. The chaining step
                            tries to link all the raw alignments that
                            are in the same order and orientation to
                            create a longer structure called chain. The
                            netting step requires you to define a target
                            genome such as the so-called nets are the
                            subset of chains that form the
                            best-in-genome alignment. In other words,
                            the final set will provide you for each bp
                            of the target genome with the best match on
                            the other genome.<br>
                            <br>
                            As I said earlier, query and target are
                            often times confusing terms. To make the
                            situation worse, we use to run the pipeline
                            such as the query for the raw alignment step
                            was the target for the chaining and netting
                            steps and vice versa. We have now changed
                            the way we refer to both sequences and call
                            them reference and non-reference genomes. We
                            find that nomenclature less confusing.<br>
                            <br>
                            Kind regards<br>
                            <br>
                            Javier
                            <div>
                              <div><br>
                                <br>
                                On 18/04/12 09:43, Dan Barrell wrote:
                                <blockquote type="cite"> Hi,<br>
                                  <br>
                                  The document
                                  low_coverage_gene_build.txt is quite
                                  old and possibly very out of date as
                                  we no longer build on low coverage
                                  genomes in Ensembl. As far as I know,
                                  the reason that the semantics of the
                                  reference and target terms got swapped
                                  is to do with the importance of
                                  directionality in a Net. When dealing
                                  with the low coverage genomes the idea
                                  was that they wanted the species they
                                  were projecting onto as the reference
                                  because it is important that each bp
                                  in the target species aligns to at
                                  most one location in the reference
                                  species. <br>
                                  <br>
                                  I would suggest you also look at the
                                  Ensembl Compara documentation which is
                                  maintained here:<br>
                                  <br>
ensembl-compara/docs/README-low-coverage-genome-aligner<br>
                                  <br>
                                  Dan<br>
                                  <br>
                                  <br>
                                  <br>
                                  <br>
                                  <br>
                                  <br>
                                  <br>
                                  <br>
                                  <br>
                                  On 17/04/12 12:47, Zhang Di wrote:
                                  <blockquote type="cite">Hi, 
                                    <div><br>
                                    </div>
                                    <div>I'm using ensembl pipeline for
                                      projection genebuild.</div>
                                    <div><br>
                                    </div>
                                    <div>when I read the doc
                                      low_coverage_gene_build.txt, I was
                                      confused by the target/query
                                      genome terms.</div>
                                    <div><br>
                                    </div>
                                    <div>It calls our newly sequenced
                                      genome the target, calls the
                                      reference genome the query.</div>
                                    <div><br>
                                    </div>
                                    <div>It is contrary to lastz terms
                                      where target means reference and
                                      query means our sequences.</div>
                                    <div><br>
                                    </div>
                                    <div>It just OK if I stick to this
                                      convention.</div>
                                    <div><br>
                                    </div>
                                    <div>However,</div>
                                    <div><br>
                                    </div>
                                    <div>In the whole genome alignment
                                      section in the same doc,</div>
                                    <div><br>
                                    </div>
                                    <div>It says that :</div>
                                    <div><br>
                                    </div>
                                    <div>    "each bp in the target
                                      genome should be represented at
                                      most once."</div>
                                    <div><br>
                                    </div>
                                    <div>What does it mean by saying
                                      "target"?</div>
                                    <div><br>
                                    </div>
                                    <div>lastz-chain-net produces the
                                      lastz termed "target genome" with
                                      this property.</div>
                                    <div><br>
                                    </div>
                                    <div>Does it mean that I should set
                                      my genome as the reference genome,
                                      while the genome from ensembl such
                                      as "human" as the non-reference in
                                      the compara/hive pipeline?</div>
                                    <div><br>
                                    </div>
                                    <div>I can project human genes to my
                                      genome with this somewhat weird
                                      setting, in the next wga2genes
                                      step?</div>
                                    <div><br>
                                    </div>
                                    <div>Some slice of human
                                      genome containing genes may exist
                                      several times in the compara_db,
                                      how can it produce gene projection
                                      right here?</div>
                                    <div><br>
                                    </div>
                                    <div><br>
                                    </div>
                                    <div>Thanks</div>
                                    <div><br>
                                    </div>
                                    <div>Best Reguards</div>
                                    <div><br>
                                    </div>
                                    <div>-- <br>
                                      Zhang Di<br>
                                    </div>
                                    <br>
                                    <fieldset></fieldset>
                                    <br>
                                    <pre>_______________________________________________
Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
                                  </blockquote>
                                  <br>
                                  <br>
                                  <br>
                                  <fieldset></fieldset>
                                  <br>
                                  <pre>_______________________________________________
Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
                                </blockquote>
                                <br>
                              </div>
                            </div>
                            <span><font color="#888888">
                                <pre cols="72">-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK</pre>
                              </font></span></div>
                          <br>
_______________________________________________<br>
                          Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
                          List admin (including subscribe/unsubscribe):
                          <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
                          Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
                          <br>
                        </blockquote>
                      </div>
                      <br>
                      <br clear="all">
                      <div><br>
                      </div>
                      -- <br>
                      Zhang Di<br>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                    <pre>_______________________________________________
Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
                  </blockquote>
                  <br>
                  <pre cols="72">-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK</pre>
                </div>
              </div>
            </div>
            <br>
            _______________________________________________<br>
            Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
            List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
            Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
            <br>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        Zhang Di<br>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
    </blockquote>
    <br>
    <pre cols="72">-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK</pre>
  </div></div></div>

<br>_______________________________________________<br>
Dev mailing list    <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>Zhang Di<br>
</div>