Thank you for your kind help.<div><br></div><div>I'll try your suggestions.</div><div><br></div><div>Best Reguards</div><div><br></div><div><br><br><div class="gmail_quote">On Thu, Apr 19, 2012 at 1:09 PM, Javier Herrero <span dir="ltr"><<a href="mailto:jherrero@ebi.ac.uk">jherrero@ebi.ac.uk</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Dear Zhang<br>

    <br>

    InnoDB is much faster for updates in large tables, but inserts are a

    different matter.<br>

    <br>

    It might be a configuration error, make sure you can connect to all

    the databases you have configured in your pipeline<br>

    <br>

    If you believe your MySQL server cannot cope with the load, you can

    reduce the hive_capacity for the process. The hive_capacity tells

    eHive roughly how many concurrent jobs for that particular analysis

    you are happy to run concurrently. The value is in the

    analysis_stats table. You can either modify the value manually in

    the database or change it in the PairAligner_conf file if you are

    re-starting the pipeline.<br>

    <br>

    Kind regards<span class="HOEnZb"><font color="#888888"><br>

    <br>

    Javier</font></span><div><div class="h5"><br>

    <br>

    On 19/04/12 05:18, Zhang Di wrote:

    <blockquote type="cite">Hi Javier

      <div><br>

      </div>

      <div>Thank you for the explanation.</div>

      <div><br>

      </div>

      <div>I've tried the new ensembl_66 API. </div>

      <div><br>

      </div>

      <div>Compared ti 64, it changed too much, both in the compara

        database structure (such as, mysql store engine type MyISAM to

        Innodb) and the analysis work flow.</div>

      <div><br>

      </div>

      <div>When I run it according to

        ensembl-compara/docs/README-pairaliger, some workers failed in

        the lastz step complaining:"cannot connect to mysql server."</div>

      <div><br>

      </div>

      <div>Sounds like mysql time out error, although the concurrent

        write of Innodb table should be faster than that of MyISAM

        table.</div>

      <div><br>

      </div>

      <div>

        <div class="gmail_quote">On Wed, Apr 18, 2012 at 9:32 PM, Javier

          Herrero <span dir="ltr"><<a href="mailto:jherrero@ebi.ac.uk" target="_blank">jherrero@ebi.ac.uk</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000"> Hi Zhang<br>

              <br>

              In the PAIR_ALIGNER module, the reference_collection_name

              corresponds to the target_collection_name. In the other

              two modules (CHAIN_CONFIG and NET_CONFIG), the

              reference_collection_name corresponds to the

              query_collection_name. Part of the reason to change the

              name was to be able to use the same name for all 3

              modules.<br>

              <br>

              In short, just use the same genome as the

              reference_collection_name for all 3.<br>

              <br>

              Note that we have changed the way we run this pipeline and

              this configuration file is not longer supported. The

              documentation for the new pipeline is available under

              ensembl-compara/docs/README-pairaliger. Obviously, you can

              still use ensembl_64 API and pipeline for as long as you

              want.<br>

              <br>

              I hope this helps<span><font color="#888888"><br>

                  <br>

                  Javier</font></span>

              <div>

                <div><br>

                  <br>

                  On 18/04/12 13:33, Zhang Di wrote:

                  <blockquote type="cite">Thank you, Javier

                    <div><br>

                    </div>

                    <div>You mean I should set some thing like below in

                      the compara_2x.conf to prepare the compara_db for

                      gene projection.</div>

                    <div><br>

                    </div>

                    <blockquote style="margin:0 0 0 40px;border:none;padding:0px">

                      <div>{</div>

                      <div>TYPE => PAIR_ALIGNER,</div>

                      <div>reference_collection_name => 'human',</div>

                      <div>non_reference_collection_name =>

                        'my_genome',</div>

                      <div>}</div>

                      <div>

                        <div>{</div>

                      </div>

                      <div>

                        <div>TYPE => CHAIN_CONFIG,</div>

                      </div>

                      <div>

                        <div>non_reference_collection_name =>

                          'human',</div>

                      </div>

                      <div>

                        <div>reference_collection_name =>

                          'my_genome',</div>

                      </div>

                      <div>

                        <div>}</div>

                      </div>

                      <div>

                        <div>{</div>

                      </div>

                      <div>

                        <div>TYPE => NET_CONFIG,</div>

                      </div>

                      <div>

                        <div>non_reference_collection_name =>

                          'human',</div>

                      </div>

                      <div>

                        <div>reference_collection_name =>

                          'my_genome',</div>

                      </div>

                      <div>

                        <div>}</div>

                      </div>

                      <div><br>

                      </div>

                    </blockquote>

                    <div>By the way, I'm using ensembl-compara version

                      64.</div>

                    <div><br>

                      <div class="gmail_quote">On Wed, Apr 18, 2012 at

                        4:57 PM, Javier Herrero <span dir="ltr"><<a href="mailto:jherrero@ebi.ac.uk" target="_blank">jherrero@ebi.ac.uk</a>></span>

                        wrote:<br>

                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                          <div bgcolor="#FFFFFF" text="#000000"> Dear

                            Zhang<br>

                            <br>

                            The query/target naming has always been

                            quite confusing in the pairwise alignment

                            pipeline. When running an alignment, you use

                            a sequence (query) to look for similar

                            regions in another sequence (target genome).

                            The pairwise alignment pipeline has three

                            major steps: (a) raw alignments; (b)

                            chaining; and (c) netting. The chaining step

                            tries to link all the raw alignments that

                            are in the same order and orientation to

                            create a longer structure called chain. The

                            netting step requires you to define a target

                            genome such as the so-called nets are the

                            subset of chains that form the

                            best-in-genome alignment. In other words,

                            the final set will provide you for each bp

                            of the target genome with the best match on

                            the other genome.<br>

                            <br>

                            As I said earlier, query and target are

                            often times confusing terms. To make the

                            situation worse, we use to run the pipeline

                            such as the query for the raw alignment step

                            was the target for the chaining and netting

                            steps and vice versa. We have now changed

                            the way we refer to both sequences and call

                            them reference and non-reference genomes. We

                            find that nomenclature less confusing.<br>

                            <br>

                            Kind regards<br>

                            <br>

                            Javier

                            <div>

                              <div><br>

                                <br>

                                On 18/04/12 09:43, Dan Barrell wrote:

                                <blockquote type="cite"> Hi,<br>

                                  <br>

                                  The document

                                  low_coverage_gene_build.txt is quite

                                  old and possibly very out of date as

                                  we no longer build on low coverage

                                  genomes in Ensembl. As far as I know,

                                  the reason that the semantics of the

                                  reference and target terms got swapped

                                  is to do with the importance of

                                  directionality in a Net. When dealing

                                  with the low coverage genomes the idea

                                  was that they wanted the species they

                                  were projecting onto as the reference

                                  because it is important that each bp

                                  in the target species aligns to at

                                  most one location in the reference

                                  species. <br>

                                  <br>

                                  I would suggest you also look at the

                                  Ensembl Compara documentation which is

                                  maintained here:<br>

                                  <br>

ensembl-compara/docs/README-low-coverage-genome-aligner<br>

                                  <br>

                                  Dan<br>

                                  <br>

                                  <br>

                                  <br>

                                  <br>

                                  <br>

                                  <br>

                                  <br>

                                  <br>

                                  <br>

                                  On 17/04/12 12:47, Zhang Di wrote:

                                  <blockquote type="cite">Hi, 

                                    <div><br>

                                    </div>

                                    <div>I'm using ensembl pipeline for

                                      projection genebuild.</div>

                                    <div><br>

                                    </div>

                                    <div>when I read the doc

                                      low_coverage_gene_build.txt, I was

                                      confused by the target/query

                                      genome terms.</div>

                                    <div><br>

                                    </div>

                                    <div>It calls our newly sequenced

                                      genome the target, calls the

                                      reference genome the query.</div>

                                    <div><br>

                                    </div>

                                    <div>It is contrary to lastz terms

                                      where target means reference and

                                      query means our sequences.</div>

                                    <div><br>

                                    </div>

                                    <div>It just OK if I stick to this

                                      convention.</div>

                                    <div><br>

                                    </div>

                                    <div>However,</div>

                                    <div><br>

                                    </div>

                                    <div>In the whole genome alignment

                                      section in the same doc,</div>

                                    <div><br>

                                    </div>

                                    <div>It says that :</div>

                                    <div><br>

                                    </div>

                                    <div>    "each bp in the target

                                      genome should be represented at

                                      most once."</div>

                                    <div><br>

                                    </div>

                                    <div>What does it mean by saying

                                      "target"?</div>

                                    <div><br>

                                    </div>

                                    <div>lastz-chain-net produces the

                                      lastz termed "target genome" with

                                      this property.</div>

                                    <div><br>

                                    </div>

                                    <div>Does it mean that I should set

                                      my genome as the reference genome,

                                      while the genome from ensembl such

                                      as "human" as the non-reference in

                                      the compara/hive pipeline?</div>

                                    <div><br>

                                    </div>

                                    <div>I can project human genes to my

                                      genome with this somewhat weird

                                      setting, in the next wga2genes

                                      step?</div>

                                    <div><br>

                                    </div>

                                    <div>Some slice of human

                                      genome containing genes may exist

                                      several times in the compara_db,

                                      how can it produce gene projection

                                      right here?</div>

                                    <div><br>

                                    </div>

                                    <div><br>

                                    </div>

                                    <div>Thanks</div>

                                    <div><br>

                                    </div>

                                    <div>Best Reguards</div>

                                    <div><br>

                                    </div>

                                    <div>-- <br>

                                      Zhang Di<br>

                                    </div>

                                    <br>

                                    <fieldset></fieldset>

                                    <br>

                                    <pre>_______________________________________________

Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>

List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>

</pre>

                                  </blockquote>

                                  <br>

                                  <br>

                                  <br>

                                  <fieldset></fieldset>

                                  <br>

                                  <pre>_______________________________________________

Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>

List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>

</pre>

                                </blockquote>

                                <br>

                              </div>

                            </div>

                            <span><font color="#888888">

                                <pre cols="72">-- 

Javier Herrero, PhD

Ensembl Coordinator and Ensembl Compara Project Leader

European Bioinformatics Institute (EMBL-EBI)

Wellcome Trust Genome Campus, Hinxton

Cambridge - CB10 1SD - UK</pre>

                              </font></span></div>

                          <br>

_______________________________________________<br>

                          Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

                          List admin (including subscribe/unsubscribe):

                          <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

                          Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>

                          <br>

                        </blockquote>

                      </div>

                      <br>

                      <br clear="all">

                      <div><br>

                      </div>

                      -- <br>

                      Zhang Di<br>

                    </div>

                    <br>

                    <fieldset></fieldset>

                    <br>

                    <pre>_______________________________________________

Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>

List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>

</pre>

                  </blockquote>

                  <br>

                  <pre cols="72">-- 

Javier Herrero, PhD

Ensembl Coordinator and Ensembl Compara Project Leader

European Bioinformatics Institute (EMBL-EBI)

Wellcome Trust Genome Campus, Hinxton

Cambridge - CB10 1SD - UK</pre>

                </div>

              </div>

            </div>

            <br>

            _______________________________________________<br>

            Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

            List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

            Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>

            <br>

          </blockquote>

        </div>

        <br>

        <br clear="all">

        <div><br>

        </div>

        -- <br>

        Zhang Di<br>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      <pre>_______________________________________________

Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>

List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>

Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>

</pre>

    </blockquote>

    <br>

    <pre cols="72">-- 

Javier Herrero, PhD

Ensembl Coordinator and Ensembl Compara Project Leader

European Bioinformatics Institute (EMBL-EBI)

Wellcome Trust Genome Campus, Hinxton

Cambridge - CB10 1SD - UK</pre>

  </div></div></div>

<br>_______________________________________________<br>

Dev mailing list    <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>

List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>

<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>Zhang Di<br>

</div>