Thank you for your kind help.<div><br></div><div>I'll try your suggestions.</div><div><br></div><div>Best Reguards</div><div><br></div><div><br><br><div class="gmail_quote">On Thu, Apr 19, 2012 at 1:09 PM, Javier Herrero <span dir="ltr"><<a href="mailto:jherrero@ebi.ac.uk">jherrero@ebi.ac.uk</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Dear Zhang<br>
<br>
InnoDB is much faster for updates in large tables, but inserts are a
different matter.<br>
<br>
It might be a configuration error, make sure you can connect to all
the databases you have configured in your pipeline<br>
<br>
If you believe your MySQL server cannot cope with the load, you can
reduce the hive_capacity for the process. The hive_capacity tells
eHive roughly how many concurrent jobs for that particular analysis
you are happy to run concurrently. The value is in the
analysis_stats table. You can either modify the value manually in
the database or change it in the PairAligner_conf file if you are
re-starting the pipeline.<br>
<br>
Kind regards<span class="HOEnZb"><font color="#888888"><br>
<br>
Javier</font></span><div><div class="h5"><br>
<br>
On 19/04/12 05:18, Zhang Di wrote:
<blockquote type="cite">Hi Javier
<div><br>
</div>
<div>Thank you for the explanation.</div>
<div><br>
</div>
<div>I've tried the new ensembl_66 API. </div>
<div><br>
</div>
<div>Compared ti 64, it changed too much, both in the compara
database structure (such as, mysql store engine type MyISAM to
Innodb) and the analysis work flow.</div>
<div><br>
</div>
<div>When I run it according to
ensembl-compara/docs/README-pairaliger, some workers failed in
the lastz step complaining:"cannot connect to mysql server."</div>
<div><br>
</div>
<div>Sounds like mysql time out error, although the concurrent
write of Innodb table should be faster than that of MyISAM
table.</div>
<div><br>
</div>
<div>
<div class="gmail_quote">On Wed, Apr 18, 2012 at 9:32 PM, Javier
Herrero <span dir="ltr"><<a href="mailto:jherrero@ebi.ac.uk" target="_blank">jherrero@ebi.ac.uk</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Hi Zhang<br>
<br>
In the PAIR_ALIGNER module, the reference_collection_name
corresponds to the target_collection_name. In the other
two modules (CHAIN_CONFIG and NET_CONFIG), the
reference_collection_name corresponds to the
query_collection_name. Part of the reason to change the
name was to be able to use the same name for all 3
modules.<br>
<br>
In short, just use the same genome as the
reference_collection_name for all 3.<br>
<br>
Note that we have changed the way we run this pipeline and
this configuration file is not longer supported. The
documentation for the new pipeline is available under
ensembl-compara/docs/README-pairaliger. Obviously, you can
still use ensembl_64 API and pipeline for as long as you
want.<br>
<br>
I hope this helps<span><font color="#888888"><br>
<br>
Javier</font></span>
<div>
<div><br>
<br>
On 18/04/12 13:33, Zhang Di wrote:
<blockquote type="cite">Thank you, Javier
<div><br>
</div>
<div>You mean I should set some thing like below in
the compara_2x.conf to prepare the compara_db for
gene projection.</div>
<div><br>
</div>
<blockquote style="margin:0 0 0 40px;border:none;padding:0px">
<div>{</div>
<div>TYPE => PAIR_ALIGNER,</div>
<div>reference_collection_name => 'human',</div>
<div>non_reference_collection_name =>
'my_genome',</div>
<div>}</div>
<div>
<div>{</div>
</div>
<div>
<div>TYPE => CHAIN_CONFIG,</div>
</div>
<div>
<div>non_reference_collection_name =>
'human',</div>
</div>
<div>
<div>reference_collection_name =>
'my_genome',</div>
</div>
<div>
<div>}</div>
</div>
<div>
<div>{</div>
</div>
<div>
<div>TYPE => NET_CONFIG,</div>
</div>
<div>
<div>non_reference_collection_name =>
'human',</div>
</div>
<div>
<div>reference_collection_name =>
'my_genome',</div>
</div>
<div>
<div>}</div>
</div>
<div><br>
</div>
</blockquote>
<div>By the way, I'm using ensembl-compara version
64.</div>
<div><br>
<div class="gmail_quote">On Wed, Apr 18, 2012 at
4:57 PM, Javier Herrero <span dir="ltr"><<a href="mailto:jherrero@ebi.ac.uk" target="_blank">jherrero@ebi.ac.uk</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Dear
Zhang<br>
<br>
The query/target naming has always been
quite confusing in the pairwise alignment
pipeline. When running an alignment, you use
a sequence (query) to look for similar
regions in another sequence (target genome).
The pairwise alignment pipeline has three
major steps: (a) raw alignments; (b)
chaining; and (c) netting. The chaining step
tries to link all the raw alignments that
are in the same order and orientation to
create a longer structure called chain. The
netting step requires you to define a target
genome such as the so-called nets are the
subset of chains that form the
best-in-genome alignment. In other words,
the final set will provide you for each bp
of the target genome with the best match on
the other genome.<br>
<br>
As I said earlier, query and target are
often times confusing terms. To make the
situation worse, we use to run the pipeline
such as the query for the raw alignment step
was the target for the chaining and netting
steps and vice versa. We have now changed
the way we refer to both sequences and call
them reference and non-reference genomes. We
find that nomenclature less confusing.<br>
<br>
Kind regards<br>
<br>
Javier
<div>
<div><br>
<br>
On 18/04/12 09:43, Dan Barrell wrote:
<blockquote type="cite"> Hi,<br>
<br>
The document
low_coverage_gene_build.txt is quite
old and possibly very out of date as
we no longer build on low coverage
genomes in Ensembl. As far as I know,
the reason that the semantics of the
reference and target terms got swapped
is to do with the importance of
directionality in a Net. When dealing
with the low coverage genomes the idea
was that they wanted the species they
were projecting onto as the reference
because it is important that each bp
in the target species aligns to at
most one location in the reference
species. <br>
<br>
I would suggest you also look at the
Ensembl Compara documentation which is
maintained here:<br>
<br>
ensembl-compara/docs/README-low-coverage-genome-aligner<br>
<br>
Dan<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
On 17/04/12 12:47, Zhang Di wrote:
<blockquote type="cite">Hi,
<div><br>
</div>
<div>I'm using ensembl pipeline for
projection genebuild.</div>
<div><br>
</div>
<div>when I read the doc
low_coverage_gene_build.txt, I was
confused by the target/query
genome terms.</div>
<div><br>
</div>
<div>It calls our newly sequenced
genome the target, calls the
reference genome the query.</div>
<div><br>
</div>
<div>It is contrary to lastz terms
where target means reference and
query means our sequences.</div>
<div><br>
</div>
<div>It just OK if I stick to this
convention.</div>
<div><br>
</div>
<div>However,</div>
<div><br>
</div>
<div>In the whole genome alignment
section in the same doc,</div>
<div><br>
</div>
<div>It says that :</div>
<div><br>
</div>
<div> "each bp in the target
genome should be represented at
most once."</div>
<div><br>
</div>
<div>What does it mean by saying
"target"?</div>
<div><br>
</div>
<div>lastz-chain-net produces the
lastz termed "target genome" with
this property.</div>
<div><br>
</div>
<div>Does it mean that I should set
my genome as the reference genome,
while the genome from ensembl such
as "human" as the non-reference in
the compara/hive pipeline?</div>
<div><br>
</div>
<div>I can project human genes to my
genome with this somewhat weird
setting, in the next wga2genes
step?</div>
<div><br>
</div>
<div>Some slice of human
genome containing genes may exist
several times in the compara_db,
how can it produce gene projection
right here?</div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks</div>
<div><br>
</div>
<div>Best Reguards</div>
<div><br>
</div>
<div>-- <br>
Zhang Di<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
<br>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
</div>
</div>
<span><font color="#888888">
<pre cols="72">--
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK</pre>
</font></span></div>
<br>
_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
List admin (including subscribe/unsubscribe):
<a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Zhang Di<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
<pre cols="72">--
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK</pre>
</div>
</div>
</div>
<br>
_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Zhang Di<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
<pre cols="72">--
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK</pre>
</div></div></div>
<br>_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>Zhang Di<br>
</div>