And these unscaffolded contigs. They may be included into genes by the so called low coverage projection.<div><div><br></div><div>Really curious to know.</div><div><br><div class="gmail_quote">On Wed, Apr 18, 2012 at 11:42 PM, Zhang Di <span dir="ltr"><<a href="mailto:aureliano.jz@gmail.com">aureliano.jz@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>Hi,</div><div><br></div><div>So NOW how do you deal with the scaffolding errors in the hight coverage genome in the gene build process?<br>
<br>It seems that contigs built from next-generation-sequencing data can be really solid while scaffolds may contain non-trivial bad linking information.</div><div class="HOEnZb"><div class="h5">
<div><br><div class="gmail_quote">On Wed, Apr 18, 2012 at 9:43 PM, Dan Barrell <span dir="ltr"><<a href="mailto:db8@sanger.ac.uk" target="_blank">db8@sanger.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hi,<br>
<br>
That's right, the Atlantic Cod was the last case of a low coverage
genome that we projected genes onto. For the time being we are too
busy with the high coverage genomes though, including the low
coverage that are now arriving as high coverage.<span><font color="#888888"><br>
<br>
Dan</font></span><div><div><br>
<br>
<br>
On 18/04/12 13:16, Zhang Di wrote:
<blockquote type="cite">Thank you, Dan.
<div><br>
</div>
<div>You mentioned that you no longer build on low coverage
genomes in Ensembl, what do you guys do with the short reads
(such as Illumina GA II/Hiseq) assembled genomes? So far as I
know the Atlantic Cod genome which was published last summer in
Nature was built based on the projection genebuild.<br>
<br>
Best Reguards<br>
<br>
<div class="gmail_quote">On Wed, Apr 18, 2012 at 4:43 PM, Dan
Barrell <span dir="ltr"><<a href="mailto:db8@sanger.ac.uk" target="_blank">db8@sanger.ac.uk</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Hi,<br>
<br>
The document low_coverage_gene_build.txt is quite old and
possibly very out of date as we no longer build on low
coverage genomes in Ensembl. As far as I know, the reason
that the semantics of the reference and target terms got
swapped is to do with the importance of directionality in
a Net. When dealing with the low coverage genomes the idea
was that they wanted the species they were projecting onto
as the reference because it is important that each bp in
the target species aligns to at most one location in the
reference species. <br>
<br>
I would suggest you also look at the Ensembl Compara
documentation which is maintained here:<br>
<br>
ensembl-compara/docs/README-low-coverage-genome-aligner<br>
<br>
Dan
<div>
<div><br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
On 17/04/12 12:47, Zhang Di wrote: </div>
</div>
<blockquote type="cite">
<div>
<div>Hi,
<div><br>
</div>
<div>I'm using ensembl pipeline for projection
genebuild.</div>
<div><br>
</div>
<div>when I read the doc
low_coverage_gene_build.txt, I was confused by the
target/query genome terms.</div>
<div><br>
</div>
<div>It calls our newly sequenced genome the target,
calls the reference genome the query.</div>
<div><br>
</div>
<div>It is contrary to lastz terms where target
means reference and query means our sequences.</div>
<div><br>
</div>
<div>It just OK if I stick to this convention.</div>
<div><br>
</div>
<div>However,</div>
<div><br>
</div>
<div>In the whole genome alignment section in the
same doc,</div>
<div><br>
</div>
<div>It says that :</div>
<div><br>
</div>
<div> "each bp in the target genome should be
represented at most once."</div>
<div><br>
</div>
<div>What does it mean by saying "target"?</div>
<div><br>
</div>
<div>lastz-chain-net produces the lastz termed
"target genome" with this property.</div>
<div><br>
</div>
<div>Does it mean that I should set my genome as the
reference genome, while the genome from ensembl
such as "human" as the non-reference in the
compara/hive pipeline?</div>
<div><br>
</div>
<div>I can project human genes to my genome with
this somewhat weird setting, in the next wga2genes
step?</div>
<div><br>
</div>
<div>Some slice of human genome containing genes may
exist several times in the compara_db, how can it
produce gene projection right here?</div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks</div>
<div><br>
</div>
<div>Best Reguards</div>
<div><br>
</div>
<div>-- <br>
Zhang Di<br>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
<br>
</div>
<br>
_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Zhang Di<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
</div></div></div>
<br>_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>Zhang Di<br>
</div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Zhang Di<br>
</div></div>