<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif">Hi John,</div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">There may be other sequences in the assembly that have not been assigned to a chromosome. You can check this via the API or in Mart. I expect you'll find a bunch of small sequences that harbour some genes - maybe that will get your totals to balance.</div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">cheers</div><div class="gmail_default" style="font-family:verdana,sans-serif">Dan</div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 7 December 2015 at 01:59, john samuel <span dir="ltr"><<a href="mailto:john.samuel@senecacollege.ca" target="_blank">john.samuel@senecacollege.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
Hi,<br>
I am trying to get an accurate count of all the ENSDARG genes from
the latest zebrafish data (<span>GRCz10) in ensembl.<br>
If I use the perl api to get all the genes in all the chromosomes
I get a total of 31,388 i.e.<br>
<br>
my $slice_adaptor = $registry->get_adaptor( 'danio_rerio',
'Core', 'Slice' );<br>
my @slices = @{ $slice_adaptor->fetch_all('chromosome') };<br>
my $total = 0;<br>
my %all;<br>
foreach my $slice (@slices) {<br>
my @genes = @{ $slice->get_all_Genes() };<br>
my $count = scalar @genes;<br>
$all{$slice->seq_region_name()}=$count;<br>
$total += $count;<br>
}<br>
foreach my $sorted (sort {$a<=>$b} keys %all) {<br>
print "chromosome: $sorted\t$all{$sorted}\n";<br>
}<br>
print "gene total is\t$total\n";<br>
<br>
chromosome: MT 37<br>
chromosome: 1 1386<br>
chromosome: 2 1587<br>
chromosome: 3 1611<br>
chromosome: 4 3103<br>
chromosome: 5 1704<br>
chromosome: 6 1280<br>
chromosome: 7 1507<br>
chromosome: 8 1216<br>
chromosome: 9 1108<br>
chromosome: 10 1108<br>
chromosome: 11 1039<br>
chromosome: 12 952<br>
chromosome: 13 1013<br>
chromosome: 14 953<br>
chromosome: 15 1146<br>
chromosome: 16 1241<br>
chromosome: 17 1048<br>
chromosome: 18 942<br>
chromosome: 19 1123<br>
chromosome: 20 1253<br>
chromosome: 21 1092<br>
chromosome: 22 1174<br>
chromosome: 23 1031<br>
chromosome: 24 800<br>
chromosome: 25 934<br>
gene total is 31388<br>
<br>
Anyone see anything wrong with how I get the total? I don't, but
then when I go to biomart (see below), I get a total of 31953<br>
<br>
</span><img src="cid:part1.09060504.00080305@senecacollege.ca" alt=""><br>
<span><br>
and if I go to the info page for the genome at <a href="http://useast.ensembl.org/Danio_rerio/Info/Annotation" target="_blank">http://useast.ensembl.org/Danio_rerio/Info/Annotation</a>
I see a </span><span>different</span><span> total there too
(31,650 not counting pseudogenes).<br>
<br>
</span><img src="cid:part3.08020603.03070106@senecacollege.ca" alt=""><br>
<span><br>
Anyone have any idea why the different totals and which one to
believe and whether there's anything wrong with using the one that
my code calculated as the definitive one? I need to compare the
total number of genes vs. the number that we are finding under
certain conditions, to do some stats.<span class="HOEnZb"><font color="#888888"><br>
John<br>
</font></span></span><br>
<br>
<br>
</div>
<br>_______________________________________________<br>
Dev mailing list <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" rel="noreferrer" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" rel="noreferrer" target="_blank">http://www.ensembl.info/</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div>VectorBase | i5K insect genome initiative</div></div></div>
</div>