<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Great, that explains biomart but I still don't understand why the
perl code isn't returning the same as the sql. I'm using sql queries
but was curious about the cause of the difference.<br>
thanks<br>
<br>
On 09/05/11 13:08, Rhoda Kinsella wrote:
<blockquote
cite="mid:F5A1C833-AF96-46FC-9F5C-1067D7A7D0AE@ebi.ac.uk"
type="cite">Hi Andrea,
<div>When you hit the count button, this just gives a count of the
number of genes as the Ensembl mart datasets you mention are
gene centric. It will not give a count of the number of exons.
To get this count, you would have to download the result file
and further process the file to calculate the number of exons.</div>
<div>Hope that helps</div>
<div>Regards</div>
<div>Rhoda</div>
<div><br>
<div>
<div>On 9 May 2011, at 12:57, Andrea Edwards wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div text="#000000" bgcolor="#ffffff"> Hi<br>
<br>
I wasn't using biomart per se; i used it as a test to try
and work out why the perl code wasn't returning the same
same numbers as the sql<br>
<br>
in both cow and human (genes 62) i didn't set any filters
and selected exon id, gene id and transcript id
attributes. For cow the count was 25670 and for human it
was 53334<br>
<br>
regards<br>
<br>
On 09/05/11 09:05, Rhoda Kinsella wrote:
<blockquote
cite="mid:814DFCD1-2D01-45E2-B0FC-58CDD7A3CB76@ebi.ac.uk"
type="cite">Hi Andrea
<div>Can you send me the details of your biomart query
so that I can look into the issue?</div>
<div>Regards</div>
<div>Rhoda</div>
<div><br>
<div>
<div>On 6 May 2011, at 17:04, Andrea Edwards wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite"><span
class="Apple-style-span" style="border-collapse:
separate; color: rgb(0, 0, 0); font-family:
Helvetica; font-style: normal; font-variant:
normal; font-weight: normal; letter-spacing:
normal; line-height: normal; orphans: 2;
text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing:
0px; font-size: medium;">
<div text="#000000" bgcolor="#ffffff">Hi Bert,<br>
<br>
<br>
I agree with your code and i agree with the
sql i originally posted. I don't understand
why biomart and this perl code (below)<br>
are only returning 25k, and it seems too
coincidental they are returning the same
number<br>
<br>
The difference in results is huge. What exons
is this code missing? All i could think of was
predicted exons but it seems<br>
unlikely there are 25k known exons and
(250k-25k = 225k) predicted exons not assigned
to genes. I don't even know if ensembl deals
with predicted exons.<br>
I got the same 'discrepancy' with figures when
i tested human too.<br>
<br>
===============================================<br>
<br>
my $gene_adaptor = $registry->get_adaptor(
'bos_taurus', 'Core', 'Gene' );<br>
my $genes = $gene_adaptor->fetch_all();<br>
<br>
<br>
$total_genes=0;<br>
$exon_count = 0;<br>
foreach $gene(@{$genes}) {<br>
$total_genes++;<br>
<span class="Apple-converted-space"> </span><br>
foreach $exon ($gene->get_all_Exons())
{<br>
$exon_count++;<br>
} <span class="Apple-converted-space"> </span><br>
} #end for each gene<br>
<br>
<br>
=============================================<br>
<br>
<br>
Thank you very much<br>
<br>
On 06/05/11 16:44, Bert Overduin wrote:
<blockquote
cite="mid:BANLkTik1KCWZFDa38bqQGfJhVVfb2HDF8w@mail.gmail.com"
type="cite">Hi,
<div><br>
</div>
<div>When I use the following code:</div>
<div><br>
</div>
<div>
<div style="margin: 0px; font: 12px
Helvetica;">#!/usr/bin/perl</div>
<div style="margin: 0px; font: 12px
Helvetica; min-height: 14px;"><br>
</div>
<div style="margin: 0px; font: 12px
Helvetica;">use strict;</div>
<div style="margin: 0px; font: 12px
Helvetica;">use Bio::EnsEMBL::Registry;</div>
<div style="margin: 0px; font: 12px
Helvetica; min-height: 14px;"><br>
</div>
<div style="margin: 0px; font: 12px
Helvetica;">my $reg =
"Bio::EnsEMBL::Registry";</div>
<div style="margin: 0px; font: 12px
Helvetica; min-height: 14px;"><br>
</div>
<div style="margin: 0px; font: 12px
Helvetica;">$reg->load_registry_from_db(
-host => '<a moz-do-not-send="true"
href="http://ensembldb.ensembl.org">ensembldb.ensembl.org</a>',
-user => 'anonymous' );</div>
<div style="margin: 0px; font: 12px
Helvetica; min-height: 14px;"><br>
</div>
<div style="margin: 0px; font: 12px
Helvetica;">my $exon_adaptor =
$reg->get_adaptor( 'Bos taurus',
'Core', 'Exon' );</div>
<div style="margin: 0px; font: 12px
Helvetica; min-height: 14px;"><br>
</div>
<div style="margin: 0px; font: 12px
Helvetica;">my $exons =
$exon_adaptor->fetch_all;</div>
<div style="margin: 0px; font: 12px
Helvetica; min-height: 14px;"><br>
</div>
<div style="margin: 0px; font: 12px
Helvetica;">print scalar( @{$exons} ),
"\n";</div>
</div>
<div><br>
</div>
<div>I get:</div>
<div><br>
</div>
<div>
<div>farm2-head2[bert]2: perl<span
class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://test.pl">test.pl</a></div>
<div>225837</div>
<div><br>
</div>
<div>Which is the same number I get with a
MySQL query:</div>
<div><br>
</div>
<div>
<div>mysql -u anonymous -h<span
class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://ensembldb.ensembl.org">ensembldb.ensembl.org</a><span
class="Apple-converted-space"> </span>-P
5306</div>
<div>Welcome to the MySQL monitor.
Commands end with ; or \g.</div>
<div>Your MySQL connection id is 8610 to
server version: 5.1.34-log</div>
<div><br>
</div>
<div>Type 'help;' or '\h' for help. Type
'\c' to clear the buffer.</div>
</div>
<div><br>
</div>
<div>
<div>mysql> use
bos_taurus_core_62_4k </div>
<div>Reading table information for
completion of table and column names</div>
<div>You can turn off this feature to
get a quicker startup with -A</div>
</div>
<div>
<div><br>
</div>
<div>Database changed</div>
</div>
<div>
<div>mysql> SELECT COUNT(*) FROM
exon;</div>
<div>+----------+</div>
<div>| COUNT(*) |</div>
<div>+----------+</div>
<div>| 225837 |</div>
<div>+----------+</div>
<div>1 row in set (0.01 sec)</div>
</div>
<div><br>
</div>
<div>Cheers,</div>
<div>Bert</div>
<div><br>
</div>
<br>
<div class="gmail_quote">On Fri, May 6,
2011 at 4:22 PM, Andrea Edwards<span
class="Apple-converted-space"> </span><span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:edwardsa@cs.man.ac.uk">edwardsa@cs.man.ac.uk</a>></span><span
class="Apple-converted-space"> </span>wrote:<br>
<blockquote class="gmail_quote"
style="margin: 0pt 0pt 0pt 0.8ex;
border-left: 1px solid rgb(204, 204,
204); padding-left: 1ex;">
<div text="#000000" bgcolor="#ffffff">I
tried 2 ways :<br>
<br>
===============================================<br>
<br>
my $gene_adaptor =
$registry->get_adaptor(
'bos_taurus', 'Core', 'Gene' );<br>
my $genes =
$gene_adaptor->fetch_all();<br>
<br>
my $exon_adaptor =
$registry->get_adaptor(
'bos_taurus', 'Core', 'Exon' );<br>
$total_genes=0;<br>
$exon_count = 0;<br>
foreach $gene(@{$genes}) {<br>
$total_genes++;<br>
<span
class="Apple-converted-space"> </span><br>
foreach $exon
($gene->get_all_Exons()) {<br>
$exon_count++;<br>
} <span
class="Apple-converted-space"> </span><br>
} #end for each gene<br>
<br>
<br>
=============================================<br>
<br>
This way gave even less (23k) but
i'm being stricter here about the
chromosomes<br>
<br>
@slices = @{
$slice_adaptor->fetch_all('chromosome',
undef, 0, 1) };<br>
<br>
$total_genes=0;<br>
$exon_count = 0;<br>
foreach $slice (@slices) {<br>
unless
($slice->seq_region_name() =~
/Un/) {<br>
print
$slice->seq_region_name."\n";<br>
my $genes =
$gene_adaptor->fetch_all_by_Slice($slice);<br>
<span
class="Apple-converted-space"> </span><br>
<span
class="Apple-converted-space"> </span><br>
foreach my $gene(@{$genes})
{<br>
$total_genes++;<br>
<span
class="Apple-converted-space"> </span><br>
foreach my $exon
($gene->get_all_Exons()) {<br>
$exon_count++;<br>
print
"$exon_count\n";<br>
} <span
class="Apple-converted-space"> </span><br>
<span
class="Apple-converted-space"> </span><br>
<span
class="Apple-converted-space"> </span><br>
<span
class="Apple-converted-space"> </span><br>
<span
class="Apple-converted-space"> </span><br>
} #end for each gene<br>
}<br>
}<br>
<br>
==============================================<br>
<br>
But neither give anything like the
sql results<br>
<br>
Why does the sql give so many more?
Which should I use?<br>
<br>
thank you
<div>
<div class="h5"><br>
<br>
<br>
On 06/05/11 15:50, Bert Overduin
wrote:
<blockquote type="cite">Hi
Andrea,
<div><br>
</div>
<div>I suspect that your
BioMart results are
truncated because the query
is too large.
<div><br>
</div>
<div>However, that doesn't
explain your API results
.... How does your API
code look like?<br>
<div><br>
</div>
<div>Cheers,</div>
<div>Bert<br>
<br>
<div class="gmail_quote">On
Fri, May 6, 2011 at
3:45 PM, Andrea
Edwards<span
class="Apple-converted-space"> </span><span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:edwardsa@cs.man.ac.uk" target="_blank">edwardsa@cs.man.ac.uk</a>></span><span
class="Apple-converted-space"> </span>wrote:<br>
<blockquote
class="gmail_quote"
style="margin: 0pt
0pt 0pt 0.8ex;
border-left: 1px
solid rgb(204, 204,
204); padding-left:
1ex;">Hello<br>
<br>
I'm sorry for the
basic question but I
was looking at the
ensembl core schema
and trying to
retrieve just the
exons on chromosomes
and couldn't work
out why i am getting
such different
figures than with
biomart and the perl
api<br>
<br>
For example for cow
there are 25670
exons in genes with
biomart and the api
but with this sql
~210k exons. This
code is just looking
for exons on
chromosomes 1-30 and
X<br>
<br>
select
count(distinct
stable_id) from exon
e inner join
exon_stable_id es
using(exon_id) inner
join seq_region sr
using(seq_region_id)
where
sr.coord_system_id =
2 and<span
class="Apple-converted-space"> </span><a
moz-do-not-send="true" href="http://sr.name" target="_blank">sr.name</a>REGEXP
'^[1-9]|^X' and
e.is_current=1<br>
<br>
I get 8k just on
chromosome 1<br>
<br>
I'm sure this is
simple and perhaps
its because its
Friday afternoon but
I'm just not seeing
it!!<br>
<br>
_______________________________________________<br>
Dev mailing list <a
moz-do-not-send="true" href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
List admin
(including
subscribe/unsubscribe):<span
class="Apple-converted-space"> </span><a moz-do-not-send="true"
href="http://lists.ensembl.org/mailman/listinfo/dev"
target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog:<span
class="Apple-converted-space"> </span><a
moz-do-not-send="true" href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
</blockquote>
</div>
<br>
<br clear="all">
<br>
--<span
class="Apple-converted-space"> </span><br>
Bert Overduin, Ph.D.<br>
Vertebrate Genomics Team<br>
<br>
EMBL - European
Bioinformatics Institute<br>
Wellcome Trust Genome
Campus<br>
Hinxton, Cambridge CB10
1SD<br>
United Kingdom<br>
<br>
<a
moz-do-not-send="true"
href="http://www.ebi.ac.uk/%7Ebert" target="_blank">http://www.ebi.ac.uk/~bert</a>
<div>
<div
style="margin-bottom:
0.0001pt;"><br
class="webkit-block-placeholder">
</div>
<div style="margin:
0.1pt 0in 0.0001pt;"><font
face="Arial"><span
style="font-size:
10pt;
font-family:
Arial; color:
black;">Ensembl
browser: <a
moz-do-not-send="true"
href="http://www.ensembl.org/" target="_blank"><span style="color:
blue;">http://www.ensembl.org</span></a></span><span
style="font-size:
10pt;
font-family:
Arial;"></span></font></div>
<font face="Arial">
<div style="margin:
0.1pt 0in
0.0001pt;"><span
style="font-size:
10pt;
font-family:
Arial; color:
black;">Mailing
lists: </span><span
style="font-size:
10pt;
font-family:
Arial; color:
blue;"><a
moz-do-not-send="true"
href="http://www.ensembl.org/info/about/contact/mailing.html"
target="_blank"><span
style="color:
blue;">http://www.ensembl.org/info/about/contact/mailing.html</span></a></span><span
style="font-size:
10pt;
font-family:
Arial; color:
black;"></span></div>
<div style="margin:
0.1pt 0in
0.0001pt;"><span
style="font-size:
10pt;
font-family:
Arial; color:
black;">Blog: </span><span
style="font-size:
10pt;
font-family:
Arial; color:
blue;"><a
moz-do-not-send="true"
href="http://www.ensembl.info/" target="_blank"><span style="color:
blue;">http://www.ensembl.info</span></a></span><span
style="font-size:
10pt;
font-family:
Arial; color:
black;"></span></div>
<div style="margin:
0.1pt 0in
0.0001pt;"><span
style="font-size:
10pt;
font-family:
Arial; color:
black;">YouTube: </span><span
style="font-size:
10pt;
font-family:
Arial; color:
rgb(0, 0, 153);"><a
moz-do-not-send="true"
href="http://www.youtube.com/user/EnsemblHelpdesk"
target="_blank"><span style="color: blue;">http://www.youtube.com/user/EnsemblHelpdesk</span></a></span><span
style="font-size:
10pt;
font-family:
Arial; color:
black;"><br>
Facebook: </span><span
style="font-size:
10pt;
font-family:
Arial; color:
blue;"><a
moz-do-not-send="true"
href="http://www.facebook.com/Ensembl.org" target="_blank"><span
style="color:
blue;">http://www.facebook.com/Ensembl.org</span></a></span><span
style="font-size:
10pt;
font-family:
Arial; color:
black;"><br>
Twitter: </span><span
style="font-size:
10pt;
font-family:
Arial; color:
blue;"><a
moz-do-not-send="true"
href="http://twitter.com/Ensembl" target="_blank"><span style="color:
blue;">http://twitter.com/Ensembl</span></a> </span><span
style="font-size:
10pt;
font-family:
Arial; color:
black;"></span></div>
</font></div>
<br>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<br>
--<span class="Apple-converted-space"> </span><br>
Bert Overduin, Ph.D.<br>
Vertebrate Genomics Team<br>
<br>
EMBL - European Bioinformatics Institute<br>
Wellcome Trust Genome Campus<br>
Hinxton, Cambridge CB10 1SD<br>
United Kingdom<br>
<br>
<a moz-do-not-send="true"
href="http://www.ebi.ac.uk/%7Ebert"
target="_blank">http://www.ebi.ac.uk/~bert</a>
<div>
<div style="margin-bottom: 0.0001pt;"><br
class="webkit-block-placeholder">
</div>
<div style="margin: 0.1pt 0in 0.0001pt;"><font
face="Arial"><span style="font-size:
10pt; font-family: Arial; color:
black;">Ensembl browser: <a
moz-do-not-send="true"
href="http://www.ensembl.org/"
target="_blank"><span
style="color: blue;">http://www.ensembl.org</span></a></span><span
style="font-size: 10pt;
font-family: Arial;"></span></font></div>
<font face="Arial">
<div style="margin: 0.1pt 0in
0.0001pt;"><span style="font-size:
10pt; font-family: Arial; color:
black;">Mailing lists: </span><span
style="font-size: 10pt;
font-family: Arial; color: blue;"><a
moz-do-not-send="true"
href="http://www.ensembl.org/info/about/contact/mailing.html"
target="_blank"><span
style="color: blue;">http://www.ensembl.org/info/about/contact/mailing.html</span></a></span><span
style="font-size: 10pt;
font-family: Arial; color: black;"></span></div>
<div style="margin: 0.1pt 0in
0.0001pt;"><span style="font-size:
10pt; font-family: Arial; color:
black;">Blog: </span><span
style="font-size: 10pt;
font-family: Arial; color: blue;"><a
moz-do-not-send="true"
href="http://www.ensembl.info/"
target="_blank"><span
style="color: blue;">http://www.ensembl.info</span></a></span><span
style="font-size: 10pt;
font-family: Arial; color: black;"></span></div>
<div style="margin: 0.1pt 0in
0.0001pt;"><span style="font-size:
10pt; font-family: Arial; color:
black;">YouTube: </span><span
style="font-size: 10pt;
font-family: Arial; color: rgb(0,
0, 153);"><a
moz-do-not-send="true"
href="http://www.youtube.com/user/EnsemblHelpdesk"
target="_blank"><span
style="color: blue;">http://www.youtube.com/user/EnsemblHelpdesk</span></a></span><span
style="font-size: 10pt;
font-family: Arial; color: black;"><br>
Facebook: </span><span
style="font-size: 10pt;
font-family: Arial; color: blue;"><a
moz-do-not-send="true"
href="http://www.facebook.com/Ensembl.org"
target="_blank"><span
style="color: blue;">http://www.facebook.com/Ensembl.org</span></a></span><span
style="font-size: 10pt;
font-family: Arial; color: black;"><br>
Twitter: </span><span
style="font-size: 10pt;
font-family: Arial; color: blue;"><a
moz-do-not-send="true"
href="http://twitter.com/Ensembl"
target="_blank"><span
style="color: blue;">http://twitter.com/Ensembl</span></a> </span><span
style="font-size: 10pt;
font-family: Arial; color: black;"></span></div>
</font></div>
<br>
</div>
</blockquote>
<br>
_______________________________________________<br>
Dev mailing list <a moz-do-not-send="true"
href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
List admin (including subscribe/unsubscribe):<span
class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog:<span
class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://www.ensembl.info/">http://www.ensembl.info/</a><br>
</div>
</span></blockquote>
</div>
<br>
<div> <span class="Apple-style-span"
style="border-collapse: separate; color: rgb(0, 0,
0); font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant: normal;
font-weight: normal; letter-spacing: normal;
line-height: normal; orphans: 2; text-indent: 0px;
text-transform: none; white-space: normal; widows:
2; word-spacing: 0px;">
<div style="word-wrap: break-word;"><span
class="Apple-style-span"
style="border-collapse: separate; color:
rgb(0, 0, 0); font-family: Helvetica;
font-size: 12px; font-style: normal;
font-variant: normal; font-weight: normal;
letter-spacing: normal; line-height: normal;
orphans: 2; text-indent: 0px; text-transform:
none; white-space: normal; widows: 2;
word-spacing: 0px;">
<div style="word-wrap: break-word;">
<div>Rhoda Kinsella Ph.D.</div>
<div>
<div style="margin: 0px; font: 12px
Helvetica;">Ensembl Bioinformatician,</div>
</div>
<div>European Bioinformatics Institute
(EMBL-EBI),<br>
Wellcome Trust Genome Campus, </div>
<div>Hinxton<br>
Cambridge CB10 1SD,</div>
<div>UK.</div>
</div>
</span></div>
</span> </div>
<br>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
<div> <span class="Apple-style-span" style="border-collapse:
separate; color: rgb(0, 0, 0); font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant: normal;
font-weight: normal; letter-spacing: normal; line-height:
normal; orphans: 2; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;">
<div style="word-wrap: break-word;"><span
class="Apple-style-span" style="border-collapse:
separate; color: rgb(0, 0, 0); font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant:
normal; font-weight: normal; letter-spacing: normal;
line-height: normal; orphans: 2; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px;">
<div style="word-wrap: break-word;">
<div>Rhoda Kinsella Ph.D.</div>
<div>
<div style="margin: 0px; font: 12px Helvetica;">Ensembl Bioinformatician,</div>
</div>
<div>European Bioinformatics Institute (EMBL-EBI),<br>
Wellcome Trust Genome Campus, </div>
<div>Hinxton<br>
Cambridge CB10 1SD,</div>
<div>UK.</div>
</div>
</span></div>
</span> </div>
<br>
</div>
</blockquote>
<br>
</body>
</html>