<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Kiran,<br>
<br>
If you are familiar with the Ensembl healthchecks, you are probably
aware that they mostly use SQL calls.<br>
Hence, mysql queries or API calls should be able to give you all the
numbers you need.<br>
<br>
The key factor here is to have access to two sets of databases, the
current and the previous release.<br>
Once you have that, you should be able to run all the comparisons
you want.<br>
<br>
Another solution would be to process the databases each release and
store the results somewhere.<br>
Then you would be able to compare a release with any prior one.<br>
<br>
Keeping track of how many gene models have changed, especially if
working with human, is a relatively tricky task.<br>
The stable id mapping would probably be the best way to go.<br>
In the gene_archive table, you can get a list of all genes which
have changed from the previous release.<br>
This includes version changes or complete retirement.<br>
For example, select count(distinct gene_stable_id) from gene_archive
where mapping_session_id = 395 ;<br>
indicates that 693 genes have changed from release 73 to 74.<br>
<br>
For other statistics, we do try and include them on our annotation
page:<br>
<a class="moz-txt-link-freetext" href="http://www.ensembl.org/Homo_sapiens/Info/Annotation#assembly">http://www.ensembl.org/Homo_sapiens/Info/Annotation#assembly</a><br>
This displays number of genes by biotype groups, total number of
variations and assembly version.<br>
If this does not cover all the numbers you are looking for, we will
happily take suggestions into consideration.<br>
<br>
Also, from release 75 onwards, these statistics will also be
available directly from the database, stored in the
genome_statistics table.<br>
<br>
<br>
Regards,<br>
Magali<br>
<br>
<div class="moz-cite-prefix">On 23/01/2014 22:51, Kiran Mukhyala
wrote:<br>
</div>
<blockquote
cite="mid:CACqrx13fS64a7ORXJgjCegt0Gh4YzZSVK=cBusqjv-=WTPY_nA@mail.gmail.com"
type="cite">
<div dir="ltr">Hello,
<div><br>
</div>
<div>I am looking for a way to summarize the differences between
two versions of Ensembl databases for a given species. </div>
<div>Specifically things like the total number of genes, how
many gene models have changed, number of genes with PFAM
domains, number of protein coding genes, number of variations
from various sources, number of homologs in species X etc.<br>
</div>
<div><br>
</div>
<div>I am aware of two ways to do this:</div>
<div><br>
</div>
<div>1. By reading the release details page for each version
that I am interested in, which doesn't really give me the
numbers I am looking for.<br>
</div>
<div>2. Using Ensembl healthcheck which I assume is hard to
customize.</div>
<div><br>
</div>
<div>Are there any other tools for accomplishing this? If not,
would a tool like that be useful to anyone else?</div>
<div><br>
</div>
<div>Thanks,</div>
<div>-Kiran</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
</body>
</html>