<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Kiran,<br>
<br>
You are correct, we have two sources of data from RefSeq for human
(and mouse):<br>
<br>
1) The logic_name 'refseq_human_import' is a version of RefSeq
which we use in collaboration with the CCDS
(<a class="moz-txt-link-freetext" href="http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi">http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi</a>) project to
create a new CCDS release and therefore is only a subset of the
RefSeq annotation set. It is supplied to us in GTF format by the
CCDS team at NCBI.<br>
<br>
2) The logic_name 'refseq_import' is our relatively new import of
annotation from the publicly available RefSeq GFF3 files (e.g:
<a class="moz-txt-link-freetext" href="ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/ref_GRCh38_top_level.gff3.gz">ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/ref_GRCh38_top_level.gff3.gz</a>).
This includes _all_ annotation from RefSeq, including annotation
on the alternate reference loci. We currently have imported the
GFF3 annotation for all merge species (human, mouse, rat, pig,
zebrafish) and will import all other species in the next release.
<br>
<br>
To exclude the alternate loci you can use
attrib_type.code='non_ref', for example:<br>
<br>
<pre>select g.analysis_id,a.logic_name,g.source,count(*)
from gene g,analysis a
where g.analysis_id=a.analysis_id
and a.logic_name ='refseq_import'
and g.seq_region_id NOT IN (select sra.seq_region_id
from seq_region_attrib sra, attrib_type at
where sra.attrib_type_id=at.attrib_type_id
and at.code='non_ref')
group by g.source,a.logic_name, g.analysis_id;
+-------------+---------------+-----------------+----------+
| analysis_id | logic_name | source | count(*) |
+-------------+---------------+-----------------+----------+
| 8360 | refseq_import | BestRefSeq | 24792 |
| 8360 | refseq_import | Curated Genomic | 22 |
| 8360 | refseq_import | Gnomon | 4945 |
+-------------+---------------+-----------------+----------+
</pre>
<br>
Regards<br>
<br>
Dan<br>
<br>
<br>
On 31/10/14 06:00, Kiran Mukhyala wrote:<br>
</div>
<blockquote
cite="mid:20141031060008.16318131FF2_4532568B@hx-mx2.ebi.ac.uk"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>Hello Ensembl!<br>
<br>
</div>
There are multiple gene sets imported from RefSeq in the
human other_features database in e77.<br>
<br>
</div>
</div>
<div>><i>select analysis_id,logic_name,source,count(*) from
gene join analysis using (analysis_id) group by source</i><br>
<span style="font-family:'courier new',monospace"><br>
<b>analysis_id logic_name </b></span><b><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span>source
</span><span style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span></b><span
style="font-family:'courier new',monospace"><b><span
style="font-family:'courier new',monospace"> </span>count(*)</b><br>
8359 ccds_import </span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span>ccds
</span><span style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span>30493<br>
8358 estgene </span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span></span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span></span>ensembl
</span><span style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span>30287<br>
8166 refseq_human_import refseq </span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span>26652<br>
8360 refseq_import </span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span>BestRefSeq
</span><span style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span>27398<br>
8360 refseq_import </span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span>Curated
Genomic 26<br>
8360 refseq_import </span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span>Gnomon
</span><span style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"><span
style="font-family:'courier new',monospace"> </span></span>5363<br>
</span></div>
<div><br>
</div>
<div>
<div>I am interested in the gene models imported from RefSeq
using GRCh38 reference assembly excluding the alternate
loci. It looks like genes from 'BestRefSeq' include
alternate loci while genes from 'refseq' don't. </div>
<div><br>
</div>
<div>Could you please explain what the the difference
between them are?<br>
</div>
</div>
<div><br>
</div>
Thanks,<br>
</div>
-Kiran<br>
<div>
<div><br>
<br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
</body>
</html>