<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hi Kiran,<br>
      <br>
      You are correct, we have two sources of data from RefSeq for human
      (and mouse):<br>
      <br>
      1) The logic_name 'refseq_human_import' is a version of RefSeq
      which we use in collaboration with the CCDS
      (<a class="moz-txt-link-freetext" href="http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi">http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi</a>) project to
      create a new CCDS release and therefore is only a subset of the
      RefSeq annotation set. It is supplied to us in GTF format by the
      CCDS team at NCBI.<br>
      <br>
      2) The logic_name 'refseq_import' is our relatively new import of
      annotation from the publicly available RefSeq GFF3 files (e.g:
      <a class="moz-txt-link-freetext" href="ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/ref_GRCh38_top_level.gff3.gz">ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/ref_GRCh38_top_level.gff3.gz</a>).
      This includes _all_ annotation from RefSeq, including annotation
      on the alternate reference loci. We currently have imported the
      GFF3 annotation for all merge species (human, mouse, rat, pig,
      zebrafish) and will import all other species in the next release.
      <br>
      <br>
      To exclude the alternate loci you can use
      attrib_type.code='non_ref', for example:<br>
      <br>
      <pre>select g.analysis_id,a.logic_name,g.source,count(*) 
  from gene g,analysis a
 where g.analysis_id=a.analysis_id 
   and a.logic_name ='refseq_import'
   and g.seq_region_id NOT IN (select sra.seq_region_id 
                                 from seq_region_attrib sra, attrib_type at 
                                where sra.attrib_type_id=at.attrib_type_id
                                  and at.code='non_ref')
 group by g.source,a.logic_name, g.analysis_id;
+-------------+---------------+-----------------+----------+
| analysis_id | logic_name    | source          | count(*) |
+-------------+---------------+-----------------+----------+
|        8360 | refseq_import | BestRefSeq      |    24792 |
|        8360 | refseq_import | Curated Genomic |       22 |
|        8360 | refseq_import | Gnomon          |     4945 |
+-------------+---------------+-----------------+----------+

</pre>
      <br>
      Regards<br>
      <br>
      Dan<br>
      <br>
      <br>
      On 31/10/14 06:00, Kiran Mukhyala wrote:<br>
    </div>
    <blockquote
      cite="mid:20141031060008.16318131FF2_4532568B@hx-mx2.ebi.ac.uk"
      type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>Hello Ensembl!<br>
                <br>
              </div>
              There are multiple gene sets imported from RefSeq in the
              human other_features database in e77.<br>
              <br>
            </div>
          </div>
          <div>><i>select analysis_id,logic_name,source,count(*) from
              gene join analysis using (analysis_id) group by source</i><br>
            <span style="font-family:'courier new',monospace"><br>
              <b>analysis_id    logic_name    </b></span><b><span
                style="font-family:'courier new',monospace"><span
                  style="font-family:'courier new',monospace">  </span>source   
              </span><span style="font-family:'courier new',monospace"><span
                  style="font-family:'courier new',monospace">    </span></span></b><span
              style="font-family:'courier new',monospace"><b><span
                  style="font-family:'courier new',monospace">  </span>count(*)</b><br>
              8359    ccds_import    </span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span></span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace"><span
                  style="font-family:'courier new',monospace">    </span></span>ccds   
            </span><span style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span></span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span>30493<br>
              8358    estgene    </span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span></span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace"><span
                  style="font-family:'courier new',monospace">    </span></span></span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace"><span
                  style="font-family:'courier new',monospace"><span
                    style="font-family:'courier new',monospace">    </span></span></span>ensembl    
            </span><span style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span>30287<br>
              8166    refseq_human_import    refseq    </span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">  </span></span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span>26652<br>
              8360    refseq_import    </span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span></span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace"><span
                  style="font-family:'courier new',monospace">  </span></span>BestRefSeq 
            </span><span style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span>27398<br>
              8360    refseq_import    </span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span></span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace"><span
                  style="font-family:'courier new',monospace">  </span></span>Curated
              Genomic    26<br>
              8360    refseq_import    </span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span></span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace"><span
                  style="font-family:'courier new',monospace">  </span></span>Gnomon   
            </span><span style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace">    </span></span><span
              style="font-family:'courier new',monospace"><span
                style="font-family:'courier new',monospace"><span
                  style="font-family:'courier new',monospace">   </span></span>5363<br>
            </span></div>
          <div><br>
          </div>
          <div>
            <div>I am interested in the gene models imported from RefSeq
              using GRCh38 reference assembly excluding the alternate
              loci. It looks like genes from 'BestRefSeq' include
              alternate loci while genes from 'refseq' don't. </div>
            <div><br>
            </div>
            <div>Could you please explain what the the difference
              between them are?<br>
            </div>
          </div>
          <div><br>
          </div>
          Thanks,<br>
        </div>
        -Kiran<br>
        <div>
          <div><br>
            <br>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>