<div class="gmail_quote"><meta http-equiv="content-type" content="text/html; charset=utf-8"><div class="gmail_quote">On Mon, May 23, 2011 at 2:00 PM, Paul Flicek <span dir="ltr"><<a href="mailto:flicek@ebi.ac.uk" target="_blank">flicek@ebi.ac.uk</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">

Once the Ensembl gene set is created we create "external references" to the RefSeq identifiers to identify those objects that are "biologically the same".  Note however, that a RefSeq that corresponds to an Ensembl gene does not mean that these are have identical placements on the genome assembly.</blockquote>

</div><div><br></div><br><div>Paul-</div><div><br></div><div>Thanks for your explanation. I understand that transcripts may be different/similar/identical in many dimensions and that Ensembl made a particular and reasonable choice about grouping transcripts by some criteria.</div>

<div><br></div><div>For whatever it's worth, my expectations were 1) fetch_all_by_external_name would return a transcript that was fully consistent with the primary source (rather than similar by some criteria to it); and 2) all transcript grouping happened at the level of Ensembl Genes (which also groups transcripts).</div>

<div><br></div><div>Because it's so convenient to code for Ensembl, I'd still like to see if there's a way to accomplish what I want with Ensembl. The goal is convert HGVS variants specified using NCBI accessions between genomic, raw transcript (i.e., 'r.' variants), CDS, and protein coordinate systems. To achieve accurate conversion in the general case, it is necessary to have a single, shared understanding of the exon structure, accurate to nucleotide level, as implied by the named transcript. Exon-level similarity, even when the CDS is unchanged, doesn't cut it in this case.</div>

<div><br></div><div>Does anyone know whether it would work to load NCBI exons directly into Ensembl? I'm hoping that populating the transcript, transcript_stable_id, exon, and exon_transcript tables with original NCBI data would suffice. Is that too naive?</div>

<div><br></div><div>Thanks,</div><div>Reece</div><div><br></div></div>