<div dir="ltr">Hi Guys<div><br></div><div>I have a script that based on a gene symbol connects to ensembl and retrieves the canonical transcript and then does the same using the external database adaptor to get the canonical refseq transcript.</div><div><br></div><div>However this does not seem to give me the correct result</div><div><br></div><div>Take for example the gene SKI ( I am using GRCh37 assembly btw)</div><div><br></div><div>If you open this gene on the Ensembl browser:</div><div><br></div><div><a href="http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343">http://grch37.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000157933;r=1:2159997-2161343</a><br></div><div><br></div><div><br></div><div>On SKI, Ensembl annotates as the canonical transcript: <a href="http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000157933;r=1:2159997-2161343;t=ENST00000378536" style="color:rgb(0,0,102);font-family:'Luxi Sans',Helvetica,Arial,Geneva,sans-serif;font-size:10px;font-weight:bold;background-color:rgb(234,238,255)" target="_blank">ENST00000378536</a></div><div><br></div><div>However, using by script, the external database adaptor returns the refseq <a href="http://www.ncbi.nlm.nih.gov/protein/XP_005244832.1" target="_blank">XP_005244832.1</a> as the refseq canonical transcript, even though the correct canonical transcripts is NM_003036.3 </div><div><br></div><div><a href="http://www.ncbi.nlm.nih.gov/gene/6497">http://www.ncbi.nlm.nih.gov/gene/6497</a></div><div><br></div><div>Unless I am understanding this incorrectly if the coding regions is the same length in 2 transcripts the longest should be the canonical </div><div><br></div><div>The longer Refseq is NM_003036.3 (has a longer 5prime UTR)</div><div><br></div><div>Can you help me understand this?</div><div><br></div><div>Many thanks</div><div><br></div><div>Duarte<br><div><div data-smartmail="gmail_signature"><div dir="ltr"></div></div></div>
</div></div>