<div dir="ltr">Dear Andy,<div><br></div><div>Thanks for the swift reply. I realised when looking through the assembly table that there were a lot of (~13000) records matching the gene seq_region_id, but wasn't sure exactly how to trace my way back through to the actual gene sequence.</div><div><br></div><div>The reason I was wanting to do this, was to look at the difference in resource utilisation (memory, cpu, runtime etc.) of the different methods of accessing EnsEMBL data. I had already coded some stuff up in Perl and Python (I prefer Python for general development work, but have used Perl, and others, for a lot longer), and used curl via the REST API too. It was also out if personal interest to gain greater insight and understanding of the internal mechanisms of the EnsEMBL API and databases.</div><div><br></div><div>Cheers,</div><div><br></div><div>Steve<br><div class="gmail_extra"><br><div class="gmail_quote">On 16 December 2014 at 16:48,  <span dir="ltr"><<a href="mailto:dev-request@ensembl.org" target="_blank">dev-request@ensembl.org</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Message: 1<br>
Date: Tue, 16 Dec 2014 16:28:58 +0000<br>
From: Andrew Yates <<a href="mailto:ayates@ebi.ac.uk">ayates@ebi.ac.uk</a>><br>
Subject: Re: [ensembl-dev] SQL query to retrieve gene sequence...<br>
To: Ensembl developers list <<a href="mailto:dev@ensembl.org">dev@ensembl.org</a>><br>
Message-ID: <<a href="mailto:469F4D6A-436C-46C2-9340-535201787788@ebi.ac.uk">469F4D6A-436C-46C2-9340-535201787788@ebi.ac.uk</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Hey Steve,<br>
<br>
The problem with using the database is that sequence is not stored against the top-level sequences annotation is held against. Instead sequence is held against the contig sequence regions which requires descending through the assembly table an unspecified number of times (once for each mapping e.g. chromosome -> supercontig -> contig).<br>
<br>
I would seriously *not* recommend doing this. Not only do you have to deal with descending down the assembly but also having to think about concatenating the sequence & paying attention to the orientation of assembly. Instead you could use the Perl API (probably not an option considering you?re a Python guy), BioMart (you can access unspliced gene sequence quite easily), the REST API or download the full genome sequence from FTP and doing subslices. The faindex index tool from htslib/samtools is pretty good at extracting arbitrary sequence from very large FASTA files.<br>
<br>
Andy<br>
<br>
------------<br>
Andrew Yates - Ensembl Support Coordinator<br>
European Molecular Biology Laboratory<br>
European Bioinformatics Institute<br>
Wellcome Trust Genome Campus<br>
Hinxton, Cambridge<br>
CB10 1SD, United Kingdom<br>
Tel: <a href="tel:%2B44-%280%291223-492538" value="+441223492538">+44-(0)1223-492538</a><br>
Fax: <a href="tel:%2B44-%280%291223-494468" value="+441223494468">+44-(0)1223-494468</a><br>
Skype: andrewyatz<br>
<a href="http://www.ensembl.org/" target="_blank">http://www.ensembl.org/</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><a href="http://about.me/gawbul" style="margin:0px;padding:0px;border:0px;outline:0px;font-size:14px;vertical-align:baseline;color:rgb(43,130,173);text-decoration:none;line-height:18px" target="_blank"><font face="tahoma, sans-serif"></font><table border="0" cellpadding="0" cellspacing="0" style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;vertical-align:baseline;border-spacing:0px"><tbody style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;vertical-align:baseline"><tr style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;vertical-align:baseline"><td style="padding:0px;border:0px;outline:0px;font-style:inherit;font-size:0px;vertical-align:baseline;height:30px"> </td></tr><tr style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;vertical-align:baseline"><td align="left" valign="top" style="padding:0px;border:0px;outline:0px;font-style:inherit;vertical-align:top;line-height:1"><div style="margin:0px;padding:0px;border:0px;outline:0px;font-weight:bold;font-style:inherit;font-size:18px;vertical-align:baseline;line-height:1;color:rgb(51,51,51)">Steve Moss</div><div style="margin:3px 0px 0px;padding:0px;border:0px;outline:0px;font-style:inherit;font-size:12px;vertical-align:baseline">about.me/gawbul</div></td></tr><tr style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;vertical-align:baseline"><td align="left" valign="top" style="padding:8px 0px 0px;border:0px;outline:0px;font-style:inherit;vertical-align:top;line-height:1"><div style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;vertical-align:baseline;text-align:right;background-color:rgb(197,208,224);height:4px"><img src="http://d13pix9kaak6wt.cloudfront.net/signature/colorbar.png" alt="Steve Moss on about.me" width="88" height="4" style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;vertical-align:baseline;float:right;display:block"></div></td></tr><tr style="margin:0px;padding:0px;border:0px;outline:0px;font-style:inherit;vertical-align:baseline"><td style="padding:0px;border:0px;outline:0px;font-style:inherit;font-size:0px;vertical-align:baseline;height:20px"> </td></tr></tbody></table></a></div></div>
</div></div></div>