<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Hello<br>
<br>
For general questions you can try biostar.stackexchange.com and
seqanswers (i've never used the this forum but i've heard of it)<br>
<br>
For issues relating to ensembl you can email their mailing lists<br>
<a class="moz-txt-link-freetext" href="http://www.ensembl.org/info/about/contact/mailing.html">http://www.ensembl.org/info/about/contact/mailing.html</a><br>
<br>
Other people might know some others?<br>
<br>
andrea<br>
<br>
On 12/01/2011 16:10, shaabana abdo wrote:
<blockquote
cite="mid:AANLkTi=FA2=wXchAVPzEmGn3ZTA17fVbkgZ2ddB4-rTF@mail.gmail.com"
type="cite">Dear collage <br>
<br>
I am a PhD.d student in university de Montreal Canada in
biomedical science program ,<br>
I would like to know some scientific Internet site to discuss in
techniques and the problem of experiments. <br>
like , if i have technical problem i will propose my problem and
then i will get discussion with other, which they are expert more
than me <br>
<br>
i wish you help me,<br>
<br>
with my best wishes <br>
<br>
shoby <br>
<br>
<div class="gmail_quote">On Wed, Jan 12, 2011 at 9:50 AM, Andrea
Edwards <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:edwardsa@cs.man.ac.uk">edwardsa@cs.man.ac.uk</a>></span>
wrote:<br>
<blockquote style="border-left: 1px solid rgb(204, 204, 204);
margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"
class="gmail_quote">Hello<br>
<br>
When i was looking at just exons I used to use exactly the
same approach as Alison. Now i want to annotate my snps to
store their relationships to the exons / genes / transcripts
they affect I have decided to approach the problem from the
other side as it were.<br>
<br>
Pablo, the idea about flattening the data once per release is
brilliant. I shall defininitely adopt that approach in the
long term. Would you be willing to post your script to the
group? I'm glad I asked now. I bet flattening the data takes
hours off the run time?<br>
<br>
Many thanks :)
<div>
<div class="h5"><br>
<br>
<br>
<br>
<br>
On 12/01/2011 10:36, Pablo Marin-Garcia wrote:<br>
<blockquote style="border-left: 1px solid rgb(204, 204,
204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"
class="gmail_quote">On Wed, 12 Jan 2011, Alison Meynert
wrote:<br>
<br>
<blockquote style="border-left: 1px solid rgb(204, 204,
204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"
class="gmail_quote">Hi Andrea,<br>
<br>
I have the same issue with mapping lots of SNPs to
Ensembl features. My solution, at least for exons or
other reasonably small sets, is to invert the problem
by iterating over all features and asking which of my
SNPs overlap a given feature. Also, I have a local
installation of some Ensembl databases, which really
helps.<br>
<br>
To do the comparison, I load all my SNPs into a MySQL
database table indexed on the sequence
region/chromosome and position fields, and query that
table with given feature coordinates. Example Perl
code below.<br>
<br>
I'd be interested to hear if anyone has more
efficient/faster solutions as this is something that
seems good enough for my purposes at the moment but
may not scale well.<br>
</blockquote>
<br>
<br>
Hello Alison,<br>
<br>
I follow a similar approach when I want to describe all
snps in a gene, but also when my starting point, like
Andrea, are SNPs (usually a sparse set in my case) not
genes I do the inverse than you, because a SNP could lay
in several exons or genes so is more natural for me
going the other way around.<br>
<br>
For each ensembl release I run a script that loop all
genes-transcript-exons and flattens the data by exons
and then load the tab file to mysql into a table
[exon_id exon_start exon_end transcript_id gen_id
gen_name] being the exon_id-transcript_id the uniq key,
and exon_start, exon_end, and gene_id being indexed.<br>
<br>
That way I can quickly find which genes, transcript,
exons overlap my SNP. And also I am able to find
exons/genes n-bases nearby my SNP so I can look for
other LD SNPs in this genes.<br>
<br>
So my scripts and SQL queries are basically like yours
but starting the problem from the other end (looking
which exons overlap the SNP).<br>
<br>
I think that for a whole genome approach this is faster
than connecting directly to ensembl each time, and
instantiating all the time ensembl objects. The drawback
is that this is a specific solution, so you loose
flexibility in order to gain efficiency, but this is how
life is. Finally, as you said, hearing about other
approaches is welcome.<br>
<br>
<br>
-Pablo<br>
<br>
<br>
<blockquote style="border-left: 1px solid rgb(204, 204,
204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"
class="gmail_quote"><br>
Cheers,<br>
Alison<br>
<br>
# connect to the database<br>
my $dbh =
DBI->connect("DBI:mysql:database=$dbname;host=$hostname",
$user, $pass) or die "Cannot connect to database\n$!";<br>
<br>
# set up the select statement<br>
my $sth = $dbh->prepare(SELECT * FROM snp WHERE
seq_region = ? AND pos >= ? AND pos <= ?);<br>
<br>
# iterate over exons or other features<br>
my $exons = $exon_adaptor->fetch_all ... ;<br>
while (my $exon = shift @{ $exons })<br>
{<br>
# select SNPs<br>
$sth->execute($exon->seq_region_name,
$exon->start, $exon->end);<br>
while (my $ref = $sth->fetchrow_hashref())<br>
{<br>
# do something with SNPs<br>
my $snp_id = $ref->{'snp_id'};<br>
...<br>
}<br>
}<br>
<br>
On 11/01/2011 21:14, Andrea Edwards wrote:<br>
<blockquote style="border-left: 1px solid rgb(204,
204, 204); margin: 0px 0px 0px 0.8ex; padding-left:
1ex;" class="gmail_quote">Sorry, missed the last bit
off my last message. This was what i did but<br>
was wondering if there was more efficient way as I
have such a lot of data.<br>
I have just given an example for one locus as i'm
sure you can imagine<br>
this code will be looped millions of times<br>
<br>
$slice = $slice_adaptor->fetch_by_region(
'chromosome', '9', 21816758,<br>
21816758 );<br>
$exons = $slice->get_all_Exons;<br>
while (my $exon = shift @{$exons}) {<br>
<br>
my $gene =
$gene_adaptor->fetch_by_exon_stable_id($exon->stable_id);<br>
#process gene<br>
my $transcripts =
$transcript_adaptor->fetch_all_by_exon_stable_id<br>
($exon->stable_id);<br>
<br>
while (my $transcript = shift @{$transcripts}) {<br>
#process transcript<br>
}<br>
<br>
}<br>
<br>
<br>
<br>
<br>
On 11/01/2011 19:38, Andrea Edwards wrote:<br>
<blockquote style="border-left: 1px solid rgb(204,
204, 204); margin: 0px 0px 0px 0.8ex;
padding-left: 1ex;" class="gmail_quote">Hello<br>
<br>
i have this code below taken from the core api
tutorial which gets me<br>
all the exons and transcripts for the gene(s) that
overlap a slice.<br>
<br>
I was hoping for an easy way to get those features
of the gene that<br>
only overlap the original one bp slice; this code
gets all exons and<br>
transcripts<br>
associated with the gene<br>
<br>
I thought you might be able to call
'get_all_Object' methods with a<br>
parameter which represents a region of sequence
overlap but it seems not.<br>
I also thought they might be filtered
automatically based on the<br>
underlying slice but it seems not.<br>
<br>
Naturally i can filter the features in the list
based on their start<br>
and end positions but for speed it would be easier
not to retrieve<br>
them all at.<br>
I have a lot of data so speed is important. Please
can you advise the<br>
best way to do this.<br>
<br>
$slice = $slice_adaptor->fetch_by_region(
'chromosome', '9', 21816758,<br>
21816758 );<br>
<br>
my $genes = $slice->get_all_Genes();<br>
while ( my $gene = shift @{$genes} ) {<br>
my $gstring = feature2string($gene);<br>
print "$gstring\n";<br>
<br>
my $transcripts = $gene->get_all_Transcripts();<br>
while ( my $transcript = shift @{$transcripts} ) {<br>
my $tstring = feature2string($transcript);<br>
print "\t$tstring\n";<br>
<br>
foreach my $exon ( @{
$transcript->get_all_Exons() } ) {<br>
my $estring = feature2string($exon);<br>
print "\t\t$estring\n";<br>
}<br>
}<br>
}<br>
<br>
print "done\n";<br>
<br>
Many thanks<br>
<br>
_______________________________________________<br>
Dev mailing list<br>
<a moz-do-not-send="true"
href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ensembl.org/mailman/listinfo/dev"
target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
</blockquote>
<br>
<br>
_______________________________________________<br>
Dev mailing list<br>
<a moz-do-not-send="true"
href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ensembl.org/mailman/listinfo/dev"
target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
</blockquote>
<br>
-- <br>
Alison Meynert<br>
MRC Human Genetics Unit, Edinburgh<br>
<a moz-do-not-send="true"
href="mailto:alison.meynert@hgu.mrc.ac.uk"
target="_blank">alison.meynert@hgu.mrc.ac.uk</a><br>
<br>
_______________________________________________<br>
Dev mailing list<br>
<a moz-do-not-send="true"
href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ensembl.org/mailman/listinfo/dev"
target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
<br>
</blockquote>
<br>
<br>
-----<br>
<br>
Pablo Marin-Garcia<br>
<br>
<br>
_______________________________________________<br>
Dev mailing list<br>
<a moz-do-not-send="true" href="mailto:Dev@ensembl.org"
target="_blank">Dev@ensembl.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ensembl.org/mailman/listinfo/dev"
target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
</blockquote>
<br>
<br>
_______________________________________________<br>
Dev mailing list<br>
<a moz-do-not-send="true" href="mailto:Dev@ensembl.org"
target="_blank">Dev@ensembl.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ensembl.org/mailman/listinfo/dev"
target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</body>
</html>