<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body text="#000000" bgcolor="#ffffff">

    Hello<br>

    <br>

    For general questions you can try biostar.stackexchange.com and

    seqanswers (i've never used the this forum but i've heard of it)<br>

    <br>

    For issues relating to ensembl you can email their mailing lists<br>

    <a class="moz-txt-link-freetext" href="http://www.ensembl.org/info/about/contact/mailing.html">http://www.ensembl.org/info/about/contact/mailing.html</a><br>

    <br>

    Other people might know some others?<br>

    <br>

    andrea<br>

    <br>

    On 12/01/2011 16:10, shaabana abdo wrote:

    <blockquote

      cite="mid:AANLkTi=FA2=wXchAVPzEmGn3ZTA17fVbkgZ2ddB4-rTF@mail.gmail.com"

      type="cite">Dear collage <br>

       <br>

      I am a PhD.d student in university de Montreal Canada in

      biomedical science program ,<br>

       I would like to know some scientific Internet site to discuss in

      techniques and the problem of experiments. <br>

      like , if i have technical problem i will propose my problem and

      then i will get discussion with other, which they are expert more

      than me <br>

       <br>

      i wish you help me,<br>

       <br>

      with my best wishes <br>

       <br>

      shoby <br>

      <br>

      <div class="gmail_quote">On Wed, Jan 12, 2011 at 9:50 AM, Andrea

        Edwards <span dir="ltr"><<a moz-do-not-send="true"

            href="mailto:edwardsa@cs.man.ac.uk">edwardsa@cs.man.ac.uk</a>></span>

        wrote:<br>

        <blockquote style="border-left: 1px solid rgb(204, 204, 204);

          margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"

          class="gmail_quote">Hello<br>

          <br>

          When i was looking at just exons I used to use exactly the

          same approach as Alison. Now i want to annotate my snps to

          store their relationships to the exons / genes / transcripts

          they affect I have decided to approach the problem from the

          other side as it were.<br>

          <br>

          Pablo, the idea about flattening the data once per release is

          brilliant. I shall defininitely adopt that approach in the

          long term. Would you be willing to post your script to the

          group? I'm glad I asked now. I bet flattening the data takes

          hours off the run time?<br>

          <br>

          Many thanks :)

          <div>

            <div class="h5"><br>

              <br>

              <br>

              <br>

              <br>

              On 12/01/2011 10:36, Pablo Marin-Garcia wrote:<br>

              <blockquote style="border-left: 1px solid rgb(204, 204,

                204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"

                class="gmail_quote">On Wed, 12 Jan 2011, Alison Meynert

                wrote:<br>

                <br>

                <blockquote style="border-left: 1px solid rgb(204, 204,

                  204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"

                  class="gmail_quote">Hi Andrea,<br>

                  <br>

                  I have the same issue with mapping lots of SNPs to

                  Ensembl features. My solution, at least for exons or

                  other reasonably small sets, is to invert the problem

                  by iterating over all features and asking which of my

                  SNPs overlap a given feature. Also, I have a local

                  installation of some Ensembl databases, which really

                  helps.<br>

                  <br>

                  To do the comparison, I load all my SNPs into a MySQL

                  database table indexed on the sequence

                  region/chromosome and position fields, and query that

                  table with given feature coordinates. Example Perl

                  code below.<br>

                  <br>

                  I'd be interested to hear if anyone has more

                  efficient/faster solutions as this is something that

                  seems good enough for my purposes at the moment but

                  may not scale well.<br>

                </blockquote>

                <br>

                <br>

                Hello Alison,<br>

                <br>

                I follow a similar approach when I want to describe all

                snps in a gene, but also when my starting point, like

                Andrea, are SNPs (usually a sparse set in my case) not

                genes I do the inverse than you, because a SNP could lay

                in several exons or genes so is more natural for me

                going the other way around.<br>

                <br>

                For each ensembl release I run a script that loop all

                genes-transcript-exons and flattens the data by exons

                and then load the tab file to mysql into a table

                [exon_id exon_start exon_end transcript_id gen_id

                gen_name] being the exon_id-transcript_id the uniq key,

                and exon_start, exon_end, and gene_id being indexed.<br>

                <br>

                That way I can quickly find which genes, transcript,

                exons overlap my SNP. And also I am able to find

                exons/genes n-bases nearby my SNP so I can look for

                other LD SNPs in this genes.<br>

                <br>

                So my scripts and SQL queries are basically like yours

                but starting the problem from the other end (looking

                which exons overlap the SNP).<br>

                <br>

                I think that for a whole genome approach this is faster

                than connecting directly to ensembl each time, and

                instantiating all the time ensembl objects. The drawback

                is that this is a specific solution, so you loose

                flexibility in order to gain efficiency, but this is how

                life is. Finally, as you said, hearing about other

                approaches is welcome.<br>

                <br>

                <br>

                  -Pablo<br>

                <br>

                <br>

                <blockquote style="border-left: 1px solid rgb(204, 204,

                  204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"

                  class="gmail_quote"><br>

                  Cheers,<br>

                  Alison<br>

                  <br>

                  # connect to the database<br>

                  my $dbh =

                  DBI->connect("DBI:mysql:database=$dbname;host=$hostname",

                  $user, $pass) or die "Cannot connect to database\n$!";<br>

                  <br>

                  # set up the select statement<br>

                  my $sth = $dbh->prepare(SELECT * FROM snp WHERE

                  seq_region = ? AND pos >= ? AND pos <= ?);<br>

                  <br>

                  # iterate over exons or other features<br>

                  my $exons = $exon_adaptor->fetch_all ... ;<br>

                  while (my $exon = shift @{ $exons })<br>

                  {<br>

                    # select SNPs<br>

                    $sth->execute($exon->seq_region_name,

                  $exon->start, $exon->end);<br>

                    while (my $ref = $sth->fetchrow_hashref())<br>

                    {<br>

                        # do something with SNPs<br>

                        my $snp_id = $ref->{'snp_id'};<br>

                        ...<br>

                    }<br>

                  }<br>

                  <br>

                  On 11/01/2011 21:14, Andrea Edwards wrote:<br>

                  <blockquote style="border-left: 1px solid rgb(204,

                    204, 204); margin: 0px 0px 0px 0.8ex; padding-left:

                    1ex;" class="gmail_quote">Sorry, missed the last bit

                    off my last message. This was what i did but<br>

                    was wondering if there was more efficient way as I

                    have such a lot of data.<br>

                    I have just given an example for one locus as i'm

                    sure you can imagine<br>

                    this code will be looped millions of times<br>

                    <br>

                    $slice = $slice_adaptor->fetch_by_region(

                    'chromosome', '9', 21816758,<br>

                    21816758 );<br>

                    $exons = $slice->get_all_Exons;<br>

                    while (my $exon = shift @{$exons}) {<br>

                    <br>

                    my $gene =

                    $gene_adaptor->fetch_by_exon_stable_id($exon->stable_id);<br>

                    #process gene<br>

                    my $transcripts =

                    $transcript_adaptor->fetch_all_by_exon_stable_id<br>

                    ($exon->stable_id);<br>

                    <br>

                    while (my $transcript = shift @{$transcripts}) {<br>

                    #process transcript<br>

                    }<br>

                    <br>

                    }<br>

                    <br>

                    <br>

                    <br>

                    <br>

                    On 11/01/2011 19:38, Andrea Edwards wrote:<br>

                    <blockquote style="border-left: 1px solid rgb(204,

                      204, 204); margin: 0px 0px 0px 0.8ex;

                      padding-left: 1ex;" class="gmail_quote">Hello<br>

                      <br>

                      i have this code below taken from the core api

                      tutorial which gets me<br>

                      all the exons and transcripts for the gene(s) that

                      overlap a slice.<br>

                      <br>

                      I was hoping for an easy way to get those features

                      of the gene that<br>

                      only overlap the original one bp slice; this code

                      gets all exons and<br>

                      transcripts<br>

                      associated with the gene<br>

                      <br>

                      I thought you might be able to call

                      'get_all_Object' methods with a<br>

                      parameter which represents a region of sequence

                      overlap but it seems not.<br>

                      I also thought they might be filtered

                      automatically based on the<br>

                      underlying slice but it seems not.<br>

                      <br>

                      Naturally i can filter the features in the list

                      based on their start<br>

                      and end positions but for speed it would be easier

                      not to retrieve<br>

                      them all at.<br>

                      I have a lot of data so speed is important. Please

                      can you advise the<br>

                      best way to do this.<br>

                      <br>

                      $slice = $slice_adaptor->fetch_by_region(

                      'chromosome', '9', 21816758,<br>

                      21816758 );<br>

                      <br>

                      my $genes = $slice->get_all_Genes();<br>

                      while ( my $gene = shift @{$genes} ) {<br>

                      my $gstring = feature2string($gene);<br>

                      print "$gstring\n";<br>

                      <br>

                      my $transcripts = $gene->get_all_Transcripts();<br>

                      while ( my $transcript = shift @{$transcripts} ) {<br>

                      my $tstring = feature2string($transcript);<br>

                      print "\t$tstring\n";<br>

                      <br>

                      foreach my $exon ( @{

                      $transcript->get_all_Exons() } ) {<br>

                      my $estring = feature2string($exon);<br>

                      print "\t\t$estring\n";<br>

                      }<br>

                      }<br>

                      }<br>

                      <br>

                      print "done\n";<br>

                      <br>

                      Many thanks<br>

                      <br>

                      _______________________________________________<br>

                      Dev mailing list<br>

                      <a moz-do-not-send="true"

                        href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

                      <a moz-do-not-send="true"

                        href="http://lists.ensembl.org/mailman/listinfo/dev"

                        target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

                    </blockquote>

                    <br>

                    <br>

                    _______________________________________________<br>

                    Dev mailing list<br>

                    <a moz-do-not-send="true"

                      href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

                    <a moz-do-not-send="true"

                      href="http://lists.ensembl.org/mailman/listinfo/dev"

                      target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

                  </blockquote>

                  <br>

                  -- <br>

                  Alison Meynert<br>

                  MRC Human Genetics Unit, Edinburgh<br>

                  <a moz-do-not-send="true"

                    href="mailto:alison.meynert@hgu.mrc.ac.uk"

                    target="_blank">alison.meynert@hgu.mrc.ac.uk</a><br>

                  <br>

                  _______________________________________________<br>

                  Dev mailing list<br>

                  <a moz-do-not-send="true"

                    href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

                  <a moz-do-not-send="true"

                    href="http://lists.ensembl.org/mailman/listinfo/dev"

                    target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

                  <br>

                </blockquote>

                <br>

                <br>

                -----<br>

                <br>

                 Pablo Marin-Garcia<br>

                <br>

                <br>

                _______________________________________________<br>

                Dev mailing list<br>

                <a moz-do-not-send="true" href="mailto:Dev@ensembl.org"

                  target="_blank">Dev@ensembl.org</a><br>

                <a moz-do-not-send="true"

                  href="http://lists.ensembl.org/mailman/listinfo/dev"

                  target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

              </blockquote>

              <br>

              <br>

              _______________________________________________<br>

              Dev mailing list<br>

              <a moz-do-not-send="true" href="mailto:Dev@ensembl.org"

                target="_blank">Dev@ensembl.org</a><br>

              <a moz-do-not-send="true"

                href="http://lists.ensembl.org/mailman/listinfo/dev"

                target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>

            </div>

          </div>

        </blockquote>

      </div>

      <br>

    </blockquote>

    <br>

  </body>

</html>