<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
    <title></title>
  </head>
  <body text="#000000" bgcolor="#ffffff">
    Hello<br>
    <br>
    For general questions you can try biostar.stackexchange.com and
    seqanswers (i've never used the this forum but i've heard of it)<br>
    <br>
    For issues relating to ensembl you can email their mailing lists<br>
    <a class="moz-txt-link-freetext" href="http://www.ensembl.org/info/about/contact/mailing.html">http://www.ensembl.org/info/about/contact/mailing.html</a><br>
    <br>
    Other people might know some others?<br>
    <br>
    andrea<br>
    <br>
    On 12/01/2011 16:10, shaabana abdo wrote:
    <blockquote
      cite="mid:AANLkTi=FA2=wXchAVPzEmGn3ZTA17fVbkgZ2ddB4-rTF@mail.gmail.com"
      type="cite">Dear collage <br>
       <br>
      I am a PhD.d student in university de Montreal Canada in
      biomedical science program ,<br>
       I would like to know some scientific Internet site to discuss in
      techniques and the problem of experiments. <br>
      like , if i have technical problem i will propose my problem and
      then i will get discussion with other, which they are expert more
      than me <br>
       <br>
      i wish you help me,<br>
       <br>
      with my best wishes <br>
       <br>
      shoby <br>
      <br>
      <div class="gmail_quote">On Wed, Jan 12, 2011 at 9:50 AM, Andrea
        Edwards <span dir="ltr"><<a moz-do-not-send="true"
            href="mailto:edwardsa@cs.man.ac.uk">edwardsa@cs.man.ac.uk</a>></span>
        wrote:<br>
        <blockquote style="border-left: 1px solid rgb(204, 204, 204);
          margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"
          class="gmail_quote">Hello<br>
          <br>
          When i was looking at just exons I used to use exactly the
          same approach as Alison. Now i want to annotate my snps to
          store their relationships to the exons / genes / transcripts
          they affect I have decided to approach the problem from the
          other side as it were.<br>
          <br>
          Pablo, the idea about flattening the data once per release is
          brilliant. I shall defininitely adopt that approach in the
          long term. Would you be willing to post your script to the
          group? I'm glad I asked now. I bet flattening the data takes
          hours off the run time?<br>
          <br>
          Many thanks :)
          <div>
            <div class="h5"><br>
              <br>
              <br>
              <br>
              <br>
              On 12/01/2011 10:36, Pablo Marin-Garcia wrote:<br>
              <blockquote style="border-left: 1px solid rgb(204, 204,
                204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"
                class="gmail_quote">On Wed, 12 Jan 2011, Alison Meynert
                wrote:<br>
                <br>
                <blockquote style="border-left: 1px solid rgb(204, 204,
                  204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"
                  class="gmail_quote">Hi Andrea,<br>
                  <br>
                  I have the same issue with mapping lots of SNPs to
                  Ensembl features. My solution, at least for exons or
                  other reasonably small sets, is to invert the problem
                  by iterating over all features and asking which of my
                  SNPs overlap a given feature. Also, I have a local
                  installation of some Ensembl databases, which really
                  helps.<br>
                  <br>
                  To do the comparison, I load all my SNPs into a MySQL
                  database table indexed on the sequence
                  region/chromosome and position fields, and query that
                  table with given feature coordinates. Example Perl
                  code below.<br>
                  <br>
                  I'd be interested to hear if anyone has more
                  efficient/faster solutions as this is something that
                  seems good enough for my purposes at the moment but
                  may not scale well.<br>
                </blockquote>
                <br>
                <br>
                Hello Alison,<br>
                <br>
                I follow a similar approach when I want to describe all
                snps in a gene, but also when my starting point, like
                Andrea, are SNPs (usually a sparse set in my case) not
                genes I do the inverse than you, because a SNP could lay
                in several exons or genes so is more natural for me
                going the other way around.<br>
                <br>
                For each ensembl release I run a script that loop all
                genes-transcript-exons and flattens the data by exons
                and then load the tab file to mysql into a table
                [exon_id exon_start exon_end transcript_id gen_id
                gen_name] being the exon_id-transcript_id the uniq key,
                and exon_start, exon_end, and gene_id being indexed.<br>
                <br>
                That way I can quickly find which genes, transcript,
                exons overlap my SNP. And also I am able to find
                exons/genes n-bases nearby my SNP so I can look for
                other LD SNPs in this genes.<br>
                <br>
                So my scripts and SQL queries are basically like yours
                but starting the problem from the other end (looking
                which exons overlap the SNP).<br>
                <br>
                I think that for a whole genome approach this is faster
                than connecting directly to ensembl each time, and
                instantiating all the time ensembl objects. The drawback
                is that this is a specific solution, so you loose
                flexibility in order to gain efficiency, but this is how
                life is. Finally, as you said, hearing about other
                approaches is welcome.<br>
                <br>
                <br>
                  -Pablo<br>
                <br>
                <br>
                <blockquote style="border-left: 1px solid rgb(204, 204,
                  204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;"
                  class="gmail_quote"><br>
                  Cheers,<br>
                  Alison<br>
                  <br>
                  # connect to the database<br>
                  my $dbh =
                  DBI->connect("DBI:mysql:database=$dbname;host=$hostname",
                  $user, $pass) or die "Cannot connect to database\n$!";<br>
                  <br>
                  # set up the select statement<br>
                  my $sth = $dbh->prepare(SELECT * FROM snp WHERE
                  seq_region = ? AND pos >= ? AND pos <= ?);<br>
                  <br>
                  # iterate over exons or other features<br>
                  my $exons = $exon_adaptor->fetch_all ... ;<br>
                  while (my $exon = shift @{ $exons })<br>
                  {<br>
                    # select SNPs<br>
                    $sth->execute($exon->seq_region_name,
                  $exon->start, $exon->end);<br>
                    while (my $ref = $sth->fetchrow_hashref())<br>
                    {<br>
                        # do something with SNPs<br>
                        my $snp_id = $ref->{'snp_id'};<br>
                        ...<br>
                    }<br>
                  }<br>
                  <br>
                  On 11/01/2011 21:14, Andrea Edwards wrote:<br>
                  <blockquote style="border-left: 1px solid rgb(204,
                    204, 204); margin: 0px 0px 0px 0.8ex; padding-left:
                    1ex;" class="gmail_quote">Sorry, missed the last bit
                    off my last message. This was what i did but<br>
                    was wondering if there was more efficient way as I
                    have such a lot of data.<br>
                    I have just given an example for one locus as i'm
                    sure you can imagine<br>
                    this code will be looped millions of times<br>
                    <br>
                    $slice = $slice_adaptor->fetch_by_region(
                    'chromosome', '9', 21816758,<br>
                    21816758 );<br>
                    $exons = $slice->get_all_Exons;<br>
                    while (my $exon = shift @{$exons}) {<br>
                    <br>
                    my $gene =
                    $gene_adaptor->fetch_by_exon_stable_id($exon->stable_id);<br>
                    #process gene<br>
                    my $transcripts =
                    $transcript_adaptor->fetch_all_by_exon_stable_id<br>
                    ($exon->stable_id);<br>
                    <br>
                    while (my $transcript = shift @{$transcripts}) {<br>
                    #process transcript<br>
                    }<br>
                    <br>
                    }<br>
                    <br>
                    <br>
                    <br>
                    <br>
                    On 11/01/2011 19:38, Andrea Edwards wrote:<br>
                    <blockquote style="border-left: 1px solid rgb(204,
                      204, 204); margin: 0px 0px 0px 0.8ex;
                      padding-left: 1ex;" class="gmail_quote">Hello<br>
                      <br>
                      i have this code below taken from the core api
                      tutorial which gets me<br>
                      all the exons and transcripts for the gene(s) that
                      overlap a slice.<br>
                      <br>
                      I was hoping for an easy way to get those features
                      of the gene that<br>
                      only overlap the original one bp slice; this code
                      gets all exons and<br>
                      transcripts<br>
                      associated with the gene<br>
                      <br>
                      I thought you might be able to call
                      'get_all_Object' methods with a<br>
                      parameter which represents a region of sequence
                      overlap but it seems not.<br>
                      I also thought they might be filtered
                      automatically based on the<br>
                      underlying slice but it seems not.<br>
                      <br>
                      Naturally i can filter the features in the list
                      based on their start<br>
                      and end positions but for speed it would be easier
                      not to retrieve<br>
                      them all at.<br>
                      I have a lot of data so speed is important. Please
                      can you advise the<br>
                      best way to do this.<br>
                      <br>
                      $slice = $slice_adaptor->fetch_by_region(
                      'chromosome', '9', 21816758,<br>
                      21816758 );<br>
                      <br>
                      my $genes = $slice->get_all_Genes();<br>
                      while ( my $gene = shift @{$genes} ) {<br>
                      my $gstring = feature2string($gene);<br>
                      print "$gstring\n";<br>
                      <br>
                      my $transcripts = $gene->get_all_Transcripts();<br>
                      while ( my $transcript = shift @{$transcripts} ) {<br>
                      my $tstring = feature2string($transcript);<br>
                      print "\t$tstring\n";<br>
                      <br>
                      foreach my $exon ( @{
                      $transcript->get_all_Exons() } ) {<br>
                      my $estring = feature2string($exon);<br>
                      print "\t\t$estring\n";<br>
                      }<br>
                      }<br>
                      }<br>
                      <br>
                      print "done\n";<br>
                      <br>
                      Many thanks<br>
                      <br>
                      _______________________________________________<br>
                      Dev mailing list<br>
                      <a moz-do-not-send="true"
                        href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
                      <a moz-do-not-send="true"
                        href="http://lists.ensembl.org/mailman/listinfo/dev"
                        target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
                    </blockquote>
                    <br>
                    <br>
                    _______________________________________________<br>
                    Dev mailing list<br>
                    <a moz-do-not-send="true"
                      href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
                    <a moz-do-not-send="true"
                      href="http://lists.ensembl.org/mailman/listinfo/dev"
                      target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
                  </blockquote>
                  <br>
                  -- <br>
                  Alison Meynert<br>
                  MRC Human Genetics Unit, Edinburgh<br>
                  <a moz-do-not-send="true"
                    href="mailto:alison.meynert@hgu.mrc.ac.uk"
                    target="_blank">alison.meynert@hgu.mrc.ac.uk</a><br>
                  <br>
                  _______________________________________________<br>
                  Dev mailing list<br>
                  <a moz-do-not-send="true"
                    href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
                  <a moz-do-not-send="true"
                    href="http://lists.ensembl.org/mailman/listinfo/dev"
                    target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
                  <br>
                </blockquote>
                <br>
                <br>
                -----<br>
                <br>
                 Pablo Marin-Garcia<br>
                <br>
                <br>
                _______________________________________________<br>
                Dev mailing list<br>
                <a moz-do-not-send="true" href="mailto:Dev@ensembl.org"
                  target="_blank">Dev@ensembl.org</a><br>
                <a moz-do-not-send="true"
                  href="http://lists.ensembl.org/mailman/listinfo/dev"
                  target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
              </blockquote>
              <br>
              <br>
              _______________________________________________<br>
              Dev mailing list<br>
              <a moz-do-not-send="true" href="mailto:Dev@ensembl.org"
                target="_blank">Dev@ensembl.org</a><br>
              <a moz-do-not-send="true"
                href="http://lists.ensembl.org/mailman/listinfo/dev"
                target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </body>
</html>