<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hi all,</p>
    <p>I am bringing this old issue up again because I have found
      another behaviour of the function that I cannot understand.
      Variant <a moz-do-not-send="true"
href="https://www.ensembl.org/Homo_sapiens/Variation/Explore?r=9:7873518-7874518;v=rs112156520;vdb=variation;vf=693077069">rs112156520</a>
      overlaps one of the three transcripts of DMAC (ENSG00000137038)
      but calling fetch_by_outward_search with the following parameters
      does not return ENSG00000137038:</p>
    <pre style="background-color:#ffffff;color:#000000;font-family:'Menlo';font-size:9.0pt;"><span style="color:#000080;font-weight:bold;">my </span>@gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search( 
                                                   -FEATURE => $var_feature,
                                                   -RANGE => <span style="color:#0000ff;">10000</span>,
                                                   -MAX_RANGE => <span style="color:#0000ff;">500000</span>,
                                                   -LIMIT => <span style="color:#0000ff;">40</span>,
                                                   -FIVE_PRIME => <span style="color:#0000ff;">1</span>)};</pre>
    <p>The list of genes that it returns is:</p>
    <p>ENSG00000226376<br>
      ENSG00000231902<br>
      ENSG00000273056<br>
      ENSG00000178654<br>
      ENSG00000230207<br>
      ENSG00000153707</p>
    <p>Strangely enough, if I remove the option to measure the distance
      to the 5' end I get ENSG00000137038 in the response:<br>
    </p>
    <p>my @gene_list_for_feature  =
      @{$gene_adaptor->fetch_all_by_outward_search(<br>
                                                                
      -FEATURE => $var_feature,<br>
                                                                 -RANGE
      => 10000,<br>
                                                                
      -MAX_RANGE => 500000,<br>
                                                                 -LIMIT
      => 40)};<br>
    </p>
    <p>I have looked into the code, namely the <a
        moz-do-not-send="true"
href="https://github.com/Ensembl/ensembl/blob/release/99/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1507-L1571">fetch_all_nearest_by_Feature</a>
      called by fetch_by_outward_search but I have not been able to
      identify why "FIVE_PRIME" would affect whether the overlapping
      gene is returned or not.</p>
    <p>In addition, ENSG00000153707 is another case of a gene whose 5'
      end is outside the search range but is still returned. I think
      that when the "FIVE_PRIME" option is set to true this gene should
      not be returned because although the 3' end is within the 500 kb
      search window (440228 bp) the 5' end is 2738705 bp away from the
      feature. Do you have any thoughts about this?</p>
    <p>Thanks you,<br>
      Asier<br>
    </p>
    <pre style="background-color:#ffffff;color:#000000;font-family:'Menlo';font-size:9.0pt;"><span style="color:#000080;font-weight:bold;">
</span></pre>
    <p> </p>
    <div class="moz-cite-prefix">On 09/08/2019 13:46, Asier Gonzalez
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:fcc3ff32-96a3-2433-f0ac-32bd1d42395e@ebi.ac.uk">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>Hi Brandon,</p>
      <p>Apologies for the delay but I have finally had time to make the
        changes and open a PR (<a class="moz-txt-link-freetext"
          href="https://github.com/Ensembl/ensembl/pull/407"
          moz-do-not-send="true">https://github.com/Ensembl/ensembl/pull/407</a>).</p>
      <p>Please let me know should there be any issues. I can confirm
        that I have used my version of the code and it now works as
        expected.</p>
      <p>Best wishes,<br>
        Asier<br>
      </p>
      <div class="moz-cite-prefix">On 26/06/2019 11:35, Brandon Walts
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:6c456e1e-a812-ea04-3041-c50ae22fe1fc@ebi.ac.uk">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        <p>Hi Asier</p>
        <p>It's great to hear that you've devised a fix. We would
          welcome a PR against master, or to see your changes, whichever
          you'd prefer. We'll be able to look at it in the next few
          days.<br>
        </p>
        <p>Best<br>
          -Brandon<br>
        </p>
        <div class="moz-cite-prefix">On 26/06/2019 11:03, Asier Gonzalez
          wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:af0f2e81-0841-9f50-f5c0-01d9388d8c29@ebi.ac.uk">
          <meta http-equiv="Content-Type" content="text/html;
            charset=UTF-8">
          <p>Hi Brandon,</p>
          <p>Thank you for your response. Do you have an idea of when it
            could be fixed? I mean, are we talking about weeks or
            months? I use a tool that calls this function at least every
            two months so I have amended the code to do what I believe
            it is supposed to do. I could share it with you if it would
            help you, or I could open a PR if you accept them. I
            understand that you may have other priorities, but at least
            I want to make sure that the future version will do what
            mine already does.</p>
          <p>Best wishes,<br>
            Asier<br>
          </p>
          <div class="moz-cite-prefix">On 26/06/2019 10:56, Brandon
            Walts wrote:<br>
          </div>
          <blockquote type="cite"
            cite="mid:622c5207-60a9-136a-f420-c66c1e960068@ebi.ac.uk">
            <meta http-equiv="Content-Type" content="text/html;
              charset=UTF-8">
            <p>Hi Asier</p>
            <p>We've had a chance to look into it and you are correct,
              this function is not working as described. As currently
              implemented, it will return more results than expected.
              It's on our list to fix, and we plan to get to it in the
              near future.</p>
            <p>Best<br>
              -Brandon<br>
            </p>
            <div class="moz-cite-prefix">On 26/06/2019 09:51, Asier
              Gonzalez wrote:<br>
            </div>
            <blockquote type="cite"
              cite="mid:25469e48-d7ce-4e0c-69cb-db527f7c5331@ebi.ac.uk">
              <meta http-equiv="Content-Type" content="text/html;
                charset=UTF-8">
              <p>Hi Brandon,</p>
              <p>Do you have any updates about this?</p>
              <p>Thanks,<br>
                Asier<br>
              </p>
              <div class="moz-cite-prefix">On 07/06/2019 16:42, Brandon
                Walts wrote:<br>
              </div>
              <blockquote type="cite"
                cite="mid:66af20b2-39e9-701b-d208-f45fbcdc7e23@ebi.ac.uk">
                <meta http-equiv="Content-Type" content="text/html;
                  charset=UTF-8">
                <p>Hi Asier</p>
                <p>Thanks for bringing this up. We will look into what's
                  going on and see if there is a bug, if the
                  documentation needs improvement, or both.</p>
                <p>Best<br>
                  -Brandon<br>
                </p>
                <div class="moz-cite-prefix">On 07/06/2019 13:47, Asier
                  Gonzalez wrote:<br>
                </div>
                <blockquote type="cite"
                  cite="mid:a7228afe-5355-bd36-8a93-7617f98e4eff@ebi.ac.uk">
                  <meta http-equiv="content-type" content="text/html;
                    charset=UTF-8">
                  <p>Hi all,</p>
                  <p>I'm troubleshooting a Perl tool that calls the
                    Ensembl API with a variant id and tries to find the
                    gene with the closest 5' end within a 500 kb window.
                    The tool was written by a colleague and it uses
                    Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search()
                    like this:</p>
                  <pre style="background-color:#ffffff;color:#000000;font-family:'Menlo';font-size:9.0pt;"><span style="color:#000080;font-weight:bold;">my </span>@gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search( 
                                                   -FEATURE => $var_feature,
                                                   -RANGE => <span style="color:#0000ff;">10000</span>,
                                                   -MAX_RANGE => <span style="color:#0000ff;">500000</span>,
                                                   -LIMIT => <span style="color:#0000ff;">40</span>,
                                                   -FIVE_PRIME => <span style="color:#0000ff;">1</span>)};</pre>
                  <p>According to the documentation of this function (<a
                      class="moz-txt-link-freetext"
href="http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a"
                      moz-do-not-send="true">http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a</a>),
                    it "Searches for features within the suggested
                    -RANGE, and if it finds none, expands the search
                    area until it satisfies -LIMIT or hits -MAX_RANGE".
                    My understanding is that in my case it should search
                    first in a 10 kb window and, if there are no genes,
                    progressively expand it to up to 500 kb unless it
                    finds 40 features before. However, this is not the
                    behaviour I am seeing, the search range grows like
                    this: 10k, 20k, 60k, 240k and 1.20M. Is this a bug
                    or have I misundertood what it does?</p>
                  <p>I have looked into the code of this subroutine (<a
                      moz-do-not-send="true"
href="https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469">https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469</a>)
                    and the search window growths exponentially because
                    it multiplies the previous value instead of the
                    initial value:</p>
                  <p><span class="pl-smi">[L1452] $search_range</span> =
                    <span class="pl-smi">$search_range</span> * <span
                      class="pl-smi">$factor</span>;</p>
                  <p>In addition, it is not true that it only expands
                    the range if it does not find any features in the
                    initial window, which is obvious from looking into
                    the while statement:</p>
                  <p>[L1451] <span class="pl-k">while</span> (<span
                      class="pl-c1">scalar</span> <span class="pl-smi">@results</span>
                    < <span class="pl-smi">$limit</span> &&
                    <span class="pl-smi">$search_range</span> <= <span
                      class="pl-smi">$max_range</span>) {</p>
                  <p>I am also confused by the fact that, apparently,
                    the found features only need to be partially within
                    the range. For instance, ENSG00000150394 (CDH8) is
                    found with the above parameters although its 5'
                    prime end is 1,338,771 bp away from the variant
                    according to the distance reported by the function.
                    So, it seems that the feature is found because its
                    3' end is within the range although the 5' prime
                    end, which is what I am interested in, is not. This
                    somehow contradicts what the documentation says (<a
                      moz-do-not-send="true"
href="https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491">https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491</a>):
                    "When looking beyond the boundaries of the source
                    Feature, the distance is measured to the nearest end
                    of that Feature to the nearby Feature's nearest
                    end."</p>
                  <p>Any help will be much appreciated. I am happy to
                    share code if you think it would be useful.</p>
                  <p>Thanks,<br>
                    Asier<br>
                  </p>
                  <br>
                  <fieldset class="mimeAttachmentHeader"></fieldset>
                  <pre class="moz-quote-pre" wrap="">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org" moz-do-not-send="true">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org" moz-do-not-send="true">https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/" moz-do-not-send="true">http://www.ensembl.info/</a>
</pre>
                </blockquote>
              </blockquote>
            </blockquote>
          </blockquote>
        </blockquote>
      </blockquote>
    </blockquote>
  </body>
</html>