<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi Asier</p>
    <p>Thanks for bringing this up. We will look into what's going on
      and see if there is a bug, if the documentation needs improvement,
      or both.</p>
    <p>Best<br>
      -Brandon<br>
    </p>
    <div class="moz-cite-prefix">On 07/06/2019 13:47, Asier Gonzalez
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:a7228afe-5355-bd36-8a93-7617f98e4eff@ebi.ac.uk">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <p>Hi all,</p>
      <p>I'm troubleshooting a Perl tool that calls the Ensembl API with
        a variant id and tries to find the gene with the closest 5' end
        within a 500 kb window. The tool was written by a colleague and
        it uses
        Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search()
        like this:</p>
      <pre style="background-color:#ffffff;color:#000000;font-family:'Menlo';font-size:9.0pt;"><span style="color:#000080;font-weight:bold;">my </span>@gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search( 
                                                   -FEATURE => $var_feature,
                                                   -RANGE => <span style="color:#0000ff;">10000</span>,
                                                   -MAX_RANGE => <span style="color:#0000ff;">500000</span>,
                                                   -LIMIT => <span style="color:#0000ff;">40</span>,
                                                   -FIVE_PRIME => <span style="color:#0000ff;">1</span>)};</pre>
      <p>According to the documentation of this function (<a
          class="moz-txt-link-freetext"
href="http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a"
          moz-do-not-send="true">http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a</a>),
        it "Searches for features within the suggested -RANGE, and if it
        finds none, expands the search area until it satisfies -LIMIT or
        hits -MAX_RANGE". My understanding is that in my case it should
        search first in a 10 kb window and, if there are no genes,
        progressively expand it to up to 500 kb unless it finds 40
        features before. However, this is not the behaviour I am seeing,
        the search range grows like this: 10k, 20k, 60k, 240k and 1.20M.
        Is this a bug or have I misundertood what it does?</p>
      <p>I have looked into the code of this subroutine (<a
          moz-do-not-send="true"
href="https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469">https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469</a>)
        and the search window growths exponentially because it
        multiplies the previous value instead of the initial value:</p>
      <p><span class="pl-smi">[L1452] $search_range</span> = <span
          class="pl-smi">$search_range</span> * <span class="pl-smi">$factor</span>;</p>
      <p>In addition, it is not true that it only expands the range if
        it does not find any features in the initial window, which is
        obvious from looking into the while statement:</p>
      <p>[L1451] <span class="pl-k">while</span> (<span class="pl-c1">scalar</span>
        <span class="pl-smi">@results</span> < <span class="pl-smi">$limit</span>
        && <span class="pl-smi">$search_range</span> <= <span
          class="pl-smi">$max_range</span>) {</p>
      <p>I am also confused by the fact that, apparently, the found
        features only need to be partially within the range. For
        instance, ENSG00000150394 (CDH8) is found with the above
        parameters although its 5' prime end is 1,338,771 bp away from
        the variant according to the distance reported by the function.
        So, it seems that the feature is found because its 3' end is
        within the range although the 5' prime end, which is what I am
        interested in, is not. This somehow contradicts what the
        documentation says (<a moz-do-not-send="true"
href="https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491">https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491</a>):
        "When looking beyond the boundaries of the source Feature, the
        distance is measured to the nearest end of that Feature to the
        nearby Feature's nearest end."</p>
      <p>Any help will be much appreciated. I am happy to share code if
        you think it would be useful.</p>
      <p>Thanks,<br>
        Asier<br>
      </p>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org">https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
    </blockquote>
  </body>
</html>