<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hello</p>
    <p><br>
    </p>
    <p>Please use the link at the bottom of any dev email to
      unsubscribe.</p>
    <p><br>
    </p>
    <p>All the best</p>
    <p><br>
    </p>
    <p>Emily<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 07/06/2016 15:09, Stamboulian,
      Mouses Hrag wrote:<br>
    </div>
    <blockquote cite="mid:1465308595467.77232@indiana.edu" type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <style type="text/css" style="display:none"><!-- p { margin-top: 0px; margin-bottom: 0px; } @font-face { font-family: Calibri; } p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; } a:link, span.MsoHyperlink { color: blue; text-decoration: underline; } a:visited, span.MsoHyperlinkFollowed { color: purple; text-decoration: underline; } span.EmailStyle17 { font-family: Calibri, sans-serif; color: windowtext; } .MsoChpDefault { font-size: 10pt; } @page WordSection1 { margin: 1in; } div.WordSection1 { }--></style>
      <p>​unsubscribe<br>
      </p>
      <div style="color: rgb(33, 33, 33);">
        <hr tabindex="-1" style="display:inline-block; width:98%">
        <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
            face="Calibri, sans-serif" color="#000000"><b>From:</b>
            <a class="moz-txt-link-abbreviated" href="mailto:dev-bounces@ensembl.org">dev-bounces@ensembl.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:dev-bounces@ensembl.org"><dev-bounces@ensembl.org></a> on
            behalf of Taylor, Sean
            <a class="moz-txt-link-rfc2396E" href="mailto:Sean.Taylor@seattlechildrens.org"><Sean.Taylor@seattlechildrens.org></a><br>
            <b>Sent:</b> Tuesday, June 7, 2016 10:00 AM<br>
            <b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:dev@ensembl.org">dev@ensembl.org</a><br>
            <b>Subject:</b> [ensembl-dev] VEP upper limit on number of
            variations?</font>
          <div> </div>
        </div>
        <div>
          <div class="WordSection1">
            <p class="MsoNormal">Hello,</p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal">I have a library of about 93 million
              variants that I want to run VEP on. I have downloaded VEP
              version 84 onto a local linux machine running CentOS Linux
              release 7.1.1503. The machine has 125G and 40 cores @
              2.3GHz.  </p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal">My variant library is stored in an
              impala table in hdfs. To generate my input into VEP, I run
              a simple impala query as follows:</p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New""> </span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">impala -B -o ~/vep/knownvariants.tsv 
                --output_delimiter="\\t" -q "select contig, pos,
                pos+length, concat(ref,'\/',alt), '+' from
                ingest.central_variant_store where variant_type !=
                'COMPLEX'"</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New""> </span></p>
            <p class="MsoNormal">This writes the variants to a tab
              delimited file in ensembl format:</p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">10      57676885        57676886       
                C/G     +</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">7       34456697        34456698       
                A/G     +</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">8       62679252        62679253       
                G/A     +</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">7       9184853 9184854 C/A     +</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">3       29205854        29205855       
                C/T     +</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">10      42815272        42815273       
                C/T     +</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">8       117963405       117963406      
                C/T     +</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">12      53054550        53054551       
                C/T     +</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">6       105515195       105515196      
                T/C     +</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">20      665650  665651  G/C     +</span></p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal">The entire library is about 93 million.
              I filtered out the complex variants (i.e. CNVs) as my
              experience showed that these just threw warnings in VEP
              and remained unannotated. My idea was to break this output
              into smaller chunks for processing, ideally around 25M
              variants each. </p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal">I then feed each chunk into VEP using
              the following arguments:</p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">perlbrew switch 5.16.3</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New""> </span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">perlbrew exec perl
                ./variant_effect_predictor.pl --cache --offline
                --everything --json --buffer_size 25000000
                --force_overwrite --verbose -i knownvariants.tsv -o
                knownvariants.json 2>&1 | tee knownvariants.log</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New""> </span></p>
            <p class="MsoNormal">VEP will crunch on this for a while. It
              will usually clear the following steps:</p>
            <p class="MsoNormal">-load variants into buffer</p>
            <p class="MsoNormal">-check for existing variations</p>
            <p class="MsoNormal">-read transcript data from cache</p>
            <p class="MsoNormal">-read regulatory data from cache</p>
            <p class="MsoNormal">-begin analyzing chromosomes</p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal">Somewhere in the last step, VEP will
              crash with the following error message:</p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">Command [perl ./variant_effect_predictor.pl
                --cache --offline --everything --json --buffer_size
                25000000 --force_overwrite --verbose -i
                knownvariants.tsv -o knownvariants.json] terminated with
                exit code 0 ($? = 9) under the following perl
                environment:</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">Command terminated with non-zero status.</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">Current perl:</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">  Name: perl-5.16.3</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New""> Path: /opt/perl5/perls/perl-5.16.3/bin/perl</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">  Config: -de
                -Dprefix=/opt/perl5/perls/perl-5.16.3
                -Aeval:scriptdir=/opt/perl5/perls/perl-5.16.3/bin</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">  Compiled at: Apr 15 2016 06:06:22</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New""> </span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">perlbrew:</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">  version: 0.75</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">  ENV:</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">    PERLBREW_ROOT: /opt/perl5</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">    PERLBREW_HOME: </span>
            </p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">    PERLBREW_PATH:
                /opt/perl5/bin:/opt/perl5/perls/perl-5.16.3/bin</span></p>
            <p class="MsoNormal"><span style="font-family:"Courier
                New"">    PERLBREW_MANPATH:
                /opt/perl5/perls/perl-5.16.3/man</span></p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal">This exit code is not very specific so
              it is hard to know what is going on. Is it running out of
              memory? I lean away from that because it seems that I have
              seen specific error messages related to memory usage when
              that happens. I have tried inputting smaller numbers of
              variants such as 12.5M, 10M, 5M, 2.5M, 1M, 100K, and 10K.
              So far, I can generally execute just fine on up to 2.5M
              variants. Anything bigger and I get this same error
              message. I also got this once on a 2.5M run, only this
              time the program crashed after analysis of all the
              chromosomes. It was actively writing to the json output
              when it died.
            </p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal">This leads me to ask if there is a
              known upper limit to how many variants one can practically
              push through at a time? Or perhaps a timeout limit?
              Processing 93M in 2.5M chunks is a bit tedious. Any
              thoughts on how to improve or optimize this would be
              appreciated. I have attached the log files from several
              runs for reference.
            </p>
            <p class="MsoNormal"> </p>
            <p class="MsoNormal">Thanks,</p>
            <p class="MsoNormal">Sean Taylor</p>
          </div>
          CONFIDENTIALITY NOTICE: This e-mail message, including any
          attachments, is for the sole use of the intended recipient(s)
          and may contain confidential and privileged information
          protected by law. Any unauthorized review, use, disclosure or
          distribution is prohibited. If you are not the intended
          recipient, please contact the sender by reply e-mail and
          destroy all copies of the original message.
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Dr Emily Perry (Pritchard)
Ensembl Outreach Project Leader

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SD
UK </pre>
  </body>
</html>