<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Dear Alice,</p>

    <p>I've seen that you have sent this question to the lastz-users

      list and Bob replied with some tips.</p>

    <p>We don't get this error we get because we have engineered a

      workflow around LastZ that does what Bob suggests, i.e. the

      masking, the splitting with overlap and its post-processing, and

      some chaining.</p>

    <p>It's not easily usable by external people, though, because it

      heavily relies on an infrastructure that's built in Ensembl. But

      Bob gave some options to run those steps by yourself, which should

      address the issue.<br>

    </p>

    <p>Best,<br>

      Matthieu<br>

    </p>

    <div class="moz-cite-prefix">On 25/03/2020 11:36, Alice Iob wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:15D945A3598FF9419093FE22AE1DE435059D833C@mrna1.crag.local">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <style type="text/css" id="owaParaStyle"></style>

      <div style="direction: ltr;font-family: Tahoma;color:

        #000000;font-size: 10pt;">

        <div>Good morning,</div>

        <div><br>

        </div>

        <div>I am a Phd student woking on plant genomics and I am

          stuggling with an issue regarding LastZ.</div>

        <div>I choose to use LastZ because it was used for plant genome

          alignments in Ensembl Plants, so I hope that someone who

          succesfully used it before can help me with this. </div>

        <div> </div>

        <div>I am trying to align two references genomes from very close

          species: I have two FASTA files, </div>

        <div>representing the same chromosome in the two species, each

          around 800Mb long, with at least one long repetitive region. </div>

        <div><br>

        </div>

        <div>the command I am using:  </div>

        <div><br>

        </div>

        <div>lastz target.fasta query.fasta --notransition --step=20

          --maxwordcount=70 ‑‑exact=20 --chain --gapped

          --ambiguous=iupac --rdotplot=plot --format=differences >

          alignment.differences</div>

        <div><br>

        </div>

        <div>I always get the same error:</div>

        <div><br>

        </div>

        <div>FAILURE: in add_segment()</div>

        <div>table size (4,869,542,152 for 101,448,794 segments) exceeds

          allocation limit of 4,294,967,279;</div>

        <div>consider raising scoring threshold (--hspthresh or --exact)

          or breaking your target sequence into smaller pieces.</div>

        <div><br>

        </div>

        <div>I tried several strategies to overcome this issue:</div>

        <div>increasing values of exact up to 100</div>

        <div>using values of hspthresh up to 10 000</div>

        <div>adding --seed=match12</div>

        <div>dividing my target sequence in two (one multiFASTA file)</div>

        <div>working with just half chromosome (400Mb)</div>

<div>set the parameters as they were set to align T.

            aestivum</i> and <i>A. tauschii</i> (<a

            href="https://plants.ensembl.org/mlss.html?mlss=9814"

            style="font-size: 10pt;" moz-do-not-send="true">https://plants.ensembl.org/mlss.html?mlss=9814</a><span

            style="font-size: 10pt;">)</span></div>

        <div><br>

        </div>

        <div>Still, I get the same error.</div>

        <div><br>

        </div>

        <div>Just a few times I was able to get an output (e.g. when

          exact=100), but it is always more than 700Gb big, thus, even

          if the file is generated, I run out of memory and I can not

          work on it. </div>

        <div><br>

        </div>

        <div>I also used LastZ_32 but the process gets killed without

          giving me any info.</div>

        <div><br>

        </div>

        <div>I was wondering if you can help me with this issue, maybe I

          am not using properly some of the options, or give me some

          advice on how to proberly deal with this alignment. </div>

        <div><br>

        </div>

        <div>Thank you.</div>

        <div><br>

          <div style="font-family:Tahoma; font-size:13px">

            <div style="font-family:Tahoma; font-size:13px">

              <div><span>Alice Iob<br>

                </span></div>

              <div><span>PhD student <br>

                </span></div>

              <div><span>Plant and Animal Genomics Program<br>

                </span></div>

              CRAG, Centre for Research in Agricultural Genomics<br>

              <div>

                <div><span></span><span>Campus UAB - CRAG Building |

                    08193 Cerdanyola | BARCELONA</span>

                  <span><br>

                  </span></div>

                <div><span>

                    <div>Office: 3.01</div>

                  </span>

<div>Tel.

                      +34 935636600 ext 3351

                    </span></div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>

Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org">https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org</a>

Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>

</pre>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

Matthieu Muffato, Ph.D.

Ensembl Compara Principal Developer

European Bioinformatics Institute (EMBL-EBI)

European Molecular Biology Laboratory

Wellcome Trust Genome Campus, Hinxton

Cambridge, CB10 1SD, United Kingdom

Room  A3-123

Phone + 44 (0) 1223 49 4631

Fax   + 44 (0) 1223 49 4468</pre>

  </body>

</html>