<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Dear Alice,</p>
    <p>This post-processing is required to handle alignment blocks that
      will be present twice in the output file due to the splitting the
      input regions with some overlap. It simply consists in getting the
      coordinates of every block and removing the extra copies of blocks
      that have the same coordinates.</p>
    <p>We have computed an alignement of <i data-stringify-type="italic" style="box-sizing: inherit; color: rgb(209, 210, 211); font-family: Slack-Lato, appleLogo, sans-serif; font-size: 15px; font-variant-ligatures: common-ligatures; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(34, 37, 41); text-decoration-style: initial; text-decoration-color: initial;">Triticum turgidum subsp. durum</i><span style="color: rgb(209, 210, 211); font-family: Slack-Lato, appleLogo, sans-serif; font-size: 15px; font-style: normal; font-variant-ligatures: common-ligatures; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(34, 37, 41); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"> vs <i>Triticum dicoccoides</i> and it will be released in the next version of Ensembl (due in a couple of weeks I believe). I have attached a screenshot of the alignment statistics.</span></p>
    <p>Best regards,<br>
      Matthieu<br>
    </p>
    <div class="moz-cite-prefix">On 27/03/2020 10:29, Alice Iob wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:15D945A3598FF9419093FE22AE1DE435059DF147@mrna1.crag.local">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div style="direction: ltr;font-family: Tahoma;color:
        #000000;font-size: 10pt;">Dear Matthieu, 
        <div><br>
        </div>
        <div>Thank you very much for your swift reply. </div>
        <div><br>
        </div>
        <div>With post-processing do you mean generate a consensus
          alignment from sam files (if I split the query, align it and
          then set the output in sam)? or something else? </div>
        <div><br>
        </div>
        <div>Also, I was wondering if you are planning to release a
          whole genome alignment of
          <i>Triticum durum</i> and<i> T. dicoccoides</i> in the
          future. </div>
        <div><br>
        </div>
        <div>Thank you for your help.<br>
          <div><br>
            <div style="font-family:Tahoma; font-size:13px">
              <div style="font-family:Tahoma; font-size:13px">
                <div><span>Alice Iob<br>
                  </span></div>
                <div><span>PhD student <br>
                  </span></div>
                <div><span>Plant and Animal Genomics Program<br>
                  </span></div>
                CRAG, Centre for Research in Agricultural Genomics<br>
                <div>
                  <div><span></span><span>Campus UAB - CRAG Building |
                      08193 Cerdanyola | BARCELONA</span>
                    <span><br>
                    </span></div>
                  <div><span>
                      <div>Office: 3.01</div>
                    </span>
                    <div><span></span><span></span><span></span><span></span><span>Tel.
                        +34 935636600 ext 3351
                      </span></div>
                  </div>
                </div>
              </div>
            </div>
          </div>
          <div style="font-family: Times New Roman; color: #000000;
            font-size: 16px">
            <hr tabindex="-1">
            <div id="divRpF733480" style="direction: ltr;"><font
                size="2" face="Tahoma" color="#000000"><b>Da:</b>
                Matthieu Muffato [<a class="moz-txt-link-abbreviated" href="mailto:muffato@ebi.ac.uk">muffato@ebi.ac.uk</a>]<br>
                <b>Inviato:</b> giovedì 26 marzo 2020 10.10<br>
                <b>A:</b> Ensembl developers list; Alice Iob<br>
                <b>Oggetto:</b> Re: [ensembl-dev] whole chromosome
                alignment with LastZ<br>
              </font><br>
            </div>
            <div>
              <p>Dear Alice,</p>
              <p>I've seen that you have sent this question to the
                lastz-users list and Bob replied with some tips.</p>
              <p>We don't get this error we get because we have
                engineered a workflow around LastZ that does what Bob
                suggests, i.e. the masking, the splitting with overlap
                and its post-processing, and some chaining.</p>
              <p>It's not easily usable by external people, though,
                because it heavily relies on an infrastructure that's
                built in Ensembl. But Bob gave some options to run those
                steps by yourself, which should address the issue.<br>
              </p>
              <p>Best,<br>
                Matthieu<br>
              </p>
              <div class="moz-cite-prefix">On 25/03/2020 11:36, Alice
                Iob wrote:<br>
              </div>
              <blockquote type="cite">
                <style type="text/css" id="owaParaStyle"></style>
                <div style="direction:ltr; font-family:Tahoma;
                  color:#000000; font-size:10pt">
                  <div>Good morning,</div>
                  <div><br>
                  </div>
                  <div>I am a Phd student woking on plant genomics and I
                    am stuggling with an issue regarding LastZ.</div>
                  <div>I choose to use LastZ because it was used for
                    plant genome alignments in Ensembl Plants, so I hope
                    that someone who succesfully used it before can help
                    me with this. </div>
                  <div> </div>
                  <div>I am trying to align two references genomes from
                    very close species: I have two FASTA files, </div>
                  <div>representing the same chromosome in the two
                    species, each around 800Mb long, with at least one
                    long repetitive region. </div>
                  <div><br>
                  </div>
                  <div>the command I am using:  </div>
                  <div><br>
                  </div>
                  <div>lastz target.fasta query.fasta --notransition
                    --step=20 --maxwordcount=70 ‑‑exact=20 --chain
                    --gapped --ambiguous=iupac --rdotplot=plot
                    --format=differences > alignment.differences</div>
                  <div><br>
                  </div>
                  <div>I always get the same error:</div>
                  <div><br>
                  </div>
                  <div>FAILURE: in add_segment()</div>
                  <div>table size (4,869,542,152 for 101,448,794
                    segments) exceeds allocation limit of 4,294,967,279;</div>
                  <div>consider raising scoring threshold (--hspthresh
                    or --exact) or breaking your target sequence into
                    smaller pieces.</div>
                  <div><br>
                  </div>
                  <div>I tried several strategies to overcome this
                    issue:</div>
                  <div>increasing values of exact up to 100</div>
                  <div>using values of hspthresh up to 10 000</div>
                  <div>adding --seed=match12</div>
                  <div>dividing my target sequence in two (one
                    multiFASTA file)</div>
                  <div>working with just half chromosome (400Mb)</div>
                  <div>set the parameters as they were set to align <i>T.
                      aestivum</i> and <i>A. tauschii</i> (<a
                      href="https://plants.ensembl.org/mlss.html?mlss=9814"
                      style="font-size:10pt" target="_blank"
                      moz-do-not-send="true">https://plants.ensembl.org/mlss.html?mlss=9814</a><span
                      style="font-size:10pt">)</span></div>
                  <div><br>
                  </div>
                  <div>Still, I get the same error.</div>
                  <div><br>
                  </div>
                  <div>Just a few times I was able to get an output
                    (e.g. when exact=100), but it is always more than
                    700Gb big, thus, even if the file is generated, I
                    run out of memory and I can not work on it. </div>
                  <div><br>
                  </div>
                  <div>I also used LastZ_32 but the process gets killed
                    without giving me any info.</div>
                  <div><br>
                  </div>
                  <div>I was wondering if you can help me with this
                    issue, maybe I am not using properly some of the
                    options, or give me some advice on how to proberly
                    deal with this alignment. </div>
                  <div><br>
                  </div>
                  <div>Thank you.</div>
                  <div><br>
                    <div style="font-family:Tahoma; font-size:13px">
                      <div style="font-family:Tahoma; font-size:13px">
                        <div><span>Alice Iob<br>
                          </span></div>
                        <div><span>PhD student <br>
                          </span></div>
                        <div><span>Plant and Animal Genomics Program<br>
                          </span></div>
                        CRAG, Centre for Research in Agricultural
                        Genomics<br>
                        <div>
                          <div><span></span><span>Campus UAB - CRAG
                              Building | 08193 Cerdanyola | BARCELONA</span>
                            <span><br>
                            </span></div>
                          <div><span>
                              <div>Office: 3.01</div>
                            </span>
                            <div><span></span><span></span><span></span><span></span><span>Tel.
                                +34 935636600 ext 3351
                              </span></div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
                <br>
                <fieldset class="mimeAttachmentHeader" target="_blank"></fieldset>
                <pre class="moz-quote-pre">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org" target="_blank" moz-do-not-send="true">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org" target="_blank" moz-do-not-send="true">https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/" target="_blank" moz-do-not-send="true">http://www.ensembl.info/</a>
</pre>
              </blockquote>
              <pre class="moz-signature" cols="72">-- 
Matthieu Muffato, Ph.D.
Ensembl Compara Principal Developer
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-123
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468</pre>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
Matthieu Muffato, Ph.D.
Ensembl Compara Principal Developer
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-123
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468</pre>
  </body>
</html>