<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi Paul,<br>
    <br>
    Thank you for reporting this.<br>
    We run genometools on all the generated GFF3 files, unfortunately it
    did not spot this error.<br>
    <br>
    The differences I can see between the GTF and GFF3 files are the
    following:<br>
    - the transcript is listed with a source ensembl_havana in the GFF3
    file, havana in the GTF files.<br>
    All children features have source havana.<br>
    This looks like a bug in our pipeline and we will aim to fix this
    for next release.<br>
    <br>
    - the transcript has type 'transcript' in the GTF file and type
    'NMD_transcript_variant' in the GFF3 file.<br>
    The GFF3 specifications allow us to use more fine-grained SO terms,
    as is the case here.<br>
    This should hopefully not break any downstream analyses, but it
    could be that some software do not use fully resolved sequence
    ontologies.<br>
    NMD_transcript_variant is a valid SO term:
    <a class="moz-txt-link-freetext" href="http://www.sequenceontology.org/browser/current_svn/term/SO:0001621">http://www.sequenceontology.org/browser/current_svn/term/SO:0001621</a><br>
    <br>
    As mentioned, we will aim to fix the source discrepancy for next
    release, hopefully this will solve the problem for you.<br>
    <br>
    From the link you mentioned, it would also seem like the developers
    for cufflinks have patched a workaround for these data issues.<br>
    <br>
    <br>
    Regards,<br>
    Magali<br>
    <br>
    <div class="moz-cite-prefix">On 26/10/2016 14:31, Paul Klemm wrote:<br>
    </div>
    <blockquote
cite="mid:CAM2E-uErXA-MQF5hMo+0wACi_vL2Yc39jh0uLdJu8jgTTAJnyg@mail.gmail.com"
      type="cite">
      <style>body{font-family:Helvetica,Arial;font-size:13px}</style>
      <div id="bloop_customfont"
style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">
        <p style="margin:15px 0px;font-family:'helvetica
          Neue',helvetica;font-size:14px">Hi dev@ensembl members,</p>
        <p style="margin:15px 0px;font-family:'helvetica
          Neue',helvetica;font-size:14px">I investigate a problem
          regarding cuffmerge in combination with ensembl GFF3 files and
          seek help in understanding the difference between the GFF3 and
          GTF files in ensembl.</p>
        <p style="margin:15px 0px;font-family:'helvetica
          Neue',helvetica;font-size:14px">I align RNA-Seq reads with
          HISAT2 to the reference genome and then derive the
          transcriptome by running cufflinks with the Mus.Musculus e.86
          release <em>GFF3</em> file. When I run cuffmerge on these
          files it fails with an error <code
            style="font-family:Menlo,Consolas,'Liberation
Mono',Courier,monospace;font-size:10pt;border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;background-color:rgb(248,248,248);color:inherit;border:1px
            solid rgb(234,234,234);margin:0px 2px;padding:0px
            5px;word-break:normal;word-wrap:normal">GFF Error:
            duplicate/invalid 'transcript' feature
            ID=transcript:ENSMUST00000045689</code>.</p>
        <p style="margin:15px 0px;font-family:'helvetica
          Neue',helvetica;font-size:14px">When I do the very same
          analysis with the Mus.Musculus e.86 release <em>GTF</em> file,
          everything runs fine. I investigated the<code
            style="font-family:Menlo,Consolas,'Liberation
Mono',Courier,monospace;font-size:10pt;border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;background-color:rgb(248,248,248);color:inherit;border:1px
            solid rgb(234,234,234);margin:0px 2px;padding:0px
            5px;word-break:normal;word-wrap:normal">ENSMUST00000045689</code> transcript
          and indeed found differences between the GTF and GFF3 file!
          This is potentially causing the problem in cufflinks.</p>
        <p style="margin:15px 0px;font-family:'helvetica
          Neue',helvetica;font-size:14px">The description of the
          difference and a fully functional minimal example can be found
          in this repository: <a moz-do-not-send="true"
            href="https://github.com/paulklemm/cuffmerge_bug"
style="color:rgb(65,131,196);background-color:inherit;text-decoration:none">https://github.com/paulklemm/cuffmerge_bug</a>.</p>
        <p style="margin:15px 0px;font-family:'helvetica
          Neue',helvetica;font-size:14px">My question is: Why is there a
          difference in the annotation between the GFF3 and GTF files? I
          thought that it is the same information just stored in
          different formats. That seems not to be the case.</p>
        <p style="margin:15px 0px;font-family:'helvetica
          Neue',helvetica;font-size:14px">I contributed the code to a
          recent bug report in the cufflinks repository, which describes
          this problem: <a moz-do-not-send="true"
            href="https://github.com/cole-trapnell-lab/cufflinks/issues/77"
style="color:rgb(65,131,196);background-color:inherit;text-decoration:none">https://github.com/cole-trapnell-lab/cufflinks/issues/77</a>.</p>
        <p style="margin:15px 0px;font-family:'helvetica
          Neue',helvetica;font-size:14px">Thanks for the help.</p>
        <p style="margin:15px 0px;font-family:'helvetica
          Neue',helvetica;font-size:14px">Paul</p>
      </div>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>