<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Paul,<br>
<br>
Thank you for reporting this.<br>
We run genometools on all the generated GFF3 files, unfortunately it
did not spot this error.<br>
<br>
The differences I can see between the GTF and GFF3 files are the
following:<br>
- the transcript is listed with a source ensembl_havana in the GFF3
file, havana in the GTF files.<br>
All children features have source havana.<br>
This looks like a bug in our pipeline and we will aim to fix this
for next release.<br>
<br>
- the transcript has type 'transcript' in the GTF file and type
'NMD_transcript_variant' in the GFF3 file.<br>
The GFF3 specifications allow us to use more fine-grained SO terms,
as is the case here.<br>
This should hopefully not break any downstream analyses, but it
could be that some software do not use fully resolved sequence
ontologies.<br>
NMD_transcript_variant is a valid SO term:
<a class="moz-txt-link-freetext" href="http://www.sequenceontology.org/browser/current_svn/term/SO:0001621">http://www.sequenceontology.org/browser/current_svn/term/SO:0001621</a><br>
<br>
As mentioned, we will aim to fix the source discrepancy for next
release, hopefully this will solve the problem for you.<br>
<br>
From the link you mentioned, it would also seem like the developers
for cufflinks have patched a workaround for these data issues.<br>
<br>
<br>
Regards,<br>
Magali<br>
<br>
<div class="moz-cite-prefix">On 26/10/2016 14:31, Paul Klemm wrote:<br>
</div>
<blockquote
cite="mid:CAM2E-uErXA-MQF5hMo+0wACi_vL2Yc39jh0uLdJu8jgTTAJnyg@mail.gmail.com"
type="cite">
<style>body{font-family:Helvetica,Arial;font-size:13px}</style>
<div id="bloop_customfont"
style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">
<p style="margin:15px 0px;font-family:'helvetica
Neue',helvetica;font-size:14px">Hi dev@ensembl members,</p>
<p style="margin:15px 0px;font-family:'helvetica
Neue',helvetica;font-size:14px">I investigate a problem
regarding cuffmerge in combination with ensembl GFF3 files and
seek help in understanding the difference between the GFF3 and
GTF files in ensembl.</p>
<p style="margin:15px 0px;font-family:'helvetica
Neue',helvetica;font-size:14px">I align RNA-Seq reads with
HISAT2 to the reference genome and then derive the
transcriptome by running cufflinks with the Mus.Musculus e.86
release <em>GFF3</em> file. When I run cuffmerge on these
files it fails with an error <code
style="font-family:Menlo,Consolas,'Liberation
Mono',Courier,monospace;font-size:10pt;border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;background-color:rgb(248,248,248);color:inherit;border:1px
solid rgb(234,234,234);margin:0px 2px;padding:0px
5px;word-break:normal;word-wrap:normal">GFF Error:
duplicate/invalid 'transcript' feature
ID=transcript:ENSMUST00000045689</code>.</p>
<p style="margin:15px 0px;font-family:'helvetica
Neue',helvetica;font-size:14px">When I do the very same
analysis with the Mus.Musculus e.86 release <em>GTF</em> file,
everything runs fine. I investigated the<code
style="font-family:Menlo,Consolas,'Liberation
Mono',Courier,monospace;font-size:10pt;border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;background-color:rgb(248,248,248);color:inherit;border:1px
solid rgb(234,234,234);margin:0px 2px;padding:0px
5px;word-break:normal;word-wrap:normal">ENSMUST00000045689</code> transcript
and indeed found differences between the GTF and GFF3 file!
This is potentially causing the problem in cufflinks.</p>
<p style="margin:15px 0px;font-family:'helvetica
Neue',helvetica;font-size:14px">The description of the
difference and a fully functional minimal example can be found
in this repository: <a moz-do-not-send="true"
href="https://github.com/paulklemm/cuffmerge_bug"
style="color:rgb(65,131,196);background-color:inherit;text-decoration:none">https://github.com/paulklemm/cuffmerge_bug</a>.</p>
<p style="margin:15px 0px;font-family:'helvetica
Neue',helvetica;font-size:14px">My question is: Why is there a
difference in the annotation between the GFF3 and GTF files? I
thought that it is the same information just stored in
different formats. That seems not to be the case.</p>
<p style="margin:15px 0px;font-family:'helvetica
Neue',helvetica;font-size:14px">I contributed the code to a
recent bug report in the cufflinks repository, which describes
this problem: <a moz-do-not-send="true"
href="https://github.com/cole-trapnell-lab/cufflinks/issues/77"
style="color:rgb(65,131,196);background-color:inherit;text-decoration:none">https://github.com/cole-trapnell-lab/cufflinks/issues/77</a>.</p>
<p style="margin:15px 0px;font-family:'helvetica
Neue',helvetica;font-size:14px">Thanks for the help.</p>
<p style="margin:15px 0px;font-family:'helvetica
Neue',helvetica;font-size:14px">Paul</p>
</div>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
</body>
</html>