<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-forward-container">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">Hello Hans,<br>
<br>
Sorry about that, it's something we missed. This will be
corrected with the coming release of Ensembl Genomes, which will
be out around mid-march, so feel free to correct it on your side
in the meantime.<br>
Note that the next release of bread wheat will include an
updated gene set.<br>
<br>
Best regards,<br>
Arnaud<br>
<br>
On 22/02/2014 01:34, Hans Vasquez-Gross wrote:<br>
</div>
<blockquote
cite="mid:CAPvh4w3dCKVuj_2vFFUgF4Ga7SMocb92vqFebyoMYOCT8hDp3A@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>Hello,</div>
<div><br>
</div>
<div>I recently downloaded the MIPs GFF3 annotation provided
on your FTP for Triticum_aestivum.</div>
<div><br>
</div>
<div><a moz-do-not-send="true"
href="ftp://ftp.ensemblgenomes.org/pub/plants/release-21/gff3/triticum_aestivum/Triticum_aestivum.IWGSP1.21.gff3.gz">ftp://ftp.ensemblgenomes.org/pub/plants/release-21/gff3/triticum_aestivum/Triticum_aestivum.IWGSP1.21.gff3.gz</a><br>
</div>
<div><br>
</div>
<div dir="ltr">
<div>I tried running this file for visualization in a genome
browser, but it does not validate. There seems to be a
problem in the manner the ID= field in the 9th column is
setup. According to SO (<a moz-do-not-send="true"
href="http://www.sequenceontology.org/gff3.shtml">http://www.sequenceontology.org/gff3.shtml</a>),
the ID= in the 9th column MUST be unique. But currently,
all transcript/CDS/exon relationships have the same ID
collision issue which I'll explain below with the first
example problem.</div>
<div><br>
</div>
<div>If you take a look at lines 133-138 in the gff3 file,
you should see this:</div>
<div>
<div>##sequence-region IWGSC_CSS_3AS_scaff_369935 1 200</div>
<div>IWGSC_CSS_3AS_scaff_369935 ensembl
protein_coding_gene 1 200 . -
.
ID=Traes_3AS_775C097A2;biotype=protein_coding;logic_name=mips_taestivum</div>
<div>IWGSC_CSS_3AS_scaff_369935 ensembl transcript
1 200 . - .
ID=Traes_3AS_775C097A2.1;Parent=Traes_3AS_775C097A2;biotype=protein_coding;logic_name=mips_taestivum</div>
<div>IWGSC_CSS_3AS_scaff_369935 . CDS 1
198 . - 0
ID=Traes_3AS_775C097A2.1;Parent=Traes_3AS_775C097A2.1;rank=1</div>
<div>IWGSC_CSS_3AS_scaff_369935 . exon 1
200 . - .
ID=Traes_3AS_775C097A2.E1;Parent=Traes_3AS_775C097A2.1;constitutive=1;ensembl_phase=-1;rank=1</div>
<div>IWGSC_CSS_3AS_scaff_369935 .
five_prime_UTR 199 200 . - .
Parent=Traes_3AS_775C097A2.1;</div>
</div>
<div><br>
</div>
<div>The transcript and CDS definition have the exact same
ID defined "Traes_3AS_775C097A2.1" which is causing the
naming collision. You will also notice in the CDS
definition line, the ID= and Parent= are exactly the same.
The parent in this case is trying to refer to the
transcript ID, but the CDS has the same ID. </div>
<div><br>
</div>
<div>ProposedSolution:</div>
<div>Any CDS ID could have a "C" appended after the period.
For example, Traes_3AS_775C097A2.1 would become
Traes_3AS_775C097A2.C1. This is similar to what you are
doing for the Exons. The exon line would then have to be
updated with this new ID for the Parent= string. Then,
the new GFF3 block for this transcript definition would
be:</div>
<div><br>
<div>##sequence-region IWGSC_CSS_3AS_scaff_369935 1 200</div>
<div>IWGSC_CSS_3AS_scaff_369935 ensembl
protein_coding_gene 1 200 . -
.
ID=Traes_3AS_775C097A2;biotype=protein_coding;logic_name=mips_taestivum</div>
<div>IWGSC_CSS_3AS_scaff_369935 ensembl transcript
1 200 . - .
ID=Traes_3AS_775C097A2.1;Parent=Traes_3AS_775C097A2;biotype=protein_coding;logic_name=mips_taestivum</div>
<div>IWGSC_CSS_3AS_scaff_369935 . CDS 1
198 . - 0
ID=Traes_3AS_775C097A2.C1;Parent=Traes_3AS_775C097A2.1;rank=1</div>
<div>IWGSC_CSS_3AS_scaff_369935 . exon 1
200 . - .
ID=Traes_3AS_775C097A2.E1;Parent=Traes_3AS_775C097A2.C1;constitutive=1;ensembl_phase=-1;rank=1</div>
<div>IWGSC_CSS_3AS_scaff_369935 .
five_prime_UTR 199 200 . - .
Parent=Traes_3AS_775C097A2.1;</div>
</div>
<div><br>
</div>
<div>Would this be a fast fix on your side to regenerate the
data to be valid? If not, I'll write my own script next
week to fix the errors in the GFF3 file.</div>
<div><br>
</div>
<div>Cheers,</div>
<div>-Hans</div>
<div><br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Dev mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
<br>
</div>
<br>
</body>
</html>