<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Jaques,<div class="">The GFF files reflect the data in our databases. The human gene set is a merge between the Ensembl set and the Havana set. The Havana set being manually curated, it has precedence over the Ensembl set and we do not modify the Havana set. The data in the GRCh37 databases will not be updated as GRCh38 is an updated and improved version of GRCh37.</div><div class=""><br class=""></div><div class="">In your third example, it is a gene annotated by Havana and the gene biotype is pseudogene and the transcript biotype is unprocessed_pseudogene. If you look in our latest release using GRCh38, you will see that the discrepancy has been corrected and the gene has the biotype unprocessed_pseudogene.</div><div class=""><br class=""></div><div class="">We are using SO terms to assign the 3rd column. In your fifth example, we use the misc_RNA biotype which is not a SO term. The correct SO term in this case would be ncRNA. This is something that our new exporters will be able to fix.</div><div class=""><br class=""></div><div class="">Thanks</div><div class="">Thibaut</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 31 Oct 2016, at 10:33, Anne Lyle <<a href="mailto:annelyle@ebi.ac.uk" class="">annelyle@ebi.ac.uk</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Jacques<div class=""><br class=""></div><div class="">Thanks for your input. We’re currently reworking our own parsers and exporters for all the common formats, including GFF3, so we’ll take your comments into consideration.</div><div class=""><br class=""></div><div class="">Cheers</div><div class=""><br class=""></div><div class="">Anne</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On 31 Oct 2016, at 10:01, Jacques Dainat <<a href="mailto:jacques.dainat@bils.se" class="">jacques.dainat@bils.se</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hi,</div><div class=""><br class=""></div><div class="">Dear all,</div><div class=""><br class=""></div><div class="">I'm working in an annotation service and we use the gff3 format as a central format for everything we are doing. So, we have scripts to check the gff3 and correct them if needed.</div><div class="">When I was working with the gff3 file Homo_sapiens.GRCh37.82.gff3 coming from your work, some peculiarities that I have never seen before popped up.</div><div class=""><br class=""></div><div class="">First of all I would like to explain quickly how we parse our data:</div><div class=""><br class=""></div><div class="">We usually parse our data paying strong attention about the “<b class="">type</b>” (3rd column) and sorting them in 3 levels structure:</div><div class="">Level1 => Features that do not have <b class="">Parent:</b> gene, pseudogene, lincrna_gene, mirna_gene etc.</div><div class="">Level2 => Features that have <b class="">Parent</b> and <b class="">Children:</b> mrna, trna, snorna, transcript, processed_pseudogene, etc.</div><div class="">Level3 => Features that have <b class="">Parent</b> but <b class="">no Children:</b> cds, exon, utr, tts, stop_codon,sig_peptide, etc.</div><div class=""><br class=""></div><div class="">This works quite fine with gff3 coming from many different sources, but when coming to parse your data it doesn’t work properly.</div><div class="">Indeed there is some inconsistency within the 3rd column and we cannot either use the biotype attribute of the 9th column that seems to vary within a same feature.</div><div class="">I would like to know if you can make an effort to homogenise those things for future data releases.</div><div class=""><br class=""></div><div class="">In order to better explain the thing, and argument it, here is some examples:</div><div class="">========================================================</div><div class=""><br class=""></div><div class="">************************************************</div><div class=""><b class="">    <u class="">FIRST</u></b>, examples where things are fine:</div><div class="">************************************************</div><div class=""><div class=""><div class="">###</div></div></div><div class=""><div class=""><div class="">1<span class="Apple-tab-span" style="white-space: pre;">     </span>ensembl<span class="Apple-tab-span" style="white-space: pre;">   </span>snRNA_gene<span class="Apple-tab-span" style="white-space: pre;">        </span>13384735<span class="Apple-tab-span" style="white-space: pre;">  </span>13384841<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=gene:ENSG00000207511;Name=RNU6-771P;biotype=snRNA;description=RNA%2C U6 small nuclear 771%2C pseudogene [Source:HGNC Symbol%3BAcc:47734];gene_id=ENSG00000207511;logic_name=ncrna;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;"> </span>ensembl<span class="Apple-tab-span" style="white-space: pre;">   </span>snRNA<span class="Apple-tab-span" style="white-space: pre;">     </span>13384735<span class="Apple-tab-span" style="white-space: pre;">  </span>13384841<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=transcript:ENST00000384780;Parent=gene:ENSG00000207511;Name=RNU6-771P-201;biotype=snRNA;tag=basic;transcript_id=ENST00000384780;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">   </span>ensembl<span class="Apple-tab-span" style="white-space: pre;">   </span>exon<span class="Apple-tab-span" style="white-space: pre;">      </span>13384735<span class="Apple-tab-span" style="white-space: pre;">  </span>13384841<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>Parent=transcript:ENST00000384780;Name=ENSE00001807562;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001807562;rank=1;version=1</div><div class="">###</div></div><div class=""><div class="">OK:<span class="Apple-tab-span" style="white-space:pre">    </span>Based on 3rd column: snRNA_gene <= snRNA <= exon</div><div class="">OK:<span class="Apple-tab-span" style="white-space:pre">        </span>biotype attribute: <span class="Apple-tab-span" style="white-space:pre">         </span>      snRNA <= snRNA</div></div><div class=""><br class=""></div><div class=""><div class="">###</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;"> </span>havana<span class="Apple-tab-span" style="white-space: pre;">    </span>lincRNA_gene<span class="Apple-tab-span" style="white-space: pre;">      </span>32814795<span class="Apple-tab-span" style="white-space: pre;">  </span>32816264<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=gene:ENSG00000233775;Name=RP4-811H24.9;biotype=lincRNA;gene_id=ENSG00000233775;logic_name=havana;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">  </span>havana<span class="Apple-tab-span" style="white-space: pre;">    </span>lincRNA<span class="Apple-tab-span" style="white-space: pre;">   </span>32814795<span class="Apple-tab-span" style="white-space: pre;">  </span>32816264<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=transcript:ENST00000448134;Parent=gene:ENSG00000233775;Name=RP4-811H24.9-001;biotype=lincRNA;havana_transcript=OTTHUMT00000020212;havana_version=3;tag=basic;transcript_id=ENST00000448134;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">        </span>havana<span class="Apple-tab-span" style="white-space: pre;">    </span>exon<span class="Apple-tab-span" style="white-space: pre;">      </span>32814795<span class="Apple-tab-span" style="white-space: pre;">  </span>32815422<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>Parent=transcript:ENST00000448134;Name=ENSE00001776478;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001776478;rank=2;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">   </span>havana<span class="Apple-tab-span" style="white-space: pre;">    </span>exon<span class="Apple-tab-span" style="white-space: pre;">      </span>32816206<span class="Apple-tab-span" style="white-space: pre;">  </span>32816264<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>Parent=transcript:ENST00000448134;Name=ENSE00001624254;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001624254;rank=1;version=1</div><div class="">###</div></div><div class=""><div class="">OK:<span class="Apple-tab-span" style="white-space:pre">    </span>Based on 3rd column:  lincRNA_gene <= lincRNA <= exon</div><div class="">OK:<span class="Apple-tab-span" style="white-space:pre">        </span>biotype attribute: <span class="Apple-tab-span" style="white-space:pre">         </span>       lincRNA <= lincRNA</div></div><div class=""><br class=""></div><div class=""><div class="">###</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">    </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>gene<span class="Apple-tab-span" style="white-space: pre;">      </span>13474689<span class="Apple-tab-span" style="white-space: pre;">  </span>13477522<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=gene:ENSG00000204491;Name=PRAMEF18;biotype=protein_coding;description=PRAME family member 18 [Source:HGNC Symbol%3BAcc:30693];gene_id=ENSG00000204491;logic_name=ensembl_havana_gene;version=2</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">      </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>transcript<span class="Apple-tab-span" style="white-space: pre;">        </span>13474689<span class="Apple-tab-span" style="white-space: pre;">  </span>13477522<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=transcript:ENST00000376126;Parent=gene:ENSG00000204491;Name=PRAMEF18-001;biotype=protein_coding;ccdsid=CCDS41258.1;havana_transcript=OTTHUMT00000008177;havana_version=2;tag=basic;transcript_id=ENST00000376126;version=2</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">  </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>exon<span class="Apple-tab-span" style="white-space: pre;">      </span>13474689<span class="Apple-tab-span" style="white-space: pre;">  </span>13475262<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>Parent=transcript:ENST00000376126;Name=ENSE00001592884;constitutive=1;ensembl_end_phase=-1;ensembl_phase=2;exon_id=ENSE00001592884;rank=3;version=2</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">    </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>CDS<span class="Apple-tab-span" style="white-space: pre;">       </span>13474689<span class="Apple-tab-span" style="white-space: pre;">  </span>13475262<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>1<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=CDS:ENSP00000365294;Parent=transcript:ENST00000376126;protein_id=ENSP00000365294</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">    </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>exon<span class="Apple-tab-span" style="white-space: pre;">      </span>13476271<span class="Apple-tab-span" style="white-space: pre;">  </span>13476849<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>Parent=transcript:ENST00000376126;Name=ENSE00001620306;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=ENSE00001620306;rank=2;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">     </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>CDS<span class="Apple-tab-span" style="white-space: pre;">       </span>13476271<span class="Apple-tab-span" style="white-space: pre;">  </span>13476849<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>1<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=CDS:ENSP00000365294;Parent=transcript:ENST00000376126;protein_id=ENSP00000365294</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">    </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>exon<span class="Apple-tab-span" style="white-space: pre;">      </span>13477236<span class="Apple-tab-span" style="white-space: pre;">  </span>13477522<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>Parent=transcript:ENST00000376126;Name=ENSE00003445415;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=ENSE00003445415;rank=1;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space: pre;">     </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>CDS<span class="Apple-tab-span" style="white-space: pre;">       </span>13477236<span class="Apple-tab-span" style="white-space: pre;">  </span>13477522<span class="Apple-tab-span" style="white-space: pre;">  </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>0<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=CDS:ENSP00000365294;Parent=transcript:ENST00000376126;protein_id=ENSP00000365294</div><div class="">###</div></div></div><div class=""><div class="">OK:<span class="Apple-tab-span" style="white-space:pre"> </span>Based on 3rd column:        gene <= transcript <= exon,CDS</div><div class="">OK:<span class="Apple-tab-span" style="white-space:pre">       </span>biotype attribute: protein_coding <= protein_coding</div></div><div class=""><br class=""></div><div class=""><div class="">************************************************</div><div class=""><b class="">    <u class="">SECOND</u></b>, 3rd column OK but biotype changes:</div><div class="">************************************************</div></div><div class=""><div class=""><br class=""></div><div class=""><div class="">###</div><div class="">10<span class="Apple-tab-span" style="white-space: pre;">  </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>pseudogene<span class="Apple-tab-span" style="white-space: pre;">        </span>112696380<span class="Apple-tab-span" style="white-space: pre;"> </span>112696991<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=gene:ENSG00000234118;Name=RPL13AP6;biotype=pseudogene;description=ribosomal protein L13a pseudogene 6 [Source:HGNC Symbol%3BAcc:23737];gene_id=ENSG00000234118;logic_name=ensembl_havana_gene;version=1</div><div class="">10<span class="Apple-tab-span" style="white-space: pre;">    </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>processed_pseudogene<span class="Apple-tab-span" style="white-space: pre;">      </span>112696380<span class="Apple-tab-span" style="white-space: pre;"> </span>112696991<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>ID=transcript:ENST00000430133;Parent=gene:ENSG00000234118;Name=RPL13AP6-001;biotype=processed_pseudogene;havana_transcript=OTTHUMT00000050371;havana_version=1;tag=basic;transcript_id=ENST00000430133;version=1</div><div class="">10<span class="Apple-tab-span" style="white-space: pre;">      </span>ensembl_havana<span class="Apple-tab-span" style="white-space: pre;">    </span>exon<span class="Apple-tab-span" style="white-space: pre;">      </span>112696380<span class="Apple-tab-span" style="white-space: pre;"> </span>112696991<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>.<span class="Apple-tab-span" style="white-space: pre;"> </span>Parent=transcript:ENST00000430133;Name=ENSE00002511651;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00002511651;rank=1;version=1</div><div class="">###</div></div><div class=""></div></div><div class=""><div class=""><div class="">OK:<span class="Apple-tab-span" style="white-space:pre">        </span>Based on 3rd column : pseudogene <= processed_pseudogene <= exon</div><div class=""><b class="">??:<span class="Apple-tab-span" style="white-space:pre"> </span></b>biotype attribute : <span class="Apple-tab-span" style="white-space:pre">      </span>     pseudogene <= processed_pseudogene</div></div></div><div class=""><br class=""></div><div class=""><b class="">Question1 :</b> Why the biotype change ? Will not be more coherent to have the processed_pseudogene biotype for the pseudogene feature too ?</div><div class=""><br class=""></div><div class=""><div class="">************************************************</div><div class=""><b class="">    <u class="">THIRD</u></b>, 3rd column does not change but biotype changes:</div><div class="">************************************************</div></div><div class=""><br class=""></div><div class=""><div class="">###</div><div class="">1<span class="Apple-tab-span" style="white-space:pre"> </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>pseudogene<span class="Apple-tab-span" style="white-space:pre">  </span>176241619<span class="Apple-tab-span" style="white-space:pre">   </span>176242538<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=gene:ENSG00000227815;Name=RP11-195C7.3;biotype=pseudogene;gene_id=ENSG00000227815;logic_name=havana;version=2</div><div class="">1<span class="Apple-tab-span" style="white-space:pre"> </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>pseudogene<span class="Apple-tab-span" style="white-space:pre">  </span>176241619<span class="Apple-tab-span" style="white-space:pre">   </span>176242538<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=transcript:ENST00000440296;Parent=gene:ENSG00000227815;Name=RP11-195C7.3-001;biotype=unprocessed_pseudogene;havana_transcript=OTTHUMT00000084685;havana_version=2;tag=basic;transcript_id=ENST00000440296;version=2</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">   </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>176241619<span class="Apple-tab-span" style="white-space:pre">   </span>176241675<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000440296;Name=ENSE00001660785;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001660785;rank=1;version=2</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">     </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>176241743<span class="Apple-tab-span" style="white-space:pre">   </span>176242168<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000440296;Name=ENSE00001739151;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001739151;rank=2;version=2</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">     </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>176242227<span class="Apple-tab-span" style="white-space:pre">   </span>176242538<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000440296;Name=ENSE00001773509;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001773509;rank=3;version=2</div><div class="">###</div></div><div class=""><div class=""><b class="">??:<span class="Apple-tab-span" style="white-space:pre">        </span></b>Based on 3rd column : <b class="">pseudogene <= pseudogene</b> <= exon</div><div class=""><b class="">??:<span class="Apple-tab-span" style="white-space:pre">   </span></b>biotype attribute :<span class="Apple-tab-span" style="white-space:pre">       </span>      pseudogene <= unprocessed_pseudogene</div></div><div class=""><br class=""></div><div class=""><b class="">Question2 :</b> Why the second feature is also a pseudogene ? Will not be more coherent to have a sort of subclass of pseudogene like unprocessed_pseudogene as you do for the biotype in that case ?</div><div class=""><span class="Apple-tab-span" style="white-space:pre">           </span>    As for the question1,  why the biotype change ? Will not be better to have unprocessed_pseudogene for the top feature too ?</div><div class=""><br class=""></div><div class=""><div class="">###</div><div class="">1<span class="Apple-tab-span" style="white-space:pre"> </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>pseudogene<span class="Apple-tab-span" style="white-space:pre">  </span>13411551<span class="Apple-tab-span" style="white-space:pre">    </span>13414482<span class="Apple-tab-span" style="white-space:pre">    </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=gene:ENSG00000237700;Name=RP11-219C24.6;biotype=pseudogene;gene_id=ENSG00000237700;logic_name=havana;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">        </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>pseudogene<span class="Apple-tab-span" style="white-space:pre">  </span>13411551<span class="Apple-tab-span" style="white-space:pre">    </span>13414482<span class="Apple-tab-span" style="white-space:pre">    </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=transcript:ENST00000437300;Parent=gene:ENSG00000237700;Name=RP11-219C24.6-001;biotype=unitary_pseudogene;havana_transcript=OTTHUMT00000022042;havana_version=1;tag=basic;transcript_id=ENST00000437300;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">      </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>13411551<span class="Apple-tab-span" style="white-space:pre">    </span>13411837<span class="Apple-tab-span" style="white-space:pre">    </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000437300;Name=ENSE00001677077;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001677077;rank=1;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">     </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>13412234<span class="Apple-tab-span" style="white-space:pre">    </span>13412812<span class="Apple-tab-span" style="white-space:pre">    </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000437300;Name=ENSE00001715540;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001715540;rank=2;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">     </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>13413924<span class="Apple-tab-span" style="white-space:pre">    </span>13414482<span class="Apple-tab-span" style="white-space:pre">    </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000437300;Name=ENSE00001784031;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001784031;rank=3;version=1</div><div class="">###</div></div><div class=""><div class=""><b class="">??:<span class="Apple-tab-span" style="white-space:pre">        </span></b>Based on 3rd column : <b class="">pseudogene <= pseudogene</b> <= exon</div><div class=""><b class="">??:<span class="Apple-tab-span" style="white-space:pre">   </span></b>biotype attribute :           pseudogene <= unitary_pseudogene</div></div><div class=""><br class=""></div><div class="">The same as question2 ...</div><div class=""><br class=""></div><div class=""><div class="">*************************************************</div><div class=""><b class="">    <u class="">FOURTH</u></b>, same feature used differently :</div><div class="">*************************************************</div></div><div class=""><br class=""></div><div class=""><div class="">### Locus1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">       </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>processed_transcript<span class="Apple-tab-span" style="white-space:pre">        </span>141474<span class="Apple-tab-span" style="white-space:pre">      </span>173862<span class="Apple-tab-span" style="white-space:pre">      </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>-<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=gene:ENSG00000241860;Name=RP11-34P13.13;biotype=processed_transcript;gene_id=ENSG00000241860;logic_name=havana;version=2</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">      </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>transcript<span class="Apple-tab-span" style="white-space:pre">  </span>141474<span class="Apple-tab-span" style="white-space:pre">      </span>149707<span class="Apple-tab-span" style="white-space:pre">      </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>-<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=transcript:ENST00000484859;Parent=gene:ENSG00000241860;Name=RP11-34P13.13-004;biotype=antisense;havana_transcript=OTTHUMT00000007035;havana_version=1;tag=basic;transcript_id=ENST00000484859;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">       </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>141474<span class="Apple-tab-span" style="white-space:pre">      </span>143011<span class="Apple-tab-span" style="white-space:pre">      </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>-<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000484859;Name=ENSE00001911218;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001911218;rank=2;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">     </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>146386<span class="Apple-tab-span" style="white-space:pre">      </span>149707<span class="Apple-tab-span" style="white-space:pre">      </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>-<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000484859;Name=ENSE00001860404;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00001860404;rank=1;version=1</div></div><div class=""><div class="">### Locus2</div></div><div class=""><div class="">1<span class="Apple-tab-span" style="white-space:pre">     </span>ensembl_havana<span class="Apple-tab-span" style="white-space:pre">      </span>pseudogene<span class="Apple-tab-span" style="white-space:pre">  </span>11869<span class="Apple-tab-span" style="white-space:pre">       </span>14412<span class="Apple-tab-span" style="white-space:pre">       </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=gene:ENSG00000223972;Name=DDX11L1;biotype=pseudogene;description=DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1 [Source:HGNC Symbol%3BAcc:37102];gene_id=ENSG00000223972;logic_name=ensembl_havana_gene;version=4</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">        </span>ensembl_havana<span class="Apple-tab-span" style="white-space:pre">      </span>processed_transcript<span class="Apple-tab-span" style="white-space:pre">        </span>11869<span class="Apple-tab-span" style="white-space:pre">       </span>14409<span class="Apple-tab-span" style="white-space:pre">       </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=transcript:ENST00000456328;Parent=gene:ENSG00000223972;Name=DDX11L1-002;biotype=processed_transcript;havana_transcript=OTTHUMT00000362751;havana_version=1;tag=basic;transcript_id=ENST00000456328;version=2</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">  </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>11869<span class="Apple-tab-span" style="white-space:pre">       </span>12227<span class="Apple-tab-span" style="white-space:pre">       </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000456328;Name=ENSE00002234944;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00002234944;rank=1;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">     </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>12613<span class="Apple-tab-span" style="white-space:pre">       </span>12721<span class="Apple-tab-span" style="white-space:pre">       </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000456328;Name=ENSE00003582793;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00003582793;rank=2;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">     </span>havana<span class="Apple-tab-span" style="white-space:pre">      </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>13221<span class="Apple-tab-span" style="white-space:pre">       </span>14409<span class="Apple-tab-span" style="white-space:pre">       </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>+<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000456328;Name=ENSE00002312635;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00002312635;rank=3;version=1</div></div><div class="">###</div><div class=""><div class=""><b class="">??</b> => For the first locus the processed_transcript is a <b class="">top level</b> feature (has a transcript child), then for the second locus the processed_transcript is a child of the pseudogene feature.</div></div><div class=""><b class=""><br class=""></b></div><div class=""><b class="">Question3: </b>Why a same feature (processed_transcript) does not follow the same schema ? In my sense, it must be always either a <b class="">top level </b>feature or a child of a top level feature.</div><div class=""><br class=""></div><div class=""><div class="">********************************</div><div class=""><b class="">    <u class="">FIFTH</u></b>, semantic choice :</div><div class="">********************************</div></div><div class=""><br class=""></div><div class=""><div class="">###</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">     </span>ensembl<span class="Apple-tab-span" style="white-space:pre">     </span>RNA<span class="Apple-tab-span" style="white-space:pre"> </span>1340841<span class="Apple-tab-span" style="white-space:pre">     </span>1341132<span class="Apple-tab-span" style="white-space:pre">     </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>-<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=gene:ENSG00000264293;Name=RN7SL657P;biotype=misc_RNA;description=RNA%2C 7SL%2C cytoplasmic 657%2C pseudogene [Source:HGNC Symbol%3BAcc:46673];gene_id=ENSG00000264293;logic_name=ncrna;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">      </span>ensembl<span class="Apple-tab-span" style="white-space:pre">     </span>transcript<span class="Apple-tab-span" style="white-space:pre">  </span>1340841<span class="Apple-tab-span" style="white-space:pre">     </span>1341132<span class="Apple-tab-span" style="white-space:pre">     </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>-<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>ID=transcript:ENST00000582431;Parent=gene:ENSG00000264293;Name=RN7SL657P-201;biotype=misc_RNA;tag=basic;transcript_id=ENST00000582431;version=1</div><div class="">1<span class="Apple-tab-span" style="white-space:pre">  </span>ensembl<span class="Apple-tab-span" style="white-space:pre">     </span>exon<span class="Apple-tab-span" style="white-space:pre">        </span>1340841<span class="Apple-tab-span" style="white-space:pre">     </span>1341132<span class="Apple-tab-span" style="white-space:pre">     </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>-<span class="Apple-tab-span" style="white-space:pre">   </span>.<span class="Apple-tab-span" style="white-space:pre">   </span>Parent=transcript:ENST00000582431;Name=ENSE00002720632;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSE00002720632;rank=1;version=1</div><div class="">###</div></div><div class="">??: Just a semantic choice but a RNA that has a transcript has child is a bit strange. </div><div class=""><br class=""></div><div class=""><b class="">Remark:</b> Change the transcript by RNA and have has top level feature something like RNA<b class="">_gene </b>as you do for all other RNA feature types would be great.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">So I would like some clarifications about  the choices you did when creating your gff3 annotation file. I would like to know as well if it’s possible for you to make the thing more consistent, it would be easier for everybody I guess.</div><div class="">Thanks in advance,</div><div class=""><br class=""></div><div class=""><div class="">Best regards,</div></div><div class=""><br class=""></div><div apple-content-edited="true" class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-position: normal; font-variant-caps: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-position: normal; font-variant-caps: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Jacques Dainat, PhD</div><div class="">NBIS (National Bioinformatics Infrastructure Sweden)</div><div class="">Genome Annotation Service</div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""></div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><u class="">Address:</u><span class="Apple-converted-space"> </span>(room E10:4204 - last floor)</div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Uppsala University, BMC<br class="">Department of Medical Biochemistry Microbiology, Genomics</div>Husargatan 3, box 582</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-position: normal; font-variant-caps: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">S-75123 Uppsala Sweden</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-position: normal; font-variant-caps: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Phone: 01 84 71 46 25</div></div></div></div></div></div></div></div></div></div>_______________________________________________<br class="">Dev mailing list    <a href="mailto:Dev@ensembl.org" class="">Dev@ensembl.org</a><br class="">Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" class="">http://lists.ensembl.org/mailman/listinfo/dev</a><br class="">Ensembl Blog: <a href="http://www.ensembl.info/" class="">http://www.ensembl.info/</a><br class=""></div></blockquote></div><br class=""></div></div>_______________________________________________<br class="">Dev mailing list    <a href="mailto:Dev@ensembl.org" class="">Dev@ensembl.org</a><br class="">Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" class="">http://lists.ensembl.org/mailman/listinfo/dev</a><br class="">Ensembl Blog: <a href="http://www.ensembl.info/" class="">http://www.ensembl.info/</a><br class=""></div></blockquote></div><br class=""></div></body></html>