<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Thanks Julien,</p>
    <p>To get more info about the biotypes use our rest endpoints and
      play with the parameters for filtering.<br>
    </p>
    <p><a class="moz-txt-link-freetext" href="http://rest.ensembl.org/info/biotypes/groups/?content-type=application/json">http://rest.ensembl.org/info/biotypes/groups/?content-type=application/json</a> 
      (list of available biotype groups)</p>
    <p><a class="moz-txt-link-freetext" href="http://rest.ensembl.org/info/biotypes/groups/coding?content-type=application/json">http://rest.ensembl.org/info/biotypes/groups/coding?content-type=application/json</a>
      (list within the coding group)</p>
    <p><a class="moz-txt-link-freetext" href="http://rest.ensembl.org/info/biotypes/groups/coding/gene?content-type=application/json">http://rest.ensembl.org/info/biotypes/groups/coding/gene?content-type=application/json</a>
      (list within the coding group for gene object type)</p>
    <p>To get more info about the transcripts, use our lookup endpoint.</p>
    <p>eg:<br>
    </p>
    <p><a class="moz-txt-link-freetext" href="http://rest.ensembl.org/lookup/id/">http://rest.ensembl.org/lookup/id/</a><b>FBtr0307760</b>?content-type=application/json</p>
    <p><span style="color: rgb(0, 0, 0); font-family: monospace;
        font-size: medium; font-style: normal; font-variant-ligatures:
        normal; font-variant-caps: normal; font-weight: 400;
        letter-spacing: normal; orphans: 2; text-align: start;
        text-indent: 0px; text-transform: none; white-space: normal;
        widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
        text-decoration-style: initial; text-decoration-color: initial;
        display: inline !important; float: none;">{</span></p>
    <ul class="obj collapsible" style="-webkit-font-smoothing:
      antialiased; list-style-type: none; padding: 0px; margin: 0px 0px
      0px 2em; color: rgb(0, 0, 0); font-family: monospace; font-size:
      medium; font-style: normal; font-variant-ligatures: normal;
      font-variant-caps: normal; font-weight: 400; letter-spacing:
      normal; orphans: 2; text-align: start; text-indent: 0px;
      text-transform: none; white-space: normal; widows: 2;
      word-spacing: 0px; -webkit-text-stroke-width: 0px;
      text-decoration-style: initial; text-decoration-color: initial;">
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">Parent</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"FBgn0002781"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">display_name</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"mod(mdg4)-RAE"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">db_type</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"core"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">id</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"FBtr0307760"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">is_canonical</span>:<span> </span><span class="type-number" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: blue;">0</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">assembly_name</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"BDGP6"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">end</span>:<span> </span><span class="type-number" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: blue;">21377399</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">object_type</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"Transcript"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">species</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"drosophila_melanogaster"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">biotype</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"protein_coding"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">strand</span>:<span> </span><span class="type-number" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: blue;">-1</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">seq_region_name</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"3R"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">start</span>:<span> </span><span class="type-number" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: blue;">21375060</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">source</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"FlyBase"</span>,</div>
      </li>
      <li style="-webkit-font-smoothing: antialiased; position:
        relative;">
        <div class="hoverable" style="-webkit-font-smoothing:
          antialiased; transition: background-color 0.2s ease-out 0s;
          display: inline-block; padding: 1px 2px; border-radius: 2px;"><span class="property" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; font-weight: bold;">logic_name</span>:<span> </span><span class="type-string" style="-webkit-font-smoothing: antialiased; white-space: pre-wrap; color: green;">"flybase"</span></div>
      </li>
    </ul>
    <span style="color: rgb(0, 0, 0); font-family: monospace; font-size:
      medium; font-style: normal; font-variant-ligatures: normal;
      font-variant-caps: normal; font-weight: 400; letter-spacing:
      normal; orphans: 2; text-align: start; text-indent: 0px;
      text-transform: none; white-space: normal; widows: 2;
      word-spacing: 0px; -webkit-text-stroke-width: 0px;
      text-decoration-style: initial; text-decoration-color: initial;
      display: inline !important; float: none;">}</span>
    <p>You can see that the source is from 'Flybase'.</p>
    <p>Please note that our rest service is running currently on Ensembl
      release 94.
      (<a class="moz-txt-link-freetext" href="http://rest.ensembl.org/info/software?content-type=application/json">http://rest.ensembl.org/info/software?content-type=application/json</a>)<br>
    </p>
    <p>Hope it helps.</p>
    Thanks<br>
    Prem<br>
    <br>
    <div class="moz-cite-prefix">On 04/10/2018 11:54, Julien Wollbrett
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:5ec2a3c426414ab19f6d8dcc9b1d5d47@unil.ch">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <div class="moz-cite-prefix">Hello,<br>
        <br>
        Thank you Premanand for this extremely usefull answer.<br>
        <br>
        If I well understand, one fasta file is created using the <b>_patch_hapl_scaff.gtf.gz
        </b>file for mouse and human or the <b>gtf.gz</b> file for
        other species. This file is then splitted using gene biotypes to
        create cdna, cds, ncrna, and pep fasta files.<br>
        Do you know where I can find a list of all biotypes used to
        split to each fasta file ?<br>
        <br>
        In my previous email I described that I found 9 transcripts
        (FBtr0307759, FBtr0084079, FBtr0307760, FBtr0084081,
        FBtr0084085, FBtr0084083, FBtr0084084, FBtr0084080, FBtr0084082)
        that are present in the
        <b>cdna.all.fa.gz</b> file but not in the <b>gtf.gz</b> file
        for D. melanogaster (release 93). All these transcripts are from
        the same gene (FBgn0002781).<br>
        Do you know where the annotation of these transcripts comes from
        ?<br>
        <br>
        Thanks,<br>
        <br>
        Julien<br>
        <br>
        Le 03.10.18 à 16:39, Premanand Achuthan a écrit :<br>
      </div>
      <blockquote type="cite"
        cite="mid:20181003143900.9C35F230047_BB4D484B@hx-mx1.ebi.ac.uk">
        <p>Hi Julien</p>
        Apologies for the delay. This might possibly answer your
        previous and current questions.<br>
        <p>The split in the gtf file is based on the chromosomal
          regions.  Also please note that the “<b>Homo_sapiens.GRCh38.84.gtf</b>”
          file includes all the top level chromosomal regions (1..22,
          X,Y, MT) and also the scaffolds, as you can see below.<br>
        </p>
        cut -f1 Homo_sapiens.GRCh38.84.gtf | sort | uniq<br>
        <p>1<br>
          10<br>
          11<br>
          12<br>
          13<br>
          14<br>
          15<br>
          16<br>
          17<br>
          18<br>
          19<br>
          2<br>
          20<br>
          21<br>
          22<br>
          3<br>
          4<br>
          5<br>
          6<br>
          7<br>
          8<br>
          9<br>
          GL000008.2<br>
          GL000009.2<br>
          GL000194.1<br>
          GL000195.1<br>
          GL000205.2<br>
          GL000213.1<br>
          GL000216.2<br>
          GL000218.1<br>
          GL000219.1<br>
          GL000220.1<br>
          GL000224.1<br>
          GL000225.1<br>
          KI270442.1<br>
          KI270706.1<br>
          KI270707.1<br>
          KI270708.1<br>
          KI270711.1<br>
          KI270713.1<br>
          KI270714.1<br>
          KI270721.1<br>
          KI270722.1<br>
          KI270723.1<br>
          KI270724.1<br>
          KI270726.1<br>
          KI270727.1<br>
          KI270728.1<br>
          KI270731.1<br>
          KI270733.1<br>
          KI270734.1<br>
          KI270741.1<br>
          KI270743.1<br>
          KI270744.1<br>
          KI270750.1<br>
          KI270752.1<br>
          MT<br>
          X<br>
          Y</p>
        <p>Also, please note that the '<b>Homo_sapiens.GRCh38.84.chr.gtf.gz</b>'
          contains the features only from the primary assemblies<br>
        </p>
        <p>The '<b>Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz</b>'
          also contains the features from haplotypes and patches (for
          human and mouse only).
          <br>
        </p>
        <p>More info about haplotype and patches:<br>
          <a class="moz-txt-link-freetext"
href="https://www.ensembl.org/info/genome/genebuild/haplotypes_patches.html"
            moz-do-not-send="true">https://www.ensembl.org/info/genome/genebuild/haplotypes_patches.html</a></p>
        <p>- The split in the fasta file is based on the <b>biotype</b>.
          So we have cdna, cds, ncrna, pep etc., (<a
            class="moz-txt-link-freetext"
href="http://ftp.ensemblorg.ebi.ac.uk/pub/release-84/fasta/homo_sapiens/"
            moz-do-not-send="true">http://ftp.ensemblorg.ebi.ac.uk/pub/release-84/fasta/homo_sapiens/</a>)<br>
          If you are interested only in the primary assembly, then he
          should be using the chr.gtf file and ignore any accessions in
          cdna fasta not in gtf as the cdna fasta includes haplotypes.<br>
        </p>
        <p>Hope it helps. <br>
        </p>
        Thanks<br>
        Prem<br>
        <br>
        <p><br>
        </p>
        <br>
        <div class="moz-cite-prefix">On 03/10/2018 15:07, Julien
          Wollbrett wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:7dfe3bbd3326412396f32010538eb771@unil.ch">
          <p>Hello,</p>
          <p>As I do not receive answers of my previous email I will try
            to describe a bit more my goals and questions.</p>
          <p>I generated my own transcriptome fasta file using gtf from
            ensembl, genome fasta file from ensembl (same release) and a
            tool like gtf_to_fasta (TopHat) or gffread (Cufflinks). Once
            I did that I compared the generated file with the cdna
            transcriptome fasta file available at ensembl ftp.
            Unfortunately I found some differences between my
            transcriptome fasta file and the one provided by ensembl.
            That is why I tried to determine the origin of these
            differences.<br>
            All my tests have been run on different species (human, D.
            melanogaster, ...) and different releases (84 and 93)<br>
            <br>
            I used the approach described below to define differences :<br>
            - take all transcript_ids from transcriptome fasta file of
            ensembl<br>
            - take all transcript_ids from gtf file of ensembl<br>
            - detect number of transcript in common in both files and
            transcript specific to each file<br>
            - map transcript_id to the gtf annotation and detect
            gene_biotype associated to transcripts of each file.<br>
            <br>
            <b>results for human release 84:</b></p>
          <p>gtf file : <a class="moz-txt-link-freetext"
href="http://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.gtf.gz"
              moz-do-not-send="true">
http://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.gtf.gz</a><br>
            fasta file : <a class="moz-txt-link-freetext"
href="http://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz"
              moz-do-not-send="true">
http://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz</a><br>
          </p>
          <p>transcripts common in both files : 161150<br>
            transcripts present only in gtf : 38034<br>
            transcripts present only in fasta file : 15091<br>
            number of different gene biotypes for transcripts present in
            gtf: 44<br>
            number of different gene biotypes for transcripts in fasta
            file : 23<br>
            list of biotypes present only in gtf and their count :<br>
            <br>
            gene_biotype  freq<br>
            3prime_overlapping_ncrna    32<br>
            antisense 10183<br>
            bidirectional_promoter_lncrna     5<br>
            lincRNA 12648<br>
            macro_lncRNA     1<br>
            miRNA  4198<br>
            misc_RNA  2306<br>
            Mt_rRNA     2<br>
            Mt_tRNA    22<br>
            non_coding     3<br>
            processed_transcript  2760<br>
            ribozyme     8<br>
            rRNA   549<br>
            scaRNA    49<br>
            sense_intronic   978<br>
            sense_overlapping   334<br>
            snoRNA   961<br>
            snRNA  1905<br>
            sRNA    20<br>
            TEC  1069<br>
            vaultRNA     1<br>
            <br>
            All the 38034 transcripts present only in gtf have a
            gene_biotype not present anymore in ensembl transcriptome.
            <br>
          </p>
          <p><b>results for D. melanogaster release 93:</b></p>
          <p>gtf file : <a class="moz-txt-link-freetext"
href="http://ftp.ensembl.org/pub/release-93/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.93.gtf.gz"
              moz-do-not-send="true">
http://ftp.ensembl.org/pub/release-93/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.93.gtf.gz</a><br>
            fasta file : <a class="moz-txt-link-freetext"
href="http://ftp.ensembl.org/pub/release-93/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.cdna.all.fa.gz"
              moz-do-not-send="true">
http://ftp.ensembl.org/pub/release-93/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.cdna.all.fa.gz</a><br>
          </p>
          <p>transcripts in both files : 30819<br>
            transcripts present only in gtf : 3948<br>
            transcripts present only in fasta : 9<br>
            number of different gene biotypes for transcripts present in
            gtf: 8<br>
            number of different gene biotypes for transcripts present in
            fasta file : 2<br>
            list of biotypes present only in gtf and their count :</p>
          <p>gene_biotype    freq<br>
            ncRNA    2941<br>
            pre_miRNA    259<br>
            rRNA    115<br>
            snoRNA    289<br>
            snRNA    32<br>
            tRNA    312<br>
          </p>
          <p>All the 3948 transcripts present only in gtf have a
            gene_biotype not present anymore in ensembl transcriptome.
          </p>
          <p><br>
          </p>
          <p>Could someone please explain to me :</p>
          <p>    1. Why all the transcripts with these gene biotypes are
            removed during the creation of the transcriptome ?<br>
                2. Where do the transcripts present in the transcriptome
            fasta file but not in the gtf file (15091 in human, 9 in D.
            melanogaster) come from ?<br>
                3. How does the cdna transcriptome fasta file is
            generated ?<br>
                4. Should I generate my own transcriptome fasta file or
            take the ensembl cdna fasta file ?</p>
          <p>Sorry for such a long email.... and thank you for your
            answers.</p>
          <p>Best Regards,</p>
          <p><br>
          </p>
          <p>Julien Wollbrett<br>
          </p>
          <p><br>
          </p>
          <p><br>
          </p>
          <p><br>
          </p>
          <p><br>
          </p>
          <!--'"--><br>
          <fieldset class="mimeAttachmentHeader"></fieldset>
          <br>
          <pre wrap="">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org" moz-do-not-send="true">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev" moz-do-not-send="true">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/" moz-do-not-send="true">http://www.ensembl.info/</a>
</pre>
        </blockquote>
        <br>
      </blockquote>
      <p><br>
      </p>
    </blockquote>
    <br>
  </body>
</html>