<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p class="MsoNormal">Dear all,</p>

    <p class="MsoNormal">Ensembl is reviewing how we provide access to

      pseudoautosomal regions (PAR) in human, and we'd like your input.

      <o:p></o:p></p>

    <p class="MsoNormal"><o:p></o:p>Please let us know how our current

      PAR strategy works for you. What do you like and what is

      difficult? If our current strategy (explained below) is not

      working for you, please let us know why. <span

        style="mso-spacerun:yes"> </span>(If it does work for you, we’d

      also like to know why!)<o:p></o:p> </p>

    <p class="MsoNormal"><o:p></o:p>The pseudoautosomal regions (PAR),

      where chromosome X and Y share homologous sequence, are defined by

      the GRC for human. In Ensembl, the full-length Y chromosome is

      displayed on our browser. However, within our core human database,

      the Y chromosome is divided into five regions:<o:p></o:p> </p>

    <p class="MsoNormal"><o:p></o:p>chromosome:GRCh38:Y:1:10000:1 unique

      to Y<o:p></o:p><br>

      chromosome:GRCh38:Y:10001:2781479:1 PAR1<o:p></o:p><br>

      chromosome:GRCh38:Y:2781480:56887902:1 unique to Y<o:p></o:p><br>

      chromosome:GRCh38:Y:56887903:57217415:1 PAR2<o:p></o:p><br>

      chromosome:GRCh38:Y:57217416:57227415:1 unique to Y<o:p></o:p> </p>

    <p class="MsoNormal"><o:p></o:p>We store sequence for only the three

      unique regions of Y in our database. The DNA for PAR1 and PAR2 are

      stored with the sequence for chromosome X. The full-length

      chromosome Y can be generated on-the-fly by our API, where we

      stitch in the shared PAR sequence from X.<o:p></o:p> </p>

    <p class="MsoNormal"><o:p></o:p>What does this mean practically?<o:p></o:p>

    </p>

    <p class="MsoNormal"><o:p></o:p>- chr Y PARs are identical (ie. same

      set of contigs in the same order) for human X and Y. When aligning

      data such as RNA-seq to the genome, masking out the PAR on Y

      avoids duplicate mappings. This impacts mapping scores.<o:p></o:p>

    </p>

    <p class="MsoNormal"><o:p></o:p>- Genes are only built on the X PAR

      regions only. (The genes you see in the Y PAR regions are

      identical to the genes annotated on X.)<o:p></o:p> </p>

    <p class="MsoNormal"><o:p></o:p>Accessing PARs on the Y chromosome:<o:p></o:p>

    </p>

    <p class="MsoNormal"><o:p></o:p>- Our Perl API provides a range of

      methods that allow you to fetch only the unique regions of Y or

      the whole of Y<o:p></o:p> </p>

    <p class="MsoNormal"><o:p></o:p>- Our websites (e.g. <a

        class="moz-txt-link-abbreviated" href="http://www.ensembl.org">www.ensembl.org</a>)

      show genomic sequence for chr Y PAR<o:p></o:p> </p>

    <p class="MsoNormal"><o:p></o:p>- Our FTP site provides a fasta file

      for the whole length of Y, with PARs replaced with Ns<o:p></o:p> </p>

    <p class="MsoNormal"><o:p></o:p>Do feedback any comments by

      responding to this post, or let us know what you think on helpdesk

      (at) ensembl.org<br>

    </p>

    <p class="MsoNormal">Best wishes,<br>

      Giulietta (Ensembl Outreach)<br>

    </p>

  </body>

</html>