<div dir="ltr"><div class="gmail_quote"><div dir="ltr"><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px">Dear Ensembl developers,</p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px">I access Ensembl data via the Perl API and retrieve information on genes, transcripts etc. I have made the observation that if I get data from the database's gene table there are genes which occur twice, once on the X and once on the Y chromosome. This affects 45 human genes, for 34/45 genes the start and end positions on X and Y are identical.</p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px">Two examples:</p><table style="box-sizing:border-box;border-collapse:collapse;max-width:100%;color:rgb(51,51,51);font-family:sans-serif;font-size:13px;width:797px"><tbody style="box-sizing:border-box"><tr style="box-sizing:border-box"><th style="box-sizing:border-box;text-align:left">geneID</th><th style="box-sizing:border-box;text-align:left">biotype</th><th style="box-sizing:border-box;text-align:left">chromosome</th><th style="box-sizing:border-box;text-align:left">start</th><th style="box-sizing:border-box;text-align:left">end</th></tr><tr style="box-sizing:border-box"><td style="box-sizing:border-box">ENSG00000002586</td><td style="box-sizing:border-box">protein_coding</td><td style="box-sizing:border-box">X</td><td style="box-sizing:border-box">2691179</td><td style="box-sizing:border-box">2741309</td></tr><tr style="box-sizing:border-box"><td style="box-sizing:border-box">ENSG00000002586</td><td style="box-sizing:border-box">protein_coding</td><td style="box-sizing:border-box">Y</td><td style="box-sizing:border-box">2691179</td><td style="box-sizing:border-box">2741309</td></tr><tr style="box-sizing:border-box"><td style="box-sizing:border-box">ENSG00000124333</td><td style="box-sizing:border-box">protein_coding</td><td style="box-sizing:border-box">X</td><td style="box-sizing:border-box">155881293</td><td style="box-sizing:border-box">155943769</td></tr><tr style="box-sizing:border-box"><td style="box-sizing:border-box">ENSG00000124333</td><td style="box-sizing:border-box">protein_coding</td><td style="box-sizing:border-box">Y</td><td style="box-sizing:border-box">57067813</td><td style="box-sizing:border-box">57130289</td></tr></tbody></table><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px"></p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px">When querying some of these genes via the Ensembl website it turned out that they are mapped to pseudoautosomal regions (identical sequence on X and Y).</p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px"></p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px"><strong style="box-sizing:border-box">Some more information on how I retrieve the data:</strong></p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px"><span style="box-sizing:border-box">I use the API version 86.</span></p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px">To speed things up I iterate over chromosomes in parallel and retrieve all genes as follows:</p><pre class="m_8170721737082301188gmail-pre m_8170721737082301188gmail-language-bash" style="box-sizing:border-box;font-family:consolas,monaco,"andale mono","ubuntu mono",monospace;font-size:13px;padding:1em;margin-top:0.5em;margin-bottom:0.5em;line-height:1.5;color:black;word-break:normal;word-wrap:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgb(245,242,240);border:1px solid rgb(204,204,204);border-radius:4px;overflow:auto"><code class="m_8170721737082301188gmail-language-bash" style="box-sizing:border-box;font-family:consolas,monaco,"andale mono","ubuntu mono",monospace;font-size:inherit;padding:0px;background-image:none;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:initial;border-radius:0px;word-spacing:normal;word-break:normal;word-wrap:normal;line-height:1.5"><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$slice</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">=</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$slice_adaptor</span> -<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span> fetch_by_region<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'chromosome'</span>, <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$chr_name</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span>
my @genes <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">=</span> @<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">{</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$slice</span> -<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span> get_all_Genes<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">}</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span>
</code></pre><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px">So basically <span style="font-family:arial,sans-serif">ENSG00000124333 </span>is in @genes when querying information on X and when querying information on Y. If I, however, go via the gene I only get the X chromosome:</p><pre class="m_8170721737082301188gmail-pre m_8170721737082301188gmail-language-bash" style="box-sizing:border-box;font-family:consolas,monaco,"andale mono","ubuntu mono",monospace;font-size:13px;padding:1em;margin-top:0.5em;margin-bottom:0.5em;line-height:1.5;color:black;word-break:normal;word-wrap:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgb(245,242,240);border:1px solid rgb(204,204,204);border-radius:4px;overflow:auto"><code class="m_8170721737082301188gmail-language-bash" style="box-sizing:border-box;font-family:consolas,monaco,"andale mono","ubuntu mono",monospace;font-size:inherit;padding:0px;background-image:none;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:initial;border-radius:0px;word-spacing:normal;word-break:normal;word-wrap:normal;line-height:1.5">my <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$gene_adaptor</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">=</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$registry</span>-<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span>get_adaptor<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'Human'</span>, <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'Core'</span>, <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'Gene'</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span>

my <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$gene</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">=</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$gene_adaptor</span>-<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span>fetch_by_<wbr>stable_id<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'ENSG00000124333'</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span>

print <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$gene</span>-<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span>seq_region_name<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-comment" style="box-sizing:border-box;clear:left;color:slategray"># => X</span>
</code></pre><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px">On <a rel="nofollow" href="http://lists.ensembl.org/pipermail/dev/2010-October/000214.html" style="box-sizing:border-box;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:transparent;color:rgb(66,139,202);text-decoration:none" target="_blank">http://lists.ensembl.org/<wbr>pipermail/dev/2010-October/<wbr>000214.html</a> they say that a gene might exceed a pseudoautosomal region and thus extend into a region unique to the Y chromosome. This could be a reason why a gene shows up for X and Y. However, I checked this and there is no overlap between unique regions of Y and the gene coordinates. PAR-Coordinates from <a href="http://www.ensembl.org/info/genome/genebuild/assembly.html" target="_blank">http://www.ensembl.org/<wbr>info/genome/genebuild/<wbr>assembly.html</a> were used.</p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px"><strong style="box-sizing:border-box">Questions</strong></p><ul style="box-sizing:border-box;margin-top:0px;margin-bottom:10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px"><li style="box-sizing:border-box">How come the positions are identical for some of the genes?</li><li style="box-sizing:border-box">Why do I get these duplicate gene entries?</li><li style="box-sizing:border-box">How can I prevent this?</li></ul><div><font color="#333333" face="sans-serif"><br></font></div><div><font color="#333333" face="sans-serif">Thanks in advance and kind regards,</font></div><div><font color="#333333" face="sans-serif">Julia </font></div></div>
</div><br></div>