<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Dear Julia,<div class=""><br class=""></div><div class="">I have just responded to your similar query on Biostars: <a href="https://www.biostars.org/p/225802/#226037" class="">https://www.biostars.org/p/225802/#226037</a></div><div class=""><br class=""></div><div class="">But to directly answer your three questions:</div><div class=""><br class=""></div><div class=""><div class="">-How come the positions are identical for some of the genes? </div><div class="">They have identical coordinates in some cases, as this is how the PARs are mapped between X & Y:</div><div class="">Y:10001-2781479 to X:10001-2781479</div><div class="">and</div><div class="">Y:56887903-57217415 to X:155701383-156030895</div><div class=""><br class=""></div><div class="">-Why do I get these duplicate gene entries?</div><div class="">You are seeing duplicates as these genes occur on both the X & Y chromosomes, in the PARs. </div><div class=""><br class=""></div><div class="">-How can I prevent this? </div><div class="">If you do not want to see these duplicate genes, but would rather see one copy, for example on the X chromosome, you can run your query to only search the Y chromosome outside of these coordinates which map to the X.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Let me know if you have further questions.</div><div class=""><br class=""></div><div class="">Best wishes,</div><div class=""><br class=""></div><div class="">Helen</div><div class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">_ _ </div><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""></div><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Helen Sparrow<br class="">Ensembl Outreach Officer<br class="">European Bioinformatics Institute (EMBL-EBI) <br class="">Wellcome Trust Genome Campus, Hinxton, Cambridge<br class="">CB10 1SD<br class="">UK<br class=""><br class=""><br class=""><br class=""></div></div>
</div>
<br class=""><div><blockquote type="cite" class=""><div class="">On 7 Dec 2016, at 13:33, Julia Söllner <<a href="mailto:julia.f.soellner@gmail.com" class="">julia.f.soellner@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_quote"><div dir="ltr" class=""><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class="">Dear Ensembl developers,</p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class="">I access Ensembl data via the Perl API and retrieve information on genes, transcripts etc. I have made the observation that if I get data from the database's gene table there are genes which occur twice, once on the X and once on the Y chromosome. This affects 45 human genes, for 34/45 genes the start and end positions on X and Y are identical.</p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class="">Two examples:</p><table style="box-sizing:border-box;border-collapse:collapse;max-width:100%;color:rgb(51,51,51);font-family:sans-serif;font-size:13px;width:797px" class=""><tbody style="box-sizing:border-box" class=""><tr style="box-sizing:border-box" class=""><th style="box-sizing:border-box;text-align:left" class="">geneID</th><th style="box-sizing:border-box;text-align:left" class="">biotype</th><th style="box-sizing:border-box;text-align:left" class="">chromosome</th><th style="box-sizing:border-box;text-align:left" class="">start</th><th style="box-sizing:border-box;text-align:left" class="">end</th></tr><tr style="box-sizing:border-box" class=""><td style="box-sizing:border-box" class="">ENSG00000002586</td><td style="box-sizing:border-box" class="">protein_coding</td><td style="box-sizing:border-box" class="">X</td><td style="box-sizing:border-box" class="">2691179</td><td style="box-sizing:border-box" class="">2741309</td></tr><tr style="box-sizing:border-box" class=""><td style="box-sizing:border-box" class="">ENSG00000002586</td><td style="box-sizing:border-box" class="">protein_coding</td><td style="box-sizing:border-box" class="">Y</td><td style="box-sizing:border-box" class="">2691179</td><td style="box-sizing:border-box" class="">2741309</td></tr><tr style="box-sizing:border-box" class=""><td style="box-sizing:border-box" class="">ENSG00000124333</td><td style="box-sizing:border-box" class="">protein_coding</td><td style="box-sizing:border-box" class="">X</td><td style="box-sizing:border-box" class="">155881293</td><td style="box-sizing:border-box" class="">155943769</td></tr><tr style="box-sizing:border-box" class=""><td style="box-sizing:border-box" class="">ENSG00000124333</td><td style="box-sizing:border-box" class="">protein_coding</td><td style="box-sizing:border-box" class="">Y</td><td style="box-sizing:border-box" class="">57067813</td><td style="box-sizing:border-box" class="">57130289</td></tr></tbody></table><div style="box-sizing: border-box; margin: 0px 0px 10px; color: rgb(51, 51, 51); font-family: sans-serif; font-size: 13px;" class=""><br class="webkit-block-placeholder"></div><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class="">When querying some of these genes via the Ensembl website it turned out that they are mapped to pseudoautosomal regions (identical sequence on X and Y).</p><div style="box-sizing: border-box; margin: 0px 0px 10px; color: rgb(51, 51, 51); font-family: sans-serif; font-size: 13px;" class=""><br class="webkit-block-placeholder"></div><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class=""><strong style="box-sizing:border-box" class="">Some more information on how I retrieve the data:</strong></p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class=""><span style="box-sizing:border-box" class="">I use the API version 86.</span></p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class="">To speed things up I iterate over chromosomes in parallel and retrieve all genes as follows:</p><pre class="m_8170721737082301188gmail-pre m_8170721737082301188gmail-language-bash" style="box-sizing: border-box; font-family: consolas, monaco, 'andale mono', 'ubuntu mono', monospace; font-size: 13px; padding: 1em; margin-top: 0.5em; margin-bottom: 0.5em; line-height: 1.5; word-break: normal; word-wrap: normal; background-color: rgb(245, 242, 240); border: 1px solid rgb(204, 204, 204); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow: auto; background-position: initial initial; background-repeat: initial initial;"><code class="m_8170721737082301188gmail-language-bash" style="box-sizing:border-box;font-family:consolas,monaco,"andale mono","ubuntu mono",monospace;font-size:inherit;padding:0px;background-image:none;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:initial;border-radius:0px;word-spacing:normal;word-break:normal;word-wrap:normal;line-height:1.5"><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$slice</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">=</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$slice_adaptor</span> -<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span> fetch_by_region<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'chromosome'</span>, <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$chr_name</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span>
my @genes <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">=</span> @<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">{</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$slice</span> -<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span> get_all_Genes<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">}</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span>
</code></pre><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class="">So basically <span style="font-family:arial,sans-serif" class="">ENSG00000124333 </span>is in @genes when querying information on X and when querying information on Y. If I, however, go via the gene I only get the X chromosome:</p><pre class="m_8170721737082301188gmail-pre m_8170721737082301188gmail-language-bash" style="box-sizing: border-box; font-family: consolas, monaco, 'andale mono', 'ubuntu mono', monospace; font-size: 13px; padding: 1em; margin-top: 0.5em; margin-bottom: 0.5em; line-height: 1.5; word-break: normal; word-wrap: normal; background-color: rgb(245, 242, 240); border: 1px solid rgb(204, 204, 204); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow: auto; background-position: initial initial; background-repeat: initial initial;"><code class="m_8170721737082301188gmail-language-bash" style="box-sizing:border-box;font-family:consolas,monaco,"andale mono","ubuntu mono",monospace;font-size:inherit;padding:0px;background-image:none;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:initial;border-radius:0px;word-spacing:normal;word-break:normal;word-wrap:normal;line-height:1.5">my <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$gene_adaptor</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">=</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$registry</span>-<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span>get_adaptor<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'Human'</span>, <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'Core'</span>, <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'Gene'</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span>

my <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$gene</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">=</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$gene_adaptor</span>-<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span>fetch_by_<wbr class="">stable_id<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-string" style="box-sizing:border-box;color:rgb(102,153,0)">'ENSG00000124333'</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span>

print <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-variable" style="box-sizing:border-box;color:rgb(238,153,0)">$gene</span>-<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-operator" style="box-sizing:border-box;color:rgb(166,127,89);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:rgba(255,255,255,0.498039)">></span>seq_region_name<span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">(</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">)</span><span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-punctuation" style="box-sizing:border-box;color:rgb(153,153,153)">;</span> <span class="m_8170721737082301188gmail-token m_8170721737082301188gmail-comment" style="box-sizing:border-box;clear:left;color:slategray"># => X</span>
</code></pre><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class="">On <a rel="nofollow" href="http://lists.ensembl.org/pipermail/dev/2010-October/000214.html" style="box-sizing:border-box;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;background-color:transparent;color:rgb(66,139,202);text-decoration:none" target="_blank" class="">http://lists.ensembl.org/<wbr class="">pipermail/dev/2010-October/<wbr class="">000214.html</a> they say that a gene might exceed a pseudoautosomal region and thus extend into a region unique to the Y chromosome. This could be a reason why a gene shows up for X and Y. However, I checked this and there is no overlap between unique regions of Y and the gene coordinates. PAR-Coordinates from <a href="http://www.ensembl.org/info/genome/genebuild/assembly.html" target="_blank" class="">http://www.ensembl.org/<wbr class="">info/genome/genebuild/<wbr class="">assembly.html</a> were used.</p><p style="box-sizing:border-box;margin:0px 0px 10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class=""><strong style="box-sizing:border-box" class="">Questions</strong></p><ul style="box-sizing:border-box;margin-top:0px;margin-bottom:10px;color:rgb(51,51,51);font-family:sans-serif;font-size:13px" class=""><li style="box-sizing:border-box" class="">How come the positions are identical for some of the genes?</li><li style="box-sizing:border-box" class="">Why do I get these duplicate gene entries?</li><li style="box-sizing:border-box" class="">How can I prevent this?</li></ul><div class=""><font color="#333333" face="sans-serif" class=""><br class=""></font></div><div class=""><font color="#333333" face="sans-serif" class="">Thanks in advance and kind regards,</font></div><div class=""><font color="#333333" face="sans-serif" class="">Julia </font></div></div>
</div><br class=""></div>
_______________________________________________<br class="">Dev mailing list    <a href="mailto:Dev@ensembl.org" class="">Dev@ensembl.org</a><br class="">Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" class="">http://lists.ensembl.org/mailman/listinfo/dev</a><br class="">Ensembl Blog: <a href="http://www.ensembl.info/" class="">http://www.ensembl.info/</a><br class=""></div></blockquote></div><br class=""></div></body></html>