<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi Genomeo,<br>
    <br>
    <div class="moz-cite-prefix">On 20/02/2014 16:48, Genomeo Dev wrote:<br>
    </div>
    <blockquote
cite="mid:CAKry3c2Wcure2=sWmyvaC-jZRKP9C-QGGNB2Z4RynbfgdESM=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">Thanks Magali for the response. 
        <div><br>
        </div>
        <div>1) I don't quite understand these two points:</div>
        <div><br>
        </div>
        <div>'..<span
            style="font-family:arial,sans-serif;font-size:13px">As each
            entry is manually curated, not all ensembl mappings are
            necessarily available..'</span></div>
        <div><br>
        </div>
        <div>Is this due to the non-practicality of manually working
          with large ensembl IDs?</div>
      </div>
    </blockquote>
    Typically, an entry in HGNC is created once and updated when new
    information becomes available.<br>
    If no Ensembl locus was available at the time of creation, it might
    be missing the link we are looking for.<br>
    With the rate of ensembl releases, it would require HGNC to go
    through all entries every single release, which is where the manual
    vs automatic approach can be limited.<br>
    <blockquote
cite="mid:CAKry3c2Wcure2=sWmyvaC-jZRKP9C-QGGNB2Z4RynbfgdESM=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div><span style="font-family:arial,sans-serif;font-size:13px">'..
            What can also happen is that our ensembl stable ID changes
            between releases due to massive changes in the underlying
            sequence..</span><span
            style="font-family:arial,sans-serif;font-size:13px">We then
            feed those cases back to HGNC for them to update their
            records if they agree with the replacement.'</span><br>
        </div>
        <div><span style="font-family:arial,sans-serif;font-size:13px"><br>
          </span></div>
        <div>But if the sequence changes, it will change for both
          Ensembl and HGNC because they both refer to the same reference
          genome right? so why HGNC needs to agree?</div>
      </div>
    </blockquote>
    Sometimes, the sequence change is a simple addition of a large UTR
    3' end, so it seems quite obvious it is still the same locus.<br>
    In other cases, a gene was split into two, so there could be an
    argument as to which of the new genes should keep the locus name.<br>
    <br>
    <blockquote
cite="mid:CAKry3c2Wcure2=sWmyvaC-jZRKP9C-QGGNB2Z4RynbfgdESM=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>2) Based on the rest of your email, let me reiterate back
          my understanding and let me know if that is correct:
          <div><br>
          </div>
          <div>- Ensembl doesn't assign any of its IDs to HGNC IDs on
            its own judgement, rather it uses the manual assignments of
            HGNC DB and assignments done by Uniprot and RefSeq.<br>
          </div>
        </div>
      </div>
    </blockquote>
    Correct<br>
    <br>
    <blockquote
cite="mid:CAKry3c2Wcure2=sWmyvaC-jZRKP9C-QGGNB2Z4RynbfgdESM=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>
          </div>
          <div><br>
          </div>
          <div>- Because A) Ensembl updates these assignment each
            release and not more frequently, B) HGNC keep making more
            manual assignments continuously, and C) Ensembl IDs can
            change between releases, all these factors lead to
            differences between what is in Ensembl and HGNC in terms of
            mapping.</div>
        </div>
      </div>
    </blockquote>
    Correct as well.<br>
    <br>
    <blockquote
cite="mid:CAKry3c2Wcure2=sWmyvaC-jZRKP9C-QGGNB2Z4RynbfgdESM=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div><br>
          </div>
          <div>If this understanding is correct, I don't see how this
            will lead to Ensembl having greater coverage than HGNC.</div>
        </div>
      </div>
    </blockquote>
    Let's say we have an ensembl gene ENSG1 and an HGNC entry HGNC1.<br>
    HGNC1 has a mapping to Uniprot Uniprot1, but none to an Ensembl
    entry.<br>
    We are able to align Uniprot1 against ENSG1 and by proxy assign
    HGNC1 to that locus.<br>
    Looking for ENSG1 in HGNC will not return anything.<br>
    Looking for HGNC1 in Ensembl wil return ENSG1.<br>
    <br>
    <blockquote
cite="mid:CAKry3c2Wcure2=sWmyvaC-jZRKP9C-QGGNB2Z4RynbfgdESM=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div><br>
          </div>
          <div>G.</div>
          <div><br>
            <div><br>
            </div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On 20 February 2014 16:18, mag <span
            dir="ltr"><<a moz-do-not-send="true"
              href="mailto:mr6@ebi.ac.uk" target="_blank">mr6@ebi.ac.uk</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"> Hi Genomeo,<br>
              <br>
              HGNC data is manually curated, so HGNC curators check a
              locus and assign the corresponding ensembl entry.<br>
              As each entry is manually curated, not all ensembl
              mappings are necessarily available.<br>
              It does mean though that HGNC can be updated in
              permanence.<br>
              <br>
              In Ensembl, we typically update those mappings every
              release, as the human gene set is updated every release.<br>
              We assign HGNC IDs using direct mappings from HGNC.<br>
              These are complemented by indirect mappings, via Uniprot
              or RefSeq.<br>
              If a Uniprot entry is mapped to an ensembl entry and that
              same Uniprot entry is mapped in HGNC to an HGNC symbol,
              the HGNC symbol is assigned to the ensembl entry.<br>
              <br>
              So there are more HGNC-ensembl ID links in Ensembl than
              they are in HGNC.<br>
              <br>
              What can also happen is that our ensembl stable ID changes
              between releases due to massive changes in the underlying
              sequence.<br>
              For those cases, we will not be able to get the direct
              mapping from HGNC.<br>
              We might still be able to keep the same name for the gene
              thanks to the two-step mappings via RefSeq or Uniprot.<br>
              We then feed those cases back to HGNC for them to update
              their records if they agree with the replacement.<br>
              <br>
              I am unsure on how NCBI assigns mappings to Ensembl, they
              could be importing the mappings from us directly or
              generate their own mappings.<br>
              <br>
              I hope this answers most of your questions.<br>
              <br>
              <br>
              Regards,<br>
              Magali
              <div>
                <div class="h5"><br>
                  <br>
                  <div>On 20/02/2014 11:43, Genomeo Dev wrote:<br>
                  </div>
                </div>
              </div>
              <blockquote type="cite">
                <div>
                  <div class="h5">
                    <div dir="ltr">Hi,
                      <div><br>
                      </div>
                      <div>I have a set of ~ 6000 Ensembl IDs which I
                        want to map to HGNC IDs. I am faced with the
                        following situation:</div>
                      <div><br>
                      </div>
                      <div>Based on Ensembl Biomart or Ensembl Rest,
                        there are ~ 4000 of these that have HGNC IDs.</div>
                      <div><br>
                      </div>
                      <div>Based on HGNC biomart, there are ~ 3000 which
                        have HGNC IDs. HGNC DB mention that themselves
                        use mapping supplied by Ensembl.</div>
                      <div><br>
                      </div>
                      <div>The IDs mapped from each of these sources are
                        not always the same.</div>
                      <div><br>
                      </div>
                      <div>Questions:</div>
                      <div><br>
                      </div>
                      <div>- What is causing the different level of
                        coverage?</div>
                      <div>- What is causing the differences in specific
                        mapping if all of it is done by Esembl?</div>
                      <div>- How often does this mapping change at any
                        of these sources?</div>
                      <div>- How do other sources like NCBI assign
                        Ensembl IDs to their Entrez IDs?</div>
                      <div>- What is the best way of getting HGNC IDs
                        for Ensembl IDs? from Ensembl or HGNC DB?</div>
                      <div><br>
                      </div>
                      <div>Thanks!<br clear="all">
                        <div><br>
                        </div>
                        -- <br>
                        <div dir="ltr">G.</div>
                      </div>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                  </div>
                </div>
                <pre>_______________________________________________
Dev mailing list    <a moz-do-not-send="true" href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a moz-do-not-send="true" href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a moz-do-not-send="true" href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a>
</pre>
              </blockquote>
              <br>
            </div>
            <br>
            _______________________________________________<br>
            Dev mailing list    <a moz-do-not-send="true"
              href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
            Posting guidelines and subscribe/unsubscribe info: <a
              moz-do-not-send="true"
              href="http://lists.ensembl.org/mailman/listinfo/dev"
              target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
            Ensembl Blog: <a moz-do-not-send="true"
              href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
            <br>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        <div dir="ltr">G.</div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>