<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Dear Herve,<br><br>I'm writing to you as a member of the Genome Reference Consortium, who also happens to be on ensembl-dev.<br><br>I think there's a misunderstandings here, since you are comparing coordinates for a genome issue report (HG-79) with coordinates for the patch that was applied to fix it (HG79_PATCH). Two closely related, but still different entities.<br><br>Genome issue HG-79 was reported on two neighbouring clones that didn't represent an existing haplotype, and their coordinates are documented at<a href="http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-79">http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-79</a>. The link on this page to Ensembl takes you to the exact location of the reported issue.<br><br>This issue was then investigated and a stretch of genomic sequence was released (HG79_PATCH) to fix the above issue. The coordinates of this fix patch aligning to the GRCh37 reference genome are reported in <a href="ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p5/PATCHES/alt_scaffolds/alt_scaffold_placement.txt">ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p5/PATCHES/alt_scaffolds/alt_scaffold_placement.txt</a>. <br><br>I take your point though that it would make sense to also report the details about the applied fix on the genome issue page. This is currently being implemented.<br><br>I hope this helps,<br><br>Kerstin<br><div><div><br></div><div><br></div><div><br></div><div>On 15 Nov 2011, at 20:02, Hervé Pagès wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>Thanks Susan for your very detailed answer!<br><br>I guess my confusion came from the fact that this page at NCBI<br><br><a href="http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-79">http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-79</a> <br><br>suggests that HG79_PATCH is mapped to 136,049,443-136,317,858<br>on chr9, which doesn't seem to be correct.<br><br>So assuming that the correct information is provided by<br><br><a href="ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p5/PATCHES/alt_scaffolds/alt_scaffold_placement.txt">ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p5/PATCHES/alt_scaffolds/alt_scaffold_placement.txt</a><br><br>then everything makes sense!<br><br>Also thanks for providing the URL for displaying HG79_PATCH in the<br>Ensembl browser. Your URL seems to be more appropriate than the URL<br>provided on<br><br>http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-79<br><br>The URL provided here is misleading.<br><br>Cheers,<br>H.<br><br><br>On 11-11-15 02:33 AM, Susan Fairley wrote:<br><blockquote type="cite">Hi,<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">As you mentioned, for each of the GRC assembly patches in Ensembl there<br></blockquote><blockquote type="cite">are two entries in the seq_region table. As you also noted, one of these<br></blockquote><blockquote type="cite">corresponds to a supercontig and the other a chromosome.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The supercontig is the patch scaffold/supercontig released by GRC. The<br></blockquote><blockquote type="cite">length corresponds to the length of the patch sequence. For HG79_PATCH,<br></blockquote><blockquote type="cite">this is 330,164 bases.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The chromosome is the patched version of the chromosome, i.e. what you<br></blockquote><blockquote type="cite">would get if you incorporated the patch into the chromosome it is<br></blockquote><blockquote type="cite">located on.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">As you say, HG79_PATCH is located on chromosome 9 (this can be seen in<br></blockquote><blockquote type="cite">the assembly_exception table as well).<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The original chromosome 9 has a length of 141,213,431 bases.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Looking in the assembly exception table or this file, provided by GRC,<br></blockquote><blockquote type="cite">ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p5/PATCHES/alt_scaffolds/alt_scaffold_placement.txt<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">ensro@ens-livemirror : homo_sapiens_core_64_37 >select seq_region.name,<br></blockquote><blockquote type="cite">seq_region_start, seq_region_end, exc_seq_region_start,<br></blockquote><blockquote type="cite">exc_seq_region_end from seq_region, assembly_exception where<br></blockquote><blockquote type="cite">name='HG79_PATCH' and<br></blockquote><blockquote type="cite">seq_region.seq_region_id=assembly_exception.seq_region_id;<br></blockquote><blockquote type="cite">+------------+------------------+----------------+----------------------+--------------------+<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">| name | seq_region_start | seq_region_end | exc_seq_region_start |<br></blockquote><blockquote type="cite">exc_seq_region_end |<br></blockquote><blockquote type="cite">+------------+------------------+----------------+----------------------+--------------------+<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">| HG79_PATCH | 136049442 | 136379605 | 136049442 | 136369192 |<br></blockquote><blockquote type="cite">+------------+------------------+----------------+----------------------+--------------------+<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">1 row in set (0.00 sec)<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">we can see that the patch replaces the region of chromosome 9 from<br></blockquote><blockquote type="cite">136,049,442 to 136,369,192.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Consequently, a region of length 319,751 bases is removed from<br></blockquote><blockquote type="cite">chromosome 9 (136,369,192 - 136,049,441 = 319,751) and replaced with the<br></blockquote><blockquote type="cite">patch.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">So, chromosome 9 (141,213,431) has 319,751 bases removed (giving<br></blockquote><blockquote type="cite">140,893,680 bases) and then 330,164 bases added, giving 141,223,844 as<br></blockquote><blockquote type="cite">the length of the patched chromosome. To differentiate the patched<br></blockquote><blockquote type="cite">chromosome from the primary version of chromosome 9 it is given the same<br></blockquote><blockquote type="cite">name as the patch.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">In this case, the patched version of the chromosome is 10,413 bases<br></blockquote><blockquote type="cite">longer because the patch is 10,413 bases longer than the region it<br></blockquote><blockquote type="cite">replaces. Patches can, however, also be shorter or the same length as<br></blockquote><blockquote type="cite">the regions they replace.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Also, it is worth noting that not all patches have names ending in<br></blockquote><blockquote type="cite">'_PATCH'. As you will see in the GRC file above, patches exist with<br></blockquote><blockquote type="cite">names like HSCHR3_1_CTG2_1.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">To identify the patches in the database, you could use the following query:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">homo_sapiens_core_64_37 >select count(*) from attrib_type,<br></blockquote><blockquote type="cite">seq_region_attrib, seq_region where attrib_type.code in<br></blockquote><blockquote type="cite">('patch_fix','patch_novel') and attrib_type.attrib_type_id =<br></blockquote><blockquote type="cite">seq_region_attrib.attrib_type_id and seq_region_attrib.seq_region_id =<br></blockquote><blockquote type="cite">seq_region.seq_region_id ;<br></blockquote><blockquote type="cite">+----------+<br></blockquote><blockquote type="cite">| count(*) |<br></blockquote><blockquote type="cite">+----------+<br></blockquote><blockquote type="cite">| 105 |<br></blockquote><blockquote type="cite">+----------+<br></blockquote><blockquote type="cite">1 row in set (0.00 sec)<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">This identifies both the novel and the fix patches, indicated by the<br></blockquote><blockquote type="cite">attrib_types with codes 'patch_fix' and 'patch_novel'.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Gene annotation is stored on the patch regions of the patched<br></blockquote><blockquote type="cite">chromosomes. Currently, the annotation strategy for the patches combines<br></blockquote><blockquote type="cite">three sets of annotation. Manual annotation from the Havana group is<br></blockquote><blockquote type="cite">given precedence and can be found in the core database. It is<br></blockquote><blockquote type="cite">supplemented, however, first with annotation projected from primary<br></blockquote><blockquote type="cite">assembly and subsequently with annotation built based on the alignment<br></blockquote><blockquote type="cite">of evidence to the patches. These second and third categories have<br></blockquote><blockquote type="cite">stable_ids starting with ASMPATCH and are stored in the otherfeatures<br></blockquote><blockquote type="cite">database.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The annotation for HG79_PATCH is visible in the browser here:<br></blockquote><blockquote type="cite">http://www.ensembl.org/Homo_sapiens/Location/View?db=core;h=Q4SWC8.1%20%28802-855%29;r=HG79_PATCH:136049442-136379605<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">I hope this is of help.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Kind regards,<br></blockquote><blockquote type="cite">Susan.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Hervé Pagès wrote:<br></blockquote><blockquote type="cite"><blockquote type="cite">Hi,<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Related to this thread from Oct 2010:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">http://lists.ensembl.org/pipermail/dev/2010-October/000304.html<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">In Ensembl release 64 (and maybe in previous releases, I didn't<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">check), the 'seq_region' table for homo sapiens<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">ftp://ftp.ensembl.org/pub/release-64/mysql/homo_sapiens_core_64_37/seq_region.txt.gz<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">contains entries for some of the "patch" sequences that belong to<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">GRCh37.p5. Those "patch" sequences are named with the _PATCH suffix<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">(e.g. HG7_PATCH, HG79_PATCH, HG506_HG1000_1_PATCH, etc...),<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">and each "patch" has 2 entries in the table. For example, here are<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">the 2 rows for HG79_PATCH:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">seq_region_id name coord_system_id length<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">100965615 HG79_PATCH 2 141223844<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">1000157396 HG79_PATCH 3 330164<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">According to the 'coord_system' table, coord_system_id 2 and 3<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">correspond to "chromosome" and "supercontig", respectively.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">So one possible interpretation could be that the 2 lengths<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">reported for HG79_PATCH are (1) the length of the chromosome<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">that this patch belongs to, and (2) the length of the patched<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">region.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">However, according to this page<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-79<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">HG79_PATCH belongs to chr9 and is mapped to region<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">136049443 - 136317858. So it's mapped to a region of length<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">268416, but the 2nd length reported for the patch is 330164.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">That seems to confirm what the OP reported in the above thread<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">i.e. that the patch is replacing a region in the reference genome<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">by a larger region.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Also, according to this page<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/index.shtml<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">the length of chr9 is 141213431. But the first length reported<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">for HG79_PATCH in the 'seq_region' table is 141223844, which is<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">the length of chr9 + 10413.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Where are those 10413 extra nucleotides coming from? Could it be<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">that this first length reported for HG79_PATCH is the length of<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">chr9 *after* its alteration by the patch? But that doesn't seem<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">to be the case either since this alteration would add 61748 bases<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">to chr9 (330164 - 268416).<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">So my question is: what are those 2 lengths reported for HG79_PATCH,<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">and for the "patch" sequences in general?<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Thanks in advance for any clarification.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Cheers,<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">H.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><br><br>-- <br>Hervé Pagès<br><br>Program in Computational Biology<br>Division of Public Health Sciences<br>Fred Hutchinson Cancer Research Center<br>1100 Fairview Ave. N, M1-B514<br>P.O. Box 19024<br>Seattle, WA 98109-1024<br><br>E-mail: hpages@fhcrc.org<br>Phone:  (206) 667-5791<br>Fax:    (206) 667-1319<br><br>_______________________________________________<br>Dev mailing list    Dev@ensembl.org<br>List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev<br>Ensembl Blog: http://www.ensembl.info/<br></div></blockquote></div><br><div>
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>--</div><div>Dr. Kerstin Howe</div><div>Senior Scientific Manager</div><div>Genome Reference Informatics (Team 135)</div><div><a href="mailto:kerstin@sanger.ac.uk">kerstin@sanger.ac.uk</a></div><div><br></div><div>Wellcome Trust Sanger Institute</div><div>Hinxton, Cambridge CB10 1SA, UK</div><div><br></div></div></span><br class="Apple-interchange-newline"></span><br class="Apple-interchange-newline">
</div>
<br></body></html>