<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Ah, I forgot there was another little problem with the <span style="font-family: Menlo; "><b>sequence_cds</b> </span> table - attached the SQL i ran. <div><br></div><div>Maybe it's worth editing the patch files … </div><div><br></div><div><br></div><div>Regards,</div><div> Jan </div><div><br><div><br></div><div><div><font face="Menlo">After fixing this, run the patches from the patch_67_68.sql file by hand. Next patch which fails:</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> ALTER TABLE sequence_cds</font></div><div><font face="Menlo"> DROP PRIMARY KEY,</font></div><div><font face="Menlo"> DROP KEY member_id,</font></div><div><font face="Menlo"> DROP COLUMN sequence_cds_id,</font></div><div><font face="Menlo"> ADD PRIMARY KEY (member_id); </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> ERROR 1062 (23000): Duplicate entry '2012449' for key 'PRIMARY'</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">Fix the sequence_cds table:</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">Check first if all is really duplicated:</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> igisdev -s -D ensembl_compara_67 -e'select member_id from sequence_cds GROUP BY member_id HAVING COUNT(*) > 1' > member_ids_duplicated.txt</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> gawk '{ print "select member_id, length, sequence_cds from sequence_cds where member_id ="$1";" }' member_ids_duplicated.txt > member_ids.sql </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> igisdev -s -D ensembl_compara_67 < member_ids.sql > tmp.txt </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> wc -l tmp.txt </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> sort -u tmp.txt | wc -l </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">Result : 5622 lines in the duplicated file, 2811 lines in the unique file => all entries are duplicates, so we can delete the</font></div><div><font face="Menlo">duplicated items.</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> select member_id , count(*) from sequence_cds GROUP BY member_id HAVING COUNT(*) > 1</font></div><div><font face="Menlo"> order by count(*) desc limit 3;</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> create table tmp_s_cds (sequence_cds_id int) ; </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> INSERT into tmp_s_cds </font></div><div><font face="Menlo"> SELECT sequence_cds_id </font></div><div><font face="Menlo"> FROM sequence_cds GROUP BY member_id HAVING COUNT(*) > 1;</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> select count(*) from tmp_s_cds ; </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">Check: count(*) = 2811 , half of 5622 = OK !</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> DELETE FROM sequence_cds </font></div><div><font face="Menlo"> USING tmp_s_cds , sequence_cds </font></div><div><font face="Menlo"> WHERE tmp_s_cds.sequence_cds_id = sequence_cds.sequence_cds_id;</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">Check:</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> SELECT * from tmp_s_cds , sequence_cds</font></div><div><font face="Menlo"> WHERE tmp_s_cds.sequence_cds_id = sequence_cds.sequence_cds_id ; </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> DROP table tmp_seb_id; </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">Now, try to apply the patch again:</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> ALTER TABLE sequence_cds</font></div><div><font face="Menlo"> DROP PRIMARY KEY,</font></div><div><font face="Menlo"> DROP KEY member_id,</font></div><div><font face="Menlo"> DROP COLUMN sequence_cds_id,</font></div><div><font face="Menlo"> ADD PRIMARY KEY (member_id); </font></div><div><br></div><div><br></div><div><br></div><div><div>On Jan 23, 2014, at 4:56 PM, Jan Vogel <<a href="mailto:jan.vogel@gmail.com">jan.vogel@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html charset=windows-1252"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hello Matthieu et al, <div><br></div><div>thanks for the comments - I managed to patch the database, but used a slightly different SQL. </div><div><br></div><div>Also thanks to the compara team and ensembl to supply these patch files - we're computing our own gene sets, however we're patching all in-house databases up to the most recent schema so we can take advantage of all the bug fixes and improvements. </div><div><br></div><div>Regards, </div><div><br></div><div> Jan </div><div><br></div><div><br></div><div><br></div><div>Here's my log, if someone is interested : </div><div><br></div><div><div><font face="Menlo">Schema patch fails due to duplicated member_ids in the sequence_exon_bounded table:</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> ALTER TABLE sequence_exon_bounded </font></div><div><font face="Menlo"> DROP PRIMARY KEY, DROP KEY member_id, </font></div><div><font face="Menlo"> DROP COLUMN sequence_exon_bounded_id, </font></div><div><font face="Menlo"> ADD PRIMARY KEY (member_id)</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">There are some member_id duplicates in the table ... let's fix this - details in IGIS</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">Identify the max. number of duplications in the member_id group:</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> SELECT sequence_exon_bounded_id , count(*)</font></div><div><font face="Menlo"> FROM sequence_exon_bounded GROUP BY member_id HAVING COUNT(*) > 1</font></div><div><font face="Menlo"> order by count(*) desc limit 3;</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">Create tmp table to store the sequence_exon_bounded_id identifiers for deletion:</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> create table tmp_seb_id (sequence_exon_bounded_id int) ; </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> INSERT into tmp_seb_id </font></div><div><font face="Menlo"> SELECT sequence_exon_bounded_id </font></div><div><font face="Menlo"> FROM sequence_exon_bounded GROUP BY member_id HAVING COUNT(*) > 1;</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> DELETE FROM seb </font></div><div><font face="Menlo"> USING tmp_seb_id tmp_seb, sequence_exon_bounded seb </font></div><div><font face="Menlo"> WHERE tmp_seb.sequence_exon_bounded_id = seb.sequence_exon_bounded_id;</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo">Check:</font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> SELECT * from tmp_seb_id tmp_seb, sequence_exon_bounded seb </font></div><div><font face="Menlo"> WHERE tmp_seb.sequence_exon_bounded_id = seb.sequence_exon_bounded_id ; </font></div><div><font face="Menlo"><br></font></div><div><font face="Menlo"> DROP table tmp_seb_id; </font></div></div><div><br></div><div><br></div><div><br><div><div>On Jan 17, 2014, at 9:49 AM, Matthieu Muffato <<a href="mailto:muffato@ebi.ac.uk">muffato@ebi.ac.uk</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Dear Jan,<br><br>There should be a single "exon bounded" sequence for each member_id and the API would anyway select only one of them.<br>You can remove the duplicated entries with this query:<br><br>DELETE FROM sequence_exon_bounded WHERE sequence_exon_bounded_id IN (SELECT sequence_exon_bounded_id FROM sequence_exon_bounded GROUP BY member_id HAVING COUNT(*) > 1);<br><br>You may have to run it several times if some member_ids have 3 or more sequences<br><br><br>Regarding Will's comment: Compara indeed only started distributing the patches as individual patches with the according meta keys a few releases ago.<br><br>Best,<br>Matthieu<br><br>On 17/01/14 10:08, Will Chow wrote:<br><blockquote type="cite">Hi Jan<br><br>I noticed warnings as well patching from 66->73 for compara for a few<br>tables ( I don't quite remember which ones ).<br><br>Of course this doesn't really help you or solve your issues, but since<br>our group doesn't use those tables I just ignored the patch warnings and<br>for our purposes, it seems to work. I do notice in the meta table the<br>patch keys for the updates are not there below e70 even though the<br>patches are in the sql directory.<br><br>again doesn't really help you I guess.<br><br>Will<br><br>On 17 Jan 2014, at 06:49, Jan Vogel <<a href="mailto:jan.vogel@gmail.com">jan.vogel@gmail.com</a><br><<a href="mailto:jan.vogel@gmail.com">mailto:jan.vogel@gmail.com</a>>> wrote:<br><br><blockquote type="cite"><br>Hi there,<br><br>i'm trying to patch our ensembl 67 compara database up to schema 74.<br><br>It fails :-( when applying the<br>*ensembl-compara/sql/patch_67_68.sql* patch as in the 67 schema there<br>are duplicate member ids in the sequence_exon_bounded table. Any idea<br>how to fix this and patch my e67 up to e68 ?<br><br>Thanks,<br><br> Jan Vogel<br><br><br>ERROR 1062 (23000) at line 28 in file:<br>'/gnet/resfiles3/resapps/apps/ensembl/ensembl-74/ensembl-compara/sql/patch_67_68.sql':<br>Duplicate entry '2012449' for key 'PRIMARY' :<br><br>--------------<br>SET session sql_mode='TRADITIONAL'<br>--------------<br>……<br>…..<br><br><br>--------------<br>ALTER TABLE sequence_exon_bounded<br>DROP PRIMARY KEY, DROP KEY member_id,<br>DROP COLUMN sequence_exon_bounded_id,<br>ADD PRIMARY KEY (member_id)<br>--------------<br><br>Failed to apply patch 'patch_67_68.sql' to database 'ensembl_compara_67'!<br>In non-interative mode; aborting current run at schema_patcher.pl line<br>643.<br><br><br>Looking into the sequence_exon_bounded table, it looks that some<br>member ids are duplicated:<br><br>mysql> select count(distinct(member_id)) from sequence_exon_bounded<br>limit 10 ;<br>+----------------------------+<br>| count(distinct(member_id)) |<br>+----------------------------+<br>| 1371152 |<br>+----------------------------+<br>1 row in set (10.55 sec)<br><br>mysql> select count(member_id) from sequence_exon_bounded limit 10 ;<br>+------------------+<br>| count(member_id) |<br>+------------------+<br>| 1373719 |<br>+------------------+<br><br><br><br>_______________________________________________<br>Dev mailing list <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a> <<a href="mailto:Dev@ensembl.org">mailto:Dev@ensembl.org</a>><br>Posting guidelines and subscribe/unsubscribe info:<br><a href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>Ensembl Blog: <a href="http://www.ensembl.info/">http://www.ensembl.info/</a><br></blockquote><br><br><br>_______________________________________________<br>Dev mailing list <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>Ensembl Blog: <a href="http://www.ensembl.info/">http://www.ensembl.info/</a><br><br></blockquote><br><br>-- <br>Matthieu Muffato, Ph.D.<br>Ensembl Developer and Ensembl Compara Manager<br>European Bioinformatics Institute (EMBL-EBI)<br>European Molecular Biology Laboratory<br>Wellcome Trust Genome Campus, Hinxton<br>Cambridge, CB10 1SD, United Kingdom<br>Room A3-145<br>Phone + 44 (0) 1223 49 4631<br>Fax + 44 (0) 1223 49 4468<br><br>_______________________________________________<br>Dev mailing list <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>Ensembl Blog: <a href="http://www.ensembl.info/">http://www.ensembl.info/</a><br></blockquote></div><br></div></div></blockquote></div><br></div></div></body></html>