[06:30:20] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3499871 (10Marostegui)
[06:31:14] 10Blocked-on-schema-change, 10DBA: Convert unique keys into primary keys for some wiki tables on s1, s2, s4, s5 and s7 (eqiad) - https://phabricator.wikimedia.org/T164185#3499874 (10Marostegui)
[06:31:16] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3411255 (10Marostegui) 05Open>03Resolved The last host, labsdb1010 is done: ``` root@labsdb1010:/home/marostegui# for i in `cat s1_tables`;do...
[06:31:28] 10DBA, 10Epic, 10Patch-For-Review, 10codfw-rollout: Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) - https://phabricator.wikimedia.org/T155099#3499877 (10Marostegui)
[06:31:30] 10Blocked-on-schema-change, 10DBA: Convert unique keys into primary keys for some wiki tables on s1, s2, s4, s5 and s7 (eqiad) - https://phabricator.wikimedia.org/T164185#3224848 (10Marostegui) 05Open>03Resolved All done
[06:31:33] 10DBA, 10MediaWiki-Database, 10MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 10Patch-For-Review, and 2 others: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3499878 (10Marostegui)
[06:32:21] 10DBA, 10MediaWiki-Database, 10MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 10Patch-For-Review, and 2 others: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3009878 (10Marostegui)
[06:32:27] 10DBA: Convert unique keys into primary keys for some wiki tables on all s* shards (codfw) - https://phabricator.wikimedia.org/T164399#3499879 (10Marostegui) 05Open>03Resolved a:03Marostegui
[06:34:12] 10DBA, 10MediaWiki-Database: Give user_properties a primary key - https://phabricator.wikimedia.org/T146570#3499882 (10Marostegui) 05Open>03Resolved a:03Marostegui
[06:34:15] 10DBA, 10MediaWiki-Database, 10MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 10Patch-For-Review, and 2 others: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3499886 (10Marostegui)
[07:14:07] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3499893 (10jcrespo)
[07:15:27] 10Blocked-on-schema-change, 10DBA: Convert unique keys into primary keys for some wiki tables on s1, s2, s4, s5 and s7 (eqiad) - https://phabricator.wikimedia.org/T164185#3499909 (10jcrespo) Thank you, see T172485.
[07:15:51] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3499893 (10jcrespo)
[07:19:27] 10DBA, 10MediaWiki-Database: Give user_properties a primary key - https://phabricator.wikimedia.org/T146570#3499912 (10jcrespo) 05Resolved>03Open Reopening, this has been done on WMF, but not on tables.sql (+mediawiki patch): https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tabl...
[07:19:36] 10DBA, 10MediaWiki-Database: Give user_properties a primary key - https://phabricator.wikimedia.org/T146570#3499917 (10jcrespo) a:05Marostegui>03None
[07:28:37] 10Blocked-on-schema-change, 10DBA: Make user_newtalk.user_id an unsigned int on wmf databases - https://phabricator.wikimedia.org/T89737#1044134 (10jcrespo) This is not a schema change to be discussed, as it has already been merged, retiring project.
[08:03:15] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500001 (10Marostegui) a:03Marostegui The new databases without the PKs are: ``` maiwikimedia kbpwiki dinwiki atjwiki ```
[08:04:35] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500007 (10jcrespo) Didn't they add wikimania2018 wiki and other 2 yesterday?
[08:05:48] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500010 (10Marostegui) >>! In T172485#3500007, @jcrespo wrote: > Didn't they add wikimania2018 wiki and other 2 yesterday? Yeah, I was double checking that too. Looks like I...
[08:11:28] grep eqiad s1.hosts | while read host port; do echo -n "$host: "; check_mariadb.py -h $host -P $port --shard=s1 --primary-dc=eqiad --icinga; done
[08:11:34] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500018 (10Marostegui)
[08:11:50] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3499893 (10Marostegui)
[08:14:00] nice! :)
[08:15:21] also, s4 has been getting worse, not better since 4am https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?panelId=4&fullscreen&orgId=1&from=now-12h&to=now
[08:15:28] (codfw, not eqiad)
[08:16:22] db1095 suffers similar pattersn- so most likely due to topology
[08:18:13] I don't think it is related to HW no :(
[08:18:27] I mean, if we put a super great server, it will be gone of course
[08:18:31] (i guess)
[08:19:16] not necesarilly
[08:20:24] at least the switchover for s4 was needed, we had to depool db2019 anyways
[08:20:37] what should we do with: https://phabricator.wikimedia.org/T170351 ?
[08:21:13] close it as it will be decomissioned
[08:21:18] it had BBU issues and it was old
[08:21:30] I think the ticket was right, it was just not the core issue
[08:21:43] but it has to be done anyway
[08:21:46] indeed
[08:21:47] *had
[08:21:59] I hoped that with a larger server
[08:22:06] they issues would be at least minimized
[08:22:13] if not fixed
[08:22:36] 10DBA, 10Patch-For-Review: db2019 has performance issues, replace disk or switchover s4 master elsewhere - https://phabricator.wikimedia.org/T170351#3500028 (10Marostegui) 05Open>03Resolved Resolving this, this host will be decommissioned once we have finished with: T162593
[08:30:17] 10DBA, 10Analytics-EventLogging, 10Analytics-Kanban, 10Community-Tech, 10User-Elukey: Drop CookieBlock* tables from EventLogging DB - https://phabricator.wikimedia.org/T171883#3500047 (10Marostegui) Anything pending here?
[08:30:28] 10DBA, 10monitoring: Monitor read_only variable and/or uptime on atabase masters, make it page - https://phabricator.wikimedia.org/T172489#3500048 (10jcrespo)
[08:31:14] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3500062 (10elukey)
[08:31:17] 10DBA, 10Analytics-EventLogging, 10Analytics-Kanban, 10Community-Tech, 10User-Elukey: Drop CookieBlock* tables from EventLogging DB - https://phabricator.wikimedia.org/T171883#3500060 (10elukey) 05Open>03Resolved a:03elukey
[08:33:10] 10DBA, 10monitoring: Monitor read_only variable and/or uptime on database masters, make it page - https://phabricator.wikimedia.org/T172489#3500048 (10jcrespo)
[08:34:55] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500078 (10Marostegui)
[08:36:45] 10DBA, 10monitoring: Monitor swap/memory usage on databases - https://phabricator.wikimedia.org/T172490#3500081 (10jcrespo)
[08:37:02] 10DBA, 10monitoring: Monitor swap/memory usage on databases - https://phabricator.wikimedia.org/T172490#3500093 (10jcrespo)
[08:41:28] 10DBA, 10monitoring: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492#3500113 (10jcrespo)
[08:41:50] 10DBA, 10monitoring: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492#3500113 (10jcrespo) p:05Triage>03Normal
[08:43:50] 10DBA, 10monitoring: Monitor swap/memory usage on databases - https://phabricator.wikimedia.org/T172490#3500142 (10jcrespo)
[08:43:52] 10DBA, 10monitoring: Monitor read_only variable and/or uptime on database masters, make it page - https://phabricator.wikimedia.org/T172489#3500143 (10jcrespo)
[08:43:55] 10DBA, 10Operations, 10Patch-For-Review: Better mysql monitoring for number of connections and processlist strange patterns - https://phabricator.wikimedia.org/T112473#3500144 (10jcrespo)
[08:43:57] 10DBA, 10monitoring: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492#3500141 (10jcrespo)
[08:47:50] 10DBA, 10MediaWiki-extensions-ClickTracking, 10Operations: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#3500151 (10jcrespo) I made a comment somewhere (cannot find where) saying this may aff...
[08:53:30] 10DBA: Fix m1 replication icinga checks - https://phabricator.wikimedia.org/T133062#3500166 (10jcrespo) 05Open>03declined I am going to decline this- something has to be fixed, but most likely T172492 will happen first, or those hosts will be decommissioned first.
[08:55:29] 10DBA, 10DC-Ops, 10Packaging: Change oom_adj for dedicated mysql server processes - https://phabricator.wikimedia.org/T172494#3500171 (10jcrespo)
[08:55:56] 10DBA, 10DC-Ops, 10Packaging: Change oom_adj for dedicated mysql server processes - https://phabricator.wikimedia.org/T172494#3500184 (10jcrespo) p:05Triage>03Low Low, but it will probably be done at some point (on the next mariadb package)
[09:01:53] 10DBA, 10Cloud-Services: Prepare and check storage layer for wikimania2018wiki - https://phabricator.wikimedia.org/T155041#3500190 (10Marostegui) a:03Marostegui I have sanitized all the hosts and ran a check_private data there. I have also registered myself and checked that on the labs hosts my user has been...
[09:07:45] 10DBA: Change pt-heartbeat model to not use super-user, avoid SPOF and switch automatically to the real master without puppet dependency - https://phabricator.wikimedia.org/T172497#3500230 (10jcrespo)
[09:07:50] 10DBA, 10MediaWiki-extensions-ClickTracking, 10Operations: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#3500242 (10Marostegui) >>! In T115982#3500151, @jcrespo wrote: > I made a comment some...
[09:16:49] 10DBA, 10Patch-For-Review: Finish dbstore2002 migration to multi-instance - https://phabricator.wikimedia.org/T171321#3500250 (10Marostegui) s4 is now replicating with gtid
[09:17:07] 10DBA, 10Patch-For-Review: Finish dbstore2002 migration to multi-instance - https://phabricator.wikimedia.org/T171321#3500251 (10Marostegui)
[09:17:43] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500252 (10Marostegui)
[09:20:21] marostegui: I am killing your screen for CREATE INDEX page_idx ON monthly_wp10_enwiki on labsdbs
[09:20:40] jynus: good, thanks for leaving so many screens behind :)
[09:20:55] i actually did a massive clean up on neodymium the other day
[09:21:06] killed like 10 screens I had there :)
[09:22:22] "thanks for leaving so many screens behind"?
[09:22:33] you mean for killing them?
[09:22:34] sorry
[09:22:35] XDDD
[09:22:40] "sorry for leaving..."
[09:22:41] XDDD
[09:22:42] or sorry
[09:22:44] :-)
[09:23:02] I wanted to THANK you and apologise myself
[09:23:18] can I blame Friday? :)
[09:26:30] marostegui: https://imgur.com/r/4chan/3YdJs
[09:27:07] hahahahahaha
[09:27:12] where do you find those things XD
[09:32:17] 10DBA, 10Operations, 10Puppet: Switch databases to the future parser - https://phabricator.wikimedia.org/T172498#3500273 (10jcrespo)
[09:34:39] I think you may need to update the password for nagios, or just make it unix_socket
[09:34:55] for s4 on dbstore2002
[09:35:13] I didn't see the puppet patch, BTW
[09:35:58] it must have been lost in the 200 other mails I got
[09:36:05] thank you
[09:39:34] I can do it, but I will wait for your ok
[09:39:40] Sure
[09:39:43] I can do it too :)
[09:39:48] ok, do ing it
[09:39:51] myself
[09:39:51] thanks!
[09:39:59] when I ask you things
[09:40:04] it is not to tell you to do it
[09:40:07] :-)
[09:40:22] I just want your ok to not break things
[09:40:29] by doing it at the same time
[09:41:21] hehe yeah, better
[09:41:34] so you are updating its password to the same one the other running shards have?
[09:41:39] (so I can know for the future imports)
[09:41:54] In most cases I am using the socket, if that is possible
[09:42:06] I plan to do that for all hosts
[09:42:19] makes total sense
[09:42:22] it just happens that 10.1 are broken with this specific issue
[09:42:32] I hope you know why
[09:42:41] without being said in public
[09:43:04] I think no pass will be much safer
[09:43:18] so both prometheus and nagios should use that model
[09:43:41] i was checking 3312 on dbstore2002 and i see it is using password, am I looking at the wrong thing?
[09:44:03] maybe I only changed it for prometheus :-)
[09:44:07] aaah
[09:44:21] but at least you know my intentions
[09:44:29] for other things, check the actual servers
[09:45:03] the important thing is that 10.1 may break some accounts
[09:45:11] until they are updated
[09:45:33] I saw that on db2062 and db2072
[09:45:48] that is why I wanted you to see for your self that, and the systemd stuff
[09:46:12] yeah, it took me 3 minutes to start mysql on dbstore2002 just making sure I was not doing something stupid :p
[09:48:25] was s4 compressed or as is?
[09:48:43] asking mainly if you got dusplicate key problems
[09:48:57] compressed, and I didn't get any :)
[09:49:10] did you compress it, or just copy from somewhere else
[09:49:25] I got it from one of the new hosts (db2073)
[09:49:40] ok, and where db2073 got the compression :-)
[09:49:46] :-P
[09:49:51] db2073 got copied from...
[09:49:52] let me check
[09:50:07] db2065
[09:50:09] I don't care the source, just if you recently compresed s4
[09:50:19] yeah, like 2 days ago :)
[09:50:27] maybe only s1 got duplicate key errors
[09:50:41] because its files date to a long time ago
[09:51:05] older file format, date columns or internal innodb structure
[09:51:20] I am compressing s5 now
[09:51:24] let me check how is it going so far
[09:51:32] yeah, double check for errors
[09:51:56] no erros on dewiki
[09:52:01] cool
[09:52:04] and it is now doing wikidata
[09:52:05] so far so good
[09:52:11] we will see how it finishes
[09:52:57] are you sure your compression of s4 was successful?
[09:53:05] linter | InnoDB | 10 | Compact
[09:53:13] which was one of the ones that failed to me
[09:53:22] page | InnoDB | 10 | Compact
[09:53:35] i didn't see anything on the logs :|
[09:53:37] watchlist | InnoDB | 10 | Compact
[09:53:57] did those fail, to you too?
[09:53:58] I am going to bet the same tables
[09:54:04] failed for you than to me
[09:54:18] did they fail in s4 too?
[09:54:29] yep, the above is s4 on dbstore2002
[09:54:34] commonswiki
[09:54:51] I feel bad as I am always pointing errors to you
[09:55:08] no no, it is good, i missed those then
[09:55:18] let me try a re-import
[09:55:22] of those tables
[09:55:23] I care about the results, I made the same mistakes
[09:55:38] so I just point them to you because I suffered from those before
[09:56:03] can i stop replicaiton on dbstore2002?
[09:56:08] sure
[09:56:12] ok, give me a sec
[09:56:23] i wonder why i missed those in the logs
[09:56:27] I will double check why
[09:56:34] note I do not care much about the tables themselves
[09:56:48] the difference in size is not going to be too large
[09:56:58] but more about a) detecting things failing
[09:57:17] b) carrying potential internal corruption or something
[09:57:30] I can give you the one liner I used
[09:57:39] let me search
[10:00:12] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500412 (10Marostegui)
[10:00:40] I don't have them, ot got lost in the screen
[10:00:45] haha now orries
[10:00:56] I am going to reimport linter table (the smallest one) to see what happens
[10:01:13] but it was just mysqldump | pigz and them pv | mysql
[10:01:22] yeah
[10:01:26] and I import it it as compressed
[10:01:48] but you said it was maybe faster in other way, so do as you best see it
[10:02:06] there is definitely a pattern
[10:02:11] on the tables
[10:02:16] Yeah, if that works I will fix db2073 and its source
[10:02:18] same ones failing
[10:02:27] s5 hasn't failed
[10:02:32] intersting
[10:02:36] well, dewiki
[10:02:40] wikidata still running
[10:02:51] so it could be some column type or something
[10:02:54] and s3 I didn't touched those tables, only the most important ones
[10:03:23] but for example, linter has been already done on dewiki and wikidatawiki
[10:03:26] with no issues
[10:14:49] GRANT USAGE ON *.* TO 'nagios'@'localhost' IDENTIFIED VIA 'unix_socket'; done on dbstore2002:s4
[10:14:56] \o/
[10:15:07] that is the plan everywhere, but I will go slowly
[10:15:20] s2 and s1 might need it too
[10:15:22] in dbstore2002
[10:15:33] I can do that
[10:15:46] and then delete all other accounts
[10:15:48] sure
[10:15:54] thanks
[11:08:17] linter page had no isses after reporting, so I am fixing page and watchlist
[11:14:22] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500531 (10Marostegui)
[11:30:20] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500562 (10Marostegui)
[11:46:06] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500590 (10Marostegui)
[12:07:46] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500655 (10Marostegui)
[12:11:37] jynus: https://phabricator.wikimedia.org/T171027#3451391
[12:11:44] When did the query limiters start being imposed?
[12:25:49] "Read timeout is reached" seems that it hit hhvm limits first
[12:26:26] server side query killer would have gotten a "connection was lost", maybe?
[12:26:44] in any case, yes, that is possible (>60s query)
[12:37:02] Reedy: I have commented with hopefuly useful information
[12:37:31] heh, thanks
[12:39:01] I took 22.57 seconds cold, 3 seconds host, so that probably means it is timeouting in http, and not my query killer
[12:39:05] not that it helps
[12:39:15] but to understad what fails
[12:49:16] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500732 (10Marostegui) Next step would be to review and merge: https://gerrit.wikimedia.org/r/#/c/370190/ I have never done any mediawiki core deployment so I would appreciat...
[12:50:35] 10DBA, 10MediaWiki-Database, 10MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 10Patch-For-Review, and 2 others: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3500739 (10Marostegui) I would like someone to help getting t...
[12:59:25] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3500757 (10Reedy) >>! In T172485#3500732, @Marostegui wrote: > Next step would be to review and merge: https://gerrit.wikimedia.org/r/#/c/370190/ > I have never done any medi...
[13:20:43] 10DBA, 10Epic, 10Patch-For-Review, 10codfw-rollout: Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) - https://phabricator.wikimedia.org/T155099#3500842 (10Marostegui)
[13:20:46] 10DBA: Convert unique keys into primary keys for some wiki tables on s3 (both eqiad and codfw) - https://phabricator.wikimedia.org/T172485#3499893 (10Marostegui) 05Open>03Resolved I am going to close this task as the DB part are done and we are going to discuss the tables.sql merge at: T172514
[15:23:35] 10DBA, 10monitoring: Monitor read_only variable and/or uptime on database masters, make it page - https://phabricator.wikimedia.org/T172489#3501437 (10jcrespo) Copied from T171928: > ``` > $ check_mariadb.py -h db1052 --slave-status --primary-dc=eqiad > {"datetime": 1501777331.898183, "ssl_expiration": 161...
[17:01:33] https://gerrit.wikimedia.org/r/#/c/370190/4 is complete for mysql at least
[17:02:52] thanks
[17:03:12] I would like later to delete the duplicate indexes , too
[17:03:24] on a following patch
[17:04:15] I'm just looking for the USE INDEX/IGNORE INDEX conflicts...
[17:04:33] I think we may need to change use into ignore
[17:04:50] because there will be a time in which only one exists
[17:05:14] so if we do USE(X) -> USE(PRIMARY), somthing will break
[17:06:20] ApiQueryLinks.php (1 usage found)
[17:06:20] 140 $this->addOption( 'USE INDEX', $this->prefix . '_from' );
[17:06:24] or are you thinking of the rename index or index exists?
[17:06:25] I bet that's the only one we need to fix here
[17:06:50] there was a log of errors we get when we deployed to prod as a test
[17:06:55] *got
[17:06:57] somewhere
[17:07:04] on the ticket of PKs
[17:07:38] will give it a look next week
[17:09:15] It would be nice if we can fix the USE/FORCE in the same patch, and actually drop the two extra indexes too
[17:09:19] Rather than a followup
[17:09:48] Anyway. I've got the farce of making SQLite pass so the unit tests don't break :)
[17:09:49] yeah, but it is ok if at least match production
[17:10:13] yeah, indeed
[17:10:22] It's not the end of the world for sure
[17:10:31] those *links duplicate index are very large
[17:10:37] We can modify the patches, and add extra drop patches
[17:10:45] So anyone in future doesn't need to do two alters
[17:10:47] or add another patch
[17:10:53] true
[17:10:53] And anyone with partial drops, can just get them dropped
[17:11:07] I don't think much in terms of external users
[17:11:13] you terrible person you :)
[17:11:15]