Mon, Dec 3

I can reproduce it as well. Received sizes and execution times are not consistent, ranging from a few hundreds of byes to a couple of megabytes and a few secs respectively. This and more importantly the test done above by @fgiunchedi indicate something going awry in the communication between varnish and swift.

I am afraid we can't really change it. It's been at 06:25am (UTC in our case) forever and people expect that. Changing it would break the current expectations of people. Note that this is true for all services and software and it hasn't really caused an issue for a long time. So we should make a better job of surfacing and fixing the issues, not changing the logrotate schedule

Does this mean he have a hard deadline of 2019-04-01 for completing the migrations? Or per the "I can backport security fixes for a while" we have a couple of more months? The current goal is that by July 2019 all scb services, restbase (and probably aqs as well), proton, parsoid will be in kubernetes. That will leave turnilo and aphlict I guess.

I 've helped with the debugging. Starting from apertium it was clear something automated was POSTing a lot of requests to it. It turned out they were mostly for the rus|bel langpair but that was a red herring as it was just the snapshot in time I looked at. Moving from apertium to cxserver it became clear something was POSTing to /v2/translate endpoint. The things I noted were mostly about another language pair ca|oc but again that was a snapshot in time. Then a VM IP caught my eye, one that was of wcdo.wcdo.eqiad.wmflabs. I 've jumped into said VM and stopped a process that was clearly heavily hitting the cxserver API

Tue, Nov 20

I think we should support multiple tags per image (docker anyway does support that and they cost next to nothing on the registry level AFAIK)

Keep the ${timestamp}-production (+1 to @LarsWirzenius about splitting date and time btw) as it's nice and monotonically increasing

+1

Add a tag based on the zuul.commit SHA1, but only if we can be reasonable sure that it's immutable. My memory fails me, but I remember some objection to this in the last meeting, does anyone remember the specifics?

I don't recall any specific objections, but I may be mis-remembering. Since we have this running as a postmerge job it should be fine, I think.

Possibly allow the developer to influence part of the process by supporting adding a tag on git commits that are tagged, allowing developers to implement SemVer (or any other kind of versioning scheme) if they so wish. That might or might not be the wisest decision on their part, but I think we should allow people to make that decision

For now we could do something like git tag --points-at HEAD and just add a tag based on that. In future we may want something fancier.

FWIW, I 'll echo @Ladsgroup and @fgiunchedi. Having the data is obviously useful. Representing them in grafana on the other hand it probably not so practical. I also have my doubts as to whether a graph would help identify the culprits of load spikes, mostly due to the nature of the service, but I am be at fault here.

Nov 13 2018

After looking into it a little bit, packaging harbor would be challenging. Harbor is a set of microservices published as containers. The installation and dev guide refers to docker-compose as a strong requirement for running harbor components, in order to run this docker-compose we need to build the container images and hosted them in our own docker registry which seems a sort of catch-22 problem (other people could rely on downloading it from DockerHub).

@Papaul I 'd say ignore it. That system+disk self/array is scheduled for decomission, to be replaced with backup2001 (T196477). The data in it is a copy of the data from helium so we ain't gonna lose something if more disks fail. There is no point in maintaining. After talking with @MoritzMuehlenhoff on IRC it seems like we can do a fresh reinstall of backup2001/backup1001 next week with the new stretch point release and set up the service on them and then decomission this

Xenon occupies a mostly constant amount of space, but that mount may double from 25G upto 50G. This will essentially chip away at the space reserved for XHGui, which I estimated (in the task description) as being able to fit current and next 5 years of data. With this change in estimate for Xenon, that would instead accomodate (100G/2G per month) about 4 years instead.

Sorry, reopening this one because one was missed (on account of being inaccessible when Cumin was being run to find all trusty instances): compiler.puppet.eqiad.wmflabs - it's still not responding to ping or SSH.@akosiaris, apparently you set this up 4 years ago, any chance you can shed any light on why it's not responding despite being status active instead of shutoff?