Status

For bugs and requests related to management of MDN Web Docs, and for bugs and requests related to the Kuma platform that drives the MDN wiki. Report issues about the MDN content under 'Developer Documentation'.

Security

(public)

User Story

Now that HTML has moved to recommendation status, everywhere our version info macros indicate that it's PR is out of date. Please trigger a rebuild of the subtree starting at https://developer.mozilla.org/en-US/docs/Web/HTML as soon as is practicable, to update all of these references.
Thanks!

Ok, I've talked to :robhudson in detail and have formed a few ideas, depending how much work we want to invest in this.
Currently the management command schedules rendering tasks immediately when using the --defer command line option. That translates to executing those rendering tasks in parallel on all three workers and a guarantees a DDoS via the kumascript/kuma request/response cycle.
There are a few options we can implement to make sure that scheduling those tasks doesn't lead to a DDoS:
1. extend the management command to use a Celery chain to make sure the rendering tasks are executed sequentially wrapped in a chord with a callback that reports when the rendering is done
Pro: Simple to develop
Con: Will take as long as actually running the management command without the defer CLI option, probably very long
2. use the rate-limit ability of celery to rate-limit a task per worker instance to a sensible amount per time unit and use a Celery task group for running the rendering tasks in parallel but under control to reduce risk of overwhelm
Pro: improved scalability and potentially speed
Contra: hard to guess how many tasks can kuma/kumascript take, so not clear how big the rate-limiting should be, also hard to test outside of prod environment due to missing dev/stage/prod parity
3. use a separate Celery queue to separate task execution from other of Celery tasks, with a hard lock of the number of tasks and/or time of processing, a.k.a. "global" rate-limit
Pro: best isolation from other important Celery tasks such as email sending. other tasks could profit from that as well, e.g. have a separate "render" queue just for rendering docs
Contra: ops intensive as it requires separate celery worker process setup, including dev environment setup
Obviously the main reason to decide for one of the options above (and they could easily be combined as well) is constraints on developer time. That's outside my jurisdiction though ;)

In a meeting this morning we decided to go with option 1 for now. Option 3 is attractive but we will save that for later if/when we add a self-service option for rebuilding subtrees.
In addition to option 1 we will add tasks in the chain that send emails to mdn-dev@ after each 20% of the tasks are processed and once again when everything is complete.