Created attachment 573034[details]
bad bad code
My local (unreviewed) code to migrate from one release to the next is a piece of shite. Notably, the output looks like:
migrating 46 40
action count 736
migrating 37 14
action count 918
migrating 21 13
action count 994
migrating 14 10
action count 1042
... meaning that, once I understood why the code said what it said, that I'm probably having thousands of Action entries in the db that have absolutely no right to exist. The fix in the attached file for reference is likely
somap = {}
for app_code, app in app4code.iteritems():
vs
for app_code, app in app4code.iteritems():
somap = {}
I don't deserve any help in cleaning this up, but I'd take it anyway. How god-freakin aweful, and how shameful that I haven't looked at what this code does everytime I ran it so far. I did complain to myself about it taking long, of course. Bad Axel, corner, stupid hat.
It'd be really good to have the data cleaned up before I try to do the merging of all that bad data in data1.1, too. Like, duplicate data is one thing, but up to 3 silly dupes, bad.

I'm still not grasping what's going on due to lack of goals and context.
However, clearly whatever code we're talking about it needs to be folded in nicely and in a structured way so we can document it and unit test it.
I don't know how to actually execute this but I'd love to see it as a management command where the meat of the management command is distinctly separated from the handling of argument options. Then we can write tests that solidify what's going on and we can assert that dupes aren't created.
Honestly don't know where to begin in terms of helping out here.

Created attachment 587290[details][diff][review]
add a command to go through the duplicate actions, and delete all but one
This command goes through all Actions, on a 100-ish chunk base, and deletes all obsolete/duplicate actions.
I tested this locally against a db dump.
A word on the chunking:
The cloning does copy over the 'when', so one can be sure that all duplicate actions have the same time. So, in order to not have duplicate actions in different chunks, I chunk by when. To have about the same amount of actions in each chunk, I first run a query to find a good timestamp, and then take all actions within that timeframe.

Gotta reopen. there's actually a race condition in the code, but that's OK-easy to fix. Basically, if there's more than a 100 actions on the cut time, things run in loops. And on the db state on dev, that triggers. That didn't happen with an earlier clone for me.

Created attachment 589025[details][diff][review]
make next_cut be earlier, and slice more inclusive
Bustage fix, this patch does two things:
The slice that we're walking over takes both start and end time. That makes us iterate over some actions more than once, but that's OK. It does make sure we're not stalling, though.
The next_cut is moved out a tad further to the past, by excluding the current cutoff.
Both together make the chunk of actions a good deal larger than 100, I've seen as much as 300 in a dry-run.
Gonna request review once I've done a smoke-test on dev on this.