Add the center shift argument to nnedi3_rpow2, and no chroma shift correction will be needed:

Code:

nnedi3_rpow2(2, cshift="Spline16Resize")

EDIT Note,
- I tested your script sans dither calls, for simplicity
- I tested at 4x enlargement to exaggerate the artifacts. The output looks great; this is a good idea!
- nnedi3_rpow2 does not accept Y8, only YV12, YUY2 and RGB24 - at least for me.

I forgot that nnedi3_resize16 does colorspace conversion. For learning purposes I compared nnedi3_resize16(1280,720, output="YV24") against a manual conversion. The following seems to produce the closest output between the 2.