Oddly enough those seem to contain a typo. In the line "QUAD_REAL g = YUV.x - 0.344*u - 1.403*v" the number 1.403 seems to have been copied from the previous line, it should be somewhere around 0.714. Also the coefficients have been rounded to 3 decimal places which is less accurate than what I was using.

As far as I can tell I'm just using the wrong colour space somehow, I suspect it has something to do with the video using different primaries, which MadVR corrects but the YCbCrtoRGB shader does not. I'll try to see if I can find a quick fix, but I'd rather not spend too much time trying to copy the entire MadVR colour processing chain.

Well, it doesn't seem to be the primaries, but I can't figure out what does cause the problems. There seems to be something weird with the source levels; the "Y" channel isn't 0 when the source is black. Presumably MadVR does something to fix that but I can't figure out what.

Unsure exactly what actually you're troubleshooting right now, but are you taking into consideration how madVR has a TV-range workflow which preserves BTB & WTW information during conversions, rather than clipping it off?

I was trying to see if there was a quick fix such that I could at least make the YCbCrtoRGB shader display the correct colours when you used it with the YCbCr hack on your clip. Since just using the Bt.601 matrices didn't seem to work I figure I might need to convert the primaries. I though I more or less understood how to go from one set of primaries to another, but something goes wrong. One of the things I can't figure out is why changing the primaries in MadVR also changes the black level when you use the YCbCr hack. This may have something to do with the TV-Range workflow you mentioned.

Edit: I did find a quick workaround for the sample you sent. If you set the primaries to EBU/PAL in MadVR then at least the chroma upsampling will work, even if you do not use the YCbCr hack, unfortunately the colours will be slightly incorrect.

It took some time but I think I've finally found a way to get improve NEDI enough to make it competitive with NNEDI3. It took a while since all algorithms related to NEDI (including aQua, SAI, and adaptations thereof) seem to suffer from the same flaw: there seems to be no way to get them simultaneously fast, numerically stable, and sharp (especially on vertical/horizontal edges).

So I needed to find a different approach. And I found one, in the article "Image Interpolation by Super-Resolution" by Alexey Lukin, Andrey S. Krylov, and Andrey Nasonov. The approach they took was to treat upscaling as a sort of inverse downscaling. They also show that this works nicely together with NEDI. From the kind of images I was able to create, using this approach, I'm also reasonably sure this was part of the inspiration behind the SmartEdge algorithm that Alexey Lukin showcases on his website.

Anyway here is an example of the images I got using NEDI combined with the "SuperRes" method:

Could you add a comparison image with your original NEDI algorithm (without SuperRes), so we can easily see the improvement from NEDI to NEDI+SuperRes? And maybe Jinc3 AR as another point of comparison? Thanks!!

At a quick glimpse SuperRes is doing much better on the windows on the castle.. If you can take care of that aliasing I think we're on to a winner. I already want to use this for all my upscaling, incredible find there Shiandow.

The speed should be close to NEDI. It still needs to use NEDI so it will be slower, but since SuperRes "refines" the image (removing artefacts etc.) it's possible to take some shortcuts in the NEDI algorithm.

This looks like a quite big improvement over NEDI to me. Less artifacts in the castle image, and it has a generally more "in focus" look. And on a quick check it might show less "fractal like" artifacts in image areas like grass/trees compared to NNEDI3. But, as you say, your current SuperRes algorithm adds quite a bit of aliasing compared to the original NEDI algorithm. This is quite noticeable in the circular hand rail at the top of the lighthouse. If you can fix that we probably have a winner!!

Btw, when I tested the original SmartEdge test application, I found that the SuperRes post processing made everything look identical, regardless of whether you started with NEDI or Bicubic. So once you've fixed (if possible) the aliasing problem, it might be worth a try to test Bicubic or Lanczos + SuperRes, just to see how it compares. At least Bicubic/Lanczos + SuperRes might be another option with a good speed/performance ratio, if SuperRes can improve on Bicubic/Lanczos.

I think I've lessened the aliasing a bit. It's still not quite as good as NEDI but then again I suspect that NEDI (and NNEDI3) deform the rail in order to improve aliasing. I also seems to have improved sharpness, without actually intending to do so.

Using Lanczos does result in a very similar image but using NEDI seems to improve aliasing a bit. But since Lanczos is faster you can have more iterations of the SuperRes algorithm. I think that in most cases more iterations will be better than less aliasing, but this doesn't seem to apply to the lighthouse image.

Using Lanczos does result in a very similar image but using NEDI seems to improve aliasing a bit. But since Lanczos is faster you can have more iterations of the SuperRes algorithm. I think that in most cases more iterations will be better than less aliasing, but this doesn't seem to apply to the lighthouse image.

As you say it's very close.. upon closer inspection you can see NEDI doing some of it's magic on the bottom rail along the top of the lighthouse.
So it looks like a new set of resizing possibilities have opened up for us to compare.. fun.

The lighthouse image is a bit "mean" because that rail already has some ringing around it in the original image which makes it really hard to handle for scaling algorithms. For some reasons NNEDI3 handles this situation especially well. But some other parts of the image actually look better in your latest NEDI + SuperRes image compared to NNEDI3, e.g. the fence. I wonder what happens if you run NNEDI3 + SuperRes? Maybe it could be an option to use NNEDI3 with 16 neurons + SuperRes instead of using NNEDI3 with more neurons.

From what I remember, though, the original SmartEdge 2 algorithm didn't have *any* aliasing at image edges. So maybe there's still room for improvement? But the original SmartEdge 2 algo was also slow as hell...

At the moment using SuperRes with NNEDI3 gives an identical result to using it with NEDI, it's currently just too aggressive too leave any difference between them intact. But I have noticed that less aggressive versions of SuperRes tend to have less aliasing, so NNEDI3 with a less aggressive version of SuperRes could look quite nice. Anyway I think it's best if I write a short explanation of the algorithm and clean up the code a bit so you can try and see for yourselves what options and trade-offs there are.

Oh, whoops, while trying to explain the algorithm I discovered a mistake. I was trying to be clever by combining a few of the steps of the algorithm, but it turns out that this wasn't possible after all. This is what I got after fixing that (and some other changes). Apparently that was (part of) the cause behind the aliasing.

It'll probably take some time before I get the explanation ready, I'd rather not rush it. Especially since there are apparently still parts of the algorithm that aren't entirely correct/clear.

I've often found that when I tried to explain something complicated, that can help seeing things clearer for myself. FWIW, I don't see that much difference between the latest result and the previous "NEDI + SuperRes (4 iterations) (v0.2)" result. Maybe a touch less aliasing. But maybe also a tiny bit less sharpness. NNEDI3 still reproduces the rail a bit better. Anyway, please keep going. I'm really looking forward to try to understand the algorithm once you get around explaining it.

Btw, another great test image is the clown image. Those 3 (castle, lighthouse, clown) are my favorites for testing resampling algorithms. Well, that, and the park meter image to a lesser degree.

Oh, whoops, while trying to explain the algorithm I discovered a mistake. I was trying to be clever by combining a few of the steps of the algorithm, but it turns out that this wasn't possible after all. This is what I got after fixing that (and some other changes). Apparently that was (part of) the cause behind the aliasing.

It'll probably take some time before I get the explanation ready, I'd rather not rush it. Especially since there are apparently still parts of the algorithm that aren't entirely correct/clear.

I find the "broken" version was slightly more pleasant thanks to the extra sharpness. The aliasing had already been reduced enough that further reduction in the fixed didn't quite balance the sharpness loss, but either way, it's a very subtle difference. It might be a worthwhile trade if it speeds things up. Would be interesting to make the sharpness/aliasing tunable, if that's possible.

I can't wait to play with the shader.

__________________There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order.

Okay, so I'll now try to explain how the super resolution method works and how I've implemented it. If you only want to know how to configure it just skip to the end.

The general idea behind the super resolution method that Alexey Lukin et al. explained in their paper is to treat upscaling as inverse downscaling. So the aim is to find a high resolution image which, after downscaling, is equal to the low resolution image.

The problem is that this is usually not well defined. For instance let's take the simplest possible example i.e. an image consisting of just 1 pixel, with value X. And say we want to find an image consisting of 2 pixels with values which we'll denote A and B. Now we need to decide on a downsampling algorithm from the image of 2 pixels to the image of 1 pixels. The obvious choice is to just average the two pixels, i.e. X = (A+B)/2. So if we had the values of A and B then we could find X but we're working backwards so we know the value of X and we want to find the values of A and B. However for any value of A there is a value of B such that (A+B)/2 = X, so there's no way to decide upon a solution. This can be solved by requiring the values of A and B to be close to each other, in which case the unique(!) solution is to make A and B equal X.

The SuperRes algorithm works similar to this except with more pixels, it requires the image to be "regular" (pixel close to each other should have "similar" values) and instead of requiring the result to be exactly equal to the original image after downscaling we simply require it to be "faithful". This method has an enormous amount of flexibility since we can choose which downscaling method to use, how to measure "regularity", and how to measure "faithfulness".

My implementation uses a very simple downscaling method which just averages the pixel values over a disk shaped region (with radius sqrt(2)). For measuring "faithfulness" I just square the difference between the downscaled result and the original. The method of measuring "regularity" is somewhat more complicated, since you only want the pixel values to be close when there is no edge, but not when there is. But it basically consists of looking at all pixels that are close and try to minimize some "distance" between the pixel values, I'll explain how I chose this distance later since it's related to how the algorithm works.

To find an image which is both "regular" and is "faithful" to the original image it is simplest to use the gradient descent method, which basically means that we look if it's better to lower or raise a pixel value and change it accordingly. This is similar to viewing the "regularity" and "faithfulness" as some kind of energy and calculate the resulting forces that act on the pixel values, where "regularity" pulls them closer together and "faithfulness" pulls them closer to the original values.

This brings us to how I've chosen to measure regularity, since instead of defining "regularity" directly and try to calculate the forces I just directly define the forces. The force I've chosen looks like this in this plot "x" is the difference between two adjacent pixel values. From the plot you can see that the force increases rapidly when values that were close move away from each other, but remains fairly constant when the difference was large (since that likely means that there's an edge). The corresponding distance is a bit more complicated but looks like this. This distance behaves like the square of the difference close to 0 but behaves more like an absolute value for large differences.

Now that we've defined all of required parts the algorithm consists of the following steps:

Calculate an initial guess

Downscale and calculate differences with original image.

Calculate forces, resulting from "regularity" and "faithfulness".

Apply forces.

Repeat steps 2-4 several times.

The mistake I discovered earlier was that I was trying to combine steps 2 and 3 together, but you can't calculate the forces if you haven't calculated the differences yet. In the end I had to split step 2 in (yet another) shader. This made the algorithm somewhat slower but I think this could be avoided by first doing step 2 and 3 for 1/4 of the pixels and then do step 3 for the other 3/4 of the pixels, this should at least prevent any unnecessary texture calls. Splitting step 2 and 3 also meant that you now need even more shaders to get it all working, which doesn't seem to improve the stability of MadVR (it crashes sometimes when it's using a lot of shaders, usually during start up).

Now there are still some small parts of the algorithm left that I haven't mentioned:

The first is the way I normalise the forces acting on a pixel. I did this by reinterpreting part of the forces as the "weight" and reinterpreting "2x" as the actual force and then use the weighted average instead of just adding them. The idea behind this is that the weight is going to be small when it is close to an edge in which case you want the values to pull together faster to prevent ringing. To make this even effect even more pronounced I actually divide by the square of the total weight. I'm still not 100% sure that this part of the algorithm is actually that beneficial, but changing it would meant that I have to recalibrate it again so I'll just leave it in for now.

The second is the way I store the original values and the differences of the downscaled result with the original. I store these in the alpha channel of the pixels, where for every 2x2 block, the difference is stored in the top-left and the original value is stored in the bottom-right. This results in the lowest possible number of texture calls.

That concludes the description of the algorithm. Which leaves us with the way to use the shaders. Firstly here is a list of parameters and what they do:

strength: total strength of the force. If it's larger the algorithm will converge faster but it might go too far, resulting in artefacts.

radius: effective radius of regularizing force (it uses a Gaussian with that radius as weights). A larger radius should make the image smoother, but only slightly.

acuity: controls the threshold for what is considered an edge. If the difference between two pixel values is larger than 1/acuity then it will assume that there's an edge between them.

baseline: the baseline of the regularizing force, ensures that even pixels across edges don't get too far from each other. Makes edges softer (putting it to 0 makes edges amazingly crisp but also very rough).

If you want to you can also control the downscaling weights (called "weights"), just make sure that you pick the same weights for each of the 3 shaders (SuperRes, SuperRes-pre, SuperRes-inf).

Secondly, you need to put the shaders in the right order. The chain of shaders needed is getting a bit complicated, but in general it looks like this:

The part between brackets can be repeated as many times as you want. I'd recommend using 3 iterations of the SupeRes shader, but perhaps you can use less if you raise the "strength" parameter. You can replace "Upscale" by "fNEDI", "NEDI" or "Lanczos" (I'd recommend using fNEDI) or just skip those shaders and use NNEDI3 for image upscaling. It you're not using NNEDI3 then you should set image upscaling to nearest. You should also make sure that you're resizing 2x (in both directions).

Finally you can download the needed shaders here. If you want to use them for MPC-BE then just put them in the shader folder, for MPC-HC you need to change the extension to ".hlsl" or add them manually using the shader editor, depending on which version of MPC-HC you have (they recently removed the shader editor, no idea why). For PotPlayer you should change the extension to ".txt" and put them in the shader folder. I'll also change the NEDI shaders in the first post, but I think it no longer makes sense to add all of the code to the first post.