I have some very exciting news! Well, it's exciting me to at least. I have successfully ported the HQ2X pixel scaling algorithm to Verilog HDL. Here are some of my test image results directly out of my emulator (captured from my FPGA emulator using my VGA capture card). I also provide links to zoomed-in/magnified versions of the test images because the actual images are so small.

But I have a 6KB C++ implementation of HQ2x that relies on a 256-byte condensed pattern table and rotation of the pattern tree for the four corners. blargg helped with much of this, too. Compiles down to maybe 1KB of binary code. Might be more suitable for saving transistors.

But I have a 6KB C++ implementation of HQ2x that relies on a 256-byte condensed pattern table and rotation of the pattern tree for the four corners. blargg helped with much of this, too. Compiles down to maybe 1KB of binary code. Might be more suitable for saving transistors.

It absolutely is more resource efficient for both software and hardware implementations. To be totally honest....I'm slightly bummed by your post because that was going to be the "upgrade" that I implemented today and was going to post about it! (Less the rotation of the pattern tree for the corners - I don't know what that means yet but I'll definitely check it out.) But you and blargg beat me to it!

I will post here again once I complete the upgrade. Thanks byuu!

infiniteneslives wrote:

So do you have an application of this planned for the NES or your NOAC? Or just playing around with the stuff?

Yes, I will be integrating it into my VeriNES and will post some videos on Youtube. But I want to make some optimizations to the code first (e.g. what was stated above).

Then there is a 256-entry switch on the 8-diff pattern for each pixel of input that produces four unique outputs:

Code:

case 177: { PIXEL00_20 PIXEL01_22 PIXEL10_20 PIXEL11_12 break; }

Case 177 = rule 4: blend2(E, D, B);

In HQ2x official, the pixel grid around the current pixel is used:
123
456
789

So for rule 1, PIXEL00_20 = 542. Blend what is above and to the left only.

(side-note: blend rules only ever work with five adjacent pixels, for the top-left, that's 12468, IIRC.)

Yet PIXEL01_20 = 526. Blend with what is above and to the right only. This leads way to a symmetry. We don't even need to look at PIXEL10_20 to know that it will be 584.

So rather than encode four rule sets per output pixel, we only really need to store the first rule set, that is, PIXEL00.

So we ignore 5, since we don't "diff" (compare to see if two pixels are similar or not) 5 to 5. So our pattern is actually "12346789" in one byte.
123
4X6
789

So after rendering the top-left, we can rotate the pattern matches by two steps: each step moves everything counter-clockwise two times. So let's do this:

369
2X8
147

Notice how 01_20 = 526 in the same spot that 00_20 had 542?

The 256-byte rotate table stores this calculation for us, since there's no easy bit-twiddling way to do it. So our new pattern byte after rotation is 74182963. We can do this again for the bottom right, and one last time for the bottom left pixel.

The PIXEL01/10/11 rules are only different because they're all considering 1-9 patterns to be the same for each corner.

I built my symmetry table off of PIXEL00 only. And as it turns out, there are actually six? cases where the symmetry does not hold up in MaxSt's original tables.

I believe them to be unintentional bugs as a result of MaxSt presumably building this table by hand. I did a blind test on my forum of my rotation HQ2x versus MaxSt's HQ2x table, and everyone favored mine, as it eliminated a couple extra jaggy points. But I admit that is subjective.

The difference was something like 6 pixels per video frame, because the differences were in very uncommon pattern matches.

I do wish I knew how MaxSt generated his initial switch table, or if he really just did it painstakingly by hand. I'd like to think there is some sort of computer algorithm that can build that comparison table, and that we can improve upon that algorithm to consider more than five surrounding pixels, to make a super-HQNx algorithm. But ... I haven't been able to figure it out.

> But you and blargg beat me to it!

Let's see if I remember ...

I came up with the pattern table + rotation, and used a script to build the pattern[] array by parsing the original hq2x.c file.

The 16->32->16 bit-packing for simple blending rules without having to split things out per channel came from Allegro's alpha-blending code, that I remembered from the '90s.

blargg took the original pixel differencing code, and implemented a much faster version using a simplified YUV table and some magic constants. He also came up with a distinction between same/diff that is used in there at some point. This differencing is also not a 100% match to the original algorithm, but is 99.9% the same, and way faster.

I came up with a tiny optimization where the pattern match for 6 can be moved to 4 for the next pixel, but it didn't seem to help performance and required extra code.

blargg had some amazing non-grow/pack variants for the most common blend cases using his clamped-RGB add/sub code, but it turned out to be slower on more modern x86 processors for some crazy reason (I guess multiplies have gotten a lot faster lately.)

blargg also has a lot of optimizations that help on the PowerPC more than on the x86. I kept them out to keep the code simpler.

Hopefully that's everything. I'd be really appreciative if anyone can find a way to reduce my code even more. The algorithm becomes somewhat beautiful with the symmetry, and the smaller it is, the more impressive it is.

byuu, that - is - freakin - awesome ! I'm still deciphering your notes and matching them up to your C code but just from skimming it it's really cool! It's amazing that you guys figured all this out. I will post a more in-depth reply once I implement these upgrades myself but I can't thank you enough for sharing this.

If I figure out any more optimizations I will def be happy to tell you what they were (if they relate to something that could be done in software and not just hardware).

Okay, so I've finally figured out how (or moreso why) this works. And just for everyone's benefit I'm going to go ahead and re-post what byuu posted with some corrections because the errors threw me off for a while (no offense _whatsoever_ intended toward byuu - I am very thankful for his help. The errors, like he said, are just because he was doing everything from memory.).

Anyway, the errors result from the incorrect assumed pattern of "12346789". There were actually just a couple of errors as a result of that first error, but I figured it would make more sense to re-post without snipping just so people don't have to look back and forth between the two posts. Here goes...

-------------------------------------------------------
So we ignore 5, since we don't "diff" (compare to see if two pixels are similar or not) 5 to 5. So our pattern is actually "98764321" in one byte.
123
4X6
789

So after rendering the top-left, we can rotate the pattern matches by two steps: each step moves everything counter-clockwise two times. So let's do this:

369
2X8
147

Notice how 01_20 = 526 in the same spot that 00_20 had 542?

The 256-byte rotate table stores this calculation for us, since there's no easy bit-twiddling way to do it. So our new pattern byte after rotation is 74182963. We can do this again for the bottom right, and one last time for the bottom left pixel.

The PIXEL01/10/11 rules are only different because they're all considering 1-9 patterns to be the same for each corner.

The significance of this (i.e. the "surprise") is that PIXEL01_12 blends the (now rotated) pixels at locations 5 & 6. Which are the exact same pixels (i.e. pixels 5 & 8) that PIXEL11_12 would have blended had we stuck with rule 177 using the non-rotated pattern.
-------------------------------------------------------

The optimization that byuu created is extremely elegant which consequently makes it difficult to explain really well with just words. I plan on adding some hand-drawn graphs to this post to visually describe this process. Fixing the errors, and then drawing all this out is the only thing that allowed me to finally understand it. :-P I just need to clean up what I've already created and scan it into my machine. I'll put the files on dropbox and post a link.

> The significance of this (i.e. the "surprise") is that PIXEL01_12 blends the (now rotated) pixels at locations 5 & 6. Which are the exact same pixels (i.e. pixels 5 & 8) that PIXEL11_12 would have blended had we stuck with rule 177 using the non-rotated pattern.

Typo in bold should be PIXEL11_12.

Yeah, sounds like you have it now, you're reminding me how it works again :D
I unfortunately had my forum in purge mode to conserve space, and lost the old post where I explained this when I first figured it out.

Thanks for the corrections.

Always felt some sort of animation showing the rotation in real-time would be kinda neat to explain this.

You can technically forgo the need to rotate if you store a 1024-byte table, but I like the smaller version. Would be really cool if someone could come up with a better 'rotate' bit-twiddling algorithm so that we don't need the second 256-byte table.

Oh, I also don't think I mentioned it, but I load in all eight surrounding pixels, but since there are only five "points" that HQ2x ever blends against, I rearrange their order when calling the blend functions, which is based around the same rotation, but I saw no reason to shuffle eight 16-bit values.

Not as I understand the algorithm. 01_12 is correct as I have it. Maybe it's just that I worded it funny or something.

01_12 is blending 5 & 6. Pixel 6 _was_ the pixel that was in location 8 prior to the matrix rotation. And prior to the rotation you would have had to use 11_12 to blend 5 & 8. So, 01_12 (5,6) in rule 166 has taken the place of 11_12 (5,8) in rule 177.

Maybe the new wording will help. I drew this all out on paper and it does work. Maybe I should scan that one too. We may just be thinking about it differently. I'm guessing there is more than one way to look at this optimization - maybe that's what happening.

Who is online

Users browsing this forum: No registered users and 3 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum