Note: Due to the nature of censored content, I’m showcasing synthetic data in this post.

Japan has its crazy censorship laws, so most hentai anime uses mosaic censoring to hide the important bits. Of course with my work on Overmix I’m interested in camera pans that can be stitched, and those could look like the following:
What is interesting however is that, while it might not be too apparent at first, the mosaic censor changes from frame to frame. Instead of censoring the image which is panned, a specific area is marked as “dirty” and censoring is applied on each frame separately. So looking even closer, we can see that the mosaics are arranged in a grid which is relative to the viewpoint (i.e. fixed) and not the panned image (which is moving).
So depending on how the panned image is offset from the viewpoint, each tile in the mosaic will cover a different area in different frames. If we take all our frames and find the average for each pixels, we can see a vast improvement already:
It is however quite blurry… However super-resolution have thought us that if we have several images of the same motif but where each image caries slightly different information, we can exploit it to increase the resolution. So let’s try to do better.

Before we can start, we need to know how the mosaic censoring works to finish our degradation model. Contrary to what I thought, it is not done by averaging all pixels inside each tile. Instead it is done by picking a single pixel (which I assume is the center one) and use that for the color of the entire tile.
The approach is therefore really simple. We create a high-resolution (HR) grid, and for each frame we plot the tile color into the HR grid into the pixel located at the center of the tile. When we have done that for all the frames, we end up with the following:
There are a lot of black pixels, these are pixels which were not covered by any of our frames, pixels we know nothing about. There is no way of extracting these pixels from our source images, so we need to fake them by using interpolation. I used the “Inpaint – diffusion” filter in G’MIC and the result isn’t too bad:
Those “missing pixels” fully controls how well the decensoring will work. If the viewpoint only pans in a vertical direction we will only have vertical strips of known pixels, thus we cannot increase the resolution in the horizontal dimension. If we have movement in both dimensions it tends to work pretty well as you can see in the overview image below:

I’m getting close to having a Super-Resolution implementation, but I took a small detour using what I have learned to a slightly easier problem, extracting overlayed images. Here is an example:
The character is looking into a glass cage and the reflection is added by adding the face as a semi-transparent layer. We also have the background image, i.e. the image without the overlay:
A simple way of trying to extract the overlayed image is to subtract the background image, and indeed we can somewhat improve the situation:
Another approach is to try estimate the overlayed image, which when overlayed on top on the background image will produce the merged image.
This is done starting with a semi-random estimate and then iteratively improve your estimation. (The initial estimate is just the merged image in this case.)
For each iteration we take our estimation and overlays it on the background. If our estimation is off it will obviously differ from our merged image, so we find the difference between the two and use that difference to improve our estimate. After doing enough iterations, we end up with the following:
While not perfect, this is quite a bit better. I still haven’t added regularization which stabilizes the image and thus could improve it, but I’m not quite sure how it affects the image in this context.

Super-Resolution works in a similar fashion, just instead of overlaying an image, it downscales a high-resolution estimate and compares it against the low-resolution images. It just so happens that I never implemented downscaling…

I have a lot of images on my computer, random fanart from the web, screenshots of movies I have seen, etc. I recently saw that one of my image folders was 80 GB big, so it is no wonder I care much about image compression.

I was looking through a visual novel CG collection when I thought: shouldn’t this be able to compress well? After all, VN CGs tend to have a lot of similar images with minor modifications like different facial expressions. So I did a quick test, how well does it compress using different lossless compression algorithms:

As expected, PNG is quite a bit better than simply zipping BMP images, and WebP fares even better. However what is this, compressing BMP images with 7z literally kills the competition!

The giant gap from ZIP to 7z does not come from the fact that LZMA is superior to Deflate, but because ZIP only allows files to be compressed individually while 7z can treat all files as one big blob of data. This is also why a general purpose compression algorithm can beat the ones optimized for images, as PNG and WebP also compresses images individually.

Note to the comparison: Usually CG collections have an average of 2-3 versions of each image, here we checked on an extreme case with 13 versions. This obviously exaggerates the results, but the trend still stands.

Doing it better

BMP is the superior solution? There is no way I can accept that, we need to do something about that!

If you have ever worked with GIF animations you properly know that you can reduce the size if you only change the differences between each frame. That is exactly what we want to do, but to use PNG and WebP to compress that difference. The problem is that we need to store the differences and information on how those should interact to recreate all the images, and there isn’t a good file format to do that.

So I have created a format based on OpenRaster, which is a layered image format to compete with PSD (Photoshop) and XCF (GIMP). I wanted to use it without modifications, but having multiple images in one file, while planned, appears to be far into the future. (I want it now!) It is basically a ZIP file which contains ordinary image files and a XML document describing layers, blend modes, etc.

Next part is automatically creating such a file from a series of images. For this I have written cgCompress (Github page) and while there is still a lot of work to be done, it has proven that we can do it better. Fundamentally this is done by creating all the differences and then with an greedy algorithm, select the ones which will add the least to the total file size. This continues frame by frame until we have recreated all the original images. I have also worked with a optimal solver, but I have not been able to get it to work with more that 3-5 images (because of time complexity).

This is a compression rate of a whooping 88.7%! Of course, this is only because we are dealing with 13 very similar images. 67.2% of the file size is the start image and without a better image compression algorithm, we can do very little to improve that. That means the 12 remaining images use each 2.7% each (1/13 is 7.7%), not much to work with but I believe I can still make improvements.

This is just one case though, while uncommon, some images still need further optimization to get near-perfect results. I have tried compressing an entire CG collection of 154 images and my results where as following:

Compared to 7z compressed BMP, there was an improvement of 24.0% and compared to WebP it is 61.1%. On average, the set contained 3.92 variations per image; cgCompress manages to do 2.57 as many images compared to ordinary WebP. The difference between those two numbers is the overhead cgCompress requires to recreate all 3.92 variations per image and it depends on how different the variations are. While I don’t know how low it can get, I do believe there is room for improvement here.

I included lossy WebP here as well (done at quality 95) to give a sense of difference between lossless and lossy compression. cgCompress definitively closes the gap, but if you don’t care about your images lossy WebP is still the way to go. (It should be possible to use lossy compression together with the ideas used in cgCompress though.)

Conclusion

cgCompress can significantly reduce the space needed to store visual novel CG collections. While only a moderate improvement of ~25% over 7z compressed BMP, compressed archives only works well for archiving or transferring over networks. cgCompress, as based on OpenRaster, has proper thumbnailing, viewer support and potentially meta-data. With PNG and WebP being the direct contenders, cgCompress provides a big leap in compression ratio.

On a personal note, going from concept to something I can use in 8 days is quite an achievement for me. While the cgCompress code isn’t too great, I’m still quite happy on how this turned out.

Well, Overmix is here with a dehumidifier to solve your problem. Too damp? Run it once and watch as your surroundings become clearer.

Your local hot spring before:

and after:

Can’t get enough of Singing in the rain? Don’t worry, just put it in the reverse and experience the downpour.

Normal rainy day:

The real deal:

This is another multi-frame approach, and really just as simple as using the average. Since the steam lightens the image, all you have to do is to take the darkest pixel at that position. (In other words, the lighter the pixel is, the more likely it is to be steam.) Since the steam is moving, this way you use the least steamy parts of each frame to gain a stitched image with the smallest amount of steam.

If we do the opposite, take the brightest pixel, we can increase the amount of steam. That is not really that interesting, but the second example shows how we can uses this to bring out features that would otherwise be treated as noise. We could also combine it with the average approach using a range, to deal with the real noise, but I did this for fun so I didn’t go that far.

While this is a fairly simple method, it highlights that we can use multiple frames not just to improve quality, but also to analyze and manipulate the image. I have several neat ideas I want to try out, but more about those when I have something working.

As I was researching on digital signal processing I found an interesting term: Super Resolution. Super Resolution is a field which attempts to improve the resolution of an image, by using the information in one or more images. This is exactly what I was doing with Overmix, using multiple images to reduce noise.

However another aspect of Super Resolution use sub-pixel shifts in the images to improve the sharpness of the image. This could not only solve the issue with the imperfect alignment I was having, it could straight out improve the quality further than I had thought possible.

(I had actually tried to use sub-pixel alignment when I ran into the issue and I speculated it might could increase sharpness. But after much work I only managed to make it align properly without reducing the blur I was having even without it, so I didn’t press it further.)

Limits

Super Resolution has it limits however. First of all, as it tries to estimate the original image, it cannot magically surpass it and give unlimited precision. If the image was created in “480p”, even a 1080p BD upscale will still only give the “480p” image. If the original was blurry by nature, Super Resolution will result in a blurry image as well, unlike a sharpness filter.

And that raises the question, why is anime blurry and why does it not align on the pixel grid? With one sample, I got the same misalignment with both the 720p TV version and the 1080p BD version. If this was caused by downscaling the issue would be smaller at 1080p, however it isn’t. Most anime does not appear to push the boundaries of 1080p, but since there are misalignment issues I suspect their rendering pipeline isn’t optimal.

The other limit is the available images used for the estimation. If the images we have does not contain any hints on what the original image looks, we can’t guess it. Thus if there are no sub-pixel shifts in an image, Super Resolution can’t do much. And that is actually an issue because most slides only moves vertically which means we only have vertical sub-pixel shifts. In those cases we can only hope to improve detail in the vertical direction.

Using all available information

Since Super resolution uses the information in the images, the more we can get the better.

First of all, the closer we can get to the source the better, as we don’t have to estimate the defects that happens on each conversion. A PNG screenshot is better than a JPEG, and the TV MPEG2 transport stream is better than a 10-bit re-encode.

One thing to notice here is that the PNG screenshot is (with all players I have tried) a 8-bit image, not 10-bit (16-bit*) for Hi10p h264. So using PNG screenshots would loose us 2 bits.

However more importantly, PNG cannot represent an image from a MPEG stream directly. The issue is that PNG only supports RGB and MPEG uses Y’CbCr. Y’CbCr is a different color space invented to reduce the required bandwidth of image/video. The human eye is most sensitive to luminance and not so much to color, which Y’CbCr takes advantage of. MPEG then (normally) uses Chroma subsampling which is the practice of reducing the resolution of the planes containing color information. A 1280×720 encode will normally have one plane at 1280×720 and two at 640×360.

So to save as a PNG, the video player upscales the chroma planes and converts to RGB, losing valuable information.

Going even further, video is compressed using a combination of key- and delta-frames. Key-frames stores a whole image while delta-frames only stores how to get from one frame to another. The specifics about how those frames were compressed is again valuable information. (But I don’t know much about how this is done.)

Status of Overmix

Overmix now accepts a custom file format which can store 8- and 10-bit chroma subsampled Y’CbCr images. I created an application using libVLC that takes the output with minimal preprocessing and stores it in this format. (It also makes it easier to save every frame in the slide.)

Overmix now only uses the Y’ plane to align on, instead of all 3 in RGB. My next goal is to redo the alignment algorithm. Currently it renders an average of all previous added images to align on, as otherwise the slight misalignment would propagate with each added frame. However I will try to use a multi-pass method now, where it will roughly align all images and then do a sub-pixel alignment on the images afterwards. Sub-pixel alignment will, at least in the start, be done by upscaling as optical flow makes no sense to me yet.

Then I need to redo the render system, as it is currently optimized for aligned images, and this will clearly not be the case anymore.

I haven’t worked on Overmix for quite some time due to University stuff, but the next three months I should have plenty of time, so hopefully I will get it done before that is over.

I have been developing a new application named Overmix, which attempts to improve the quality of anime screenshot stitching. This article will shortly explain what stitching is, what issues affect the quality and how Overmix tries to fix those. At the end a short summery of the results for the current progress is given.

Background

One common animation technique is panning where the camera moves/pans over the image, showing only a part of it at a given time:

Very little movement actually happens during the shot, in fact only the mouth is moving (presumably to reduce animation costs). This makes it possible to combine the frames together to one large image, which is known as “stitching”.

Source quality

The issue is however that more often than not, the video quality isn’t that great. The video has been compressed and especially if the source is a TV-transmission or webcast, visual artifacts can be quite noticeable:

Reducing artifacts

A stitch is normally done by taking two frames, finding the offset between the two images and then soften the edges between the images to make the transition less apparent (which is usually done by applying a gradient on the alpha channel).

Since this is a time consuming process, as few frames as possible is used. The idea is to do the opposite, use as many frames as possible. The reason is that the artifacts are not static, for every frame they differ slightly. In result, every frame carries a slightly different set of information. The goal is then to derive the original information, based on this set of inconsistent information.

Just by using the average, we can get quite decent results:

(Right is a single frame, left is the average of all unique frames.)

Results

Noise artifacts has shown to nearly disappear completely when simply averaging every frame with each other, even when the source has a significant amount of noise artifacts. Color banding is also reduced but with much more varying amounts.

Even with modern TV-encodes, stitches sees a significant improvement from using this technique and can visually be tell apart at normal magnification. Surprisingly, even when using good BD-encodes there is usually a slight improvement, but normally requires 2-4 times magnification to be noticeable.

It has shown that it often is not possible to make a perfect alignment when sticking to the pixel grid. This causes the images to be slightly more blurry than originally. It is an area which still requires work.

Using the average to derive the result is not always desirable, as the encode might contain information not related to the image. Such information could be subtitles, TV logos or simply errors in the source. See the following image as example, the most-right column of pixels was completely black and shows up as lines in the averaged image.

However the currently devised algorithms has a tendency to choke on the slight misalignment mentioned previously and cause unwanted artifacts. If this is solved best by fixing the misalignment or by improving the algorithm is up to discussion.

I try to keep anime off this blog, but I still wanted to bring this series up.

There isn’t really that much interesting anime this season, actually there is only one series that I’m looking forward to and that is “Another”. It is a horror series which started airing this month and the fourth episode is to air the 31st.

I’m not really a horror fan, mainly because they usually are too predictable and therefore not really that scary, and I do not expect this series to be much different in that regard.

Plot

The plot isn’t really that unique at this point. A young boy moves to his grandparents because of family reasons and therefore transfers into class 9-3 at the local school. However that class happens to have a dark history… 26 years ago one of the students died in an accident but as she was popular and loved by the class they started denying her death and pretended that she was still alive. However their actions prevented her to rest in peace and at the time of graduation she shows up in the group photo.

Returning to the present this particular boy tries to fit into his new class only to learn that they are holding something hidden from him which appears to be the cause of the uneasiness that spreads throughout the class. In addition there is this girl in the class that just feels out of place which catches his interests.

That is pretty much the main point of the plot without spoiling too much. The outline is set by the end of episode 3 and I suspect that “accidents” will play a major role in the story.

Animation

The main reason for my interest in this because how well it is executed. The visuals are in my opinion on a level rarely seen in weekly series. Not only is it well drawn, it manages to create a mood which matches the series well. This is where many other animations fells short, however Another still manages to excel in this area.

The series director have also made a decent job on this which combined together with the level of the visuals really creates something special.

If the animation wasn’t good enough, the soundtrack is well done too. The soundtrack have some of the qualities as Ga-Rei Zero, the actual music isn’t great by itself but when used together with the video and sound effects manages to bring out emotions that it can’t do by itself. While it is not on the same level as in Ga-Rei Zero, it is still fairly good.

Conclusion

The plot doesn’t appear to be anything special but it has potential to be interesting if it does not spoil too much too soon. The main strength of this series so far is the execution of the plot. The direction is decent, the visuals are great and the sound is decent too which when combined really makes this an experience to watch.

I have and it have started to become annoying. ‘[‘ getting replaced by ‘ %5B’ happens rather rarely, however I hate how spaces are replaced by underscores all the time. Renaming the files manually is even more annoying so I made a program to do this.

Drag and drop files into the program and click “Rename!”. (If a file is open or otherwise locked, it will currently just silently fail.) If you drop a folder, it will rename the folder, not its contents.

Use the check-boxes to turn conversions on and off. Conversions are done in the same order as the check-boxes, so if you turn “Convert spaces to underscores” remember to turn “Convert underscores to spaces” off. ; )

When I read/played Fate/Stay Night my favorite chapter was undoubtable Unlimited Blade Works so I was excited to hear that it was going to be animated into a movie. However when finally watching it quickly showed that there was something very wrong. It tried to follow the story down to the smallest detail, it even kept consistent with the event CG graphics.

This is not necessarily a bad thing however Fate/Stay Night is insanely long (and boring at times) and the movie was less than two hours; it is simply not realistic to include everything in the movie. But they tried anyway and failed horribly…

The whole plot was so hurried that there rarely was time to actually explain what was going on. There was only room for the action scenes and almost nothing of the plot (and absolutely nothing of the romance aspects) were explained. Well, if you like mindless action then it might not be that bad…

There was one scene where they deviated from the original storyline, the climax of the romance aspect in the plot, and well, for good reason. However they decided to dub it over with some lame story about Shirou seeing Rin’s memories just to show the event CG graphics from the game? For God’s sake…

The visual aspect of the movie is pretty good, I always hated the graphics from the game but they did it well while keeping the original drawing style pretty well. The sound is also okay, but in the end what does it mean if the storyline is terrible?