Saturday, July 30, 2016

I'm still learning about audio in PlayStation games (Nocash Playstation Specifications is amazing!). The XA ADPCM audio format, frequently seen with STR videos, is what I'm most familiar with. But the PlayStation also has a Sound Processing Unit (SPU) that is used to play all other audio you hear in the game.

XA audio is easy to identify and convert. SPU audio isn't so easy. The easiest clips to identify are simple sound effects that are played and then end. It gets a little more difficult with audio clips that need to loop. Where it gets impossible to identify is when multiple audio clips are combined to form unique sounds in real-time. I believe this is how a lot of background music is done in games. Each instrument is actually little sound clips being played at different frequencies. This brings up another challenge with all SPU audio: clips can be played at any frequency, and there isn't any way to know what it is.

One case where a game used instrumental audio along with STR video is the Valkyrie Profile opening FMV. jPSXdec can only identify the video clip, but has no way to recreate the music. Thankfully, diligent people have put a lot of effort into extracting these instrumental kinds of audio. These are stored in what's known as PSF files. Lo and behold, someone has taken the time extract the Valkyrie Profile instrumental music into PSF files.

With my growing knowledge around PlayStation audio, I thought it would be fun to create a very high quality conversion of the Valkyrie Profile opening. I assume using a PSF converter could produce better quality audio than what you can get on hardware or emulators. jPSXdec can extract the video with the best possible quality which can be made even better with other tools.

Extracted Valkyrie Profile opening video with jPSXdec in avi:jyuv format. This is YUV using the [0-255] component range.

Downloaded Valkyrie Profile psf audio clips.

Used Audio Overload to convert opening video PSF to wav.

Used VirtualDub to mux the video and audio into a single avi, with a 1 second audio delay to sync them up correctly.

Created an Avisynth script to upscale the video to HD quality and convert to RGB. DGMPGDec plugin was used for deblocking and nnedi3 plugin for scaling.

# VALKYRIE.BIN[0]HD.avs# For deblockingLoadPlugin("DGDecode.dll")# For scalingLoadPlugin("nnedi3.dll")AviSource("VALKYRIE.BIN[0]jyuv+audio.avi",pixel_type="YV12")# Deblocking# quant is the strength between 1 and 31# The quant=31 removed the maximum blocking issues# Didn't seem to blur anything elseBlindPP(quant=31)# Scale up by 4x for a final resolution of 1280x900# The results of nnedi3 appeared slightly better than Spline64Resizennedi3_rpow2(rfactor=4)# Avisynth has the unique ability to choose the matrix# and ChromaInPlacement when converting to RGB (available since Avisynth 2.6)# matrix="pc.601" indicates input is in [0-255] component range# ChromaInPlacement="MPEG1" is the chroma placement used by PSXConvertToRGB32(matrix="pc.601",ChromaInPlacement="MPEG1")

Used ffmepg and this script to convert to an uncompressed/RGB/DIB AVI (5 GB file!):

Tuesday, March 13, 2012

It might be a time for celebration because I've finally bumped jPSXdec into 'beta' status. It's pretty much feature complete now. At worst there may be a redesign of a couple modules as I've recently been hit with the infamous Judge Dredd which breaks some assumptions. I also think the GUI leaves much to be desired (thank you user-testing!).

During the life of jPSXdec I've always been really interested in what people were using jPSXdec for.

I was going to link to a recently posted, and possibly related HD version of the Blood Omen videos, but to my surprise, Boulotaur2024's YouTube account has been terminated. He's been posting HD versions of several game videos, utilizing jPSXdec for most of the PlayStation ones. It was great because he found a few games that jPSXdec had problems with. Guess Media Interactive Inc. and the Record Industry Association of Japan didn't appreciate his work.

The great ScummVM project has made use of my awesome documentation to add a PlayStation video player so it could utilize the PSX videos from Broken Sword 1 and 2. They've written up a few instructions on how to get your videos ready to play in the emulator.

Saturday, August 13, 2011

On a whim, I ran one of the unique identifiers in the Lain game through Google which led me to a couple interesting sites.

A very impressive Russian site is trying to recreate most of the game's content for browsing on the web. What impressed me even more is the creator managed to reverse-engineer some of the game's data types before I did. He kindly gave jPSXdec a shout out since it was used heavily to extract nearly everything on the site.

Wednesday, June 1, 2011

While developing jPSXdec for the last 4 years, I've run across three different methods of decoding bitstreams. Of course, in the honored tradition of multimedia hacking, none of these approaches are documented anywhere. So I thought I'd break that unwritten rule and actually write about them.

If you'd like to learn more about what part this plays in MPEG and PlayStation .STR decoding, check out my thorough document on the subject: PlayStation_STR_format.txt

Approach 1: Brute force

This is the most obvious approach. For each code, peek the next n-bits until the bits match something.

In the worst case, this approach requires 111 conditional checks to identify a bit code. To be honest, I've never actually seen this implemented anywhere besides by me years ago when first learning about bitstream parsing.

Approach 2: Binary tree

I actually ran across this approach implemented in the Serial Experiments Lain PlayStation game. You have a tree of conditionals testing the value of each bit until a match is found.

If ReadNextBit() == '1' If ReadNextBit() == '0' return END_OF_BLOCK Else If ReadNextBit() == '0' return ('11+0'.ZeroRun, '11+0'.AC) Else return ('11+1'.ZeroRun, '11+1'.AC) End If End IfElse If ReadNextBit() == '1' If ReadNextBit() == '1' // '011s' ... Else // '010... and so on End If Else // '00... and so on End IfEnd If

The branching can be optimized a bit for most leaves: once the length of the bit code is clear, the remaining bits can be used as the index in several small lookup tables. The jPSXdec implementation only requires (in the worst case) 12 branches to determine the longest bit codes.

Approach 3: Array lookup

I believe this type of approach is used in ffmpeg and the Q-gears decoder. Thanks to the unspoken tradition of never documenting anything, I was unable to understand what it was doing. It wasn't until I reverse-engineered the .iki bitstream parsing that I finally saw how this approach works.

At least for MPEG-1 (and PSX STR), you can take advantage of its particular set of variable length bit codes. Only the first code ('11s') and the end-of-block code ('10') need special parsing. The rest of the codes fall under one of three groups. The group a code belongs to can be determined by looking at how many initial zeros it has.

Group one starts with between 1 and 4 zeros (this also includes the escape code 000001).

Each group has its own lookup table of 256 entries, and each code will be associated with one or more entries in the lookup table. After stripping off the minimum number of zeros in the group, no entry in the group will have more than 8 bits remaining in the bit code. For codes that have 8 bits remaining, its value identifies the associated table index. For the bit codes that have fewer than 8 bits remaining, you have to walk through every combination of the remaining bits to find all associated indexes.

Example:

Group 1 code: 00110sUse 0 for sign bit for now: 001100Strip off first leading 0: 01100Find all combinations of remaining bits:

Now each table entry needs three values: the inverse discreet cosine transform (IDCT) run of zero-value alternating current (AC) coefficients, the non-zero AC coefficient value, and the length of the bitstream bits that should be skipped.

Once all three tables are constructed, the following pseudo code will parse your bitstream.

Of course the implementation details can vary, but this gives the idea. The Approach 3 I implemented for jPSXdec requires about 8 conditionals to identify a bit code in the worst case. I've found it to be about 10%-15% faster than the Approach 2 I've been using.

Thursday, August 19, 2010

Just writing a straight-forward PlayStation 1 video decoder has been a lot of work. However, for the absolute most impeccable quality, there is so much more that can be considered in the process.

Upsampling

When PlayStation videos are created, the pixels are broken up into luma (brightness) components and chroma (color) components. Like with JPEG and MPEG formats, 3/4 of the chroma information is thrown away because the human eye can't really tell (this is an example of lossy compression).

When decoding, that lost chroma information needs to be recreated somehow to convert the pixels back into RGB. Unfortunately there is no one 'right' way to do it, because there's really no way to get that lost information back. All you can do is 'guess' by filling in the blanks based on the information around the pixels using some kind of interpolation. Some of the most well known kinds of interpolation are: nearest neighbor, bilinear, bicubic, and lanczos. I've read about more advanced chroma upsampling approaches that also take into account the luma component. This works because there is often a lot of correlation between the luma and chroma components--when the luma changes, the chroma probably will also. I'd like to try to find the best one, but I haven't had much luck on finding many good resources about them all.

Now, because this is essentially just scaling of a 2D image, I've been worried about this article that points out a nasty little gremlin called gamma correction. It seems nearly everyone has been doing image scaling wrong since the popularization of the sRGB gamma corrected color space. I'm assuming video isn't immune to the same problem, yet I've never seen anyone mention it.

Deblocking

Assuming we find the upsampling method of choice, there are still ways the image can be improved. Most video codecs break the frames down into 'blocks', then encode each block separately--again losing information along the way. When everything is reconstructed, that lost information can often be seen as visible distortions between blocks. This problem has been addressed in more recent video codecs such as h.264, but is still a problem with the older MPEG2. I believe nearly all DVD players do some deblocking before showing the final frame.

Even though MPEG2 has been around a long time and deblocking is so common, I've had the darndest time trying to find much mention of what deblocking algorithms are in use today. UnBlock, and this page on JPEG Post-Processing are the best I've come by. I think I've read somewhere that some advanced deblockers can even make use of the original MPEG2 data to improve the deblocking.

I still consider myself a multimedia novice, so there are probably more post-processing methods that would really make the output shine. A big bummer among all research in that area is that if you can think it, you can pretty much count on it been patented.

Given how difficult all this stuff is, I really really wish I could just pass that problem off to the big players in the field, such as ffmpeg (i.e. libavcodec). I've even considered writing a PSX video to MPEG2 video translator so the MPEG2 video can be fed into ffmpeg. Unfortunately there are some big reasons why doing this still makes me uneasy.

Differences in YCbCr

Another worry is that a real good quality MPEG2 decoder will spatially position the chroma components in the proper location (vertically aligned with every other luma component) as opposed to how I believe PSX does it (the MPEG1 way: in-between luma components).

To make things a bit more complicated, MPEG2 uses the proper Rec.601 Y'CbCr color space with [16-235] luma, and [16-240] chroma range. PSX on the other hand, uses the full [0-255] range for color information. Many video converters don't handle that discrepancy very well. Related to that, pretty much all converters store the data as integers, so any fractional information is lost after every conversion. In contrast, jPSXdec maintains all that fractional information until the very end.

One advantage that comes when incorporating all these enhancements in jPSXdec is it provides a much nicer user experience. No need to be hopping between multiple tools to get the best results.

So if I were to actually implement all these features, where would I get the information I lack? Perhaps the doom9.org forums could help. If any multimedia gurus happen to pass by this post, please, if you could, toss some wisdom my way.