So, the samples actually decode to 8-bit samples, which is a bit bizarre. They just get an LSR before writing $4011. The reason it's so strange is because the compressed sample format is a fixed 5-bits per sample stream which index a 32 byte lookup table for the output; at the beginning of the stream, and at every 20th sample, the lookup table is reset with a 4-bit selection code; there are 16 different lookup tables which map to different ranges of output samples. So... they're decompressing 5-bit samples into 8-bit samples then throwing away a bit, so really they're only getting a compression ratio a bit worse than 5/7. A 5-bit value of 0 halts the sample and returns from playback.

I figured out the locations of samples by hand; there's a bunch of contiguous blocks. (Not sure if I missed any.) There's a mechanism to load and play a sample by a 3-byte pointer (basically bank select + pointer + a few extra bits of data). All of these pointers are stored in bank 2 but some of them are contiguous, some are not; there's not convenient table here. Where there's contiguous ones, they tend to splice words/sentences together out of the consecutive sounds.

As Tepples mentioned, the consonant sounds are often separated (and I think are used as common to many words). Some samples are broken into parts, I think so playback can return for an instant to do animation or something else quickly. The alphabet gets really weird; they tend to be stored in strings of ~30 sample blocks, or other strange combinations.

The following code is the decoding loop. It reads and outputs 8 5-bit samples from 5 bytes of memory, selecting a new lookup table every 20 bytes. It also bankswitches if the end of a bank is reached. Note that the code is interspersed with time wasting NOPs and JSRs to keep the samplerate consistent.

Of course, we homebrewers don't need to implement this codec directly in our projects. We just need to use it as a sort of "pace car" to see if our own codecs are better or worse than the state of the art was during the NES's commercial era. (The M.C. Kids post-mortems are the same way.)

I still use Ubuntu 11.10, which has Python 2.7 by default, which has different semantics for str and bytes from the Python 3.x series. I had to make a small change to load_rom() to get the program to work:

Some samples are stored in CHR ROM. How does the playback work for those? I seem to remember that a lot of older emulators used to show garbage tiles for Big Bird's sprites when some samples were playing and/or freeze when Big Bird says "Go". Perhaps they were screwing up the MMC1 bankswitching.

I investigated the format of the sixteen tables. Apart from entry 0 which appears to duplicate entry 16 (both are always 128 in both tables), they appear to just be linear PCM at 30 different scale factors:[30, 34, 40, 46, 54, 62, 70, 82, 94, 108, 124, 144, 166, 192, 220, 254]

Mistake #1 was starting with 8-bit samples to begin with. I do suspect as you do that a modified 4-bit VOX decoder would be almost as good for this purpose, save more space, and would reduce the complexity of the code down to just two bit-block decoding stages rather than 10 (far fewer branches to try and time correctly too).

If you check the game's credit screen, the voice stuff was written by another company; perhaps it was a solution for a different 6502 platform (with 8-bit playback) that they purchased and adapted.

Though, despite this criticism, it works, and it saved I think 80kb or so vs 8-bit samples, so it did its job somewhat. The game shipped, and plays just fine for what it is. It's a very simple and robust game. I think it's a bit more playable and engaging than the other Sesame Street NES games I've seen (i.e. the same company's Sesame Street Countdown, and Rare's Sesame Street ABC).

I don't know enough about how the iNES format handles banks or how the MMC1 should be implemented to know how the hell they used samples in CHR ROM, but they seem to be there, and the relevant sounds (e.g. "Grover") are heard in game and don't seem to be duplicates. There could be a duplicate decoder hiding in the code somewhere, but I didn't find it if there is one. In FCEUX's PPU viewer I don't see any "noise" data or flickering in the CHR pages while the samples are playing, so I think it manages to bankswitch the data into $8000 somehow.

Edit: there is a duplicate decompressor, and it loads in blocks of data from CHR ROM just after NMI when playing these samples. See below for more specific information.

Last edited by rainwarrior on Sun Mar 04, 2012 2:14 pm, edited 2 times in total.

And all this sound data fit inside a NES ROM ? This is surprising to say the least.
Even using some kind of 4-bit-per-sample ADPCM-ish compression (teeple's algorithm, not mine) I could only get several seconds of sound before getting some ridiculously large data.

_________________Life is complex: it has both real and imaginary components.

Ah, found the code that does it. It reads a block from CHR ROM to RAM, and then runs a second implementation of the decompressor that works on that RAM address.

The CHR reading code starts at around $F592 in the code bank, and it gets executed just after an NMI (reads a block of data to $0280). I guess playback for these samples halts briefly after NMI to do this and then the code at $F8E1-$F9B5 looks really similar to the 8-sample decoder cycle that I annotated above from $F3B3-$F480.

It's "nothing special" compared to Forrest Mozer's codecs for ESS. Those are ridiculously efficient for the time, producing understandable speech around 4 kbps, comparable to an LPC vocoder but algorithmically much simpler to decode.

The fidelity loss from using the MX codec was unacceptable for an early childhood edutainment game in which clear diction and Spinney's iconic voice are of paramount importance. In the sample page, hear how warbly the "gbusters scream" and "imission scream" samples are.

ESS was willing to sell an LPCM solution for a much cheaper royalty than a solution based on MX, and the royalty difference outweighed the difference in replication cost between the smaller cart that used MX and the bigger cart that used LPCM.

In the sample page, hear how warbly the "gbusters scream" and "imission scream" samples are

Hmm, this makes me wonder whether the samples there are quantized to 4-bit (as they were in those C=64 games) or not. That could potentially make quite a difference. Maybe not in the "warbliness", but at least in general perceptive quality.

My first guess would be no such quantization is done by this player, and this is then how good a sample with this codex could possibly be rendered without those 4-bit quantization limitations. But I can't tell for sure from the playback.

The ghostbusters thing doesn't sound like fixed bit compression to me. If I had to guess what's going on there, I'd say it cuts the sound into short segments and replaces each segment with a single cycling waveform that is a close match.

Who is online

Users browsing this forum: No registered users and 11 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum