Zlib is the asm-optimized version included in cc65, the others were compiled C, and it shows in the speed. Speed is decompression in kb/s. The data used was the first 768 bytes of the GPL, ie English text, comparable to any level data you'd usually use in compressibility.

I started with the assumption LZO would be best of the no-RAM algos, and indeed it was. Its speed was a surprise however, on desktops it is extremely fast, here it's over twice as slow as zlib. Being GPL it is not usable for many closed projects, though commercial licenses are available.

Might try others if I find some, only LZO advertises being low RAM.

edit: Added lz5hc.

Last edited by calima on Wed Jun 07, 2017 10:09 am, edited 4 times in total.

Forgive my ignorance, but are these different methods of compressing in-game graphics?

Due to the small amount of RAM on the NES, graphics are usually the only thing compressed with traditional techniques. Some do use an extra 8KB of WRAM to store level maps, and these could also benefit from these traditional compression algorithms.

Quote:

I'm only familiar with RLE, not included here.

RLE is, like, the most basic compression scheme ever, but even it can be implemented in a lot of different ways that'll slightly affect the final compression ratios. Still, all of the other compression schemes mentioned here should outperform RLE by a reasonable margin, specially when compressing text.

Quote:

Do you have any links to NES code which actually use the compression methods discussed here?

I believe there are 6502 implementations of decompression routines for some of these schemes, but if you want to use them for graphics you must modify all the code that deals with the output stream to use $2006/$2007 for reads and writes.

I'm not sure what is the data that you are showing, but it almost certainly does not make any sense in a "general purpose" way. A good compression algorithm exploit property of the data it compresses, so it really depends on the data. This is why I wrote my own tool that does statistical analysis using many different algorithm that uses low ressources.

The NES has 2kb RAM so an algorithm that requires 0.7kb of RAM is certainly usable without extra RAM.

The ROM usage is certainly more important than RAM usage. If the decompression algorithm is large, it kills of the savings you have by compressing data.

Forgive my ignorance, but are these different methods of compressing in-game graphics?

Do you have any links to NES code which actually use the compression methods discussed here?

They can be used for graphics with CHR RAM, but I intend to use them on level data and such. No public examples that I know of, and for the decompression I used the latest release of each.

43110 wrote:

lz4hc seems to be the best of the four tested (considering ram first then time), but still would be painfully slow. 5.8 kb/s is only about 12 bytes per ntsc field, assuming 100% cpu usage.

When using it to compress the level data, which must be under 2kb, that means sub-0.5s level loading time, which is quite good. Loading can be on a black screen, so all cpu can be used.

Bregalad wrote:

The NES has 2kb RAM so an algorithm that requires 0.7kb of RAM is certainly usable without extra RAM.

There usually isn't that much free. In a game I did previously, there was less than 200 bytes free BSS, so zlib couldn't be used. To keep that much free, you pretty much have to design for it from the start.

Quote:

The ROM usage is certainly more important than RAM usage. If the decompression algorithm is large, it kills of the savings you have by compressing data.

Yes. However, level data may be 10kb or more total, compression that cuts half can be a kb or two and still a big win.

The Legend of Zelda: Oracle of * also had the Game Boy Color's 32K of RAM (4K fixed, 4K switched) and 16K of VRAM, which is easily randomly accessible while rendering is off, to work with. For random access with little RAM, such as on an NES without extra RAM on the cartridge, you need to use a static dictionary in ROM.

For map data, there's either multi-layer metatiles (Mega Man; Blaster Master; Sonic the Hedgehog) or object coding (Super Mario Bros. series). The Legend of Zelda for NES is known for using metatiles that are a column of the screen.

Another trick I like to use that generalizes RLE in a static-dictionary-ish way is based on a Markov chain. For each symbol, store the most common symbol that follows it, and then treat a "run" as a set of symbols each followed by its most common next symbol. (RLE is the special case of this where each symbol's most common next symbol is itself.) Background maps in Haunted: Halloween '85 for NES are stored this way with vertical runs. Level maps in Wrecking Ball Boy for Pygame are horizontal rows of tiles that are then Markov-expanded vertically.

It's also useful to consider how you are encoding run lengths. The best method rate-wise depends on the distribution of run lengths and literal lengths; the distribution implied by popular RLE schemes such as PackBits may not be optimal. And on the NES, you often have to break VRAM uploads into fixed-size packets. I like to use a unary code, in which each bit of a control word can be 0 for a single literal symbol or 1 for part of previous run, to be fairly efficient on real data and easy to decode to a multiple of 8 bytes. LZSS in the Allegro 4 library and Game Boy Advance and Nintendo DS BIOS uses a variant of this unary code, where each bit of the control word is one way for a literal and the other way for a reference to previous data.

It is true that RLE is another format, and then there is also huffed RLE (I have used for map data in one game). As tepples mentions, there are many other good ones too. Another thing you can do is to decompress data from CHR ROM into RAM; if everything is read sequentially and may even have parts of variable length, then this may speed it up even more, but you would have to disable rendering in order to use it properly, so it might be used during loading one level or something like that (if you have small enough level sizes).

I like to use a unary code, in which each bit of a control word can be 0 for a single literal symbol or 1 for part of previous run, to be fairly efficient on real data and easy to decode to a multiple of 8 bytes. LZSS in the Allegro 4 library and Game Boy Advance and Nintendo DS BIOS uses a variant of this unary code, where each bit of the control word is one way for a literal and the other way for a reference to previous data.

I don't think that qualifies under the definition of unary code you linked, though... (the tokens are always 0 or 1, not a string of bits) Also what you've described applies to pretty much nearly every LZ77-based scheme out there (LZSS's particularity is the use of a ring buffer instead of the already decoded data, allowing for more room to reuse strings as repeated strings don't go into the buffer).

Who is online

Users browsing this forum: No registered users and 10 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum