LZSA is a new data compressor written by Emmanuel Marty that is targeted for small, primarily 8-bit platforms, and heavily optimized for high decompression speeds. It takes its heritage from LZ4 school of LZ-compression, but modifies and extends the format to improve the compression ratio and decompression speed on not particularly capable computers.

You can download the latest sources and binaries for LZSA from: https://github.com/emmanuel-marty/lzsa

In fact, LZSA comprizes two compressors inside of a single executable. The first of them, called "LZSA1", is easier to explain, as it is a very similar compressor to LZ4. The changes include more efficient extended lengths codes, as well as the added support for 8-bit offsets. These changes together increased the compression ratio by about 6-7%, which is probably not particularly surprising to someone with experience of working with this type of compressors. However, what hopefully makes this specific format interesting is the process that we went through to arrive at it.

Essentially, we knew from the start, from our experience of developing a speed-optimized Z80 decompressor for LZ4, that certain decisions in the compression format are akward to implement on an 8-bit platform. For example, the format is half-assuming that decompressor would be controlling the number of decompressed bytes as the stopping condition. This is clearly a good practice from the point of view of dealing with potentially malicious data, but not very practical on 8-bit platforms, where 16-bit operations, even if available, are fairly expensive. Therefore, every practical LZ4 decompressor for Z80 is not checking the size of decompressed data, but instead is looking for zero offset as the marker of end-of-data. (This works because 8-bit computers only use LZ4 data with a single frame, and at the end of the last compressed block there is an EndMark that is 32-bit zero, sitting exactly where the last offset would sit.) Actually, testing that offset is zero is relatively expensive too, because there is no single command on an 8-bitter capable of checking if a 16-bit number is equal to zero.

So, to address this kind of issues, we developed the format in several iterations. After the first version of the compressor was written, we implemented the decompressor on Z80 and looked into ways of simplifying the format to simplify/speed-up decompressor. We went through several iterations of this process, with some ideas tried and discarded, and another ideas kept. This resulted in the format that is farly decompressor-friendly, at least from the point of view of an 8-bit machine.

We also spend quite a lot of time working out ways to help the decompressor during the compression. Our main vector of attack was to resolve the ties - the situations where several combinations of code are possible - strategically. We attempted several approaches to do this. At some point we had a version of the compressor that literally took into account how long each decompressor branch executes, so that the number of machine cycles to decompress was used to resolve the ties. However, we found it to be very impractical: every decompressor would need a separate function specifying the cost of each combination of codes. Every change of a decompressor would entail changing the compressor and recompressing the data. So, we came up with an extremely useful heuristic, by which we minimized the number of compression tokens (this is exactly why we tried to apply it to LZ4 data too, see LZ4Ultra tread here). The resulting speed-up was within one or two percent of the result obtained using the exact machine cycle costs of the decompression.

In addition, even with such extreme optimizations, the added complexity of "LZSA1" was such that we struggled to match the decompression speed of LZ4. Luckily, we quickly noticed that the relative loss in speed scaled as about 4/3. Incidentally, this is also the ratio of minimum size matches in LZ4 and LZSA. Fundamentally, this happens because the decompression time per byte scales as C1 / N + C2, where N is the length of the current match, C1 some ~constant decoding time per token and C2 is the time that it takes to copy a byte. C2 is pretty much a CPU-specific constant (assuming a well-optimized decompressor). Our work on the compression format was aimed at reducing C1. However, fundamentally, even if we managed to match C1 from LZ4 using various clever tricks in the data format, we could not match C1/4 in the LZ4, because our shortest match length was 3.

This led us to a decision to allow the user to choose the minimal match length "-m". This leads to a loss in compression ratio, but also provides a fairly substantial speed-up in the decompressor. Therefore, the user can decide that a particular set of data needs to be compressed as well as possible and choose to use the minimal match length of 3 ("-m3"). On the other hand, the user may have a preference for higher decompression speed and choose minimum match length of 4 or even 5 ("-m5" or "-m5"). The resulting combination of the format tuning and compressor helping the decompressor by reducing the number of tokens and, possibly, increasing the minimum match length produces results that are clearly on a Pareto frontier for Z80-based compression and decompression:

Of course, our combination of skills meant that Z80 was particularly well catered for. Good for Z80, but not completely representitative for everything else. Since our compressor was published, new highly optimized implementations for 8086/8088 decompressors were contributed by Jim Leonard and Peter Ferrie. Although the speed of decompression of "LZSA1 -m5" in these decompressors is merely similar to LZ4 (on Z80 "LZSA1 -m5" overtakes LZ4 by about 9%), it is still showing that the fundamental design decisions are not completely off.

The second compression format, "LZSA2" is in many ways a logical extension of the first. To the simple byte-based format of "LZSA1" we added nibbles, to allow finer grain control of the match lengths and offsets. The token was modified a little to optimize the compression on our corpus. Last, but not least, a command for repeated offsets was added. Therefore, overall, the resulting compressor is probably best described as a further exploration of the LZ5-style compression. In our compression tests we achieved compression ratios comparable with "ZX7", which was ultimately our target in terms of the compression ratio. Compared to "ZX7", "LZSA2" did not fare very well when compressing graphics, which is probably due to the relatively crude variable length code used in the compressor. However, when dealing with mixed data, less dominated by a large number of small matches, we outperform "ZX7" by quite a big margin, mostly due to a larger compression window. These are results on our corpus comprised of about 2.5Mb of files mostly relevant to ZX Spectrum:

At this stage of the project, the decompressors for Z80 and 8086/8088 contributed by Jim Leonard and Peter Ferrie are quite mature. The decompressors for 6502 have been partly optimized, but probably more can be done to them. We are hoping to have decompressors for several other small CPUs added soon.Code:corp demos games gfx music text OOZ v7.1(Kraken) 34211+219830+337485+438657+138516+204963 1,373,662 LZSA (f2) 38043+229544+352172+497857+138086+225430 1,481,132 ZX7 39484+238903+362443+489624+143457+226092 1,500,003 OOZ v7.1(Mermaid) 37709+235601+363995+486306+157553+220841 1,502,005 LZOP v1.3 -9 40347+238797+374751+527285+154278+236488 1,571,946 LZSA (f1) 41839+238331+375340+530595+152838+247901 1,586,844 LZ4 v1.9.1 -19 45135+244707+398287+559407+168746+266545 1,682,827 OOZ v7.1(Selkie) 45122+246574+398377+559023+167500+271168 1,687,764 Uncompressed 90445+303002+563234+819609+352130+518401 2,646,821

P.S. We recognized very recently that there exists another compression scheme that is also named LZSA: https://cbloomrants.blogspot.com/201...ndex-post.html

We did not really try to refer to Charles Bloom's work and, anyway, his compressor is unpublished and extremely different from what we did. We hope that this won't cause any unnecessary confusion.