PyLZMA

PyLZMA

Impressed by the spectacular compression ratios of the Inno Setup compiler, I wanted to use the great compression algorithm LZMA in my own Python programs. As the LZMA SDK by Igor Pavlov is Open Source, it was no problem writing some Python wrappers for the C library. They currently run fine both on Windows and Linux, so hopefully, I can provide a tool that enables the user to read and create 7-zip compatible archives on Linux (as this is not supported by the original 7-zip).

Comparison

Here are the compression results of different data files with the zlib, bz2 and pylzma modules:

Description

Original

zlib

bz2

pylzma

SVN export of version 0.1.0

542.720

100%

97.923

18.04%

79.660

14.68%

74.009

13.64%

20 JPEG wallpapers

7.178.240

100%

6.989.049

97.36%

7.022.040

97.82%

698.0443

97.24%

libxml2-2.6.22.tar

34.232.320

100%

4.567.489

13.34%

3.408.457

9.96%

2.475.885

7.23%

Depending on your input data, the differences between zlib/bz2 and pylzma may be even bigger!

Features

Compression / decompression of a single block of data

Compression from a file-like object (must provide a read method)

Streaming decompression through multiple calls to decompress

An initial library that supports reading of 7-zip archives (both solid and non-solid)

Compiles and runs on Windows, Linux and OSX

Multithreaded compression on Windows

Built with LZMA SDK 4.65

Download

Afterwards, you will find a file pylzma.pyd in the directory build/lib.win32-<PythonVersion> that can get imported by Python. On linux, the file will be called pylzma.so and can be found in a directory called build/lib.linux-<arch>-<PythonVersion>.

Compilation has been tested with Microsoft Visual Studio 2003, GCC 3 (Linux, Cygwin), GCC 4 (Linux) but should work with any ANSI C compiler. Please let me know if you encounter any problems.

Installation using Python eggs

If you installed the EasyInstall package, you can install the latest version of pylzma using the following command:

easy_install pylzma

Refer to the EasyInstall documentation for further details about installing Python eggs. EasyInstall queries the Python Package Index and automatically fetches the latest release.

– Already had installed MSVC compiler for AMD64 with Visual Studio 2008 (v9.0)
– Cloned the GIT repo
– tried setup.py install, but got bunch of compiler error
– tried to change all .c file extension to .cpp and modified the setup.py accordingly
– Now they compile fine (after few tweaks in CpuArch and another file which had a switch/case block), but now the linker choke with these errors (Might look really bad in this little text field):

I’ve just successfully built and installed pylzma-0.4.3 on Windows XP Pro with Python 2.7.1 using mingw. No problems. (Python installed with the Windows x86 MSI installer from python.org).

pylzma is being used on (pickle) files that are transferred over slightly unreliable 9600 baud dialup connections. Half the size of zip compression. Reduced transfer time = fewer redials = lower costs and much less frustration. Thank you.

help!
if i try to install with easy_install, it exits with
File “C:\Python26\lib\site-packages\setuptools-0.6c11-py2.6.egg\setuptools\package_index.py”, line 475, in fetch_distribution
AttributeError: ‘NoneType’ ibject has no attribute ‘clone’
you dont happen to have a simple, pre-built distribution do you? xxx

It says “You can download the binaries and the source code for the wrappers below,” but there is no link anywhere on this page that I can find.

It looks like the best thing to do is go to http://www.joachim-bauch.de/
From there it says “You can get the source tarball from the Python Package Index or github.”

But the current page (projects/pylzma) is still the #1 hit on google for “python lzma” and although there is a link to github, I think the Python Package Index is easier for most people to download from, and certainly feels less scary than trying to fetch what may or may not be an experimental bleeding edge version in the source repository.

Well, by using “easy_install” as described above, the Python Package Index is queried for the latest released version. The text above was copied from my old page und surely is a bit unclear. I’ll update it to refer to the Python PI.

I can’t seem to find any hosted online. I’ve tried running help(pylzma), but many of the doc strings are marked as “todo” or do not explain any of the parameters. Some of the doc strings invite you to instead run “help(type(x))” where I assume x is the class/module/function. This is not helpful either.

[…] for quite a while and couldn’t find a python implementation of ppmz, but I did find another method ported to python with lzma, the compressor behind 7zip. Lzma uses a different implementation of Lempel-Ziv, […]

Hey Joachim – you might never read this but thanks for writing up this post and showing off these libraries. I was trying to find a solution to use in Windows so I’ll download that package from PyPI and see what I can do.

I’m quite new to handling binary files, I’m looking for a python implementation to decompress a 7zip and save the result as a new file.
Any chance of pointing me to a small example illustrating this? (I understand this goes beyond the scope of pylzma…..still it would be very helpful to me to understand how to achieve it

[…] for quite a while and couldn’t find a python implementation of ppmz, but I did find another method ported to python with lzma, the compressor behind 7zip. Lzma uses a different implementation of Lempel-Ziv, […]

How in the world does one create a 7z with multiple files in it? My google fu is failing me and I just can’t get it to work. Below is the method I’m using to 7z one file and it works fine. Do, I header/data/header/data, header/header/data/data or sumthing else entirely as I’ve tried em both?

PyLZMA doesn’t work with regular 7z files, only single-file LZMA archives. (Also not for LZMA2/XZ.) So quick answer: Pipe through tar before you compress. If you need to work with regular 7z files, you need to use 7z.dll or 7z.so.

Anyone can help me to make 2 little python script based on this pylzma:

1. 7zcompress input.ext

Compress with 7z WITHOUT 7z HEADER to input.ext.7z
No need to be 7z compatible! 7z format contains header, file informations, and I no need it! I want a pure stream/string compressor based on 7z.
I will use it from Linux CLI and need a SMALLEST file size.

thanks very much for your port of LZMA – I find it very useful for filtering archives of network data. However, I have a problem decompressing archives which are generated incrementally adding two or more files simultaneously. I have found that the assumption in Archive7z.__init__ that “every file has it’s own folder” does not hold. I appreciate this may be a cause of “Don’t Do That!”. I am sure I should raise this on your bugzilla, but you do not seem to have a category for py7zlib bugs?

In order to work around the problem I found that (1) SubstreamsInfo.__init__ contained a bug:
Where id == PROPERTY_SIZE, the sum must be set to zero before the inner loop (otherwise the total is carried across and incorrect sizes (often negative!) result.

(2) In order to more easily process the unpacking info, I changed the loop and its preamble in Archive7z.__init__ to:

My apologies for the length of this post, and any poor code formatting. I should also add that I have not conducted exhastive testing on my workaround – it just works for the kind of archives I am encoundering.