Thursday, January 26, 2012

Lzma Vs Bzip2 – Better Compression than bzip2 on UNIX / Linux


Lzma stands for Lempel-Ziv-Markov chain Algorithm. Lzma is a compression tool like bzip2 and gzip to compress and decompress files. It tends to be significantly faster and efficient than bzip compression. As we know, gzip compression ratio is worse than bzip2 (and lzma).
In this article, let us understand how to use lzma, an effective compression utility which is significantly better in compression ratio and faster operation.

Compress the input text file using lzma -c

$ lzma  -c --stdout  sample.txt  >sample.lzma

Decompress the lzma file using -d option

$ lzma -d –stdout sample.lzma >sample.txt

Comparison between bzip2 and lzma compression tools

To understand the effectiveness of lzma, let us compress/decompress a 1MB sample.txt with both lzma and bzip2 and compare the outcome. These testing has been done with the machine which has 1GB of RAM  and the processor  is Pentium 4.
Size of the sample.txt input file:
$ ls -l sample.txt
-rw-r--r-- 1 bala bala   1048576 2010-05-14 19:43 sample.txt
Note: We used time command in front of every compression and decompression commands to get the CPU usage of the command.

Compress the sample.txt using bzip2

Compress the input file with bzip2 command and it doesnt require the option during compression.
$ time bzip2  sample.txt

real    0m27.874s
user    0m13.981s
sys     0m0.148s

$ ls -l sample.txt.bz2
-rw-r--r-- 1 bala bala      1750 2010-05-14 19:43 sample.txt.bz2
After bzip2 compression, the output file size is of 1750 bytes.

Decompress the sample.txt using bunzip2

Decompress the compressed file with bunzip2 utility and it also doesn’t need any option to be passed.
$ bunzip2  sample.txt.bz2

real    0m0.232s
user    0m0.128s
sys     0m0.020s

Compress the sample.txt using lzma

Now, let us compress the sample.txt using lzma command with the following options:
  • -c to compress
  • –stdout to print the compressed output in stdout
$ time lzma  -c --stdout  sample.txt >sample.lzma

real    0m2.035s
user    0m1.544s
sys     0m0.132s

$ ls -l sample.lzma
-rw-r--r-- 1 bala bala       543 2010-05-14 19:48 sample.lzma
After the compression, lzma produces the output file with the size as 543 bytes, which is comparatively less than bzip2 command. Also, as seen above, the CPU time used by lzma is much less than the bzip2.

Decompress the sample.txt using lzma

Decompress the *.lzma file using the lzma command with following options:
  • -d to compress
  • –stdout to print the decompressed output in stdout
$ time lzma -d --stdout sample.lzma >sample.txt

real    0m0.043s
user    0m0.016s
sys     0m0.004s
As seen above, the decompression done by lzma is many times quicker than bzip2

Different Levels of Lzma Compression

  • Lzma provides the compression range from -1 to -9.
  • -9 is the highest compression ratio, which requires certain amount of time and system resources to do it. These ratio are not applicable for decompression.
  • -1 is the lowest level compression ratio and it runs much quicker.
Do the following to do a quick lzma compression using the low level compression ratio:
$ lzma -1 -c --stdout  sample.txt >sample.lzma

$ ls -l sample.lzma

-rw-r--r-- 1 bala bala       548 2010-05-14 20:47 sample.lzma
Note: -fast is alias to -1.
-9 is the highest level compression ratio and it takes longer time to compress than the low level ratio. Do the following to do a intensive compression using the high level compression ratio:
$ lzma -9 -c --stdout  sample.txt >sample.lzma

$ ls -l sample.lzma
-rw-r--r-- 1 bala bala       543 2010-05-14 20:55 sample.lzma
Note: -best is alias to -9.