Compression and Archival Tools for Linux Administrators

File Compressing and Decompressing Utilities

We have several tools to compress and decompress in unix and linux environment. Below is the list of some commonly used tools.

gzip-compression

The gzip tool is typically categorized as the “classic” method of compressing data on a Linux machine. It has been around since 1992, is still in development, and still has many things going for it.

One of its main advantages is speed. It can both compress and decompress data at a much higher rate than some competing technologies, especially when comparing each utility’s most compact compression formats. It is also very resource efficient in terms of memory usage during compression and decompression and does not seem to require more memory when optimizing for best compression.

Its biggest disadvantage is that it compresses data less thoroughly than some other options. If you are doing a lot of quick compressions and decompressions, this might be a good format for you, but if you plan to compress once and store the file, then other options might have advantages.

gzip Command Examples

To compress the file and change the name to sourcefile.gz on your system.

# gzip sourcefile

To Compress a file and save it as different name use the redirection as below

# gzip -c test > test.gz

If you would like to recursively compress an entire directory, you can pass the -r flag like this:

# gzip -r directory1

This will move down through a directory and compress each file individually. This is usually not referred, and a better result can be achieved by archiving the directory and compressing the resulting file as a whole, which we’ll show how to do shortly.

To find out more information about the gzip compressed file,you can use the -l flag, which will give you some stats:

To adjust the compression optimization by passing a numbered flag between 1 and 9.The -1 flag (and its alias –fast) represent the fastest, but least thorough compression.The -9 flag (and its alias –best) represents the slowest and most thorough compression.The default option is -6, which is a good middle ground.

# gzip -9 compressme

To decompress a file, you simply pass the -d flag to gzip (there are also aliases like gunzip,but they do the same thing):

# gzip -d test.gz

Using gzip over SSH

to compress a file in remote server and extract it in the local server

bzip2 Compression

Another common compression format and tool is bzip2. While somewhat more modern than gzip, being first introduced in 1996, bzip2 is very heavily implemented as the traditional alternative to gzip.

The most important trade off for most users is greater compression at the cost of longer compression time. The bzip2 tools can create significantly more compact files than gzip, but take much longer to achieve those results due to a more complex algorithm

Another thing to keep in mind is that the memory requirements are greater than gzip. This won’t have an affect on most machines, but on small embedded devices, this may affect your choice. You can optionally pass a -s flag, which will cut the memory requirements roughly in half, but will also lead to a lower compression ratio.

bzip2 command Examples

To create a bzip2 compressed file, use the below format. This will compress the file and give it the name “afile.bz2”

# bzip2 afile

To compress a file with reduced memory mode, use -s option as mentioned below

# bzip2 -s afile

bzip2 implements numbered flags, and they represent the block size that the utility manages to implement its compression, so this is more a measurement of memory usage vs compression size, rather than time vs compression size. The default behavior is the -9 flag, which means high memory usage (relatively) but greater compression

# bzip2 -1 file

To decompress a bzip compressed file , use the below syntax

bzip2 -d file.bz2

xz Compression

A relative newcomer in the space is the xz compression mechanism. This compression tool was first released in 2009, and has gained a steady following ever since.

The xz compression utilities leverage a compression algorithm known as LZMA2. This algorithm has a greater compression ratio than the previous two examples, making it a great format when you need to store data on limited disk space. It creates smaller files.

While the compressed files thatxz produces are smaller than the other utilities, it takes significantly longer to do the compression.

The xz compression tools also take a hit in the memory requirements, sometimes up to an order of magnitude over the other methods. If you are on a system with abundant memory, this might not be a problem, but this is a consideration to keep in mind.

XZ compression Command Examples:

To compress a file, simply call the utility without any arguments:

# xz file

This will process the file and produce a file called “file.xz”.

To list statistics about the compression of the file, you can pass the -l flag on a compressed file:

If you need to send the compressed output to standard out, you can signal that to the utility with the -c flag. Here we can again direct it straight back into a file:

# xz -c test > test.xz

XZ uses numbered flags, to indicate the speed of the compression, he lower numbers to indicate faster compression. The -6 flag is the default and is a good middle ground for most use cases. If you need even more compression and don’t care at about time, memory requirements, etc., you can use the -e flag, which uses an alternate “extreme” compression variant. This can also modify its performance with numeric flags:

# xz -e -9 large_file

This will take a long time and in the end, may not show very significant gains, but if you need that functionality, the option is available.
To decompress files, you pass the -d flag again:

# xz -d large_file.xz

This will decompress the data into a file called “large_file”.

tar Archiving

Tar utility often paired with other compression utilities, to preserve directory structures, permissions, etc. of the files we wrap up,

To create a tar archive that is then compressed with the gzip utility, you can pass the -z flag, which indicates that you wish to use gzip compression on top of the archive. Actually, tar flags don’t actually require the leading “-” like most tools.

To use tar archiving with bzip2, you can replace the -z flag, which is gzip-specific, with the -j flag.

To use tar archiving with xz compression. These follow the exact same format using the -J flag.

TAR archiving Examples:

TAR archives with gzip compression

To create a tar archive that is then compressed with the gzip utility,you can pass the -z flag, which indicates that you wish to use gzip compression on top of the archive. Actually, tar flags don’t actually require the leading “-” like most tools.
Below command will create an archive (-c) from a directory called “directory1”.

# tar czvf compressed.tar.gz directory1

It will create verbose output, compress the resulting archive with gzip, and output to a file called “compressed.tar.gz” (a tar file that has been gzipped).
To Verify the contents of the archive file , use -t option as shown below

To extract the tar archive contents that is created with “J” option, use the syntax as below

# tar xJvf xzcompressed.tar.xz

Using TAR over SSH

Below command will copy the mydata directory from remotehost to local host with the new name “newdata”, Using tar with gzip

# ssh user@remotehost “tar czpf – /mydata” | tar xzpf – -C /newdata

Below command will copy the mydata directory from remotehost to local host with the new name “newdata”, Using tar with bzip2

# ssh user@remotehost “tar cjpf – /mydata” | tar xjpf – -C /newdata

Below command will copy the mydata directory from remotehost to local host with the new name “newdata”, Using tar with bzip2

# ssh user@remotehost “tar cJpf – /mydata” | tar xJpf – -C /newdata

Tip : Use all the above commands using time command to calculate the time required for each option.

cpio archiviing

cpio is a tool for creating and extracting archives, or copying files from one place to another. It handles a number of cpio formats as well as reading and writing tar files.By default, cpio creates binary format archives, for compatibility with older cpio programs. When extracting from archives, cpio automatically recognizes which kind of archive it is reading and can read archives created on machines with a different byte-order.

In copy-in mode, cpio copies files out of an archive or lists the archive contents. It reads the archive from the standard input.

In copy-pass mode, cpio copies files from one directory tree to another, combining the copy-out and copy-in steps without actually using an archive. It reads the list of files to copy from the standard input; the directory into which it will copy them is given as a non-option argument.

About iGURKUL

IGURKUL I.T. Training Hub offering various Career Certification courses in Computer Networking, Unix, Linux, Cloud Computing and DevOps Technologies. With its rich experience in IT training service sector, iGURKUL has been able to set Industry best practices in IT Training for the past five years.

In Past five years, more than 5000 professionals have been trained by iGURKUL for System administration, Cloud Computing and DevOps Skill set through our Online Training portal www.unixadminschool.com. And , each day , more than 10000 working professionals from all over the globe visiting our knowledge base www.unixadminschool.com/blog for the best practices and Knowledge learning.