The solution proposed by Xiong Chiamiov above works beautifully. I had just backed up my laptop with .tar.bz2 and it took 132 minutes using only one cpu thread. Then I compiled and installed tar from source: gnu.org/software/tar I included the options mentioned in the configure step: ./configure --with-gzip=pigz --with-bzip2=lbzip2 --with-lzip=plzip I ran the backup again and it took only 32 minutes. That's better than 4X improvement! I watched the system monitor and it kept all 4 cpus (8 threads) flatlined at 100% the whole time. THAT is the best solution.
– Warren SeverinNov 13 '17 at 4:37

5 Answers
5

You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:

tar cf - paths-to-archive | pigz > archive.tar.gz

By default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.

How do you use pigz to decompress in the same fashion? Or does it only work for compression?
– user788171Feb 20 '13 at 12:43

37

pigz does use multiple cores for decompression, but only with limited improvement over a single core. The deflate format does not lend itself to parallel decompression. The decompression portion must be done serially. The other cores for pigz decompression are used for reading, writing, and calculating the CRC. When compressing on the other hand, pigz gets close to a factor of n improvement with n cores.
– Mark AdlerFeb 20 '13 at 16:18

The problem with this answer is that you don't have any control over pigz arguments. For instance, how to enable --best compression or set the number of processes (--processes) to use. This is why I prefer the accepted answer from @mark-adler where he uses pigz on the pipe.
– Nathan S. Watson-HaighMay 9 '18 at 3:02

xz

Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils, you can utilize multiple cores for compression by setting -T or --threads to an appropriate value via the environmental variable XZ_DEFAULTS (e.g. XZ_DEFAULTS="-T 0").

This is a fragment of man for 5.1.0alpha version:

Multithreaded compression and decompression are not implemented yet, so this
option has no effect for now.

However this will not work for decompression of files that haven't also
been compressed with threading enabled. From man for version 5.2.2:

Threaded decompression hasn't been implemented yet. It will only work
on files that contain multiple blocks with size information in
block headers. All files compressed in multi-threaded mode meet this
condition, but files compressed in single-threaded mode don't even if
--block-size=size is used.

Recompiling with replacement

If you build tar from sources, then you can recompile with parameters

--with-gzip=pigz
--with-bzip2=lbzip2
--with-lzip=plzip

After recompiling tar with these options you can check the output of tar's help:

I just found pbzip2 and mpibzip2. mpibzip2 looks very promising for clusters or if you have a laptop and a multicore desktop computer for instance.
– user1985657Apr 28 '15 at 20:57

This is a great and elaborate answer. It may be good to mention that multithreaded compression (e.g. with pigz) is only enabled when it reads from the file. Processing STDIN may in fact be slower.
– oᴉɹǝɥɔJun 10 '15 at 17:39

Step 1: find

This command will look for the files you want to archive, in this case /my/path/*.sql and /my/path/*.log. Add as many -o -name "pattern" as you want.

-exec will execute the next command using the results of find: tar

Step 2: tar

tar -P --transform='s@/my/path/@@g' -cf - {} +

--transform is a simple string replacement parameter. It will strip the path of the files from the archive so the tarball's root becomes the current directory when extracting. Note that you can't use -C option to change directory as you'll lose benefits of find: all files of the directory would be included.

-P tells tar to use absolute paths, so it doesn't trigger the warning "Removing leading `/' from member names". Leading '/' with be removed by --transform anyway.

-cf - tells tar to use the tarball name we'll specify later

{} + uses everyfiles that find found previously

Step 3: pigz

pigz -9 -p 4

Use as many parameters as you want.
In this case -9 is the compression level and -p 4 is the number of cores dedicated to compression.
If you run this on a heavy loaded webserver, you probably don't want to use all available cores.