How to Split Large ‘tar’ Archive into Multiple Files of Certain Size

Are you worried of transferring or uploading large files over a network, then worry no more, because you can move your files in bits to deal with slow network speeds by splitting them into blocks of a given size.

In this how-to guide, we shall briefly explore the creation of archive files and splitting them into blocks of a selected size. We shall use tar, one of the most popular archiving utilities on Linux and also take advantage of the split utility to help us break our archive files into small bits.

Create and Split tar into Multiple Files or Parts in Linux

Before we move further, let us take note of, how these utilities can be used, the general syntax of a tar and split command is as follows:

# tar options archive-name files
# split options file "prefix”

Let us now delve into a few examples to illustrate the main concept of this article.

Example 1: We can first of all create an archive file as follows:

$ tar -cvjf home.tar.bz2 /home/aaronkilik/Documents/*

Create Tar Archive File

To confirm that out archive file has been created and also check its size, we can use ls command:

$ ls -lh home.tar.bz2

Then using the split utility, we can break the home.tar.bz2 archive file into small blocks each of size 10MB as follows:

Example 3: In this instance, we can use a pipe to connect the output of the tar command to split as follows:

$ tar -cvzf - wget/* | split -b 150M - "downloads-part"

Create and Split Tar Archive File into Parts

Confirm the files:

$ ls -lh downloads-parta*

Check Parts of Tar Files

In this last example, we do not have to specify an archive name as you have noticed, simply use a - sign.

How to Join Tar Files After Splitting

After successfully splitting tar files or any large file in Linux, you can join the files using the cat command. Employing cat is the most efficient and reliable method of performing a joining operation.

To join back all the blocks or tar files, we issue the command below:

# cat home.tar.bz2.parta* >backup.tar.gz.joined

We can see that after running the cat command, it combines all the small blocks we had earlier on created to the original tar archive file of the same size.

Conclusion

The whole idea is simple, as we have illustrated above, you simply need to know and understand how to use the various options of tar and split utilities.

You can refer to their manual entry pages of to learn more other options and perform some complex operations or you can go through the following article to learn more about tar command.

Aaron Kili is a Linux and F.O.S.S enthusiast, an upcoming Linux SysAdmin, web developer, and currently a content creator for TecMint who loves working with computers and strongly believes in sharing knowledge.

Your name can also be listed here. Got a tip? Submit it here to become an TecMint author.

I have more than 1,00,000 files in a directory(60GB data) and I want to tar them based on size 128 MB.

What will happen to the file on the tape boundary .

I mean if the 500th file is tarred at around 116MB but the file size is 15 MB will the file get corrupted or the system understands that the file cannot be accommodated in that part and puts it in the next part.

If you are using tar without compression, the operation will be a little faster. But enabling compression slows down the whole process; for instance if you are using gzip, include the – -best or – -fast flag to make it faster(read man page for more info).

Secondly, also ensure that there is not a lot of I/O operations running on the system, or any other CPU-time consuming processes.

This is an effective explanation of a solution to a common problem. There is another side to these issues that involves initial creation of the large tar archive. The processing to create the archive is likely to take a long time (wall clock). Various events might interrupt that processing before it is complete. Example interruptions might include loss of power, battery depletion, loss of connection to the target disk(s), and so on.

The tar command does not have the ability to remember which input files and folders have already been processed and resume with the remaining to-be-done files and folders. Instead, the list of files submitted to tar gets determined by the input selection GLOB at the tar command line. I’m certain that readers would be interested in any technique that will enable (1) creation of a list of files, (2) gathering of checkpoint details while files are processed, (3) restart from the most recent checkpoint following an interruption.

Thanks for the appreciation, and as you well expressed your concern, i took out to search for any other powerful Unix/Linux archive creating utilities apart from TAR that can handle the issues you are trying to bring to light.

TAR is a Unix/Linux standard for this particular purpose, as far as i know(and i stand to be corrected), there is no other utility that can solve the shortcomings of TAR you have mentioned above. Probably in the future, someone will developed a utility that will resolve these issues but at the moment, that is just the way it works.

But am still on the look out to find a technique or method that you have well explained above, and in case you find one, please always let us know. Many thanks for your feedback.

The only work around that might work (not tested it personally) is snapshot recovery of a virtual machine running the task, this resilience will not completely recover from last point of blackout though, only since last successful snapshot, furthermore it might involve more resource overhead (disc space, hosting, etc, etc) than initially desired.

Thanks for the tip, didn’t know about this handy trick, just we’ve included the instructions to combine or join back together files after splitting large tar archive file to the writeup….hope you like it..