I want to combine thousands of little text files into one big text file. I have them in directories with the structure: timestamp1/status.txt. For example: 20130430133144/status.txt.
So far, I know that

cat */* > bigtextfile.txt

works for small numbers of files. But will it work for higher numbers? I wonder if cat is going to gather the content of all files and then try to save to the bigtextfile. Otherwise, I suppose there must another way to do it, like fetch one file, append it to bigtextfile, then fetch another and so on.

3 Answers
3

However if you have a large number of files you can run into an issue with the number of arguments passed to cat. By default the linux kernel only allows a fixed number of arguments to be passed to any program (I can't remember how to get the value, but its a few thousand in most cases).
To solve this issue you can do something like this instead:

The shell will expand */* into the sorted list of (non-hidden) matching files, and will execute cat with those file paths as arguments.

cat will open each file in turn and write on its stdout what it reads from the file. cat will not hold more than one buffer full of data (something like a few kilo bytes) at a time in memory.

A problem you may encounter though is that the list of arguments to cat is so big that it reaches the limit of the size of arguments of the execve() system call. So, you may need to split that list of files and run cat several times.

You could use xargs for that (here with GNU or BSD xargs for the non-standard -r and -0 options):

printf '%s\0' */* | xargs -r0 cat > big-file.txt

(because printf is built in the shell, it doesn't go through the execve system call, so not through its limit).

Or have find make the list of files and run as many cat commands as necessary:

find . -mindepth 2 -maxdepth 2 -type f -exec cat {} + > big-file.txt

Or portably:

find . -path './*/*' -prune -type f -exec cat {} + > big-file.txt

(beware though that contrary to */*, it will include hidden files, and the list of files will not be sorted).

If on a recent version of Linux, you can lift the limit of the size of arguments by doing:

ulimit -s unlimited
cat -- */* > big-file.txt

With zsh, you could also use zargs:

autoload zargs
zargs -- */* -- cat > big-file.txt

With ksh93, you can use command -x:

command -x cat -- */* > big-file.txt

All those do the same thing, split the list of files and run as many cat commands as necessary.

With ksh93 again, you can get around the execve() limit by using the builtin cat command:

If the number of files is too large, the */* will give a too large argument list. If so, something along the lines will do:

find . -name "*.txt" | xargs cat > outfile

(the idea is to use find to pick up the file names, and make them into a stream; xargs chops this stream up into manageable pieces to give to cat, which concatenes them into the output stream of xargs, and that goes into outfile).