This question came from our site for system and network administrators.

20

Just a side remark: You do know that those two commands are not equivalent, don't you? At least for Unix/Linux, the echo >> $file will append a newline to $file and thus modify it. I assume it will be the same for OS/X. If you do not want that, use echo -n >> $file.
–
DubuApr 9 '14 at 9:03

2

Also wouldn't touch `find . -name "*.xml"` be even faster than both of the above?
–
elmoApr 9 '14 at 10:27

Not an answer to the explicit question, but why invoke touch so many times at all? find . -name '*.xml' -print0 | xargs -0 touch invokes touch much fewer times (possibly only once). Works on Linux, should work on OS X.
–
Mike RenfroApr 10 '14 at 15:37

3 Answers
3

Since touch is an external binary, and you invoke touch once per file, the shell must create 300,000 instances of touch, which takes a long time.

echo, however, is a shell builtin, and the execution of shell builtins does not require forking at all. Instead, the current shell does all of the operations and no external processes are created; this is the reason why it is so much faster.

Here are two profiles of the shell's operations. You can see that a lot of time is spent cloning new processes when using touch. Using /bin/echo instead of the shell builtin should show a much more comparable result.

Did you compile strace on OS X or run your test on another OS?
–
bmikeApr 9 '14 at 19:49

1

@bmike My test is on Linux, but the principle is identical.
–
Chris DownApr 10 '14 at 2:24

I totally agree - see my comment on the main question about how /bin/echo is as slow as /bin/touch so the reasoning is sound. I just wanted to reproduce the timing of strace and failed using dtruss/dtrace and the bash -c syntax doesn't work as expected on OS X either.
–
bmikeApr 10 '14 at 2:56

As others have answered, using echo will be faster than touch as echo is a command which is commonly (though not required to be) built-in to the shell. Using it dispenses with the kernel overhead associated with running starting a new process for each file that you get with touch.

However, note that the fastest way to achieve this effect is still to use touch, but rather than running the program once for each file, it is possible to use the -exec option with find to ensure that is only run a few times. This approach will usually be faster since it avoids the overhead associated with a shell loop:

find . -name "*.xml" -exec touch {} +

Using the + (as opposed to \;) with find ... -exec runs the command only once if possible with each file as an argument. If the argument list is very long (as is the case with 300,000 files) multiple runs will be made with an argument list which has a length close to the limit (ARG_MAX on most systems).

Another advantage to this approach is that it behaves robustly with filenames containing all whitespace characters which is not the case with the original loop.

Shell builtins are much faster as there is no overhead involved in loading the program, i.e. there is no fork/exec involved. As such, you'd observe a significant time difference when executing a builtin vs an external command a large number of times.

This is the reason that utilities like time are available as shell builtins.

You can get the complete list of shell builtins by saying:

enable -p

As mentioned above, using the utility as opposed to the builtin results in a significant performance degradation. Following are the statistics of the time taken to create ~9000 files using the builtinecho and the utilityecho: