Posts Tagged ‘maxdepth’

Listing hidden files is one of the common thing to do as sys admin. Doing manipulations with hidden files like copy / delete / move is very rare but still sometimes necessary here is how to do all this.

1. Find and show (only) all hidden files in current directory

find . -iname '.*' -maxdepth 1

maxdepth – makes files show only in 1 directory depth (only in current directory), for instance to list files in 2 subdirectories use -maxdepth 3 etc.

echo .*;

Yeah if you're Linux newbie it is useful to know echo command can be used instead of ls.echo * command is very useful on systems with missing ls (for example if you mistakenly deleted it 🙂 )

2. Find and show (only) all hidden directories, sub-directories in current directory

In this article, I will explain shortly few approaches to delete few million of obsolete files to clean some space on your server. Here are 3 methods to use to clean your tons of junk files.

1. Using Linux find command to wipe out millions of files

a.) Finding and deleting files using find's -exec switch:

# find . -type f -exec rm -fv {} \;

This method works fine but it has 1 downside, file deletion is too slow as for each found file external rm command is invoked.

For half a million of files or more, using this method will take "long". However from a server hard disk stressing point of view it is not so bad as, the files deletion is not putting too much strain on the server hard disk.b.) Finding and deleting big number of files with find's -delete argument:

Luckily, there is a better way to delete the files, by using find's command embedded -delete argument:

# find . -type f -print -delete

c.) Deleting and printing out deleted files with find's -print arg

If you would like to output on your terminal, what files find is deleting in "real time" add -print:

# find . -type f -print -delete

To prevent your server hard disk from being stressed and hence save your self from server normal operation "outages", it is good to combine find command with ionice, e.g.:

# ionice -c 3 find . -type f -print -delete

Just note, that ionice cannot guarantee find's opeartions will not affect severely hard disk i/o requests. On heavily busy servers with high amounts of disk i/o writes still applying the ionice will not prevent the server from being hanged! Be sure to always keep an eye on the server, while deleting the files nomatter with or without ionice. if throughout find execution, the server gets lagged in serving its ordinary client requests or whatever, stop the execution of the cmd immediately by killing it from another ssh session or tty (if physically on the server).

2. Using a simple bash loop with rm command to delete "tons" of files

An alternative way is to use a bash loop, to print each of the files in the directory and issue /bin/rm on each of the loop elements (files) like so:

for i in *; do rm -f $i; done

If you'd like to print what you will be deleting add an echo to the loop:

# for i in $(echo *); do \ echo "Deleting : $i"; rm -f $i; \

The bash loop, worked like a charm in my case so I really warmly recommend this method, whenever you need to delete more than 500 000+ files in a directory.

3. Deleting multiple files with perl

Deleting multiple files with perl is not a bad idea at all. Here is a perl one liner, to delete all files contained within a directory:

Using perl interpreter to delete thousand of files is quick, really, really quick. I did not benchmark it on the server, how quick exactly is it, but I guess the delete rate should be similar to find command. Its possible even in some cases the perl loop is quicker …

4. Using PHP script to delete a multiple files

Using a short php script to delete files file by file in a loop similar to above bash script is another option. To do deletion with PHP, use this little PHP script:

As you see the script reads the $dir defined directory and loops through it, opening file by file and doing a delete for each of its loop elements. You should already know PHP is slow, so this method is only useful if you have to delete many thousands of files on a shared hosting server with no (ssh) shell access.

This php script is taken from Steve Kamerman's blog . I would like also to express my big gratitude to Steve for writting such a wonderful post. His post actually become inspiration for this article to become reality.

To use it rename delete_millioon_of_files_in_a_dir.php.txt to delete_millioon_of_files_in_a_dir.php and run it through a browser .

Note that you might need to run it multiple times, cause many shared hosting servers are configured to exit a php script which keeps running for too long. Alternatively the script can be run through shell with PHP cli:

php -l delete_millioon_of_files_in_a_dir.php.txt.

5. So What is the "best" way to delete million of files on Linux?

In order to find out which method is quicker in terms of execution time I did a home brew benchmarking on my thinkpad notebook.

a) Creating 509072 of sample files.

Again, I used bash loop to create many thousands of files in order to benchmark. I didn't wanted to put this load on a productive server and hence I used my own notebook to conduct the benchmarks. As my notebook is not a server the benchmarks might be partially incorrect, however I believe still .they're pretty good indicator on which deletion method would be better.

Deleting my sample 509072 took 6 mins and 24 secs. This is about 3 times faster than find! GO-GO perl 🙂 As you can see from the results, perl is a great and time saving, way to delete 500 000 files.

– The approximate speed deletion rate of of for + rm bash loop

hipo@noah:/tmp/test$ time for i in *; do rm -f $i; done

real 206m15.081s user 2m38.954s sys 195m38.182s

You see the execution took 195m en 38 secs = 3 HOURS and 43 MINUTES!!!! This is extremely slow ! But works like a charm as the running of deletion didn't impacted my normal laptop browsing. While the script was running I was mostly browsing through few not so heavy (non flash) websites and doing some other stuff in gnome-terminal) 🙂

As you can imagine running a bash loop is a bit CPU intensive, but puts less stress on the hard disk read/write operations. Therefore its clear using it is always a good practice when deletion of many files on a dedi servers is required.b) my production server file deleting experience

On a production server I only tested two of all the listed methods to delete my files. The production server, where I tested is running Debian GNU / Linux Squeeze 6.0.3. There I had a task to delete few million of files. The tested methods tried on the server were:

– The find . type -f -delete method.

– for i in *; do rm -f $i; done

The results from using find -delete method was quite sad, as the server almost hanged under the heavy hard disk load the command produced.

With the for script all went smoothly. The files were deleted for a long long time (like few hours), but while it was running, the server continued with no interruptions..

While the bash loop was running, the server load avarage kept at steady 4 Taking my experience in mind, If you're running a production, server and you're still wondering which delete method to use to wipe some multitude of files, I would recommend you go the bash for loop + /bin/rm way. Yes, it is extremely slow, expect it run for some half an hour or so but puts not too much extra load on the server..

Using the PHP script will probably be slow and inefficient, if compared to both find and the a bash loop.. I didn't give it a try yet, but suppose it will be either equal in time or at least few times slower than bash.

If you have tried the php script and you have some observations, please drop some comment to tell me how it performs.

To sum it up;

Even though there are "hacks" to clean up some messy parsing directory full of few million of junk files, having such a directory should never exist on the first place.

Frankly, keeping millions of files within the same directory is very stupid idea. Doing so will have a severe negative impact on a directory listing performance of your filesystem in the long term.

If you know better (more efficient) ways to delete a multitude of files in a dir please share in comments.

Did you ever needed to count, how many files in a directory are there? Having the concrete number of files in a directory is not a seldom task but still very useful especially for scripts or simply for the sake of learning

The quickest and maybe the easiest way to count all files in a directory in Linux is with a combination of find and wc commands:

This will find and list all matched files in any directory and subdirectories, print them out and count them with wc command. The -type f argument instructs find to look only for files.

Other helpful variance of finding and listing all files in a directory and subdirectories is to list and count all the files with a certain file extension under a directory. For example, lets list all text files (.txt) contained in a directory and all level sub-directories:

linux:~/ascii# find . -type f -iname '*.txt' -print |wc -l 401

If you need to check the number of files in a directory for multiple directories on a server and you're aiming at doing it efficienly, issung above find .. | wc code will definitely be not a good choice. If used it will generate heavy load for the system and along with that will complete the execution in ages if issued on a large number of files containing dirs.

Thanksfully if efficiency is targetted, there is a command written in C called tree which is more efficient than find. To count the number of files in dir but using tree :

ls can be used to calculate the number of 1-st level sub-directories under certain directory for instance:

linux:~/ascii# ls -l |grep ^d|wc -l 25

You see the ascii directory has 25 subdirectories in its 1st level.

To check symlinks under a directory with ls the command would be:

linux:~/ascii# ls -l | grep ^l | wc -l 0

Note above 3 ls | grep … examples, will not work properly if the directory contains files with SUID or some special properties set. Hence to get the same 3 results for active files, directories and symbolic links, a one liner similar to the one below can be used instead:

This will show statistics about all files, links and directories for all directory sub-levels. Just in case if there is need to only count files, links and directories without directory recursion enabled, use: