We have an application that generates huge number of files (3 - 4 lakh including subdirectories) under multiple directories. In order to archive, move or delete these files based on files age, I have written a simple perl script but sometime it is causing high CPU spikes(%CPU in Linux top outpu 60-100%). Just spikes it never continuously use high CPU.

Noticed that when script is switching from one directory to other, it is causing CPU spikes. Strange thing is if are more files matched, then there is no high CPU spikes (10-15%), but if it only found around 30-60 files in each dir, i can see spikes. I have put sleep after each dir to bring down cpu.

What is the best way to match files based on age? Any idea what is causing high CPU spikes. %CPU shows high value sometime but at same time overall CPU usage given by top, mpstat shows hardly from 4-5%, this is confusing me.

I was trying to find out what is causing high CPU. In a directory there are 2-3 lac files. So below while loop is resulting in high CPU usage.

Now, suppose there are 2 lac files in a directory matching pattern but out of those 2 lac only 10,000 files are 7 days old and I only want to action those files., Remaining files are modified within 7 days and thus below loop go to next section. As while run for long, it is causing high cpu usage.

Please post a short but complete script that demonstrates the problem and use the code tags so that the formatting/indentation is retained.

I would just ask the same three things.

Just an additional comment: please give an idea of the data volume (number of files, number of megabytes).

Without such information, my first uneducated guess would be that it is not some much your script itself that is taking a lot of CPU usage, but the zip/gzip system commands that you issue in this script.

Thanks for the update. I could not use module specified because it is not available on system and unfortunately not allowed to install anything new on system.

To be more clear, in a single day system generates 5-6 lakh files in just 3-4 directories consuming around 10-14 GB space per day (individual file size is not huge - 12345 bytes, 5-6 digits number in bytes).

There are files of 4-5 different patterns but count is huge. No date timestamp in file name (e.g ABC_YYYMMDD.log), just random 8 digits number in file names (i.e. ABC_????????.log).

As per my observation, glob is not consuming high CPU. As soon as we are entering while loop, then can see CPU spikes. After commenting out mv, zip, unlink, just keeping print statements, I noticed CPU around 80%.

After adding below in "next" section, CPU is under control 10-12% or even low to 4% sometime.

select (undef, undef, undef, 0.250);

but already process running slow because of huge number of files and cant afford sleep.