FS_scan: Getting Detailed with Your Data

The basic elements in the previous section have been built upon to create a tool called FS_scan. It allows you to scan a directory tree returning useful information about the directory tree. The tool adds some features relative to the code snippet previously presented.

You can specify the root directory for the scan (see ./FS_scan.py -help)

It prints out the full path to the file, the size, the atime, mtime, and ctime, as well as the uid and gid including the password file entries for the corresponding values

It computes the access, modify, and change ages relative to the current date (he output is months)

For each directory the average age for the the three times is computed. It is printed in two forms; (1) years, months, days, weeks, hours, seconds,(2) months.

It prints out a summary at the end for the entire directory tree for averages for the three times. It also prints out the oldest age for the 3 times and what file is the oldest.

The application also produces a csv file (comma separated values) that is suitable for a spreadsheet. It contains a long list of information that can be used for sorting, plotting, or for generating trend data.

If you search a fairly large directory tree, you can generate a large amount of output. If you want to just get a quick directory level view you can choose a “short” option (e.g. FS_scan.py -short .). This option won’t print any file information to stdout (standard output) but it does print out the three average ages for the files in that directory. It’s not a bad idea to start with this option to get a quick overview of the directory tree and then get a more detailed view on certain parts of the tree.

Here is a sample of the “short” output. It is a little long but it is intended to show what FS_scan is capable of producing as well as illustrate how much information can be generated.

Notice that there is a single file in the tree that is the oldest for modify and change age, but there is a different file for the oldest access age. Also, notice that there are some listings that are for directories that don’t have any files in them (Number of Files: 0). Either they are just other subdirectories or they literally have no files in them.

Recall that FS_scan also produces a csv file as part of the scan. You can load the csv file into a spreadsheet and then sort the results or filter them or perform any operation you want. The number of columns is quite long but to give you an idea of what it looks like, below is a screenshot of the left-hand part of the sheet that corresponds to the above output.

Figure 1: FS_scan csv Output in OpenOffice – Left-Hand Part of Sheet

Figure 2: FS_scan csv Output in OpenOffice – Right-Hand Part of Sheet

The sheet has more rows than can be shown, but these screenshots at least give you a feel for what kind of data is contained in the csv.

FS_Scan Next Generation

Many people are probably thinking about how to use such detailed information from FS_scan for creating trends or learning more about the data. Ideally you would run FS_scan once every day or so and keep the output. Then periodically you could run a report tool across the scans and get information on certain aspects such as,

The progression of the averages of file age

The rate of the creation of new files

The rate of deletion of existing files and approximately how old they were when deleted

How often files are accessed

How often files are modified

The growth of files and/or directories

You can even take this information and do the analysis on a per-user basis to find the “power” users. However, creating this information is not easy.

The next generation of FS_scan is being developed so that in addition to the creation of a csv file, it will create a database file using sqlite. Then you can take the database snapshots and perform operation on them to generate information such as that previously mentioned.

Next Steps – What Do You Want to Know?

Currently, FS_scan creates a csv file suitable for spreadsheets that you can use for generating reports or even plotting data. The next generation FS_scan is being developed to create a sqlite database and scripting to generate useful information. However, the tool is only as good as how you use it. So if you have any ideas or comments about how you would use this information please post them to the site.

Really rarely do I encounter a blog that’s both educative and entertaining, and let me tell you, you could have hit the nail on the head. Your idea is outstanding; the issue is something that not enough people are speaking intelligently about. I am very pleased that I stumbled across this in my search for something relating to this hcg

I appreciate that you produced this wonderful article to help us get more knowledge about this topic. I know, it is not an easy task to write such a big article in one day, I’ve tried that and I’ve failed. But, here you are, trying the big task and finishing it off and getting good comments and ratings. That is one hell of a job done! http://steezmatic-designs.com/forums/

I simply want to tell you that I am new to weblog and definitely liked this blog site. Very likely I’m going to bookmark your blog . You absolutely have wonderful stories. Cheers for sharing with us your blog Schlüsseldienst Berlin

Its like you read my mind! You seem to know so much about this,like you wrote the book in it or something. I think that you could do with some pics to drive the message home a bit, but other than that, this is great blog. this https://www.youtube.com/watch?v=I_Y1O8yHXTE

Really great post, Thank you for sharing This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Please keep it up! https://www.youtube.com/watch?v=IXG4dC_cHXM