KS – an open source bash script for indexing data

ABSTRACT: This is a keywords searching tool working on the allocated, unallocated data and the slackspace, using an indexer software and a database storage .

Often during a computer forensics analysis we need to have all the keywords indexed into a database for making many searches on it in a fast way.

We could use strings and grep, for searching the keywords, but we cannot have a database and an engine, then we can’t search them inside many formats, like compressed files, including the ODT, DOCX, XLSX, etc..

So, I tried to solve this problem, first of all we need to extract, what I call “spaces”:

1) Allocated space;

2) Unallocated space;

3) Slackspace;

Then we can run the indexer against these three spaces and we can extract all the keywords inside them.

We must remember that we have two kind of unallocated spaces, the first is all the deleted files and the second is all the files those are not in the deleted set, but they are still on the memory device (hard disk, pendrive, etc.).

For extracting these file we need to use the data carving technique, that consists into the search for the file types by their “magic numbers” (headers and footers), this technique is filesystem-less, so we can gather all files, allocated and unallocated (including the delete files too), so we need to eliminate duplication generated by carving.

The slackspace can be extracted by the TSK (The SleuthKit ) tools and put into a big text file, we have to remember that slackspace is all the file fragments present into the unused cluster space.

Inception

We have to create a directory named, for instance, “diskspace”.

We can mount our disk image file (bitstream, EWF, etc) into a sub-directory of diskspace, e.g. /diskspace/disk and so we can have all the allocated space.

Now, we have to extract all the deleted files including their paths and put them into “/diskspace/deleted”.

We have to run the data carving and put all the results into “/diskspace/carved”, we can use the data carving only on the freespace of the disk and then we must delete the duplicates with the deleted files.

Finally we can extract all the slackspace, if we need it and put it into “/diskspace/slack”.

Now we got:

/diskspace
|_disk
|_deleted
|_carved
|_slack

We only need a “spider” for indexing all these spaces and to collect all the keywords into a database.
For this purpose there is a program in the open source world: RECOLL that indexes a content of a directory and allows various quests. (http://www.lesbonscomptes.com/recoll/)

KS – This is a keywords searching tool. sudo bash ks.sh for running it. It mounts a DD image file; It extracts all deleted files; slackspace; It makes a data carving on the freespace only; It indexes all by RECOLL.
You need:The Sleuthkit (last release)
Photorec
MD5Deep
RECOLL
It stores the index DB and the recoll.conf in the chosen output directory.
NEW file formats added and README.txt for the HowTo expand the search range.
Website:http://scripts4cf.sourceforge.net/tools.html

The RECOLL allow the search for keywords also working in compressed files and email attachments in short, once indexed all the content you had to be able to search for keywords or phrases, just as you would with Google.
As all the open source projects I have to thank to the collaboration of some friends and developers.