Download Crab

Crab is free for evaluation and non-commercial use. The unlicensed version sends a no-data ping to allow us to count active users. Licensing deactivates this and gets you Next Business Day support. Use is covered by the End User License Agreement

Crab for macOS is compatible with OSX Mavericks (10.8) and higher.

If you earn money while you're using Crab, you need a license.If you don't, you don't.

You may now need to unzip it by double clicking in Finder, depending on which browser you use

From your download directory copy/paste or drag the 'CrabHome' folder into your home directory. For user johnsmith that would be '/Users/johnsmith'

Now you can run Crab from Terminal by typing its path ~/CrabHome/CrabExe/crab but to save typing and get full functionality it's strongly recommended to use a text editor to copy/paste one of the aliases below into the file called '.bash_profile' in your home directory.

N.B. '.bash_profile' is normally hidden in Open/Save dialogs, as are all filenames that start with a dot. Press Cmd + Shift + . to reveal them.

At the command line give Crab the paths for one or more directories you want scanned, their subdirectories will be scanned too. It's best to start by scanning a project directory rather than your whole disk, to get the hang of things. E.g. to scan a directory called myproject:

$ crab ~/myproject

If the path contains spaces, wrap it in single quotes:

$ crab '/Users/johnsmith/my project'

N.B. Ctrl + C will quit Crab or cancel a scan

When the scan is finished you'll get a count of files and directories scanned, and a CRAB> prompt where you can type SQL to report on files, and to process them.

First query

Crab uses SQLite flavor SQL, it's mostly ANSI standard: If you come from SQL Server remember to end each query with a semi-colon.

Crab stores every directory path with a backslash on the end. You can tell by looking at a path whether it is a file or a directory. E.g. '/Users/johnsmith/something' is a file, and '/Users/johnsmith/somethingElse/' is a directory.

To match the path for a directory you need to include the backslash. E.g. to list everything in the 'Backups' directory:

To search recursively in a directory, use LIKE with a wildcard pattern. E.g this query reports all .docx files anywhere in the 'Documentation' directory or its subdirectories:

SELECT fullpath FROM files WHERE parentpath like '/Users/johnsmith/MyProject/Documentation/%' and extension = '.docx'

In the next section you'll learn about editing multi-line queries, and see some basic query examples.

Longer queries

Typing long commands at the command line can be awkward, whether it's Crab SQL, Bash, or another shell language.

If you type each query on one line you can press up-arrow on the keyboard to step through your query history. But it's usually clearer to lay out each query over multiple lines, like the examples on this website. To do this, write your query in a text editor such as TextEdit then copy and paste to the Crab command line. You can paste multi-line queries in one go, you don't need to do it line-by-line.

This query lists all files with a .zip extension that have "proposal" or "rfp" in their name:

SELECT fullpath FROM files WHERE extension = '.zip' and (name like '%proposal%' or name like '%rfp%') ;

If you get too many results add a LIMIT clause to your query, to restrict results to a specified number of rows.

This query lists the five biggest files scanned:

SELECT fullpath, bytes FROM files ORDER BY bytes DESC LIMIT 5;

This query sums file sizes and groups the total by the directory they're in, to give the five biggest directories:

SELECT parentpath, SUM(bytes) FROM files GROUP BY parentpath ORDER BY SUM(bytes) DESC LIMIT 5;

This query counts files by year and month they were modified

SELECT strftime('%Y-%m',modified) yrmon, count(*) FROM files WHERE type = 'f'GROUP BY yrmon ORDER BY yrmon DESC;;

Continue down this page to get a taste of the different ways you can use Crab, or

To get an introduction to the SQLite flavor of SQL see SQLite SELECT (offsite)

To see example Crab queries for lots of different scenarios, browse the Use Cases

Query tips

Ctrl + C will interrupt a running query

Path abbreviations such as . and .. aren't useful for matching paths because we're simply matching text. Instead use SQL string pattern matching with the LIKE keyword and wildcards % and _

Use the %mode command to change layout of query results. The default display of results is dict format, but it's often convenient to use %mode line (each output field on a separate line) or %mode list (comma separated output, with field names on header row)

If you want to pattern match with regex, use the match operator in place of like

SQLite isn't case sensitive, queries here have some keywords written in upper case to make them more readable.

Exit Crab

At the CRAB> prompt just type

%quit

General tips

When editing text at the macOS command line, or at the CRAB> prompt, you can reuse earlier commands by pressing up-arrow on the keyboard. Undo is Ctrl + _ To move cursor word‑left and word‑right use Alt + left and Alt + right

If you start Crab without giving it any path to scan, you can continue querying the previous scan results

You can press Tab after the first two letters of most keywords to complete the keyword

You can write a query in a text editor like TextEdit and paste it into the Crab command line in one chunk, even if it's a multi-line query

Check out the "Documentation" menu at the top of this page. It has thorough reference info on Crab tables, launch options, commands and functions

To avoid your machine sync'ing to the cloud every time you scan, turn off sync of the CrabHome directory to Dropbox or other cloud services.

Use the %show command at the CRAB> prompt to see current Crab settings, including the date and path of the current scan

Use the %help command at the CRAB> prompt to get reference information on Crab tables and functions

Instant SQL and Bash scripts

Use the -batch option to run Crab without an interactive Crab prompt; to scan directories or execute SQL from the macOS command line, or from inside a Bash script.

In a Bash script or a scheduled job you'll have to use the full path for Crab, or add directory 'CrabHome/CrabExe/' to the PATH, because aliases only work at the macOS command line.

Unattended jobs

Use the -batch option for unattended jobs, such as a scheduled scan.

E.g. Here's a command which will scan your whole drive to a database file called 'wholedrive.crdb' then exit Crab.

crab -db wholedrive.crdb -batch /

To open an interactive query session on this scan data, start Crab with the same database file, but without a scan path

crab -db wholedrive.crdb

Crab SQL at the macOS Command Line

You can use the -batch option to execute a query at the macOS Command line, we call this "Instant SQL". Results will be returned to the macOS Command Line, allowing you to pipe output to other programs. Write SQL inside single quotes, after any crab options and scan path.

E.g. This example will recursively scan the current directory (denoted by the dot) and return a list of all text files. Output is piped to the Bash 'more' command which shows the output one screenful at a time.

You can also use the -batch option to execute Crab SQL in a Bash script. Typically you'd do this to include a query as part of a larger job, such as a data import process; or to save as script files any SQL queries you use frequently at the macOS command line.

E.g. The following line in a Bash script is part of a data import: It scans the 'Import' directory and copies any .csv files, removing the header row from each.

Here is an example of a simple query that is useful at the macOS command line. It scans the current directory, and returns the fullpath of every object in it (it doesn't scan subdirectories because of -maxdepth 1).

crab -batch -maxdepth 1 . "SELECT fullpath FROM files"

If you use this frequently you can save it in a script file somewhere on your path. Then you can just type the name of the script whenever you want a list of full paths for the contents of your current directory.

Instant SQL tips

Crab's -maxdepth parameter specifies how deep you want to scan, -maxdepth 1 means the specified directory only -maxdepth 2 includes children one level down. Default is to scan the whole tree.

In batch mode Crab defaults to comma separated output, actually list mode. You can change this with start up options such as -dict and -column See "Documentation" Menu, "Launching Crab" for details.

Querying file contents

You can search the contents of files, by querying the fileslines table. The fileslines table has the same fields as the files table, plus two extra fields: data and linenumber. The data field represents the text of all files scanned, one row per line. The linenumber field has the value 1 for the first line of each file.

fileslines is a 'virtual table': it doesn't index file contents, it reads the content of files at the time you query them. Filters on name, fullpath, extension and so on are applied before reading file contents, so files will only be read if they match the criteria.

E.g. This query shows all lines containing 'TODO' or a 'FIXME' in .c files anywhere below the 'MyProject' directory

SELECT fullpath, linenumber, data FROM fileslines WHERE parentpath like '/Users/johnsmith/myproject/%' and extension = '.c' and (data like '%TODO%' or data like '%FIXME%');

Crab's default settings are configured for UTF-8 and ASCII text files, any non UTF-8 character will cause your query to skip the rest of the file, so as to exclude contents of binary files from query results.

To search files that have occasional exotic characters change the %encoding setting from utf8:skipfile to utf8:ignore (filters out invalid characters) or utf8:replace (replaces invalid characters by ⃞ )

Processing files

Use Crab's exec() function to run an operating system command once for each query result row. It's typically used to run a command on the files returned by a query, using fullpath as one of the arguments.

exec's first argument should be the command name, or the full path of a program; followed by the command's arguments. Arguments are automatically wrapped in quotes, so you don't need to worry about paths that contain spaces.

E.g. This query copies files to the 'python backups' directory by runing the bash 'cp' command with the -f option on every python file modified since midnight yesterday.

Remember that after you move or delete files, Crab's scan data will be out of date; you'll need to scan again before processing the file in its new location.

If files may have been moved or deleted since the last scan, you can avoid error messages by using the pathexists() function to check that the file or directory is still there at the time the query is run. E.g. to move a bunch of files when some have been deleted since the last scan:

exec() writes every command that is executed to your screen, together with its output. If you are running many commands this will be slow, and the screen will be cluttered.

You can avoid this by throwing away the output; use the following command before running the query:

%output /dev/null

Any error messages will still go to the screen, as will subsequent CRAB> prompts

To restore output to the screen do this:

%output stdout

Whole disk scans

Each time you scan with Crab the previous scan data is overwritten. To avoid overwriting a scan of the whole disk, you should give it a name so it won't be overwritten by the next quick scan of some project directory.

You do this with Crab's -db option. It stores scan results in a Crab database file you specify (by default scan results are stored in 'CrabHome/CrabData/default.crdb')

This command will scan your whole disk and store the scan data in file 'wholedisk.crdb'

crab -batch -db ~/CrabHome/CrabData/wholedisk.crdb /

The -batch option scans without giving a CRAB> prompt afterwards; use it if you want to run the scan as a scheduled job.

To query this scan data from an interactive CRAB> prompt, specify the database file with -db without giving any path to scan

crab -db ~/CrabHome/CrabData/wholedisk.crdb

Whole disk query tips

By default Crab won't scan mounted drives it meets during a scan, use the -mount option when launching Crab to change this.

If you have millions of entries in the scan data, you can make queries faster by avoiding wildcards on the left side of search strings, so Crab can use indexes

E.g. this is slow

SELECT fullpath FROM files WHERE fullpath like '%/myproject/%Budget%.xls';

This is fast

SELECT fullpath FROM files WHERE fullpath like '/Users/johnsmith/myproject/%Budget%.xls';

Comparing contents

SQL's set based logic is well suited to comparing contents of directories and files.

E.g. Here is a query that lists any files in directory A that aren't in directory B

SELECT a.name FROM files a WHERE a.parentpath = '/Users/johnsmith/A/' and a.name not in (SELECT b.name FROM files b WHERE b.parentpath = '/Users/johnsmith/B/');

For more examples like this, see the "Compare Directories" page on this website, under the "Use Case" menu.

This query lists any lines in fileA.txt that aren't in fileB.txt

SELECT a.data FROM fileslines a WHERE a.fullpath = '/Users/johnsmith/fileA.txt' and a.data not in (SELECT b.data FROM fileslines b WHERE b.fullpath = '/Users/johnsmith/fileB.txt');