Read file contents

To query file contents select from the fileslines table. This has one row for each line of text in every file scanned. It has the same fields as the files table, but also a data field which contains the text, and a linenumber field.

fileslines is a virtual table, it reads contents of files at query run time and does not replicate file contents in the Crab database. File details in the WHERE clause determine which files are opened and read.

Set the right encoding

Crab's default settings are configured for UTF-8 and ASCII text files, any non UTF-8 character will cause your query to skip the rest of the file, so as to exclude contents of binary files from query results.

To search files that have occasional exotic characters change the %encoding setting from utf8:skipfile to utf8:ignore or utf8:replace, e.g.

%encoding utf8:ignore

This will include contents of binary files in queries against fileslines, unless you exclude them with WHERE clause criteria.

To search files that are not ASCII or UTF-8, specify the appropriate encoding. E.g. for Korean

E.g. Copy every file listed in 'files.txt' to directory 'PublishMe'

'files.txt' must contain a valid file name or full path on each line, because the query is executing 'copy' on each one, and will halt if an error is returned.

If 'files.txt' lines include whitespace, remove it using the trim function: in the query change data to trim(data)

If 'files.txt' contains file names without path prefixes, they will be assumed to be in the directory where you invoked Crab.

Crab needs to have scanned the file 'files.txt' in order for it to be known to fileslines, but does not need to have scanned the files listed there, nor the target directory 'PublishMe'.

The '/y' option means that copied files will overwrite anything with the same name. Use pathexists() if you don't want to overwrite.

Write to files

writeln() and write()

Use Crab functions writeln() and write() to write to one or more files from a query.

SELECT writeln(FileName, DataToWrite) FROM ...

or

SELECT write(FileName, DataToWrite) FROM ...

Specify the file to write to, FileName, with a string giving the file name or full path. This can be calculated at query run time, possibly different for each result row. The string DataToWrite will be appended to the file.

Use the writeln() function for creating line-oriented files, it writes a newline at the end of each string whereas write() does not write any delimiter.

Like exec(), write() and writeln() are only allowed in the SELECT clause at the top level of a query. This is because they are delayed evaluation functions: The rest of the query must have finished evalution before strings can be passed to them.

Use the %encoding command to specify the behaviour you require when byte sequences are read that are not valid characters in your chosen encoding, e.g.

%encoding utf8:ignore

Valid error handlers are:ignore - the invalid character is ignoredreplace - the invalid character is replaced by a placeholder symbolskipfile - the invalid character is treated as end of file marker

Tip: Suppress writeln() or write() echo to screen

By default write() and writeln() output every write command executed to your Command Line window. If you are running a write function over hundreds of thousands or millions of rows this will be slow, and the screen will be a mess.

To discard the output, use the following command before running the query:

%output NUL

Any error messages will still go to the Command Line window, as will subsequent CRAB> prompts

To switch output back to the screen do this:

%output stdout

If you want to keep a log of the output rather than discard it, use a filename in place of NUL, e.g.

%output 'C:\somepath\MyLogFile.txt'

Tip: Memory constraints

The total number of rows that can be written to file by a query is limited by memory constraints.

On macOS we've tested with hundreds of millions of rows without any problems. On Windows the limit is currently much lower, a few million rows, because on that platform Crab is currently a 32bit app.

Tip: Alternative output modes

When you query file contents you might prefer a different output format for query results.

Each field on a line of its own

%mode line

One column per field truncated to 10 chars, use the %width command to change number of chars.

%mode column

Comma delimited output fields

%mode list

To return to the default:

Dictionary output mode

%mode dict

column and list modes have a header row that can be toggled on or off with the %header command