formatYou can supply a second argument, a format specifier, which dictates how the byte is printed when it's not a printable ASCII character. This follows the python integer formatting conventions. By default it is '\x%02X' which prints \xhh. But for example you can use '.'

chopstr(s, n)Chops n characters off the beginning of string s, returning the part of s that remains. If n is negative, chops characters from the end of s.

E.g. to trim the last character off every line of a CSV file to remove trailing commas

SELECT writeln('trim.csv',chopstr(trim(data),-1)) FROM fileslines WHERE name = 'csvc.csv';

codestr(s [, format])Returns the string or blob s as a sequence of unicode code points.

Where the code point is a printable ASCII character the character is returned, otherwise \uhhhh is returned, where hhhh are four hexadecimal digits specifying the number of the unicode character.

Use this to discover which code points make up a glyph.

formatYou can supply a second argument, a format specifier, which dictates how the code is printed when it's not a printable ascii character. This follows the python integer formatting conventions. By default it is "\u%04X" which prints \uhhhh. But for example you can use '.' (see Crab's bytestr()function entry for an example)

Example: Select files with non ASCII characters in the nameTo select filenames with non ASCII characters you can type the characters directly

SELECT fullpath FROM files WHERE name LIKE '%café%'or you can use Crab's u() function to turn unicode escape strings into text

SELECT fullpath FROM files WHERE name LIKE u('%cafe\u0301%');However there is often more than one way to enter the same glyph, e.g. 'é' can be 'e\u0301' or '\u00e9'. The one you type at the keyboard may not match the unicode sequence in the filename, even if they look the same. To see the code sequence inside a string, use the codestr() function.

SELECT codestr(name) FROM files WHERE name LIKE '%caf%';To match 'é' whichever sequence has been used, Crab provides the normstr() function, which normalizes a code sequence to the Windows filename standard

CRAB> SELECT name, codestr(name) FROM files WHERE name LIKE normstr('%café%');{name: café, codestr(name): cafe\u0301}CRAB> SELECT name FROM files WHERE name LIKE u('%caf\u00e9%');CRAB> SELECT name FROM files WHERE name LIKE normstr(u('%caf\u00e9%'));{name: café}

eval(cmd [,arg1] [,arg2] ...)Runs and returns the output of operating system commands and other programs.

Use eval when you want to test the output of some command in a query's WHERE clause, or to return command output as part of a query's result set.

eval may execute multiple times for each row, and even for rows that don’t appear in the final results, depending on the whim of the query optimizer, so eval should only be used with commands that don't change filesystem data.

Parameters

The first argument should be the command name and later ones the command's arguments. Arguments are automatically escaped, so you don't need to worry about paths that contain spaces.

ExampleE.g. Use eval with the 'dir' command's '/q' option to find files with a particular owner

SELECT fullpath, eval('dir', '/q', fullpath) owner FROM files WHERE fullpath LIKE 'C:\Users\johnsmith\%' AND pathexists(fullpath) AND owner LIKE '%peterdalloz%';

Tips

eval returns a string.

Trailing newlines are stripped, other newlines appear as '\x0A'

See Crab's exec functionTo execute a command once for each query result row

exec(cmd [,arg1] [,arg2] ...)Executes operating system commands and other programs using data returned in query results.

Use exec when you want to run a command once for each query result row.

Typically used to run commands on files, using fullpath as one of the arguments.

ParametersThe first argument should be the command name and later ones the command's arguments. Arguments are automatically escaped, so you don't need to worry about paths that contain spaces.

Examples

E.g. Use exec with the COPY command to back up any file under the documents directory that was modified today

SELECT exec('copy', fullpath, 'C:\Users\johnsmith\backups\'||name||'.'||fileid) FROM files WHERE fullpath LIKE 'C:\Users\johnsmith\documents\%' AND modified >= date('now','start of day')

Explanation: The exec function is evaluated for every file meeting the criteria of the WHERE clause - every recently modified file. For each of these files the copy command is executed, with the first argument (copy from) being the fullpath of the file, and the second argument (copy to) being a string concatenated from the target directory plus the filename plus a dot plus its unique fileid number. We use the fileid number to avoid name collisions in case any files have the same name.

E.g. Copy files that contain specific text

SELECT exec('copy', fullpath, 'C:\Users\petergibbons\tpsreports\'||name||'-'||fileid) FROM fileslines WHERE parentpath = 'C:\Users\petergibbons\LR427\' AND data LIKE '%cents%' GROUP BY fullpath;

Explanation: Here the query is against fileslines, so the fullpath is returned for every matching line in each file. But we only want to copy each file once: GROUP BY fullpath achieves this.

Restrictions

exec is a delayed evaluation function. It sends output to the screen, but does not return results you can use in other parts of your query because the rest of the query has already finished evaluation when it executes.

For this reason it is only supported in the SELECT clause of a top level query, you can't use exec in subqueries, WHERE clauses, ORDER BY clauses, inside other functions and so on.

If you want to process the output of operating system commands, for example to test in your query's WHERE clause, see Crab's eval function.

Tips

Safety first: before using exec to change data it's good practice to try your query with the 'echo' command so you can check that the commands that will be executed are what you expect, e.g.

Explanation: The pattern "(?u)(\s|'|^)+(\w+)" has two groups: because the first bracket starts with ? it isn't a capturing group. The number 2 tells groupn to return matches for the second group. The (?u) element tells the regex engine to include unicode letters in \w matches

sepList separator. If stringList is a path use '\'. If stringList is a list of regular expression match results use '|', unless the default separator has been changed with the %matchsep commandMulti character separators are allowed.Blob separators are supported, such as X'09' to represent tab.

See Crab's other list-processing functions: itemcount, itemn, sublist

itemcount(stringList, sep)Returns length of stringList, counted as number of separators + 1.

Use when processing a string value as a list, e.g. to treat a path as a '\'-delimited list, to process a list of regular expression match results, or a row of csv data from fileslines.

Blank list elements are included in the count.

ParametersstringListThe string to be searched, interpreted as a list

sepList separator. If stringList is a path use '\'. If stringList is a list of regular expression match results use '|', unless the default separator has been changed with the %matchsep commandMulti character separators are allowed.Blob separators are supported, such as X'09' to represent tab.

ParametersstringListThe string to be sliced, interpreted as a list. iIndex of the required element. The first item of stringList has position 1. If startPosition is negative, count backwards from the end of the list, e.g. last list element has position -1.If you select a list element that is beyond the end of the list, NULL is returnedsepList separator. If stringList is a path use '\'. If stringList is a list of regular expression match results use '|', unless the default separator has been changed with the %matchsep commandMulti character separators are allowed.Blob separators are supported, such as X'09' to represent tab.Tips

itemn(X,i,sep) is equivalent to sublist(X,i,1,sep)

Use itemcount(X,sep) to find list length

Use Crab's iterator table to process multiple list items from a row of data

Paths that end in a directory have an empty string last element. E.g. 'C:\a\b\c\'is a list of length 5 where item 1 = 'C:', item 2 = 'a' and item 5 = ''

[] matches any character inside the brackets, e.g. [A-Z0-9] matches uppercase A thru Z and 0 thru 9 If ^ is the first character inside the bracket it makes the bracket match anything except the characters listed So e.g. [^A-Z] matches anything except upper case A thru Z

See Crab's other regex matching functions: groupn, matchn, matchcount

See How to RegexA short tutorial on regular expressions with Crab

matchcount(string, pattern) or matchcount(groupn(string, pattern, j))Returns how many matches were found when matching regex pattern to string, or were returned by groupn

Pattern matching isn't anchored, so can be matched anywhere in string, not just at the beginning.

By default regex matching is case sensitive.

Often used with the iterator table to return all the matches on separate rows

SELECT matchn(data,"\w+\'*\w*",i) wrd, count(*) FROM fileslines JOIN iterator ON i <= matchcount(data,"\w+'*\w*") WHERE fullpath LIKE '%sqlsys\qqtest\76.txt'group by wrd ORDER BY count(*) DESC LIMIT 5;

CRAB> select matchn(data,"\w+'*\w*",i) word, count(*) from fileslines join iterator on i <= matchcount(data,"\w+'*\w*") where fullpath like '%sqlsys\qqtest\76.txt' group by word order by count(*) desc limit 5;and 6209 the 4736 I 3277 a 3177 to 3010

See How to RegexA short tutorial on regular expressions with CrabSee Crab's other regex matching functions: groupn, match, matchn

matchn(string, pattern, i) or matchn(groupn(string, pattern, j), i)Returns the i'th match of regex pattern in string, or the i'th element from a list of groupn match results

Pattern matching isn't anchored, so can be matched anywhere in string, not just at the beginning.

By default regex matching is case sensitive.

Examples

E.g. Count word frequencies

SELECT matchn(data,"\w+'*\w*",i) word, count(*) FROM fileslines JOIN iterator ON i <= matchcount(data,"\w+'*\w*") WHERE fullpath LIKE '%sqlsys\qqtest\76.txt' GROUP BY word ORDER BY count(*) DESC LIMIT 50;

See How to RegexA short tutorial on regular expressions with CrabSee Crab's other regex matching functions: groupn, match, matchcount

metadata(path, metaDataFieldName)Returns the value of some metadata attribute for the filesystem object

Currently macOS only

normstr(string[,norm])Converts string to normal form, by expressing each unicode glyph as code points in a standard order.

Use it to match non ASCII characters in filenames or file contents

Example

CRAB> SELECT name, codestr(name) FROM files WHERE name LIKE '%café%';CRAB> SELECT name, codestr(name) FROM files WHERE name LIKE normstr('%café%');{name: Café de Flore, codestr(name): Cafe\u0301 de Flore}CRAB> select codestr('café');{codestr('café'): caf\u00E9}

norm

If you want to specify the normalization, normstr can take a second argument . This can be 'NFC', 'NFD', 'NFKC' or 'NFKD'.On Windows the normalization of filenames typed at the keyboard is usually 'NFC', on Mac the normalization of filenames is always 'NFD'. On Unix there is no standard. The default value of norm is 'NFD'

See Crab's codestr function To look at the code points which make up a string

pathexists(path)Tests whether a path - file or directory - exists at query run time. Returns 1 (True) if the path exists, 0 (False) if it does not.

Use it to avoid errors when running commands or fileslines queries against files that might have been moved or deleted since the scan.

Use it to check that you're not going to clobber anything before moving or renaming a file.

Examples

-- avoid error due to sha1 function on non existent file SELECT f1.bytes/1e9 as GB, f1.fullpath, f2.fullpath FROM files f1 JOIN files f2 ON f1.bytes>1e8 AND f1.fileid>f2.fileid AND f1.name=f2.name AND f1.bytes=f2.bytes AND sha1(f1.fullpath) = sha1(f2.fullpath) AND pathexists(f1.fullpath) AND pathexists(f2.fullpath);

-- avoid error due to query against fileslines for file that no longer exists SELECT fullpath, count(*) FROM fileslines WHERE parentpath LIKE 'C:\Users\johnsmith\tempfiles\%' AND pathexists(fullpath) GROUP BY fullpath;

-- avoid name collision when renaming dictories by removing spaces SELECT exec('rename', fullpath, replace(name,' ','')) FROM files WHERE fullpath LIKE 'C:\Users\OneDirection\%' AND type = 'd' AND NOT pathexists(parentpath||replace(name,' ',''));

short(s, n)Restricts restricts string s to n characters by removing characters from the middle of the string, and replacing them with '~'

Use it to shorten wide strings in column output mode, when the most interesting information is at the start and the end of the string, e.g. paths and filenames

stringListThe string to be sliced, interpreted as a list. startPositionStarting position for the sublist. This is the first element of the returned list unless len is negative.The first item of stringList has position 1. If startPosition is negative, count backwards from the end of the list, e.g. last list element has position -1.lenIf optional argument len is given, then sublist returned is len items long. If len is positive, returns len items from stringList, starting from and including startPositionIf len is negative, returns the abs(len) items from stringList, preceeding and not including startPositionIf len is omitted then sublist(stringList, startPosition, sep) returns all items through the end of stringList beginning with startPosition. sepList separator. If stringList is a path use '\'. If stringList is a list of regular expression match results use '|', unless the default separator has been changed with the %matchsep commandMulti character separators are allowed.Blob separators are supported, such as X'09' to represent tab.Tips

The convenience function itemn(X,i,sep) is equivalent to sublist(X,i,1,sep)

Use itemcount(X,sep) to find list length

Use Crab's iterator table to process multiple list items from a row of data

Paths that end in a directory have an empty string last element. E.g. 'C:\a\b\c\'is a list of length 5 where item 1 = 'C:', item 2 = 'a' and item 5 = ''

See Crab's other list-processing functions: inlist, itemcount, itemn

See Crab's above and below functionsAlternative function for truncating and trimming paths

To make byte strings use the SQLite blob prefix X with a string of hex digits

CRAB> select cast(X'c3a1' as text);á

Tips

Crab displays newline characters in query results as \x0A, so as not to mess up the row-oriented output.

See also Crab's bytestr and codestr functions, and the %encoding command

write(filename, [,data1] [,data2] ...)Writes query results to file

Data is appended to the file, one write operation for each result row. If the file doesn't exist, it is created. If you want a newline at the end of each row of data, use the writeln function instead.

Data is writtenusing the encoding specified with the %encoding command. The default is utf8.

filenameThe first argument is the name or full path of the file to write to

data1, ...The following arguments, are the data you want written. These can be strings, numbers or blobs. Numbers and blobs will be converted to strings,. If you use binary encoding blob data will be written unchanged.

ExamplesE.g. To copy a binary file, 40 bytes at a time

%encoding binary:40

SELECT write('C:\Users\johnsmith\myproject\cp001.tif',data) FROM fileslines WHERE fullpath LIKE '%\myproject\001.tif' "

Restrictions

write is a delayed evaluation function like exec. It sends output to the screen, but does not return results you can use in other parts of your query because the rest of the query has already finished evaluation when it executes.

For this reason it is only supported in the SELECT clause of a top level query, not in subqueries, WHERE clauses, ORDER BY clauses, inside other functions and so on.

Tips

Remember that any changes you make to the filesystem - files you delete or add - won't be reflected in the files table until you scan the changed directories.

See Crab's writeln functionTo write data one line at a time

See %encoding commandTo write binary files, or different character sets

Data is appended to the file, one line for each result row. If the file doesn't exist, it is created. If you don't want a newline at the end of each row of data, use the write function instead.

Data is writtenusing the encoding specified with the %encoding command. The default is utf8.

filenameThe first argument is the name or full path of the file to write to

data1, ...The following arguments, are the data you want written. These can be strings, numbers or blobs. Numbers and blobs will be converted to strings,. If you use binary encoding blob data will be written unchanged.

ExamplesE.g. To search and replace text in a set of files: This example reads data from every .c file in directory \HL2\, and writes the substituted text to a file that is given the suffix .003

SELECT writeln(fullpath||'.003', replace(data,'Kanal','Canal')) FROM fileslines WHERE extension = '.c' AND parentpath LIKE '%\HL2\' ORDER BY fullpath, linenumber;

writeln is a delayed evaluation function like exec. It sends output to the screen, but does not return results you can use in other parts of your query because the rest of your query has already finished evaluation when it executes.

For this reason it is only supported in the SELECT clause of a top level query, not in subqueries, WHERE clauses, ORDER BY clauses, inside other functions and so on.

Tips

Remember that any changes you make to the filesystem - files you delete or add - won't be reflected in the files table until you scan the changed directories.

See Crab's write functionTo write data without the newlines

See %encoding commandTo write binary files, or different character sets