dat is essentially git for data, so the data repository needs to be initialised before it can be used:

dat init

Next, start the dat server to listen for incoming connections:

dat listen

Data can be piped into dat as line-delimited JSON (i.e. one object per line - the same idea as CSV but with optional nested data). Happily, this is the format in which Twitter’s streaming API provides information, so it's ideal for piping into dat.

I used a PHP client to connect to Twitter’s streaming API as I was interested in seeing how it handled the connection (the client needs to watch the connection and reconnect if no data is received in a certain time frame). There may be a command-line client that is even easier than this, but I haven’t found one yet…

The streaming API uses OAuth 1.0 for authentication, so you have to register a Twitter application to get an OAuth consumer key and secret, then generate another access token and secret for your account. Add these to this small PHP script that initialises Phirehose, starts listening for filtered tweets and outputs each tweet to STDOUT as it arrives:

Run the script to connect to the streaming API and start importing data: