I have csv files and would like to treat them as tables of a database. Of course I can transform these files into tables. But it would be nice to have a possibility to do it directly in the command line (in a way like grep, head, tail, sort and awk are used).

For example I would like to select a particular column of a file (given by its name), or select rows where certain columns have certain values, or order by one of the columns.

2 Answers
2

Since you tagged this with python and ipython, I assume you'd like to see what it would be like to do this from an ipython prompt. So, here's a trivial CSV file people.csv:

first,last,age
John,Smith,20
Jane,Smith,19
Frank,Jones,30

Now, here's an ipython session using it:

In [1]: import csv
In [2]: from operator import *
In [3]: with open('foo.csv') as f: people = list(csv.DictReader(f))
In [4]: [p['age'] for p in sorted(people, key=itemgetter('first')) if p['last'] == 'Smith']
Out[4]: ['19', '20']

It takes one line to read a CSV file into memory as a list of dicts.

Given that, you can run list comprehensions on it.

So, the p['age'] selects a column by name; the sorted(people, itemgetter('first')) orders by another column, and the if p['last'] == 'Smith' is a where clause.

You can even do group by clauses with a little help from itertools, although here you'll definitely want to define helper functions both for groupby and for the aggregates to apply to groups, and I think it still might be pushing the limits a bit…

There are no "indexes". With a database, selecting the 20 Smiths out of 100000 people only needs log(100000)+20 steps; with a list, it needs 100000 steps.

You have to order the operations appropriately. When you want to order, then filter rows, then filter columns (as in the example above), everything is easy; if you want a different order (especially if you want to order or filter by columns you aren't selecting), you may need to write more complex comprehensions, while with a database there's no problem at all.

Keep in mind that it's only about 5 lines of code to convert a CSV file to a sqlite table. So, I think you'd be better off with a script that just runs your 5-line Python program and dumps you into a sqlite command line.