Using peewee to explore CSV files

I recently heard a talk from a coworker
wherein one of the things he discussed was automatically converting CSV data
for use with a SQLite database. I thought this would be a great thing to add to
peewee, especially as lately I've found myself on several occasions working with
CSV and battling with it in a spreadsheet. It would be much easier to load it
into a database and then query it using a tool I'm familiar with.

Which brings me to playhouse.csv_loader, a new module I've added to the
playhouse package of extras. It's hopefully really easy to use. Here is an
example of how you might use it:

>>> fromplayhouse.csv_loaderimport*>>> db=SqliteDatabase(':memory:')# Create an in-memory sqlite database# Load the CSV file into the in-memory database and return a Model suitable# for querying the data.>>> ZipToTZ=load_csv(db,'zipcode_to_timezone.csv')# Get the timezone for a zipcode.>>> ZipToTZ.get(ZipToTZ.zip==66047).timezone'US/Central'# Get all the zipcodes for my town.>>> [row.zipforrowinZipToTZ.select().where(... (ZipToTZ.city=='Lawrence')&&(ZipToTZ.state=='KS'))][66044, 66045, 66046, 66047, 66049]

Let's load up all the files and run some queries! All the files are about 1.6GB
of CSV data (~2.3M rows) so it takes a few minutes to load. I'm passing in a
parameter sample_size=0 so that the introspector doesn't try to "introspect"
all 1000 files. I'm also specifying db_table='reddit' so that all the CSV files
are read into the same table (otherwise they would be read into tables based on
their filename). Immediately after loading the data, I'm going to print out
all the field names so we know what we're looking at:

Happy hacking, please feel free to leave any questions or comments below!
If you're interested in learning more about peewee check the docs: http://docs.peewee-orm.com/

Lastly, if you're really interested in exploring data with python, I would highly
suggest looking into the pandas library. I'd also
suggest iPython notebook, which is a web
frontend for iPython (the improved python shell) that has integrated support
for matplotlib and markdown-formatted text.

Thanks for reading! As a reward for making it this far, here's a picture of my cat Huey: