This article introduces a Python script I’ve published on PyPI: csv-to-sqlite. It’s a command line tool that takes CSV files and dumps their contents into a SQLite database. There are some options you can set, but the basic gist of it is this: for each file, the script creates a database table with the same name, tries to guess the data type, then copies all the data into the database. The source code is on GitHub under the MIT license.

Preamble: Why bother?

I’ve been looking for a database of Pokémon data for a long time. I’ve even considered writing a scraper that would download them from various webs. Then, quite recently, I’ve stumbled upon the wonderful Pokéapi that exposes just such data via a RESTful API. The data seems quite complete, way better than I would’ve gotten by scraping. And it’s open source. And it even contained all the data in many CSV files.

So naturally I wanted to play with them, preferably by using LINQ, my query language of choice, using LINQPad. In case you don’t know LINQPad, it’s a great tool that lets you play with C# and LINQ snippets, and lets you connect to many different data sources. (In case you’re interested, check out their challenge.)

But I can’t very well connect LINQPad to a folder of CSVs as a data source, so I’ve decided to write something that’ll dump the data in a SQLite database.

How to?

A little warning first: the script assumes the first row of the file contains column names. They are then used as names for the DB columns.

The usage is simple. First, install via pip:

pip install csv-to-sqlite

Have a list of files you want to dump?

csv-to-sqlite -f abilities.csv -f egg_groups.csv

If you don’t specify otherwise, the output file’s name will be current-folder-name.db.

Want to dump a lot of files you don’t want to specify individually? The script accepts file paths from standard input. So in PowerShell:

ls *.csv | % FullName | csv-to-sqlite -o outputDatabase.sqlite -D

This takes all the CSV files in the current folder, dumping them into a database file called outputDatabase.sqlite, dropping any existing tables and re-creating them.

By default, the script tries to guess column’s data types. To turn this behaviour off (and make all the columns of type string), use the --no-types parameter.

You can of course print out help by running

csv-to-sqlite --help

Other notes

Implemented in Python 3. The only external dependency is Click. It’s the first time I’ve used it: it might be good, but its name is nowhere as great as my previous (now deprecated) favourite, aaargh.

This was the first time I’ve published on PyPI and I’m honestly appalled by how horrible the user experience is. Getting the package there was probably the longest, and definitely the most annoying part of this project. There are many confusing tools, most of them don’t really work, and even the one that finally did left things to be desired. (I had to upload some parts of the project description by hand, since the tool/web simply ignored them, for example.) The website is truly horrendous, too.