You don't have to keep writing the same old database access code
over and over again -- instead, it's simple to roll your own database
wrapper and use Python syntax to retrieve, search, and update data.
Implement the classes and methods shown here to speed up your Python
database development process.

About

Have you ever found it tedious to mix SQL and other languages, or been
reluctant to write the same four lines of code again to do a simple
database query? This article can help you eliminate the drudgery
involved in database access, and make your programming time more
efficient, by wrapping simple transactions in friendly native Python
syntax. By making the database emulate regular Python objects, you can
remove a source of friction and frustration from your development
process. Your time spent programming will be more efficient and
productive when you can focus on the task at hand, without being
constantly sidetracked by unimportant details like where the cursor
object is, or whether you need to escape-protect the data in the next
query.

Another benefit of using native syntax is better portability. This
approach makes it easy for you to change databases without having to
rewrite any of your application code. Simply modify a few lines in the
database wrapper, and you can support an entirely new database. Though
this article focuses on MySQL, the code should work easily with
PostgreSQL or even SQLite with only minor modifications.

The approach described here is useful any time you want to write a
quick program to access a database. You may find it convenient if you
want to generate a handful of new reports quickly, or perform basic
modifications to your data.

Selecting Rows

To begin, you'll want to convert the most frequently-used operations.
The most basic operation is reading or selecting data, so I will focus
on that first.

Applications frequently need to do things such as "retrieve record
87", or "print results 30-40." This calls for an array or list
structure, which is easy to do in Python. Normally, to show record 87,
you would need to execute something such as the following:

Four lines is needlessly long for what is essentially an array lookup.
It barely begins to utilize the full power of SQL, yet this type of
thing is quite common. So, what if you could use simpler syntax to do
the same thing? The following would be much easier:

display(books[87])

That sort of easy syntax is supported in Python, using its special
class methods. Any object can emulate an array by implementing special
functions. In this case, you want to create a Table class with a
method called __getitem__().

Using this class, you can easily access a database table as if it were
a native Python
list. It merely provides read-only access to single rows, but it's
a good start. The following short program demonstrates its use:

Condensing the database lookup into a single short expression greatly
increases its expressive power. When you must write code to keep track
of a cursor, execute queries, and fetch rows from the result, the
operation just doesn't fit into as many places. Using a word or two
for the same concept allows you to incorporate such condensed
operations into other ideas more clearly and easily.

Determining Data Length

Something important was missing from the code mentioned so far. Many
programs want to know how much data needs to be processed. Most
search programs return this with the results, such as "results 80 to
100 of 487." That 487 is important to many search engine users.
Luckily, the length operator in Python is easy to
implement. Simply add the following to the Table class:

Iterating Through Rows

The loop in the previous example was better, but it could have been
simpler still. Python supports iterators, which would have been
a little easier. It should be possible to process every item in a
table with syntax such as:

for book in books:
print book

This is simple to do, but first it would be a good idea to refactor a
little. The Table class will be doing a lot of database queries, and
pumping all those queries through one function will help make
debugging easier. First, add a self.debug = 1 line to the
Table's __init__() function, and then add a method for
queries:

To add iteration, simply add two methods to the Table class:
__iter__() and next(). The class does not
need to be derived from a built-in iterator type, it merely needs to
implement the same policies. This "Duck
Typing" technique is part of the "policy over mechanism" theme
common in Python. It provides most of the same benefits as more strict
languages, without losing flexibility. Adding the following code turns
the Table class into an iterable object:

By adding this code, the Table class now acts like an iterator,
allowing you to use the easy syntax at the beginning of this section.

Putting the concepts of an idea, such as database access, into simpler
terms enables still more and better innovations. For example, a car
engine is a very complicated device with countless parts and design
considerations. It takes a lot of work to fully describe an engine,
and to attach it to wheels to make a car. But once you fully
understand the concept of an engine, you no longer need to describe it
in detail. It becomes sufficient to say "an engine with wheels." Then,
after building such a car, you can easily create new concepts by
reusing the same old engine ideas. Perhaps after the car, you decide
to attach a blade and build a lawn mower. And then, attach wings and a
propeller to make a plane. Creating these new things would be vastly
more difficult if you had to re-create the concept of an engine each
time. Or, in this case, if you had to keep writing the same code over
and over, each time you needed to perform an operation you've done
before.

Using Data "Slices"

Databases tend to be large, and accessing an entire data set can be
very slow. If you have 50,000 records in a table, and only need twenty
of them right now, grabbing the whole set would be very inefficient.
However, taking only one record at a time is also inefficient. To
solve this, use slices. Python provides slicing for list-like objects,
which can select a range of records quickly and easily.

The previous examples executed one database query for each row accessed
in the table. Pulling tens or hundreds of records at a time would be
more efficient though. The following demonstrates the syntax for taking
a slice of a list, or a list-like object:

# create a list of all items to go on the third page
page3 = books[20:30]
# or, pull ten records into a list, and loop over them
for book in books[20:30]:
print book

To implement slice support in the Table class, modify the
__getitem__() method. Both individual lookups and slice
lookups are implemented with this method, and determining which type
of result to return is simply a matter of checking whether the
parameter is an integer or a slice. The built-inisinstance() function, along with the types module
can help you decide what to do.

The Slice object contains three useful attributes: start,
stop, and step. The step part
will probably not be useful, because SQL databases don't generally
support step sizes other than one. However, start and
stop are useful, and translate easily into arguments to
MySQL's LIMIT parameter. The start attribute
is used directly as the first parameter. Subtracting
start from stop produces the second
parameter -- length.

Different databases use different syntax for selecting data slices,
which can become confusing if you work with more than one. It can be
difficult to remember if the LIMIT clause takes
first, last as arguments, or if it uses first,
length, or if it expects something else. This tends to cause
silent errors in programs because the syntax is still correct --
accidentally getting too many records, or too few, or even none at
all. Hopefully, using the easy slice syntax will help you avoid
that type of error.

Searching and Sorting

So far, this article has ignored two of the most important aspects of
databases. Searching and sorting, or finding and arranging data, let you
narrow and organize results automatically. If you were writing an
address book application, the user would often ask questions such as
"Who do I know named 'Bob'?" and expect to get a sorted list. The
following code is a simple and feasible way to implement such a query:

Python does not provide operators for searching or sorting, so a class
method must suffice. These methods are very simple, because SQL
implements them as parameters to SELECT instead of
allocating separate commands for them. The only work the search and
sort methods need to do is save data for later use:

Making these functions actually do something requires modifying the
other class methods. The self._search and
self._sort attributes must be defined, and inserted into
queries where appropriate. The following methods should be modified:

Table.__init__(): Set self._search = "" and self._sort = "" to avoid accessing them before they are defined.

Now, at this point, you may have noticed that some SQL slipped into
the interface for the Table class. The parameters for
Table.search() and Table.sort() are exactly
what you would write in a WHERE or ORDER BY
clause in a normal query. The table's name is also placed directly
into the query. This affords some interesting opportunities, or
tricks.

Some simple tricks include sorting by several columns, or searching with
more than one condition. A more complicated trick is to turn the Table
into something like a view, by connecting it to more than one database
table. The following code demonstrates all three techniques:

The preceding code tells Python to get rows from two tables, book
and series. The search method then adds two additional
instructionsójoin the tables by using the series ID, and ignore
anything from 1995 or earlier. Then sort the results first by the name
of the series, and then by the year each book was published. This has
the effect of grouping related items together and showing them in
chronological order. The code then chooses three interesting columns
from the data, and prints the information in a human-readable form.

Inserting Rows

Reading data is useful, especially if you need to build reports or
otherwise sort through data. But you probably will want to write to the
database too. The simplest way to write is to insert new rows, which you
can achieve with the following syntax:

contacts.insert('', "Stanley", "Spudowski", "1965-01-01")

This is similar to a direct SQL query, but even simpler. And, because
of the way the Python database
spec works, you won't even have to worry about escaping data to
make it safe. The spec requires automatic escaping for anything in the
second parameter of the cursor's execute() function. If
you ever plan to insert data supplied by a user, it is a good idea to
use that protection, because it will prevent possible exploits in your
program. Doing this requires a minor modification to the
Table._query() method:

With that modification in place, you can now write an
insert function. All it needs to do is pass the data to
your database in a format compatible with the DB API spec. However,
because it needs to support tables of any size and shape, the format
string should be created dynamically. The following creates a format
string such as "%s,%s,%s,%s" according to the length of
the row it is given, and then executes the query.

As an additional usability feature, it avoids the need to pass a tuple
object, by collecting all the function's arguments into a tuple
automatically. The *row syntax does this. Its effect,
basically, is to remove one set of parenthesis, allowing you to run
insert(a,b,c) instead of insert((a,b,c)).

Deleting Rows

Removing rows is also likely to be useful. And, in most SQL databases,
it is not as simple as just telling the DB to remove row number 87.
The difficulty is a matter of telling the database to identify which
row to remove. One approach is to identify the row by specifying every
value in its cells. For example, DELETE FROM movies WHERE
title='Harvey' AND year='1950' AND genre=37;. The problem with
this approach is that you might have two rows with identical data. In
that case, you would accidentally delete both rows instead of just
one.

A less error-prone approach is to use row IDs. Row IDs are supported
by most databases, though the implementation differs. In MySQL, the
query would look more like this: DELETE FROM movies WHERE
_rowid=58;. That query will remove one and only one row,
regardless of the content of the table. Even if every row is
identical, each one will have a unique row ID. As an added bonus, the
query is smaller, simpler, and quite similar to the array-like syntax
Python uses to delete list items:

del movies[58]

To make your Table class respond to the del command, add
the following method to it:

This function performs two queries. First, it is necessary to obtain the
row ID. Then it deletes the row identified by that ID. You can eliminate
the need for the first query by always including the row ID in your
query results, but for the sake of brevity, that is not demonstrated
here. The provided example code includes row IDs, though, along with
several other enhancements.

What's Next?

The data-access approach described above should help you use your time
more efficiently and help you write data-processing scripts more
quickly. You could extend and refine the approach into a much more
sophisticated library, and make it more appropriate for larger
applications. For example, you could modify the code to use dict-style
cursors, instead of tuples. Or, you could add a Row class, with methods
to modify individual cells. Another useful extension would be to
formally support joined tables instead of using the somewhat tricky
method described earlier. Many other enhancements could be added too,
such as speeding up iteration by fetching and caching many rows, or
making deletion more powerful or efficient. Implementing row
assignment would also make many programs simpler.

This "bottom-up" approach should help you build up the Python language
to better deal with the tasks you commonly perform. By adding a few
special functions, you can turn Python into a richer, more appropriate
language for whatever you need. And hopefully, by letting the
interpreter do more of the work for you, you can build faster, cleaner
solutions.

While the approach described here demonstrates some of the things you can do
with Python's syntax overloading, you may want to look into established
tools for real database use. The
SQLAlchemy
library is very powerful, or I can also recommend
SQLObject.
Both are far more complicated than what is presented here, but they are also
a great deal more powerful and mature.
There is also an add-on to SQLAlchemy which makes it easier to use, called
SqlSoup.
It reads database structure automatically and maps it onto classes for you,
so you don't have to.

Definitions

Duck Typing

Pythonic programming style that determines an object's type by
inspection of its method or attribute signature rather than by
explicit relationship to some type object ("If it looks like a duck
and quacks like a duck, it must be a duck.") By emphasizing
interfaces rather than specific types, well-designed code improves
its flexibility by allowing polymorphic substitution. Duck-typing
avoids tests using type() or isinstance().
Instead, it typically employs hasattr() tests or EAFP
programming.

EAFP

Easier to ask for forgiveness than permission. This common Python
coding style assumes the existence of valid keys or attributes and
catches exceptions if the assumption proves false. This clean and
fast style is characterized by the presence of many try and except
statements. The technique contrasts with the LBYL style that is
common in many other languages such as C.

LBYL

Look before you leap. This coding style explicitly tests for
pre-conditions before making calls or lookups. This style contrasts
with the EAFP approach and is characterized by the presence of many
if statements.

Originally published
at DevX on 2004-10-01.
Updated after re-publishing here.