SQLAlchemy ORM Tutorial for Python Developers

TL;DR: In this article, we will learn how to use SQLAlchemy as the ORM (Object Relational Database) library to communicate with relational database engines. First, we will learn about some core concepts of SQLAlchemy (like engines and connection pools), then we will learn how to map Python classes and its relationships to database tables, and finally we will learn how to retrieve (query) data from these tables. The code snippets used in this article can be found in this GitHub repository.

SQLAlchemy Introduction

SQLAlchemy is a library that facilitates the communication between Python programs and databases. Most of the times, this library is used as an Object Relational Mapper (ORM) tool that translates Python classes to tables on relational databases and automatically converts function calls to SQL statements. SQLAlchemy provides a standard interface that allows developers to create database-agnostic code to communicate with a wide variety of database engines.

As we will see in this article, SQLAlchemy relies on common design patterns (like Object Pools) to allow developers to create and ship enterprise-grade, production-ready applications easily. Besides that, with SQLAlchemy, boilerplate code to handle tasks like database connections is abstracted away to let developers focus on business logic.

Before diving into the ORM features provided by SQLAlchemy, we need to learn how the core works. The following sections will introduce important concepts that every Python developer needs to understand before dealing with SQLAlchemy applications.

Python DBAPI

The Python DBAPI (an acronym for DataBase API) was created to specify how Python modules that integrate with databases should expose their interfaces. Although we won't interact with this API directly—we will use SQLAlchemy as a facade to it—it's good to know that it defines how common functions like connect, close, commit, and rollback must behave. Consequently, whenever we use a Python module that adheres to the specification, we can rest assured that we will find these functions and that they will behave as expected.

In this article, we are going to install and use the most popular PostgreSQL DBAPI implementation available: psycopg. Other Python drivers communicate with PostgreSQL as well, but psycopg is the best candidate since it fully implements the DBAPI specification and has great support from the community.

SQLAlchemy Engines

Whenever we want to use SQLAlchemy to interact with a database, we need to create an Engine. Engines, on SQLAlchemy, are used to manage two crucial factors: Pools and Dialects. The following two sections will explain what these two concepts are, but for now it suffices to say that SQLAlchemy uses them to interact with DBAPI functions.

To create an engine and start interacting with databases, we have to import the create_engine function from the sqlalchemy library and issue a call to it:

This example creates a PostgreSQL engine to communicate with an instance running locally on port 5432 (the default one). It also defines that it will use usr and pass as the credentials to interact with the sqlalchemy database. Note that, creating an engine does not connect to the database instantly. This process is postponed to when it's needed (like when we submit a query, or when create/update a row in a table).

SQLAlchemy Connection Pools

Connection pooling is one of the most traditional implementations of the object pool pattern. Object pools are used as caches of pre-initialized objects ready to use. That is, instead of spending time to create objects that are frequently needed (like connections to databases) the program fetches an existing object from the pool, uses it as desired, and puts back when done.

The main reason why programs take advantage of this design pattern is to improve performance. In the case of database connections, opening and maintaining new ones is expensive, time-consuming, and wastes resources. Besides that, this pattern allows easier management of the number of connections that an application might use simultaneously.

As usual production-ready programs need to override these defaults (to fine-tune pools to their needs), most of the different implementations of connection pools provide a similar set of configuration options. The following list shows the most common options with their descriptions:

pool_size: Sets the number of connections that the pool will handle.

max_overflow: Specifies how many exceeding connections (relative to pool_size) the pool supports.

pool_recycle: Configures the maximum age (in seconds) of connections in the pool.

pool_timeout: Identifies how many seconds the program will wait before giving up on getting a connection from the pool.

SQLAlchemy Dialects

As SQLAlchemy is a facade that enables Python developers to create applications that communicate to different database engines through the same API, we need to make use of Dialects. Most of the popular relational databases available out there adhere to the SQL (Structured Query Language) standard, but they also introduce proprietary variations. These variations are the solely responsible for the existence of dialects.

For example, let's say that we want to fetch the first ten rows of a table called people. If our data was being held by a Microsoft SQL Server database engine, SQLAlchemy would need to issue the following query:

SELECT TOP 10 * FROM people;

But, if our data was persisted on MySQL instance, then SQLAlchemy would need to issue:

SELECT * FROM people LIMIT 10;

Therefore, to know precisely what query to issue, SQLAlchemy needs to be aware of the type of the database that it is dealing with. This is exactly what Dialects do. They make SQLAlchemy aware of the dialect it needs to talk.

SQLAlchemy ORM

ORM, which stands for Object Relational Mapper, is the specialization of the Data Mapper design pattern that addresses relational databases like MySQL, Oracle, and PostgreSQL. As explained by Martin Fowler in the article, Mappers are responsible for moving data between objects and a database while keeping them independent of each other. As object-oriented programming languages and relational databases structure data on different ways, we need specific code to translate from one schema to the other.

For example, in a programming language like Python, we can create a Product class and an Order class to relate as many instances as needed from one class to another (i.e. Product can contain a list of instances of Order and vice-versa). Though, on relational databases, we need three entities (tables), one to persist products, another one to persist orders, and a third one to relate (through foreign key) products and orders.

As we will see in the following sections, SQLAlchemy ORM is an excellent Data Mapper solution to translate Python classes into/from tables and to move data between instances of these classes and rows of these tables.

In the code snippet above, we are defining a class called Product that has six properties. Let's take a look at what these properties do:

The __tablename__ property tells SQLAlchemy that rows of the products table must be mapped to this class.

The id property identifies that this is the primary_key in the table and that its type is Integer.

The title property indicates that a column in the table has the same name of the property and that its type is String.

The in_stock property indicates that a column in the table has the same name of the property and that its type is Boolean.

The quantity property indicates that a column in the table has the same name of the property and that its type is Integer.

The price property indicates that a column in the table has the same name of the property and that its type is Numeric.

Seasoned developers will notice that (usually) relational databases do not have data types with these exact names. SQLAlchemy uses these types as generic representations to what databases support and use the dialect configured to understand what types they translate to. For example, on a PostgreSQL database, the title would be mapped to a varchar column.

SQLAlchemy Relationship Patterns

Now that we know what ORM is and have look into data types, let's learn how to use SQLAlchemy to map relationships between classes to relationships between tables. SQLAlchemy supports four types of relationships: One To Many, Many To One, One To One, and Many To Many.

Note that this section will be an overview of all these types, but in the SQLAlchemy ORM in Practice action we will do a hands-on to practice mapping classes into tables and to learn how to insert, extract, and remove data from these tables.

The first type, One To Many, is used to mark that an instance of a class can be associated with many instances of another class. For example, on a blog engine, an instance of the Article class could be associated with many instances of the Comment class. In this case, we would map the mentioned classes and its relation as follows:

The second type, Many To One, refers to the same relationship described above but from the other perspective. To give a different example, let's say that we want to map the relationship between instances of Tire to an instance of a Car. As many tires belong to one car and this car contains many tires, we would map this relation as follows:

The third type, One To One, refers to relationships where an instance of a particular class may only be associated with one instance of another class, and vice versa. As an example, consider the relationship between a Person and a MobilePhone. Usually, one person possesses one mobile phone and this mobile phone belongs to this person only. To map this relationship on SQLAlchemy, we would create the following code:

In this example, we pass two extra parameters to the relationship function. The first one, uselist=False, makes SQLAlchemy understand that mobile_phone will hold only a single instance and not an array (multiple) of instances. The second one, back_populates, instructs SQLAlchemy to populate the other side of the mapping. The official Relationships API documentation provides a complete explanation of these parameters and also covers other parameters not mentioned here.

The last type supported by SQLAlchemy, Many To Many, is used when instances of a particular class can have zero or more associations to instances of another class. For example, let's say that we are mapping the relationship of instances of Student and instances of Class in a system that manages a school. As many students can participate in many classes, we would map the relationship as follows:

In this case, we had to create a helper table to persist the association between instances of Student and instances of Class, as this wouldn't be possible without an extra table. Note that, to make SQLAlchemy aware of the helper table, we passed it in the secondary parameter of the relationship function.

SQLAlchemy ORM Cascade

Whenever rows in a particular table are updated or deleted, rows in other tables might need to suffer changes as well. These changes can be simple updates, which are called cascade updates, or full deletes, known as cascade deletes. For example, let's say that we have a table called shopping_carts, a table called products, and a third one called shopping_carts_products that connects the first two tables. If, for some reason, we need to delete rows from shopping_carts we will need to delete the related rows from shopping_carts_products as well. Otherwise we will end up with a lot of garbage and unfulfilled references in our database.

To make this kind of operation easy to maintain, SQLAlchemy ORM enables developers to map cascade behavior when using relationship() constructs. Like that, when operations are performed on parent objects, child objects get updated/deleted as well. The following list provides a brief explanation of the most used cascade strategies on SQLAlchemy ORM:

save-update: Indicates that when a parent object is saved/updated, child objects are saved/updated as well.

delete: Indicates that when a parent object is deleted, children of this object will be deleted as well.

delete-orphan: Indicates that when a child object loses reference to a parent, it will get deleted.

merge: Indicates that merge() operations propagate from parent to children.

SQLAlchemy Sessions

Sessions, on SQLAlchemy ORM, are the implementation of the Unit of Work design pattern. As explained by Martin Fowler, a Unit of Work is used to maintain a list of objects affected by a business transaction and to coordinate the writing out of these changes. This means that all modifications tracked by Sessions (Units of Works) will be applied to the underlying database together, or none of them will. In other words, Sessions are used to guarantee the database consistency.

As we can see from the code snippet above, we only need one step to get sessions. We need to create a session factory that is bound to the SQLAlchemy engine. After that, we can just issue calls to this session factory to get our sessions.

SQLAlchemy in Practice

Now that we got a better understanding of the most important pieces of SQLAlchemy, it's time to start practicing it. In the following sections, we will create a small project based on pipenv—a Python dependency manager—and add some classes to it. Then we will map these classes to tables persisted to a PostgreSQL database and learn how to query data.

Starting the Tutorial Project

To create our tutorial project, we have to have Python installed on our machine and pipenv installed as a global Python package. The following commands will install pipenv and set up the project. These commands are dependent on Python, so be sure to have it installed before proceeding:

Running PostgreSQL

To be able to practice our new skills and to learn how to query data on SQLAlchemy, we will need a database to support our examples. As already mentioned, SQLAlchemy provides support for many different databases engines, but the instructions that follow will focus on PostgreSQL. There are many ways to get an instance of PostgreSQL. One of them is to use some cloud provider like Heroku or ElephantSQL (both of them have free tiers). Another possibility is to install PostgreSQL locally on our current environment. A third option is to run a PostgreSQL instance inside a Docker container.

The third option is probably the best choice because it has the performance of an instance running locally, it's free forever, and because it's easy to create and destroy Docker instances. The only (small) disadvantage is that we need to install Docker locally.

After having Docker installed, we can create and destroy dockerized PostgreSQL instances with the following commands:

Installing SQLAlchemy Dependencies

In this tutorial, we will need to install only two packages: sqlalchemy and psycopg2. The first dependency refers to SQLAlchemy itself and the second one, psycopg2, is the PostgreSQL driver that SQLAlchemy will use to communicate with the database. To install these dependencies, we will use pipenv as shown:

# install sqlalchemy and psycopg2
pipenv install sqlalchemy psycopg2

This command will download both libraries and make them available in our Python virtual environment. Note that to run the scripts that we are going to create, we first need to spawn the virtual environment shell. That is, before executing python somescript.py, we need to execute pipenv shell. Otherwise, Python won't be able to find the installed dependencies, as they are just available in our new virtual environment.

Mapping Classes with SQLAlchemy

After starting the dockerized PostgreSQL instance and installing the Python dependencies, we can begin to map Python classes to database tables. In this tutorial, we will map four simple classes that represent movies, actors, stuntmen, and contact details. The following diagram illustrates these entities' characteristics and their relations.

To start, we will create a file called base.py in the main directory of our project and add the following code to it:

The definition of this class is pretty similar to the previous one. The differences are that the Actor has a name instead of a title, a birthday instead of a release_date, and that it points to a table called actors instead of movies.

As many movies can have many actors and vice-versa, we will need to create a Many To Many relationship between these two classes. Let's create this relationship by updating the movie.py file as follows:

we created a movies_actors_association table that connects rows of actors and rows of movies;

and we added the actors property to Movie and configured the movies_actors_association as the intermediary table.

The next class that we will create is Stuntman. In our tutorial, a particular Actor will have only one Stuntman and this Stuntman will work only with this Actor. This means that we need to create the Stuntman class and a One To One relationship between these classes. To accomplish that, let's create a file called stuntman.py and add the following code to it:

In this class, we have defined that the actor property references an instance of Actor and that this actor will get a property called stuntman that is not a list (uselist=False). That is, whenever we load an instance of Stuntman, SQLAlchemy will also load and populate the Actor associated with this stuntman.

The fourth and final class that we will map in our tutorial is ContactDetails. Instances of this class will hold a phone_number and an address of a particular Actor, and one Actor will be able to have many ContactDetails associated. Therefore, we will need to use the Many To One relationship pattern to map this association. To create this class and this association, let's create a file called contact_details.py and add the following source code to it:

As we can see, creating a Many To One association is kinda similar to creating a One To One association. The difference is that in the latter we instructed SQLAlchemy not to use lists. This instruction ends up restricting the association to a single instance instead of a list of instances.

Persisting Data with SQLAlchemy ORM

Now that we have created our classes, let's create a file called inserts.py and generate some instances of these classes to persist to the database. In this file, let's add the following code:

The seventh section creates instances of the ContactDetails class and defines what actors these instances are associated to.

The eighth section defines three stuntmen and also defines what actors these stuntmen are associated to.

The ninth section uses the current session to save the movies, actors, contact details, and stuntmen created. Note that we haven't explicitly saved actors. This is not needed because SQLAlchemy, by default, uses the save-update cascade strategy.

The tenth section commits the current session to the database and closes it.

To run this Python script, we can simply issue the python inserts.py command (let's not forget to run pipenv shell first) in the main directory of our database. Running it will create five tables in the PostgreSQL database and populate these tables with the data that we created. In the next section, we will learn how to query these tables.

Querying Data with SQLAlchemy ORM

As we will see, querying data with SQLAlchemy ORM is quite simple. This library provides an intuitive, fluent API that enables developers to write queries that are easy to read and to maintain. On SQLAlchemy ORM, all queries start with a Query Object that is extracted from the current session and that is associated with a particular mapped class. To see this API in action, let's create a file called queries.py and add to it the following source code:

The code snippet above—that can be run with python queries.py,—shows how easy it is to use SQLAlchemy ORM to query data. To retrieve all movies from the database, we just needed to fetch a session from the session factory, use it to get a query associated with Movie, and then call the all() function on this query object. The Query API provides dozens of useful functions like all(). In the following list, we can see a brief explanation about the most important ones:

The fifth section of the updated script uses the filter() function to fetch only movies that were released after January the first, 2015. The sixth section shows how to use join() to fetch instances of Movie that the Actor Dwayne Johnson participated in. The seventh and last section, shows the usage of join() and ilike() functions to retrieve actors that have houses in Glendale.

Running the new version of the script (python queries.py) now will result in the following output:

### All movies:
The Bourne Identity was released on 2002-10-11
Furious 7 was released on 2015-04-02
No Pain No Gain was released on 2013-08-23
### Recent movies:
Furious 7 was released after 2015
### Dwayne Johnson movies:
The Rock starred in No Pain No Gain
The Rock starred in Furious 7
### Actors that live in Glendale:
Dwayne Johnson has a house in Glendale
Mark Wahlberg has a house in Glendale

As we can see, using the API is straightforward and generates a code that is readable. To see other functions supported by the Query API, and their description, take a look at the official documentation.

Next Steps

We have covered a lot of ground in this article. We've learned about basic SQLAlchemy concepts like Engines, Connection Pools, and Dialects. After that, we've learned about how SQLAlchemy addresses ORM topics like Relationship Patterns, Cascade strategies, and the Query API. In the end, we applied this knowledge in a small exercise. In summary, we had the chance to learn and practice the most important pieces of SQLAlchemy and SQLAlchemy ORM. In the next article, we are going to use these new skills to implement RESTful APIs with Flask—the Python microframework for the web. Stay tuned!