The SQL representation of many data types is often different from their Python
string representation. The typical example is with single quotes in strings:
in SQL single quotes are used as string literal delimiters, so the ones
appearing inside the string itself must be escaped, whereas in Python single
quotes can be left unescaped if the string is delimited by double quotes.

Because of the difference, sometime subtle, between the data types
representations, a naïve approach to query strings composition, such as using
Python strings concatenation, is a recipe for terrible problems:

>>> SQL="INSERT INTO authors (name) VALUES ('%s');"# NEVER DO THIS>>> data=("O'Reilly",)>>> cur.execute(SQL%data)# THIS WILL FAIL MISERABLYProgrammingError: syntax error at or near "Reilly"LINE 1: INSERT INTO authors (name) VALUES ('O'Reilly') ^

If the variables containing the data to send to the database come from an
untrusted source (such as a form published on a web site) an attacker could
easily craft a malformed string, either gaining access to unauthorized data or
performing destructive operations on the database. This form of attack is
called SQL injection and is known to be one of the most widespread forms of
attack to database servers. Before continuing, please print this page as a
memo and hang it onto your desk.

Reading from the database, integer types are converted into int, floating
point types are converted into float, numeric/decimal are
converted into Decimal.

Note

Sometimes you may prefer to receive numeric data as float
instead, for performance reason or ease of manipulation: you can configure
an adapter to cast PostgreSQL numeric to Python float.
This of course may imply a loss of precision.

Python str and unicode are converted into the SQL string syntax.
unicode objects (str in Python 3) are encoded in the connection
encoding before sending to the backend: trying to send a
character not supported by the encoding will result in an error. Data is
usually received as str (i.e. it is decoded on Python 3, left encoded
on Python 2). However it is possible to receive unicode on Python 2 too:
see Unicode handling.

In Python 3 instead the strings are automatically decoded in the connection
encoding, as the str object can represent Unicode characters.
In Python 2 you must register a typecaster in order to receive unicode objects:

Python types representing binary objects are converted into
PostgreSQL binary string syntax, suitable for bytea fields. Such
types are buffer (only available in Python 2), memoryview (available
from Python 2.7), bytearray (available from Python 2.6) and bytes
(only from Python 3: the name is available from Python 2.6 but it’s only an
alias for the type str). Any object implementing the Revised Buffer
Protocol should be usable as binary type where the protocol is supported
(i.e. from Python 2.6). Received data is returned as buffer (in Python 2)
or memoryview (in Python 3).

Changed in version 2.4: only strings were supported before.

Changed in version 2.4.1: can parse the ‘hex’ format from 9.0 servers without relying on the
version of the client library.

Note

In Python 2, if you have binary data in a str object, you can pass them
to a bytea field using the psycopg2.Binary wrapper:

Since version 9.0 PostgreSQL uses by default a new “hex” format to
emit bytea fields. Starting from Psycopg 2.4.1 the format is
correctly supported. If you use a previous version you will need some
extra care when receiving bytea from PostgreSQL: you must have at least
libpq 9.0 installed on the client or alternatively you can set the
bytea_output configuration parameter to escape, either in the
server configuration file or in the client session (using a query such as
SETbytea_outputTOescape;) before receiving binary data.

Note that only time zones with an integer number of minutes are supported:
this is a limitation of the Python datetime module. A few historical time
zones had seconds in the UTC offset: these time zones will have the offset
rounded to the nearest minute, with an error of up to 30 seconds.

Changed in version 2.2.2: timezones with seconds are supported (with rounding). Previously such
timezones raised an error. In order to deal with them in previous
versions use psycopg2.extras.register_tstz_w_secs().

PostgreSQL can store the representation of an “infinite” date, timestamp, or
interval. Infinite dates are not available to Python, so these objects are
mapped to date.max, datetime.max, interval.max. Unfortunately the
mapping cannot be bidirectional so these dates will be stored back into the
database with their values, such as 9999-12-31.

It is possible to create an alternative adapter for dates and other objects
to map date.max to infinity, for instance:

Furthermore ANY can also work with empty lists, whereas IN()
is a SQL syntax error.

Note

Reading back from PostgreSQL, arrays are converted to lists of Python
objects as expected, but only if the items are of a known type.
Arrays of unknown types are returned as represented by the database (e.g.
{a,b,c}). If you want to convert the items into Python objects you can
easily create a typecaster for array of unknown types.

In Psycopg transactions are handled by the connection class. By
default, the first time a command is sent to the database (using one of the
cursors created by the connection), a new transaction is created.
The following database commands will be executed in the context of the same
transaction – not only the commands issued by the first cursor, but the ones
issued by all the cursors created by the same connection. Should any command
fail, the transaction will be aborted and no further command will be executed
until a call to the rollback() method.

The connection is responsible for terminating its transaction, calling either
the commit() or rollback() method. Committed
changes are immediately made persistent into the database. Closing the
connection using the close() method or destroying the
connection object (using del or letting it fall out of scope)
will result in an implicit rollback.

It is possible to set the connection in autocommit mode: this way all the
commands executed will be immediately committed and no rollback is possible. A
few commands (e.g. CREATEDATABASE, VACUUM...) require to be run
outside any transaction: in order to be able to run these commands from
Psycopg, the connection must be in autocommit mode: you can use the
autocommit property (set_isolation_level() in
older versions).

Warning

By default even a simple SELECT will start a transaction: in
long-running programs, if no further action is taken, the session will
remain “idle in transaction”, an undesirable condition for several
reasons (locks are held by the session, tables bloat...). For long lived
scripts, either make sure to terminate a transaction as soon as possible or
use an autocommit connection.

A few other transaction properties can be set session-wide by the
connection: for instance it is possible to have read-only transactions or
change the isolation level. See the set_session() method for all
the details.

When a connection exits the with block, if no exception has been raised by
the block, the transaction is committed. In case of exception the transaction
is rolled back.

When a cursor exits the with block it is closed, releasing any resource
eventually associated with it. The state of the transaction is not affected.

Note that, unlike file objects or other resources, exiting the connection’s
with block doesn’t close the connection but only the transaction
associated with it: a connection can be used in more than a with statement
and each with block is effectively wrapped in a separate transaction:

When a database query is executed, the Psycopg cursor usually fetches
all the records returned by the backend, transferring them to the client
process. If the query returned an huge amount of data, a proportionally large
amount of memory will be allocated by the client.

If the dataset is too large to be practically handled on the client side, it is
possible to create a server side cursor. Using this kind of cursor it is
possible to transfer to the client only a controlled amount of data, so that a
large dataset can be examined without keeping it entirely in memory.

Server side cursor are created in PostgreSQL using the DECLARE command and
subsequently handled using MOVE, FETCH and CLOSE commands.

Psycopg wraps the database server side cursor in named cursors. A named
cursor is created using the cursor() method specifying the
name parameter. Such cursor will behave mostly like a regular cursor,
allowing the user to move in the dataset using the scroll()
method and to read the data using fetchone() and
fetchmany() methods. Normally you can only scroll forward in a
cursor: if you need to scroll backwards you should declare your cursor
scrollable.

Named cursors are also iterable like regular cursors.
Note however that before Psycopg 2.4 iteration was performed fetching one
record at time from the backend, resulting in a large overhead. The attribute
itersize now controls how many records are fetched at time
during the iteration: the default value of 2000 allows to fetch about 100KB
per roundtrip assuming records of 10-20 columns of mixed number and strings;
you may decrease this value if you are dealing with huge records.

Named cursors are usually created WITHOUTHOLD, meaning they live only
as long as the current transaction. Trying to fetch from a named cursor after
a commit() or to create a named cursor when the connection
transaction isolation level is set to AUTOCOMMIT will result in an exception.
It is possible to create a WITHHOLD cursor by specifying a True
value for the withhold parameter to cursor() or by setting the
withhold attribute to True before calling execute() on
the cursor. It is extremely important to always close() such cursors,
otherwise they will continue to hold server-side resources until the connection
will be eventually closed. Also note that while WITHHOLD cursors
lifetime extends well after commit(), calling
rollback() will automatically close the cursor.

Note

It is also possible to use a named cursor to consume a cursor created
in some other way than using the DECLARE executed by
execute(). For example, you may have a PL/pgSQL function
returning a cursor:

The Psycopg module and the connection objects are thread-safe: many
threads can access the same database either using separate sessions and
creating a connection per thread or using the same
connection and creating separate cursors. In DB API 2.0 parlance, Psycopg is
level 2 thread safe.

The difference between the above two approaches is that, using different
connections, the commands will be executed in different sessions and will be
served by different server processes. On the other hand, using many cursors on
the same connection, all the commands will be executed in the same session
(and in the same transaction if the connection is not in autocommit mode), but they will be serialized.

The above observations are only valid for regular threads: they don’t apply to
forked processes nor to green threads. libpq connections shouldn’t be used by a
forked processes, so when using a module such as multiprocessing or a
forking web deploy method such as FastCGI make sure to create the connections
after the fork.

PostgreSQL offers support for large objects, which provide stream-style
access to user data that is stored in a special large-object structure. They
are useful with data values too large to be manipulated conveniently as a
whole.

Psycopg allows access to the large object using the
lobject class. Objects are generated using the
connection.lobject() factory method. Data can be retrieved either as bytes
or as Unicode strings.

Psycopg large object support efficient import/export with file system files
using the lo_import() and lo_export() libpq functions.

Changed in version 2.6: added support for large objects greated than 2GB. Note that the support is
enabled only if all the following conditions are verified:

the Python build is 64 bits;

the extension was built against at least libpq 9.3;

the server version is at least PostgreSQL 9.3
(server_version must be >= 90300).

If Psycopg was built with 64 bits large objects support (i.e. the first
two contidions above are verified), the psycopg2.__version__ constant
will contain the lo64 flag. If any of the contition is not met
several lobject methods will fail if the arguments exceed 2GB.

PostgreSQL doesn’t follow the XA standard though, and the ID for a PostgreSQL
prepared transaction can be any string up to 200 characters long.
Psycopg’s Xid objects can represent both XA-style
transactions IDs (such as the ones created by the xid() method) and
PostgreSQL transaction IDs identified by an unparsed string.

The format in which the Xids are converted into strings passed to the
database is the same employed by the PostgreSQL JDBC driver: this should
allow interoperation between tools written in Python and in Java. For example
a recovery tool written in Python would be able to recognize the components of
transactions produced by a Java program.