Chapter 2 Guidelines for Python Developers

The following guidelines cover aspects of developing MySQL
applications that might not be immediately obvious to developers
coming from a Python background:

For security, do not hardcode the values needed to connect and
log into the database in your main script. Python has the
convention of a config.py module, where you
can keep such values separate from the rest of your code.

Python scripts often build up and tear down large data
structures in memory, up to the limits of available RAM. Because
MySQL often deals with data sets that are many times larger than
available memory, techniques that optimize storage space and
disk I/O are especially important. For example, in MySQL tables,
you typically use numeric IDs rather than string-based
dictionary keys, so that the key values are compact and have a
predictable length. This is especially important for columns
that make up the primary
key for an InnoDB table, because those
column values are duplicated within each
secondary index.

Any application that accepts input must expect to handle bad
data.

The bad data might be accidental, such as out-of-range values or
misformatted strings. The application can use server-side checks
such as unique
constraints and
NOT NULL
constraints, to keep the bad data from ever reaching the
database. On the client side, use techniques such as exception
handlers to report any problems and take corrective action.

The bad data might also be deliberate, representing an
“SQL injection” attack. For example, input values
might contain quotation marks, semicolons, %
and _ wildcard characters and other
characters significant in SQL statements. Validate input values
to make sure they have only the expected characters. Escape any
special characters that could change the intended behavior when
substituted into an SQL statement. Never concatenate a user
input value into an SQL statement without doing validation and
escaping first. Even when accepting input generated by some
other program, expect that the other program could also have
been compromised and be sending you incorrect or malicious data.

Because the result sets from SQL queries can be very large, use
the appropriate method to retrieve items from the result set as
you loop through them.
fetchone()
retrieves a single item, when you know the result set contains a
single row.
fetchall()
retrieves all the items, when you know the result set contains a
limited number of rows that can fit comfortably into memory.
fetchmany()
is the general-purpose method when you cannot predict the size
of the result set: you keep calling it and looping through the
returned items, until there are no more results to process.

Since Python already has convenient modules such as
pickle and cPickle to read
and write data structures on disk, data that you choose to store
in MySQL instead is likely to have special characteristics:

Too large to all fit in memory at one
time. You use
SELECT statements to query
only the precise items you need, and
aggregate
functions to perform calculations across multiple
items. You configure the
innodb_buffer_pool_size
option within the MySQL server to dedicate a certain amount
of RAM for caching query results.

Too complex to be represented by a
single data structure. You divide the data
between different SQL tables. You can recombine data from
multiple tables by using a
join query. You make sure
that related data is kept in sync between different tables
by setting up foreign
key relationships.

Updated frequently, perhaps by
multiple users simultaneously. The updates might
only affect a small portion of the data, making it wasteful
to write the whole structure each time. You use the SQL
INSERT,
UPDATE, and
DELETE statements to update
different items concurrently, writing only the changed
values to disk. You use InnoDB
tables and
transactions to keep
write operations from conflicting with each other, and to
return consistent query results even as the underlying data
is being updated.

Using MySQL best practices for performance can help your
application to scale without requiring major rewrites and
architectural changes. See Optimization for
best practices for MySQL performance. It offers guidelines and
tips for SQL tuning, database design, and server configuration.

You can avoid reinventing the wheel by learning the MySQL SQL
statements for common operations: operators to use in queries,
techniques for bulk loading data, and so on. Some statements and
clauses are extensions to the basic ones defined by the SQL
standard. See Data Manipulation Statements,
Data Definition Statements, and
SELECT Syntax for the main classes of statements.

Issuing SQL statements from Python typically involves declaring
very long, possibly multi-line string literals. Because string
literals within the SQL statements could be enclosed by single
quotation, double quotation marks, or contain either of those
characters, for simplicity you can use Python's triple-quoting
mechanism to enclose the entire statement. For example:

'''It doesn't matter if this string contains 'single'
or "double" quotes, as long as there aren't 3 in a
row.'''

You can use either of the ' or
" characters for triple-quoting multi-line
string literals.

Many of the secrets to a fast, scalable MySQL application
involve using the right syntax at the very start of your setup
procedure, in the CREATE TABLE
statements. For example, Oracle recommends the
ENGINE=INNODB clause for most tables, and
makes InnoDB the default storage engine in
MySQL 5.5 and up. Using InnoDB tables enables
transactional behavior that helps scalability of read-write
workloads and offers automatic
crash recovery.
Another recommendation is to declare a numeric
primary key for each
table, which offers the fastest way to look up values and can
act as a pointer to associated values in other tables (a
foreign key). Also
within the CREATE TABLE
statement, using the most compact column data types that meet
your application requirements helps performance and scalability
because that enables the database server to move less data back
and forth between memory and disk.