The good news is that recompiling the connector for Python is a lot easier than for PHP. With PHP, the complexity was due to there being one monolithic package to recompile. The bad news is that there is a slight hitch with Python.

Background

Full PAM plugin called auth_pam. This plugin uses dialog.so. It fully supports the PAM protocol with arbitrary communication between client and server.

Oracle-compatible PAM called auth_pam_compat. This plugin uses mysql_clear_password which is a part of Oracle MySQL client. It also has some limitations, such as, it supports only one password input. You must use -p option in order to pass the password to auth_pam_compat.

Percona’s MySQL client supports both plugins natively. That is, you can use auth_pam or auth_pam_compat and use the “mysql” tool (or “mysqldump”, or mysql_upgrade, etc.) and you are good to go. Given the choice, we would all use auth_pam, under which clients DO NOT use mysql_clear_password.

Not all clients support auth_pam, which is the main problem. Workarounds have called for using auth_pam_compat over SSL, which is a perfectly reasonable way to handle the risk of cleartext passwords – encrypt the connection.

However, what if you want to use auth_pam, and avoid cleartext passwords all together?

If you try to connect to MySQL using Python, you will receive this error: “Client does not support authentication protocol requested by server; consider upgrading MySQL client.”

The good news is that if the client uses libmysqlclient library to connect via MySQL, you can recompile the client’s source code to use the libmysqlclient library of Percona Server to make it compatible. This involves installing Percona Server development library, compiler tools, and development libraries followed by compiling and installing the client’s source code.

The Hitch

The hitch with Python is that there are a few different ways to connect to MySQL with Python. In fact, MySQL has written a Python connector, called mysql-connector-python. According to the documentation for mysql-connector-python:

It is written in pure Python and does not have any dependencies except for the Python Standard Library.

This means we cannot recompile mysql-connector-python to use the libmysqlclient library from Percona that supports auth_pam – because it does not use the libmysqlclient library.

It is using the standard library, and we can recompile it. Here is a mirror of the Perl recompilation process for MySQL.

Recompiling MySQLdb to support auth_pam

Step 1

“Install Percona yum repository and Percona Server development library.” This is not a problem, do what you need to do to install Percona-Server-devel for your version.

Step 2

Install a package manager so you can build a package – optional, but useful, if you ever want to have this new client without having to recompile. As in the example, I chose the RPM package manager, so I installed rpm-build.

Step 3

“Download and install the source RPM for the client package.”

I did a web search for “mysqlclient-python source rpm” and found the rpmfind page containing many versions. If you click on the link in the “Package” column you will get to a page that has a link for the Source RPM. I chose the most recent (as of this writing) CentOS package.

So I downloaded the source RPM and installed it into the sources directory:
cd SRPMS
wget http://vault.centos.org/7.4.1708/os/Source/SPackages/MySQL-python-1.2.5-1.el7.src.rpm
cd ../SOURCES
rpm -Uvh MySQL-python-1.2.5-1.el7.src.rpm

This unpacks MySQL-python-1.2.5.zip and a patch file in the SOURCES directory and puts a spec file in the SPECS directory, so this is not as complicated as the PHP version.

Step 4

“Install compilers and dependencies”.
On my host I had no work to get any requirements installed (your mileage may vary – I had installed a lot of dependencies previously in my PHP test, and used the same machine). Make sure to include the Percona Server package for the /usr/lib64/mysql/plugin/dialog.so file:yum install Percona-Server-server-55-5.5.55-rel38.8.el6.x86_64

In the fourth paragraph of this chapter, starting with “Most applications don’t need superuser privileges for day-to-day activities” they give you some reasons why you want to create users without the SUPER privilege. There are better reasons than the book gives, which are at the MySQL Manual page for the SUPER privilege.

In the section “Creating and Using New Users” (p. 300) they say “There’s no limit on password length, but we recommend using eight or fewer characters because this avoids problems with system libraries on some platforms.” You should ignore this, this book was written in 2006 and modern system libraries can handle more than 8 characters in a password. Also ignore it when they say the same thing in the section “Understanding and Changing Passwords” (p. 324).

In the section “Creating a New Remote User” at the very end (p. 214), it talks about using % as a host wildcard character. I want to point out that if there are no ACL’s set for a given host, MySQL will reject ALL connections from that host – even “telnet host 3306” will fail. So if you avoid using %, you are slightly more secure.

In the “Anonymous Users” section (p. 315), one fact that is not mentioned is that for all users, including the anonymous user, any database named “test” or that starts with “test_” can be accessed and manipulated. So an anonymous user can create tables in the “test” database (or even “test_application”) and fill it full of data, causing a denial of service when the disk eventually fills up. This fact is mentioned later in the chapter in the “Default User Configuration” section under “Linux and Mac OS X”, but it should be known earlier.

In the section on “Removing Users” (p. 324), it says that if all the privileges are revoked, and a user only has GRANT USAGE, “This means the user can still connect, but has no privileges when she does.” This is untrue, as mentioned before, everyone can access and manipulate databases starting with “test”.

The section “Managing Privileges with SQL” is deprecated (p. 339-346, up to and including “Activating Privileges”). It used to be, back when this was written, that few people used the GRANT statements and more people directly manipulated the tables. These days, it’s the other way around, and due to problems like SQL injection, there are safeguards in place – for example, if you change the host of a user with an ALTER TABLE on the mysql.user table, the user will have all privileges dropped. Just about the only thing direct querying is used for, is to find who has the Super_priv variable set to ‘Y’ in the user table.

So, recently we had a runaway query eat up all sorts of temporary table space on our machines. Several machines had several terabytes in their ibtmp1 file after this happened. So I set out to find out more about why the InnoDB temporary tablespace is used, why it is better than using regular files, which was what was used prior to MySQL 5.7, and how to make sure that runaway queries do not end up filling up disk space.

Unfortunately, the manual does not go into why ibtmp1 is better than one file per temporary query, which disappears once the query ends. There are a few sections to look into:

Temporary Table Undo Logs – has one paragraph that states that these are the undo logs for temporary tablespaces. Given that these are undo logs, my guess is that this makes MySQL more crash-safe. But that is just a guess.

And finally, the documentation for the innodb_temp_data_file_path system variable sheds a bit of light on the subject – It explains “Metadata about active InnoDB temporary tables is located in INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO.”

There is a manual page on Innodb temp table info table as well, which shows both compressed and uncompressed tables – uncompressed tables live in the ibtmp tablespaces, and compressed temporary tables live in the .ibd tablespace of the compressed table in question – as shown in the “PER_TABLE_SPACE” and “IS_COMPRESSED” fields.

Sadly, the table does not give useful information such as which process ID or user is generating the data. And of course it is only the active temporary space usage at the time – if you have a large temporary tablespace but no active queries using the tablespace, INNODB_TEMP_TABLE_INFO is empty.

I can imagine a scenario with more than one long-running query using a lot of space in the temporary tablespace. But I do not see how the INNODB_TEMP_TABLE_INFO would help me determine any useful information as to which query I should kill. I guess it is useful to see if there is an active query currently using temporary tablespace, but when you have a large file with nothing in it, it is just that much more depressing.

Note: if trying to load the “academics.csv” file on Linux, you may want to set the owner and group to mysql first:sudo chown mysql:mysql /tmp/academics.csv

In the section on “Creating Tables with Queries”, p. 286, it says “The LIKE syntax allows you to create a new table with exactly the same structure as another, including keys.” However, it only preserves UNIQUE, PRIMARY and regular indexes. It does not preserve FOREIGN keys.

In the section on “Replacing Data”, p 292 – 293, it mentions the REPLACE command, but not the INSERT…ON DUPLICATE KEY UPDATE syntax. You can read about that at the MySQL manual page.

GROUP BY and HAVING examples – Supplemental blog post. The example of HAVING in the text shows a use case where HAVING is the same function as WHERE. This blog posts shows examples of HAVING that you cannot do any other way.

In the section called “The GROUP BY clause”, on pages 231-232, the book says:
“you can count any column in a group, and you’ll get the same answer, so COUNT(artist_name) is the same as COUNT(*) or COUNT(artist_id).” This is not 100% true; COUNT does not count NULL values, so if you had 10 rows and 1 artist_name was NULL, COUNT(artist_name) would return 9 instead of 10. COUNT(*) counts the number of rows and would always return 10, so COUNT(*) is preferable when you intend to count the number of rows.

Also in that section, on page 233 when they show you the example:SELECT * FROM track GROUP BY artist_id;
– Note that they explain the result is not meaningful. In most other database systems, this query would not be allowed.

In the “Advanced Joins” section, specifically on page 238 at the bottom where they say “There’s no real advantage or disadvantage in using an ON or a WHERE clause; it’s just a matter of taste.” While that’s true for the MySQL parser, it’s much easier for humans to read, and see if you missed a join condition, if you put the join conditions in an ON clause.

In the section on Nested Queries, on page 251, it says “nested queries are hard to optimize, and so they’re almost always slower to run than the unnested alternative.” MySQL has gotten better and better at optimizing nested queries, so this statement isn’t necessarily true any more.

A “derived table”, is a nested query in the FROM Clause, as described in the section heading with that name (p. 262).

In the “Table Types” subsection (p. 267), it says that MyISAM is a good choice for storage engines, and that “you very rarely need to make any other choice in small-to medium-size applications”. However, it’s recommended to use InnoDB for better concurrency, transaction support and being safer from data corruption in a crash situation. Indeed, the default storage engine in more recent versions of MySQL is InnoDB.

In addition, the lingo has been changed since the book was written; we now use “storage engine” instead of “table type”. The examples that use CREATE TABLE or ALTER TABLE with TYPE may need to be changed to STORAGE ENGINE instead of TYPE.

Finally, you can skip the section on BDB since it has been deprecated (p. 274-5).

Other notes:
At the end of the “Creating Tables” section (p.183-4), it says “We like using the underscore character to separate words, but that’s just a matter of style and taste; you could use underscores or dashes, or omit the word-separating formatting altogether.” While this is true, beware of using a dash, because MySQL will try to interpret “two-words”, thinking – is a minus sign. I recommend avoiding dashes for this reason (even though the book does this on page 215).

At the end of the “Collation and Character Sets” section (p.186), it says “When you’re creating a database, you can set the default character set and sort order for the database and its tables.” Note that the default character set for the server will set the default character set for any new databases created if a default character set is not specified; there is no change in existing databases. In turn, the default character set for the database sets the default character set for any new tables created but does not change any existing tables, and the default character set for a table determines the default character set for each column, which can be overridden by specifying a character set when defining a column.

Under the “Other Features” section it references a section called “Table Types”. This section is in chapter 7, p. 267.

Under the “Other Features” section it shows the SHOW CREATE TABLE example (p. 187). By default, MySQL shows you output in horizontal format – that is, one table row is shown as one row in the output. However, you can have MySQL show you output in vertical format, where one column is shows as one row in the output. Instead of using ; to end a query, use \G

Try it with:
SHOW CREATE TABLE artist;
vs
SHOW CREATE TABLE artist\G

And see the difference.

In the “Column Types” section on page 194, it says that “Only one TIMESTAMP column per table can be automatically set to the current date and time on insert or update.” This is not true as of MySQL version 5.6.5 and higher. As per the documentation at https://dev.mysql.com/doc/refman/5.6/en/timestamp-initialization.html: “For any TIMESTAMP or DATETIME column in a table, you can assign the current timestamp as the default value, the auto-update value, or both.”

In the section called “The AUTO_INCREMENT Feature”, on page 211, it says “If, however, we delete all the data in the table, the counter is reset to 1.” The example shows the use of TRUNCATE TABLE. Note that if you deleted all the data in the table with DELETE, such as “DELETE FROM count WHERE 1=1;”, the counter is NOT reset.

On page 162, it says “In Chapter 6, we’ll discuss how to load data from existing tables or from external data sources.” but this content is in Chapter 8.

On page 169, it has an example of “DELETE FROM played;” which deletes all the rows of a table. This will not work if safe_updates are enabled. I always advise that if you truly want to do that, ALWAYS use a WHERE clause, for example: DELETE FROM played WHERE 1=1;. In this way, your query is self-documenting – it tells the reader that you intended to delete all rows. If you just have “DELETE FROM played” it is possible you meant to put a WHERE clause in but forgot.

Similarly, on page 171, it shows an example where an UPDATE with no WHERE clause is used to update all rows. I prefer the syntax UPDATE tbl SET column=value WHERE 1=1 – this broadcasts your intent.

The task: Find out how many inserts were done to a set of tables over a period of time. Specifically avg inserts/min, avg inserts/hour, avg inserts/day, for the past week. Seems simple, right?

I was surprised that there was no good way to get what was done to a specific table, for data gathering AND forensic purposes!

These tables are be deleted from, so just using rows and timestamps is not valid. We have binary logs, which log when an insert was done. My first thought was to use pt-query-digest, since the tool has a “Tables” section in the output, so it already parses out which tables are in a query.

Sadly, tables is not an attribute I can --filter on, at least not in my version of pt-query-digest. (insert sad trombone here)

My second thought was to see if mysqlbinlog could filter for records only in that table. No such luck.

So, I decided to use awk. With awk I could define a “record” as “separated by the string ‘# at'”, match for table names, and print out the separator and record.

In other words, I could parse binlogs for a specific match, and print the entire record. Because each record is a variable size (maybe the record is 5 lines, maybe it is 7, maybe it is even more!)

'# at' – the record separator string, which is the beginning of each record.

/TABLE1|TABLE2/ – a regular expression matching either TABLE1 or TABLE2 – you may decide you want your string to be something like /FROM TABLE1|FROM TABLE2|JOIN TABLE1|JOIN TABLE2/ – it depends on your queries.

print RS $0 – prints both the record separator (RS) and the record ($0). If you just print the record, the default output record separator (ORS) is the empty string, so you will not get a good file.

| grep -v ^$ – take out blank lines, grep -v means “all lines except” and ^$ is start of line (^) followed by end of line ($). If there are blank lines, pt-query-digest will not work on this file.

From here, filtered_binlog.txt can be parsed and analyzed with tools like pt-query-digest or mysqlbinlog.

Edited to add: Commenters point out that this works for STATEMENT based binlogs. If your binlogging format is ROW, you can use mysqlbinlog –verbose (or -v) to get the reconstructed SQL statement, which includes the table name.

On page 114, it says “For some applications, no combination of attributes can uniquely identify an entity (or it would be too unwieldy to use a large composite key), so we create an artificial attribute that’s defined to be unique and can therefore be used as a key: student numbers, Social Security numbers, driver’s license numbers, and library card numbers are examples of unique attributes created for various applications.” This is known as a “surrogate key”, because the number is a surrogate for the “real” unique way of identification.

On page 131, it says that reverse engineering an ER model from an existing database in MySQL Workbench is in beta testing phase. This function has been stable for a long time, so it’s safe to use.

As you are reading this chapter, if you are wondering how a model relates to reality, know that for the most part, entities become tables and attributes become fields in a table. Relationships may or may not become tables, though for many-to-many relationships, they usually do.

For the homework, DO NOT USE MySQL Workbench to make Entity/Relationship diagrams. It is not a true E/R diagram and you will not be able to show everything you need to (e.g. the symbols for a weak entity do not exist in MySQL Workbench). Use flowcharting software or just draw it on paper and scan it in or take a picture, and submit the scan/picture as the homework

On p. 96 – 97, there is an example of what you will see with SHOW DATABASES; You will likely also see sys, INFORMATION_SCHEMA and PERFORMANCE_SCHEMA databases.

On p. 97, it says “There are some restrictions on what characters and words you can use in your database, table and other names.” These days, it is possible to escape characters and reserved words, but it is still not recommended.

On p. 101-102, there are examples of how to run the MySQL monitor in “batch mode”. It says you can use either “SOURCE” or the redirection operator on commandline (

Using the MySQL monitor, also known as the MySQL command line, including using options

Know what it means to end your statement with the following: ; \c \G \q