Tuesday, December 15, 2009

The latest MySQL 5.5 milestone release offers support for an ANSI/ISO standard feature called the SIGNAL syntax. You can use this syntax inside stored routines (including triggers) to raise an error condition which can be used to invoke specific error handling, or otherwise abort the stored routine. In addition, you can use the SIGNAL syntax to convey information about what went wrong, which may be used by the caller to handle the error.

Triggers

For those of you that are unfamiliar with the subject, MySQL offers support for triggers as of version 5.0. Triggers are stored routines that are executed automatically right before or after data change events like a row being inserted, updated or deleted. Because triggers are executed as part of the SQL statement (and its containing transaction) causing the row change event, and because the trigger code has direct access to the changed row, you could in theory use them to correct or reject invalid data.

Example data validation problem

Let's take a quick look at the following example. Suppose you have a table called person to store data about persons:

We require the first letter of the value for the initials column to match the first letter of the value of the first_name column.

In addition, we require that the values for the initials column consists of uppercase letters separated by periods.

In order to implement this, we design the following algorithm:

If the value for first_name is NULL, we do nothing. The NOT NULL table constraint will prevent the data from being entered anyway, so further attempts at validation or correction are pointless.

If the value for initials is NULL, we correct it by automatically filling in the first character of the value for first_name.

If the values for first_name as well as initials are both not NULL, we require that the first character of the value for first_name equals the first character of the value for initials.

Finally, we use a regular expression to check if the value for initials matches the desired pattern of uppercase letters separated by periods.

A data validation stored procedure

Let's start by creating a stored procedure to perform this algorithm. Here's the code for the p_validate_initials procedure which validates and possibly corrects the initials value based on the value for first_name:

IF p_first_name IS NULL THEN LEAVE _main; -- nothing to validate ELSEIF p_initials IS NULL THEN -- initials are NULL, correct: SET p_initials := CONCAT(LEFT(p_first_name, 1), '.'); -- take the first letter of first_name SIGNAL WARN_CORRECTED_INITIALS -- warn about the corrective measure SET MESSAGE_TEXT = 'Corrected NULL value for initials to match value for first_name.'; ELSEIF BINARY LEFT(p_first_name, 1) != LEFT(p_initials, 1) THEN -- initials don't match first_name SIGNAL ERR_INITIALS_DONT_MATCH_FIRSTNAME -- raise an error SET MESSAGE_TEXT = 'The first letter of the value for initials does not match the first letter of the value for first_name'; END IF; IF NOT p_initials REGEXP '^([A-Z][.])+$' THEN -- if initials don't match the correct pattern SIGNAL ERR_INITIALS_ILLFORMATTED -- raise an error SET MESSAGE_TEXT = 'The value for initials must consist of upper case letters separated by periods.'; END IF;END;go

DELIMITER ;

Let's take a look at how this procedure works.

How to issue warnings

First, let's pass NULL for the initials to see if they are properly corrected:

So, in this case, we used the condition name WARN_CORRECTED_INITIALS as condition_value. This condition is declared in the top of the procedure:

DECLARE WARN_CORRECTED_INITIALS CONDITION FOR SQLSTATE '01000';

Basically, condition declarations like these serve simply to tie a human readable name to otherwise obscure SQLSTATE values. As per the SQL standard, SQLSTATE values are strings of 5 digits. The prefix 01 indicates a warning.

(Condition declarations like these are not only useful to clarify the meaning of your SIGNAL statements, you can also use them to declare error HANDLERs)

An alternative syntax for SIGNAL allows you to directly refer to the SQLSTATE without explicitly declaring a CONDITION. So, if you feel that declaring explicit conditions is too much trouble, you can also omit that and write:

SIGNAL SQLSTATE '01000' ...

(However, I like using explicit condition names better because it does a better job of explaining the intention of the code.)

Conveying SIGNAL context information

The SIGNAL statement also features a SET-clause which is used to convey signal information. In our example the set clause was:

SET MESSAGE_TEXT = 'Corrected NULL value for initials to match value for first_name.';

As you can see, the SET-clause in the example contains an assignment. In the context of the SIGNAL statemntt, such an assignment is referred to as signal_information. The left hand side of the assignments must be one of the predefined condition_information_items. The SET-clause can have multiple of these signal_information items which can be used to capture and communicate program state to the client program.

In the case of the example we can demonstrate how this works using the MySQL command-line client. By issuing a SHOW WARNINGS statement, we can see the message text was conveyed with with the signal_information item:

Unfortunately, MySQL does not currently support a way for any calling stored routines to capture the signal_information items. They are currently only available in the APIs you use to communicate with MySQL, so you can capture them in your application code.

(A bug has been filed to ask for the ability to refer to signal information items in stored routines. This should become available whenever MySQL implements a DIAGNOSTICS feature)

Predefined condition information items

I just mentioned that the left-hand side of the signal_information item assignment must be one of the predefined condition_information_items. These are dictated by the standard, and although MySQL allows all of the standard condition_information_items, only two of them are currently relevant: MESSAGE_TEXT and MYSQL_ERRNO. We already illustrated using MESSAGE_TEXT. The MYSQL_ERRNO is a non-standard condition information item that can be used to convey custom error codes.

This leaves currently three variables to convey information about the context of the SIGNAL statement:

SQLSTATE: available in the C API as mysql_sqlstate()

MYSQL_ERRNO: available in the C API as mysql_errno()

MESSAGE_TEXT: available in the C API as mysql_error()

(You should be able to obtain the information also in PHP through the corresponding mysqli_, pdo_ and mysql_ functions.)

How to issue errors

We just discussed how to cause your stored routine to issue warnings. Issuing errors is exactly the same process, it just relies on a different class of SQLSTATE values (as determined by the code prefix). Let's see the errors in action:

mysql> set @initials := 'r';Query OK, 0 rows affected (0.00 sec)

mysql> call p_validate_initials('Roland', @initials);ERROR 1644 (45000): The first letter of the value for initials does not match the first letter of the value for first_name

In this case, the stored routine ran through the last branch of the first IF statement:

ELSEIF BINARY LEFT(p_first_name, 1) != LEFT(p_initials, 1) THENSIGNAL ERR_INITIALS_DONT_MATCH_FIRSTNAME SET MESSAGE_TEXT = 'The first letter of the value for initials does not match the first letter of the value for first_name'; END IF;

As you can see, the syntax for the actual SIGNAL statement is exactly similar to what we saw in the example illustrating warnings. The most important difference is that in this case, the condition that is being signalled is declared with a SQLSTATE value of 45000:

The SQLSTATE code 45000 is a special and indicates a general user-defined exception.

Using the procedure in a trigger

All we need to do now is create our triggers on the person table that call the procedure to perform the actual validation. We need to apply the validation when data is inserted into the table, but also when data is updated. If it turns out the data is invalid, we need to reject the change. For this reason, we want to create triggers that fire before the data change is applied to the table.

So, to enforce validation, we need two triggers: one that fires BEFORE INSERT events, and one that fires BEFORE UPDATE events. Because the validation process itself is the same regardless of the type of change event, both triggers can call the p_validate_initials procedure to perform the actual validation. This allows us to write (and maintain!) the validation logic only once, and reuse it whenever we need it.

A quick check indicates that data validation is now enforced as intended:

mysql> INSERT INTO person (id, first_name, last_name, initials) -> VALUES (2, 'Roland', 'Bouman', 'r');ERROR 1644 (45000): The first letter of the value for initials does not match the first letter of the value for first_name

This is a great article and I look forward to using SIGNAL/RESIGNAL. One problem I see in relying on these procedures is that since mysqld does not support a comprehensive ALTER PROCEDURE or atomic CREATE OR REPLACE PROCEDURE syntax, maintenance of procedures requires separate DROP and CREATE commands. This then requires that you block writes to these tables while the procedure is being re-made, otherwise they could throw errors.

This old outstanding bug/feature really should get more attention: http://bugs.mysql.com/bug.php?id=9588

Roland,Not sure about routines, but definitely won't work with triggers, as my experience tells. While LOCKing tables, triggers cannot be dropped; being the late hour it is, I don't have the link to the bugs system; but this is reported.

Yea, doing all the validation with a trigger works but also results in a lot of duplication for every table that needs to use the same validation. BTW, #49744 is submitted to address the inconsistent system tables locking behavior.

I suppose a work-around (and probably smarter solition) is to:

create a new procedure with a different name (p_validate_initials_v2), lock the table, modify the trigger to point to the new procedure, unlock tables, drop the old procedure or keep it around if you need to roll back.

I can post a comment on it, and then the support team can figure out if the bug was fixed or if there is another bug thatt accidentally allows me to drop the trigger while the table is locked (I hope that's not the case though!)

" reject this update until i accept it to be added to the table. When accepted it comes in the table"

So you want it to be rejected, but also to be remembered so it can be added later on?

Make a new table similar to the original target table. in the before insert / before update trigger, check the value. If it's valid do nothing, if not, store the row in the reject table, and fire a warning. Note that you cannot fire an error since that would not allow the current transaction to be committed (in which case you can't store the reject).

Create another trigger on the parent table that holds the values you want to check, and use that to move the rejects that are now accepted to the target table.

Search This Blog

About Me

I'm @rolandbouman, a Web- and BI Developer and Information Analyst. I have worked for MySQL AB, Sun Microsystems and I'm currently working as a software engineer for Pentaho (a Hitachi data systems company).

Together with Jos van Dongen I wrote a book called "Pentaho Solutions" (Wiley, ISBN: 978-0-470-48432-6, 630+ pages). This book is intended for people that want to get started with Business Intelligence and provides lots of practical examples to work with the open source Pentaho Business Intelligence Suite.

Together with Matt Casters and Jos van Dongen, I authored another book for Wiley called "Pentaho Kettle Solutions" (750+ pages, Wiley, ISBN: 978-0-470-63517-9). This book is more specialized and focuses on Pentaho data integration (Kettle) and ETL.