Sunday, April 30, 2006

MySQL 5 supports row-level triggers. This means you can have the server execute a SQL (or SQL/PSM) statement just before or after a row is inserted, updated or deleted. Because these triggers are fired in response to an operation occurring for a single row, these are called row-level triggers.

The SQL Standard also defines the syntax and semantics for triggers that are to be fired once for an entire statement, regardless of the number of rows that is touched by that statement. Unsurprisingly, these are called statement-level triggers.

MySQL does not support statement-level triggers. However, there's a quick hack to emulate BEFORE STATEMENT triggers.

Using the ROW_COUNT function

The hack depends on the ROW_COUNT() function. This function returns the number of rows updated, inserted, or deleted by the preceding statement. An example update statement for the sakila database might illustrate that:

Now we can create triggers for each of the three DML events insert, update and delete, and for each of these events, we can write both a before and a after tigger. So, potentially, we could write up to 3 * 2 = 6 different triggers. Let's start out with the two INSERT triggers and see what happens:

Now let's see what happens when we insert a row into the test_row_count table.:

INSERTINTO test_row_count( id ) VALUES ( 1 );

Executing this INSERT will fire both the triggers exactly once, so we expect to see two rows inside the test_row_count_log table.

We expect to see one row in test_row_count_log corresponding to the instant immediately before a row was created in test_row_count; we also expect one row corresponding to the moment immediately after the insert on test_row_count occurred. We're most interested in the value stored in the row_count column of the test_row_count_log table, as this would've captured the value returned by the ROW_COUNT() function. So:

So, both triggers fired once, and both inserted a row in the test_row_count_log. Actually, it's quite funny seeing the -1 value in the row_count column of the row corresponding to the firing of the BEFORE trigger. I mean, being a count, I'd expect the value 0 rather than -1.

For the row corresponding to the AFTER trigger, the value in the row_count value makes more sense. At this point, we can certainly agree that indeed exactly one row is inserted, and so the value is 1 in this case.

When we execute a statement that inserts multiple rows, we can observe another interesting phenomenon:

Even though the statement fires the BEFORE trigger multiple times, the -1 is returned only for the first row that is touched by the statement. For all subsequent rows, ROW_COUNT() returns a 1 for de row_count column.

Emulating the statement level trigger

This behaviour offers the opportunity to emulate a statement-level trigger. Let's add a level column to the test_row_count_log table and rewrite the BEFORE trigger to demonstrate this:

Using an IF statement to test the value returned by the ROW_COUNT() value, we can detect if the trigger is handling the first row, and if it is, we can do some special work. In this case, we simply insert a row in the test_row_count_log table, marking it by storing the value 'STATEMENT' in the level column.

Some closing notes

The IF statement tests for those cases where the ROW_COUNT() function returns something else than 1, rather than checking explicitly for equality with -1. This is because I suspect that the value of -1 might be a bug, and it wouldn't surprise me if that would be changed to be a 0 in a newer version of the server.

There's another observation that might convince you to be careful with this hack. At least in MySQL versions 5.0.18 and 5.1.7 (the platform I tested), ROW_COUNT() behaves differently when you wrap a procedure around the code that's now inside the triggers. I found that ROW_COUNT()always returns 0 in this case. As far as I can see now, at least three distinct contexts are involved: procedures, triggers and immediate statements. There might be even more of course, but my observations with ROW_COUNT() imply at least these three.

If you plan to use ROW_COUNT() to emulate a statement-level BEFORE trigger, it's probably a good idea to be careful when upgrading, and convince yourself that the code still works in the newer server version.

I also tested this hack for UPDATE and DELETE statements - that worked for me. Of course, if anyone has any new insights, or methods to achieve a statement-level trigger in MySQL, I'd be very interested. Just leave a comment here and tell the world about it.

One of the things that keep intriguing me if it would be possible to emulate an AFTER statement-level trigger. As far as I can see now, this is impossible, as there is now information regarding the total number of rows that is handled by the statement that fires the triggers. Of course, If you know a hack that does this, I'd be most grateful to hear about it.

Monday, April 24, 2006

As a prelude to the MySQL users conference, Markus and I went to fetch Mike from the airport to do some sightseeing.

Well, it's like this, I'm from a country that's very small in every way. Apart from being small, it's also very flat. Can you imagine what a place like San Francisco does with someone that's been living for over 30 years in such an enviroment? No?

Check out what happened to us while we were sightseeing San Francisco:

Mike already blogged about us taking the cable cart....But he didn't tell you how fast these things go:...and what perils it faces when it navigates it's way through the other traffic......but lucky for us, they installed a guardian angel to watch over the cable cart travellers:After that, we figured that the golden gate bridge just might maybe a little more relaxing......until you decide to look down!!...It just makes you wanna reach out......or call for some counseling:

It's probably a nobrainer, but as this is the first time I'm running into this problem, I decided I'd write it down.

I nearly ran out of diskspace on the C: disk of my notebook (running windows there). You now how it goes: I knew I did not have too much space left, but that little...It just couldn't be true. So, I ran a little Windows Wizard to clean up temporary files and such, and the next thing that happened:

The good news was that my disk space problems were indeed instantly solved.

Of course, I'm a little bit surprised that it would be possible for windows to throw away the files associated with the temporary table. I would expect them to be locked by the MySQL server so that it would be impossible for windows to delete it.

Anyway, I decided that maybe I should not be so lazy and put some effort in not running into this kind of trouble so easily. I've got lots (...well, it's never enough, right?) of space left on my D: disk, so I looked in the manual to look for some hint on how to specify the location for the MySQL temporary files.

I guess that in my case, MySQL was using the system default, because I do not have a TMPDIR environment variable (I've got a TEMP and a TMP though, neither of which was used). Apparently, C:\WINDOWS\TEMP is the system default on windows.

I decided that it would be best to set the directory using the --tmpdir option, because I can add it as an entry in the my.ini file, which I can reuse whenever I upgrade the server software. So, I went right ahead, and added this entry to my.ini:

tmpdir=D:\MySQL\temp

I restarted the server, and re-ran the query. Almost instantly, it went:

I like the MySQL reference manual a lot, but especially when I'm looking for some specifics about a topic I already read before, I find that it takes too much time to quickly locate what I need. Knowing what simple Server SQL modes are member of what composite mode is such an example. Knowing the order of arguments for a function is another such problem. I hope the quickref section in the wiki will remedy this, and I hope other people will find it useful too.

Sunday, April 16, 2006

One of the features that distinguishes MySQL 5 from it's predecessors is the increased ability to control how the server interprets SQL and how strict or relaxed it should behave towards errors. This is done by setting the server SQL mode.

Choosing an appropriate Server SQL Mode can be an important tool to ease the task of porting from database products like Oracle or Microsoft Sequal Server. Despite the terrific documentation on the MySQL Reference manual, I really felt the need to have some kind of quick reference around to make effective use of the server sql modes. So, I made up one myself:

It's all as-is of course but this time, I explicitly included a couple of lines saying that the usage is all under GPL license.

Regarding the information_schema dump, there are some more tools in the works, among which is a xslt stylesheet that generates human readable documentation for a database schema in html format. As soon as I feel these tools are usuable to others, I will make them available too.

There are a lot of articles around that explain that relations (tables) should be normalized (an article by Mike Hillyer will get you started) before you can store them in a relational database. The complete normalization process consists of two steps:

elimination of multi-valued attributes ('at most one value per column')

elimination of redundancy ('no duplication of data')

Although these are two distinct activities that solve different problems, the methods used to perform them have a great deal in common. In both cases, some columns from the original table are moved to a new table, and a foreign key relationship is constructed to be able to relate the two tables.

The goal of the entire normalization process is to make the data as maintainable as possible to ensure integrity and validity of the data. Essentially, that goal is achieved by putting each separate 'thing' in it's own table reducing the amount of effort that is needed to add, remove or modify it.

This ususally means that retrieving and presenting data requires more effort than it would've been if the data would've been stored in an unnormalized structure. Because data is spread out across multiple tables, answering even simple questions usually involves accessing multiple tables.

More often than not, data presentation involves recreating (but not storing) a redundant structure with repeating groups. This is especially true for human-readable presentations of data (reports etc), but it may also occur when generating flat files from the database for the purpose of data-exchange (for example, XML-documents).

Although it may require considerable effort to query and retrieve a normalized structure to present data in a redundant format with repeating groups, it is doable. It's certainly a lot easier than storing the data in a redundant format and maintaining integrity and validity in case data is added, modified or removed.

GROUP_CONCAT(): repeating groups through aggregation

(Some of you might recall I used this structure in my previous blog entry to support my plea for a new JOIN USING FOREIGN KEY syntax.)This is a normalized structure. There are no multi-valued attributes (no column can have more than one value for a particular row). There is no redundancy either: each column is dependant only upon the primary key of it's table.

Suppose you'd want an overview of all the countries and their cities. As described in my previous blog entry, this is easily achieved by JOIN-ing the city table to the country table:

Now, let's rephrase the requirements just a little bit. What if we want an overview of all countries and a list of all it's cities. It resembles the previous overview in that each country is displayed with all of it's cities; it's just the presentation that is different. Here, the output is required to use a repeating group: the result itself will be a list of countries, and each country in the list will contain a list of cities itself. That's pretty much the definition of a repeating group:

This query has two extra elements compared to the previous one: the GROUP BY clause and the GROUP_CONCAT() function.

GROUP BY accepts a comma-separated list of expressions. GROUP BY modifies the returned resultset: all rows from the original resultset that have a distinct combination of values in the list of GROUP BY expressions, are bundled (grouped) and returned as one single row in the returned resultset.

Columns from the original resultset that do not appear in the GROUP BY list ('detail'-columns, as opposed to 'grouping'- or 'summary'-columns) should not be used directly in the SELECT list. That's because for one group in the final resultset, mutliple distinct values may occur in those columns. Because the expressions in the GROUP BY list are returned a single row, the resultset simply doesn't have room to return all of them.

(Most rdbms-es consider it an error to have a column that does not appear in the GROUP BY list in the SELECT list. By default MySQL will accept such a statement. MySQL even returns the correct value too if the column has a unique value within the group. Otherwise, the returned value is undefined. This particular behaviour can be overidden using the ONLY_FULL_GROUP_BYserver SQL mode. In that particular mode, MySQL will treat it as an error to have such columns in the SELECT-list too. See the manual for more info on this subject.)

Although the detail values for each group cannot be accessed in the SELECT list directly, they can be processed by an aggregate function, like MIN, MAX, COUNT, SUM. All these functions operate on a group of objects only to return a single scalar value.

GROUP_CONCAT() is such an aggregate function. The function processes the expression passed to it by concatenating all occurrences within the group together. Apart from the expression value, an ORDER BY clause may be specified to control the order in which the detail expression values are concatenated to each other. Also, a SEPARATOR clause may be passed to specify the string that should be used to separate the multiple entries in the return value.

Adding a level

Ok, we've seen that in MySQL, generating the repeating group in the query result is really very easy.Now, suppose the requirements for the query change again. This time, we would also like to see the list of customers living in each city. First, let's take a look at the relevant structures in the database schema:

If we would not require the repeating groups in the output, the solution is a quite straightforward extension of the first country/city query:

However, we run into problems when we try to think of an appropriate GROUP BY clause and GROUP_CONCAT() expression to generate the desired repeating groups.

If we GROUP BY the country.country column like we did before, we would get one summary row for the country, exaclty as required. However, what remains is a set of city/customer combinations. Although we could process those with GROUP_CONCAT, that will never give us the desired result. The repeating group generated by GROUP_CONCAT will have an entry for each city/customer combination corresponding to the United Kingdom, and that is not what we want: what we want is a repeating group of cities per country and per city, a repeating group of customers.

So, we need two different repeating groups occurring for two different groupings or levels. This means that somehow, our query needs to have two corresponding GROUP_CONCAT()'s as well as two GROUP BY's. To achieve that, we need to have a '..query inside a query..' that we can equip with it's own GROUP BY clause so it can generate it's own repeating group.

A 'query inside a query' is called a subquery. Inside a SELECT expression, a subquery can appear in three different places: in the WHERE-clause, in the FROM-clause, or in SELECT-list. Let's stick as close as we can to our previous attempt, JOIN-ing all relevant tables, and write a subquery in the FROM clause.

Well, it's not really hard to see what parts we must place in the subquery. We had the repeating group for the city table in place, all we did since then is add the address and customer tables. These two tables really are all that is required to generate the repeating group of customers per city:

(In this query, some adjustments are made to the separators. Specificly, some newlines and indenting were added to make the result more readable. It's still a repeating group but this time, a new-line is output between cities instead of the semi colon used in previous examples.) This is the result:

Summary

The MySQL aggregate function GROUP_CONCAT() generates a repeating group for the string expression passed as argument.

The items that appear in a repeating group can themselves be repeating groups. Nesting of repeated groups is achieved by writing distinct queries for each distinct repeating group. These distinct queries are then joined (as subqueries) to relate the repeating groups to each other.

I call these two forms of the join syntax intelligent because they imply a non-trivial join condition. This means that such a join operation will relate the records from the two tables expressions based on some criterion without requiring that the criterion is specified in each and every detail. This is unlike the other variants of the JOIN syntax: to be meaningful, these require that the join condition in the form of a boolean expression is associated with the JOIN operation using a ON clause.

Relationships, Foreign Keys and Joins

Here, we have table country that has a 1 to many relationship with city. The relationship conveys the fact that a city is situated inside a particular country (I won't be discussing the validity of that here). The concept of a relationship in itself has nothing todo with databases or computers or whatever. It merely has to do with they way we, as humans, perceive the world around us. 'Relationship' is a conceptual term.

In a relational database, relationships are implemented using foreign keys. Here in the example, the city table (the 'many' end of the relationship) has a country_id column. For a particular record from the city table this column stores one of the values found in the country_id column of the country table (the 'one' end of the relationship).

The country_id column in the country table stores only unique values: this is enforced using a primary key constraint. A foreign key constraint on the city table ensures that the country_id column in the city table may only contain values that are in fact present in the country_id column of the country table.

An arbitrary record from the city table refers to exactly one particular record in the country table by storing a value from the country table's key, coutry_id. That's why it is called a foreign key: the city stores values that act as key values (identifying values) in another table, in this case the country table.

So, a foreign key is an implementation of a relationship in a relational database: foreign key is an implementation term. (In fact, it is they most common way to implement a relationship.) Usually, foreign keys are actively enforced by declaring a foreign key constraint. A foreign key constraint declares the exact column mapping involved in the referenced, and it makes the database management system prevent any data from being entered, removed or modified that might violate any of the references.

Joins

Now, suppose we want to have a list of all the city names together with the name of the country in which it is situated. Such a list can be conveniently produced using a JOIN operation. In a Join operation, records from different tables are combined to create new (virtual) records. So, joining is a (runtime) processing term. Usuallly, the join operation combines records that have some sort of correspondence; that is, records that maintain some kind of relationship between one another.

To understand the result of the JOIN operation we can imagine that the database server performs the following actions (Caution: the following 'algorithm' does not illustrate what a database server actually does to obtain the result of the JOIN operation - it only illustrates what the result will look like):

For each record from the city table, all records from the country table are retrieved. A new record is constructed consisting of all the fields from the 'city' record and all the fields from the 'country' record, thus yielding all combinations of city and country records: the cartesian product. These newly constructed records are temporarily stored in a virtual table.

A filter is applied to the virtual table, only retaining those records that have equal values for both country_id fields

For each record in the filtered table, create a new record consisting only of the city and country fields (projection), and return those records as the final resultset

It is in fact possible to write a SELECT expression that corresponds quite literally to this conceptual algorithm:

Instead of the CROSS JOIN operator, one can also write a comma (,). However, because the comma denotes a separator when used in the SELECT list, it's better to disambiguate using the keywords.

Now, the WHERE clause contains the condition that forms a criterion for filtering the records. Our condition is made up of only one element: the column comparison that specifies that only corresponding records should be retained. However, the WHERE is a general container for such conditions: we may write any kind of condition there. And, we should if we need to apply extra criteria. Suppose we want to have the list only for those countries of which the name starts with a capital A. Then, our WHERE clause could be extended to look like this:

So, both the join condition - the condition that is supposed to retain the related records - and our extra criterion appear mixed all in one big WHERE condition. This style of joining is called theta join.

Another way of putting it is to say that this is an unqualified join: the join condition does not seem to be associated with the join operation itself: it just happens to be a criterion that requires columns from both the tables in join operation to share equal values.

Qualified Joins

It's generally considered to be better to avoid theta join style in favour of ansi join style. The ansi join style requires the join condition to be directly associated with the join operation: we say that the join is qualified with the condition.

The join condition is separated from the join operation by a single ON keyword only:

We can modify the 'conceptual algorithm' presented earlier to match this more closely. (Please bear in mind that this 'algorithm' still has nothing to do with how a server might actually implement it. Also, note that this has no bearing at all on our understanding of the final resultset: the final resultset here is indistinguishable from the one we had earlier):

For each record from the city table, loop through all records from the country table. A new record is constructed consisting of all the fields from the 'city' record and all the fields from the 'country' record, but oly if the values of the country_id column of both records are exactly equal. These newly constructed records are temporarily stored in a virtual table.

A filter is applied to the virtual table, only retaining those records that match the WHERE clause

For each record in the filtered table, create a new record consisting only of the city and country fields (projection), and return those records as the final resultset

From the developer's point of view, the advantage of the ansi join syntax is that all the elements that have to do with relating the two tables can be written on consecutive lines. This makes it much easier to maintain the code if the database structure changes, or to remove the join.

Intelligent Join constructs

There are two particular forms of qualified join syntax that do not require the precise join condition to be specified. Rather, this syntax relies on some kind of rule that implies the precise join condition. The actual join condition itself is derived from the rule, and never entered literally.

Natural Join

The natural join implies a join condition that requires that equally named pairs of columns belonging to the table expressions appearing on the left and right side of the JOIN operator have equal values. In our example, two such pairs are present: both tables have a column named country_id and a column named last_update. Therefore, a natural join between the country and the city table implies a join condition that requires that the columns from the country_id and the last_update pairs have equal values. So, this:

This example immediately illustrates an important limitation of the NATURAL JOIN construct. We now that the city table referes to the country table by means of the country_id foreign key. So, the implied join condition is 'half-right' in requiring equality for the values in the country_id columns. However, because both tables have a last_update column, equality is required for these columns too. This is of course total non-sense: the last_update column is a TIMESTAMP column, and it will contain the current date and time of the moment the record was last updated (or inserted). Any equality found for these columns will be a matter of coincidence, so altough the columns can be related to each other, this does not constitute a relationship.

One could of course argue that this natural join does not check out because the table definition is flawed. If the last_update columns would've been given a name that is more specific to the table to which they belong, the problem would not've occurred, and we could've used the natural join. However, I feel that is not a very good argument

If we were to go along with this, we will be forced to rename all columns systematically in order to avoid accidentally occurring identical column names. Although we could limit ourselves to rename only the columns that are uninentionally included in the join conditions, this would introduce an amount of inconsistency in the identifier set, which is undesirable because it makes it harder to code against the schema. Also, the columns that we don't rename now could become a problem later on when we add new tables and columns to the schema, hampering further development. So we really need some sort of systematic approach towards picking identifiers. For example, all columns that 'truly' belong to a table could be given a prefix that is derived from the table name. Although this seems doable, it is rather impractical. It would yield long column names - for some rdbms-es, a column name can easily become too long (Oracle has a maximum of 30 characters, MaxDB has a maximum of 32).

Named columns join

The other 'intelligent' join construct is the named columns join. Like the natural join, the named columns join implies a join condition based on requiring equal values for pairs of identically named columns. The difference with the natural join is that the column pairs that should be implied in the join condition must be specified explicitly with a USING clause.

Because the column pairs must be explicitly specified, we can avoid unrelated column pairs that happen to have identical names to be implied in the join condition. This solves the problem with the last_update column we just discussed for the natural join between the city and the country table:

Athough the named columns join is thus slightly more useful than the natural join, it still poses a problem. No matter how we name our columns, we can never find a satisfactory solution for those cases where a particular table has more than one foreign key referring to one key in another table. In that case, each foreign key will need it's own set of columns, and only one of these can choose column names identical to the names of the referenced columns. The natural join as well as the named columns join can be used for joining along one relationship only. Joining along the other relationships always involves writing an explicit join condition.

All in all, if think that bioth the natural join and the named columns join constructs are not very useful. I never use them, because they would introduce inconsistency in the code. I feel that similar tasks should be solved in similar ways to avoid confusion, and because you will potentially need to write explicit join conditions anyway, I prefer to do so right away, even if a particular case would allow for a natural or named columns join.

What about a foreign key join?

Nevertheless, I think that the concept of an implied join condition is really elegant and charming if it could be used generically. Both the natural join and the named columns join cannot be used generically because they infer the join condition based on equality of the column names. Clearly, equality of column names just isn't the right sort of thing to use to infer join conditions.

In a lot of cases, we want to join along the columns of a foreign key, regardless of the actual column names. If we enforce the foreign key with a constraint, the rdbms could use that information to infer a join condition. So I think it's really amazing to discover that neither the SQL Standard nor any of the more renowned rdbms-es support a syntax that actually takes advantage of this.

I do know a development environment, USoft, that actually supports such a construct. I don't know if they still support it, but when I used it, you could write the example join something like this:

SELECT city.city , country.country FROM city RELATE country WITH city_resides_in_country

So, the foreign key name city_resides_in_country is used to specify the join condition. A variant that resembles the usual named columns join syntax could look like this:

SELECT city.city , country.country FROM city JOIN country USING city_resides_in_country

The nice thing about this syntax is that it solves all the problems described with the natural join and the ordinary named columns join. As a bonus, our code has become resilient to structural changes of the keys: If we were to add a column to the foreign key in the city table and corresponding primary key in the country table, our joins would not need to be modified because they never refer to a column name.

Anyone to point me in the direction of a rdbms that supports this kind of syntax?

Thursday, April 06, 2006

I'm using DBDesigner 4 quite a lot. I think the successor, MySQL Workbench, is very promising, but it does not meet my requirements (yet).

Having DBDesigner Generate SQL

Those that've worked with DBDesigner have probably noticed the particular strategy it uses to generate SQL code from the model. First of all, the user needs to specify what kind of SQL script is to be generated: a DROP, CREATE or optimization script. Then, DBDesigner does it's thing. In case of a DROP or CREATE script, exactly one statement is generated for each table in the diagram (or in the current selection if you choose to).

In case of the CREATE script, each CREATE TABLE statement contains not only the column definitions, but also the PRIMARY KEY, UNIQUE and FOREIGN KEY constraint definitions corresponding to that table.

DBDesigner tries to put the statements into a particular sequence in order to end up with a script that can be run immediately without generating errors due unresolved dependencies. For example, if a particular table T1 has a foreign key referencing another table T2, DBDesigner will place the CREATE TABLE statement for T1 after the CREATE TABLE statement for T2. For the DROP script, the sequence is reversed for the same reasons.

DB Designer Problems

Usually this all works out fine. However, occasionally this approach leads to problems. Consider this example:

Here, we have to tables that are referencing each other. It's not that common, but I think the example I made up here is not that artificial. My real world case is a little bit more complicated, but the relationships are equivalent.

Well, it does not take long to figure out that a structure like this can never be realized using only CREATE TABLE statements. At least, two CREATE TABLE statements and one ALTER TABLE ADD CONSTRAINT statement are needed in order to create the tables before creating the dependency (either one of the foreign key relationships).

DBDesigner seems to take a rather principal point of view, because it decides not to generate any SQL at all. This is BTW rather nasty, because if you were so unlucky to specify the hitherto working script file as target for the SQL export action, you will discover that your script now amounts to 0 bytes...ouch!

There is another reason why DB Designer is not so good anymore for generating code. It has nothing to do with DB Designer. Rather, it has to do with mysql 5.1.7.

MySQL 5.1.7 does not accept the previously deprecated TYPE= option in CREATE TABLE statements. (You must use ENGINE= instead nowadays.)

Now, don't get me wrong: I'm not complaining - I think DBDesigner is a mighty cool tool, and I like it very much. It's just that these little quirks do take some time to adjust to, and occasionally they cost time too. And of course, DB Designer never could've foreseen that the TYPE= option would be disallowed in the future.

How to cope

All this just means that we need to think of a trick to keep on using the DB Designer GUI while generating the SQL in another way. Well, this is actually not too hard. DB Designer saves it's model in an XML format. XML is easily parsed and translated using XSLT.

So, that's what I did. I'm will discus it here briefly. If someone needs it, feel free to use it. Of couse, you will need a utility to apply the xslt stylesheet to the Db Designer xml output, but there are literally hundreds of tools that do that for you - most of them are freely obtainable. (I use a little msxml based javasript utility I - for no particular reason - hacked up myself)

The entry point for the transformation is formed by a single matching template. The template contents are controlled by a global parameter value (which is set from the environment when invoking the XSLT processor):

What happens is that the xslt processor will start off with the node that represents the entire document - I mean the document that contains our DB Designer model. Looking in the xslt stylesheet, the xslt processor will discover that this template matches the document node: the match attribute on the xsl:template element has the XPath expression / as value, which is a shorthand nodetest for the entire document.So, the contents of our template are processed. The xsl:choose element selects one of the options according to the value of the PARAM_SCRIPT_TYPE parameter. If the value equals DROP, the named templates drop-foreign-keys and drop-tables are executed. If it equals CREATE, then create-tables and create-foreign-keys are executed. Well, you can imagine what these do, can't you? That's right, these generate the SQL statments to drop foreign keys, drop tables, create tables and create foreign keys. Of course, this sequence of executing the DDL statements can never offend any dependencies: dropping a table can never fail because there are no more foreign keys at that point that might prevent it, and creating a foreign key kan never fail because all the tables have already been created.

Well, the rest is just work: putting together little XPath queries and have the data that they retrieve drive static template text. For convenience, I declared two global variables. one nodeset containing all tables in the model and one containing all foreign keys:

The drop-foreign-keys template itself just loops through each of the nodes from the nodeset stored in the global foreign-keys variable, passing the current node through to the drop-foreign-key template. This does the actual work of generating the ALTER TABLE statement.

Getting to know where all the info is stored in the DBDesigner model is just a matter of common sense, and some trial and error. Anyway, it does not take much effort. The entire stylesheet cost me about 2 hours to write, and that includes getting to know the source format (of course ignoring all the stuff in there I don't need at this point).

Search This Blog

About Me

My name is Roland Bouman. If you like you can follow my @rolandbouman feed on twitter.
I'm a Web- and BI Developer and Information Analyst. I'm currently working as a Software Engineer for Pentaho, a world-leading Open Source Business Analytics Suite and Big Data Platform. In the past, I have worked for Inter Access, MySQL AB, Sun Microsystems and Strukton Rail.
I authored two books: Pentaho Solutions (Wiley, ISBN: 978-0-470-48432-6) and "Pentaho Kettle Solutions" (Wiley, ISBN: 978-0-470-63517-9).
I'm proud to be elected Oracle Ace for my MySQL expertise.