Wednesday, September 28, 2005

Like all procedural database languages I know, the MySQL stored procedure language supports explicit cursors. I just wrote "explicit cursor" but for I'll refer to those by just saying "cursor".

A cursor allows a programmer to traverse a set of records retrieved by a relational database query in a sequential, one-by-one fashion. As such, a cursor forms an interface between the relationally oriented, declarative SQL language and the procedurally oriented calling context. A database cursor is like a pointer positioned somewhere in a resultset which can be moved programmatically to browse through all the records in the resultset. So, that's quite analogous to the wordprocessor cursor you use to browse through the characters and words on your computer screen.

(For those that are wondering what an implicit cursor is - well, that's just a piece of memory used by the database server to work with resulsets internally. Implicit cursors are not accessible via an externally exposed API, whereas explicit cursors are.)

Lots of php programmers are probably familiar with the cursor concept:

The cursor interface is really convenient when you need to do complex resultset processing. Processing a resultset is quite common in a client program, especially when the resultset needs to be displayed or printed, but somwtimes, it can be quite useful to do resultset processing on the database server side.

In most cases, purely data-oriented problems can usually be solved without the explicit resulset processing provided by cursors. It is often much better to use implicit resulset processing using 'ordinary' SQL statements. In fact, ordinary SQL statements are very powerful, and should be used whenever it's possible. However, in some cases, the cursor interface is easier to write and maintain, and in some cases, there's no ordinary SQL substitute for the cursor interface.

Let's think of an example that is screaming for explicit resultset processing.

Suppose we have a table of employees, emp and a stored procedure p_calculate_salary. Now, assume that this stored procedure contains all the logic and business rules we need to calculate an employee's salary for a given month, including benefits, bonuses, and witheld fines for taking part in health or educational plans. What's more, the procedure does not only calculate all that, but also makes the necessary changes elsewhere in the system, trigger autonomous processes such as sending each employee it's monthly and (when applicable) annual salary reports.

For those that are still not convinced that all this logic should be embedded in a procedure, think of the following scenario. Suppose that the user that needs to calculate the salaries is not authorized to directly access the data that is needed to perform the business logic implemented by the procedure, and that the procedure is created using SQL SECURITY DEFINER. In that case, there really is no way to solve this using ordinary SQL. Resultset processing using a cursor is really the only way in this case.

To conveniently do the salary run for a particular month, we could use a cursor to traverse the list of employees, getting the relevant employee data and plugging that into the procedure. This type of use case is typical for a cursor: We do not need any direct output, so an ordinary SELECT is not necessary, spurious even.

Also, we cannot call a procedure in an ordinary SELECT, and it is not always possible to wrap all the business rules into a function (which could be used inside a SELECT).

This salary run is all back-end stuff. We do not want some external script doing this. We want to keep all those business rules and logic as closely tied to the database as we can, especially when the internals of the procedure need to refer to the database. So, that's why the database procedure is really the solution we want to use to calculate the salaries. Once we admit to that, we really cannot avoid cursor-based resultset processing anymore. So let's dig into some resultset processing...

The pseudocode for this operation would look like this:

for all records in the employee table do: get the current employee record calculate salary for the current employee record until all employee records are processed.

The ingredients are about the same for both examples. Both Oracle and MySQL open the cursor, fetch from it, and close it when it runs out of records.

The difference between MySQL and Oracle boils down to the difference in the way the loop control is handled.

In Oracle you could use a so-called cursor attribute. Cursor attributes are essentially builtin, readonly variables associated with the cursor. That's actually quite like member variables encountered in object oriented languages. To refer to a particular cursor attribute, you use this syntax: cursor_name%attribute_name. Oracle defines the following cursor attributes: FOUND, NOTFOUND, ROWCOUNT and ISOPEN.

In our snippet, we used the NOTFOUND attribute to check wheter the cursor was exhausted in order to exit the loop. We also used the ISOPEN attribute in the general catch-all WHEN OTHERS exception handler wich allows us to explicitly close the cursor when it was left open when an exception occurred in the inner block.

Although MS SQL Server's Transact SQL dialect does not define cursor attributes, there's a device there used to write cursor loops that in practice resembles the Oracle FOUND and NOTFOUND cursor attributes: the @@FETCH_STATUS (session wide) global variable. Theres a similar global resembling the Oracle ROWCOUNT attribute: @@CURSOR_ROWS.

The difference with Oracle cursor attributes is that in MS SQL, there's only one instance of the globals, reflecting the status of the last cursor statement. In practice, this is of no consequence because you usually only reference the cursor attributes in the immediate vincinity of the statements that could change the cursor status. T-SQL has one other device that provides information about the status of an arbitrary cursor (not just the current one), the CURSOR_STATUS(), but usually, the globals are sufficient for most purposes.

MySQL does not define any cursor attributes, not any globals that provide cursor status information. In fact, cursor functionality is limited to opening a cursor, fetching values from it's current record, and closing it. Like in other rdbms products, fetching from an exhausted cursor will raise an error situation in MySQL. Because there's no general way to know on beforehand if the cursor is exhausted, there's nothing left to do but fetching from the cursor and handling the error situation. Always.

So, in MySQL, we must declare a HANDLER for the builtin NOT FOUND condition. Usually, we want to continue our procedure when the cursor is exhausted, so this will usually be a CONTINUE handler. Inside the handler statement, we must place the statements that will update our loop control variable. Like in the Oracle case, we still have to check our loop control variable to actually exit the loop to stop the cursor traversal process.

When writing these loops for MySQL it might seems a bit awkward at first to use these contructs, especially when you are used to cursor attributes or cursor status globals. In my opinion, it's actually quite daft to have to rely on general error handling mechanisms for handling cursor exhaustion situations. After all, there's nothing exceptional or erroneous about the cursor becoming exhausted. But for now, it's no use sulking. This is just how MySQL does these things right now.

No, there is no such statement. Thw only way to manipulate the cursor position it throught the fetch statement, wich retrieves the values for the current position and automatically increases the position.

Also, MySQL cursors can be traversed in a forward manner only: you cannot fetch in the reverse direction.

Do you know that the mysql forum list includes a cursor forum? It's really helpful, lots of people ask their questions there, and...they are answered too! I hang around a lot there, so, post your ques

As a sidenote: I have noticed that this an my other MySQL cursor articles still remain quite popular. Stay tuned: I will be posting a article that explains exactly why and how cursors can be avoided in the majority of cases.

I asked around: the explanation is that the OPEN statement materializes the entire set associated with the cursor. The global variable that back FOUND_ROWS() is updated as a side-effect.

There are plans to alter the materialization behaviour of the OPEN statement. This means that FOUND_ROWS() is likely to change behaviour too when this get's implemented.

So, you are right in assuming that it is not recommended to rely on FOUND_ROWS() in this situation.

I also checked the documentation (http://dev.mysql.com/doc/refman/5.1/en/information-functions.html), this behaviour does not actually violate that:

"The row count available through FOUND_ROWS() is transient and not intended to be available past the statement following the SELECT SQL_CALC_FOUND_ROWS statement. If you need to refer to the value later, save it:"

So, you should *never* rely on FOUND_ROWS() unless it's the first thing you call after executing a SELECT SQL_CALC_FOUND_ROWS

I want to update the salaies of all the employees. 1st emp in the table shud b updated as sal = sal+12nd emp sal = sal+23d emp sal = sal+3etc... till the last record. Could u plz suggest this using the cursors?

sure, you are almost there. Your UPDATE statement is missing a WHERE clause that identifies the current record fetched from the cursor though. So now, you are just updating all records in the table for each iteration of the loop.

That said, I wonder what the purpose is of this update process. Do you realise that the order of the records is not explicitly determined? In other words, each time you run this code, another record might happen to be first, second etc. Also, you can do this all without a cursor. A single UPDATE statement is enough to accomplish this:

So if i execute those procedure will appear error message "Code already exists", and that message can i catch up in my application.So i want to validate the application by the stored procedure routine.

I am trying to write a cursor for loading data into a table after fetching it from other table.My table into which i am loading the data cant allow duplicates, hence I need to write an handler which will skip such insertion and move forward with the next row in the source table. All I can do in the below stored procedure is to exit the loop without throwing error.

just put the SELECT INTO in its own block and trap there NOT FOUND condition there to prevent it from bubbling up. Alternatively, set a flag before runnin the SELECT INTO and have the handler check if the flag was set. Of course you need to unset the flag again right after the SELECT INTO.

Yes, there is a very dirty trick that actually works. However, it is not guaranteed that this will work in future versions of MySQL because it relies on a rather unfortunate detail in the way cursors are implemented.

See, under the covers, MySQL cursors work by evaluating the query, and storing the result in a temporary table. This happens when opening the cursor. As a side-effect, the FOUND_ROWS() function actually works exaclty like it would for executing a regular query. Here's an example that proves it:

However, I should point out that in my opinion, the materialization of the resultset when opening the cursor is a very big mistake. It makes MySQL cursors extremely expensive: a lot of memory may be required to store the resultset, and it makes them slow.

So I actually hope this will be cleaned up ASAP after which this hack will stop working.

Hi Roland,Thanks a lot for your quick response.I am getting the following error during the execution of a stored procedure wherein a cursor is selecting all of the rows of a large table. Basically it iterates over the large table rows, extract some column values and updates those values into another table.

ERROR 3 (HY000): Error writing file '/tmp/MYmbeDU6' (Errcode: 28)

I would like to know is there anything equivalent to Oracle's BULK COLLECT with the LIMIT clause and iterates over the 3 million rows.

The only option that I have explored is using of dynamic SQL query in a loop with the LIMIT clause. Even in this case like "Select * from LargeTable LIMIT 100000, 5000" - here, the offset is too high and this would cause performance issues. Not sure how to tackle this issue efficiently. Please enlighten me on this subject.

That said, to me it sounds like that what you want to achieve - "extract some column values and updates those values into another table" - should not be done with a cursor. You should probably write a simple UPDATE statement to do that.

And when i am looping through this cursor, the cursor does not ever take the order by clause into consideration. It always runs as the order by clause was not there, make what so ever changes to the order by claue.

Search This Blog

About Me

My name is Roland Bouman. If you like you can follow my @rolandbouman feed on twitter.
I'm a Web- and BI Developer and Information Analyst. I'm currently working as a Software Engineer for Pentaho, a world-leading Open Source Business Analytics Suite and Big Data Platform. In the past, I have worked for Inter Access, MySQL AB, Sun Microsystems and Strukton Rail.
I authored two books: Pentaho Solutions (Wiley, ISBN: 978-0-470-48432-6) and "Pentaho Kettle Solutions" (Wiley, ISBN: 978-0-470-63517-9).
I'm proud to be elected Oracle Ace for my MySQL expertise.