Friday, December 31, 2010

How about CompuServe? Does anyone remember CompuServe? Does anyone still have a CompuServe account?

This was first published on June 14, 1996:

When I die I want to go like my grandfatherin his sleep,not screaming like the passengersin his car.

Q: How many Windows programmers doesit take to change a light bulb?

A: 472. One to writeWinGetLightBulbHandle,one to writeWinQueryStatusLightBulb,one to writeWinGetLightSwitchHandle...

Q: How many Customer Support Representativesdoes it take to change a light bulb?

A: We have an exact copy of thelight bulb here, and it seems to beworking fine. Can you tell me whatkind of system you have?OK. Now,exactly how dark is it?OK, there could be four or five thingswrong... have you tried thelight switch?

Q: How many Microsoft technicians doesit take to change a light bulb?

A: Three. Two holding the ladderand one to screw the bulb into afaucet.

Q: How many Microsoft vice presidentsdoes it take to change a light bulb?

A: Eight. One to work the bulb andseven to make sure Microsoft gets$2 for every light bulb everchanged anywhere in the world.

Q: How many testers does it take tochange a light bulb?

A: We just noticed the room wasdark. We don't actually fix theproblems.

Q: How many C++ programmers does ittake to change a light bulb?

A: You're still thinkingprocedurally. A properly designedlight bulb object would inherit achange method from a generic lightbulb class, so all you'd have to dois send a light bulb changemessage.

Q: How long does it take a computerrepairman to change a light bulb?

A: It depends on how many burnt-outlightbulbs he brought with him.

Q: How many Apple Newtons does it take tochange a lightbulb?

A: Foux! There to eat lemons, axegravy soup.

Q: How many Microsoft engineers doesit take to change a light bulb?

A: None, Bill Gates will just redefineDarkness ®as the new industry standard.

Q: How many Microsoft programmersdoes it take to change a lightbulb?

A: None, they just use OLE and pipein light from a bulb in the nextroom.

One of my favorite Foxhound regression tests is the one I call "the hopelessly overloaded database server". That's when I throw so much work at a SQL Anywhere database that ALL the lights on the server come on, ALL the fans start running and almost ALL the load-related Foxhound alerts appear in my inbox.

Yesterday, however, I broke an important programming rule: I looked to see what was happening while walking out the door at the end of the day. NEVER do that, not if you actually want to get out the door in a good mood.

But alas, it was a shoulder-slumping moment: Instead of recording a sample every 10 seconds like it's supposed to, Foxhound was taking OVER TWO MINUTES to record what the database server was doing.

And worse: It was issuing an Alert #1 "Database unresponsive" message, followed by an All Clear, in pairs, every two minutes. A previous test using an earlier build of Foxhound exhibited no such behavior.

Something must be running slower... much slower... inside Foxhound.

A bug which needs fixing... this is the reason we run regression tests, right?

[fretful sighing sound]

Ha HA! NOT a bug!

It's a behavior change!

The difference was not the two minutes to record a sample, the old version of Foxhound took the same amount of time.

The difference was the annoying Alert / All Clear messages appearing with every single sample.

OK, it's a bug...

...to be dealt with later. In the meantime, here's the new Foxhound FAQ with a screen shot showing what "hopelessly overloaded" looks like...

Use the Alerts Criteria Page to increase the threshold for Alert #1 - Database unresponsive from 1 minute to some value larger than the time it takes the Foxhound Monitor to gather a sample (the "Interval" time shown on the Monitor page).

Here's the long answer...

When the target database server is heavily overloaded, Foxhound may take longer than one minute (or whatever the threshold is for Alert #1 - Database unresponsive) to gather a sample. In that case Foxhound will issue Alert #1 while it's waiting for the sample data to be returned, and then immediately issue an All Clear when it does get the data.

This is new behavior for Foxhound. Previously, Alert #1 messages were only issued if Foxhound failed to gather a successful sample. In this case, however, Foxhound isn't actually failing to gather samples, it's just taking a long time.

The change was made because it is important for you to know when your server is hopelessly overloaded as well as when it is completely unavailable. However, the multiple Alert - ALL CLEAR messages are annoying, and something will probably be done about that in the future.

The following image shows what "hopelessly overloaded" looks like. The target database server is using 64% of a four-core CPU, but the server computer is actually running at 100% CPU... it's also running a multi-threaded client application with 1003 database connections performing 7,400 database update transactions per second. The client application wants to do more, but everything is maxed out, and instead of recording a sample every 10 seconds, Foxhound is taking more than two minutes for each one. To make matters worse, Foxhound is also running on the server computer; in this case, the first step should be to move Foxhound and the client application to some other computer(s).

Boy, do I have suggestion for you. You might call it 'undocumented' but it's really something you could have tried on your own. Due to historical reasons, though, it's both better and worse than you would think.

The suggestion is to use a named pipe to carry the data. Supposing that both the old and new databases are on the same database server, open one connection to each database and have each connection be used by a separate thread. On the old database, issue an UNLOAD .. TO '\\\\.\\pipe\\data'. That's just a standard Windows named pipe path. On ANOTHER THREAD and using the connection to the new database, *simultaneously* run LOAD TABLE ... FROM '\\\\.\\pipe\\data'. The data will pass straight between the two statements without needing intermediate data files. It's what dbunload does when rebuilding a database.

Okay, so normally that's just using your OS facilities and not an undocumented feature of SA. It's a pretty common technique for UNIX hacks who are familiar with mkfifo(). Here's the caveat, though. Due to historical reasons, SA recognizes any path that starts with \\.\pipe as a local named pipe and doesn't actually open an OS named pipe. Instead, it bypasses all that and shuffles the data via internal buffers. There can only be one such pair of operations in progress at a time since there is only one internal buffer for this purpose. Everything after \\.\pipe is ignored.

There are ways to use real OS named pipes but it's kludgy since at some time in the past we claimed the OS's pipe paths as our own. That should get cleaned up and documented for users who wouldn't think about using named pipes.

Another caveat: if you do use a real OS named pipe (which I haven't showed you how to do), the server is at the mercy of the producer (which in this case is the UNLOAD but could an arbitrary program) to close the pipe. Otherwise, LOAD will wait indefinitely for more data.

-john.

December 24, 2010 9:53 AMJohn Smirnios said...

One other caveat occurs to me. If a LOAD TABLE refers to a named pipe, it will not be recoverable unless you use CONTENT LOGGING or ROW LOGGING. Otherwise, the LOAD statement that gets logged will refer to the pipe path and that's not good. For a reload, it's not a big problem since the database gets checkpointed and the log is thrown away. I'm not sure what should be done in "production" -- it's so nice to be able to do the load without logging. That issue needs to be addressed too.

December 24, 2010 10:07 AMJohn Smirnios said...

I shouldn't post comments until I've thought everything through, right? The name after \\.\pipe\ does matter somewhat. If the one-and-only internal pipe is in use with another name, the LOAD or UNLOAD will block until it is no longer in use. So, either wait for both the LOAD and UNLOAD to complete before issuing the next pair of statements or use a unique name for each pair. Otherwise, you could end up with two UNLOADs writing into the same pipe and one LOAD might see the data from both UNLOADs (one file concatenated after the other).

December 24, 2010 10:25 AM

Here's proof it works...

First, here are the Windows command files used to create two SQL Anywhere 12 databases ddd1 and ddd2, start them both using one engine ddd, and then fire up one ISQL session on each of the databases:

The UNLOAD process is started by the TRIGGER EVENT statement on line 27; that statement starts the event defined on lines 8 through 24. Events run asynchronously, using their own connections to the database, and they meet John's requirement that "each connection be used by a separate thread".

The LOAD process is started by the LOAD TABLE statement on line 29.

The diagnostic MESSAGE output on lines 34 through 39 show that the LOAD processe started before the UNLOAD process completed ("Step 3: load starting" appears before "Step 2: unload done"). This meets John's requirement to "simultaneously run LOAD TABLE".

The SELECT output on lines 41 through 43 prove that the data made it through the named pipe unscathed.

Awesome! I don't know how much faster it is than files, I'm still geeking out over the fact it works!

What's with \\ all \\\\ the \\\\\\\\ backslashes?

Line 13 puts the following string in @sql, after the four string literal escape sequences '\\' are each reduced to a single backslash:

OPENROWSET is a special SQL Server function that provides a lot of the power of linked servers without requiring you to actually define a linked server. The syntax for OPENROWSET is rather funky, but it does make for concise code as shown in Figure 22.

Line 1 in Figure 22 shows that an OPENROWSET function call may be used as the target of a SQL Server INSERT statement. OPENROWSET calls may also the targets of UPDATE and DELETE statements, and they can appear in the FROM clauses of SELECT statements.

Line 1 also identifies the OLE DB provider to be used to communicate with SQL Anywhere, in this case the same standard SQL Server ODBC provider MSDASQL that was used in the linked server download technique shown earlier. In fact, the OPENROWSET technique is a variation on linked servers as far as the underlying technology is concerned.

Line 2 names the ODBC DSN to be used, plus the user id and password to be passed to SQL Anywhere; again, the same information that's necessary for linked servers. Line 2 is where the funky syntax comes into play: semicolons are used to separate the three parameters on this line rather than commas.

Line 3 names the target table on the SQL Anywhere database. Because there is no linked server involved, the simpler [owner].[table] naming convention is used as opposed to the full [server].[database].[owner].[table] syntax.

Lines 4 through 6 select all the rows from the SQL Server source table for processing by the INSERT.

There is some extra setup required for OPENROWSET, however, as shown in Figure 23.

Line 10 in Figure 23 is the central step in the setup: it makes OPENROWSET possible by turning on the Ad Hoc Distributed Queries feature.

Lines 16 through 20 are a simple test to see if the setup worked: a SELECT to display SQL Anywhere's single-row single-column "dummy" table. If OPENROWSET works, you will see the result on lines 22 through 24. If not, you'll see something like "Msg 15281 SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource'".

SQL Anywhere 11 ships with an alternative OLE DB provider for use with SQL Server, called SAOLEDB.11.

Tip: In theory, also known as "the documentation", you can refer to SAOLEDB.11 by the simpler name "SAOLEDB" but I haven't had any luck with that, finding success only with the full name "SAOLEDB.11".

Caution: The techniques shown in this section will fall over in a heap, also known as a "SQL Server fatal memory leak", if you have any column in your tables that SQL Server regards as a blob. If you're asking the question "Why then use SAOLEDB.11?" the answer is "Because it's faster than MSDASQL"... more on this in the next section called Performance.

Figure 24 shows two versions of the Windows command lines necessary to register the SAOLEDB.11 provider on the computer that's running SQL Server. The two versions are with and without the dialog boxes like the one shown in Figure 25.

Note: If the EXEC on lines 7 through 10 in Figure 26 produces the following message, it probably means that DisallowAdHocAccess is already set to zero for SAOLEDB.11, and you may carry on: "RegDeleteValue() returned error 2, 'The system cannot find the file specified.'".

That is truly Magic Code... the stored procedure sp_MSset_oledb_prop is not mentioned anywhere in the SQL Server 2008 Help, let alone documented, nor does any documentation appear on the web... at least none that I can find. All that's out there are a few examples like the code shown above.

Another Tip: The Script Action to Clipboard menu item won't actually generate any code unless you make some kind of change, but it's easy to change an option and then change it back so you can capture the code.

Figure 28 shows how to create a linked server using the SAOLEDB.11 provider instead of MSDASQL when you're using Technique 4: Download Via Linked Server. The code in Figure 28 is very much the same as the earlier Figure 20, with these variations: The @srvproduct on line 3 needs to be filled in with an actual value, the new value for @provider is necessary on line 4, and the @locallogin parameter must be assigned the NULL value on line 11.

Figure 30 shows the INSERT for Technique 4 using SAOLEDB.11. It's exactly the same code as the INSERT in Figure 20 earlier, except for the fact that a different provider is being used behind the scenes.

Once you've configured SAOLEDB.11 you can immediately start using it for Technique 5: Download Via OPENROWSET. Figure 31 shows a simple test you can run to confirm everything's OK so far; another select from dummy, this time using the funky OPENROWSET call.

The Row Locks column in Figure 34 shows why LOAD TABLE is a winner: every other technique forces SQL Anywhere to gather and release 1.9 million row locks, but LOAD TABLE is essentially non-transactional from a performance point of view. Couple that with the raw performance of BCP on the SQL Server side and it's unlikely that any other technique will ever come close... when rows per second is your main (or only) measure of success.

The "Seconds, Part 1 and Part 2" columns in Figure 34 apply to the first two techniques: separate times are shown for BCP and LOAD TABLE, and for the two distinct stages of a MobiLink download: Part 1 is for the MobiLink server to SELECT the rows from SQL Server and transmit the download stream (all 1.9 million row images), and Part 2 is for the MobiLink client to INSERT the rows on the SQL Anywhere database.

Comparing human performance is a lot harder; here are some subjective opinions:

a) The BCP and LOAD TABLE technique can be difficult to implement because it uses an external file rather than a direct database connection. Think WAN, think security, encryption, etc.

b) BCP is less than perfectly flexible when it comes to dealing with non-SQL-Server targets and sources. In particular, it can be difficult to specify BCP row and column delimiters that are guaranteed to work with (and not collide with) all possible combinations of characters in the data itself. The good news is, if you can get something to work on the BCP side, the syntax of LOAD TABLE is flexible enough to deal with it.

c) MobiLink is by far the most flexible technique. If you have "interesting" data transformations to perform, like big schema differences between source and target, MobiLink may be your best bet. Plus, MobiLink offers end-to-end encryption.

d) Proxy tables are loved, or hated, there is no middle ground. Performance has been a big issue for many years, not just with complex cross-server joins, but with simple straight-through data transmissions like the one shown in this article.

e) The folks who love proxy tables, love the flexibility. Folks who have used both SQL Anywhere proxy tables and the SQL Server linked server and OPENROWSET techniques tend to have, ahem, mixed opinions about the SQL Server side of things. It's not the syntax that's the problem, it's flexible enough, it's the actual user experience that can leave something to be desired.

The next version of SQL Anywhere, code-named Innsbruck, entered beta testing while this article was being written. Initial testing of the beta software reveal enormous performance improvements in the processing of proxy tables, at least as far as straight-through data transmissions are concerned.

(Note: Innsbruck is now SQL Anywhere 12, the current Generally Available (GA) version of SQL Anywhere.)

In fact, the proxy table technique using the Innsbruck software is faster than all other techniques shown in this article, even faster than BCP and LOAD TABLE. The proxy table INSERT - SELECT statement shown in Figure 18 took 4,230 seconds to run using SQL Anywhere 11.0.1 software but only 687 seconds using Innsbruck; the effect is clearly shown in Figure 35.

Figure 35: Innsbruck Proxy Table Improvement

Caveats: Figure 35 doesn't show Innsbruck results for the other techniques so the comparison may not be completely fair. Also, performance of the Innsbruck software may change before it becomes generally available

Having said that, the future looks bright for proxy tables; performance has always been the biggest stumbling block in the path of wider acceptance.

The five techniques presented here don't cover all the ways you can copy data back and forth between SQL Server databases and SQL Anywhere in-memory databases. They don't even cover all the download techniques, let alone covering products like Sybase IQ or Oracle.

Here's a partial list of topics-not-covered in this article, just the SQL Server-related ones, for both download and upload:

Using SQL Anywhere's FROM OPENSTRING ( FILE ... ) clause to treat the contents of a file as a set of rows.

Using SQL Server's FROM OPENROWSET ( BULK ... ) clause to treat the contents of a file as a set of rows.

Using the many XML-aware features in both SQL Anywhere and SQL Server to push and pull data back and forth.

Using SQL Anywhere's UNLOAD TABLE and UNLOAD SELECT to rapidly create a file from a set of rows.

Using SQL Server's bcp.exe utility to rapidly upload data from a file to a table.

Using SQL Server's BULK INSERT command to rapidly upload data from a file to a table.

Using MobiLink to directly upload data to SQL Server, using the script-driven upload feature to cope with the fact that an in-memory SQL Anywhere database doesn't have the transaction log that is usually used to drive the upload process.

Friday, December 24, 2010

One week ago I wrote about how UNLOAD COMPRESSED not only saves space over the regular UNLOAD but also runs faster when you count the LOAD TABLE time.

Now, it's a week later, and I can report a real-world case where it was worth the time and effort to retrofit UNLOAD COMPRESSED in place of INSERT SELECT. Here's the story:

Whenever a new release of the Foxhound database monitor is made available, it is shipped as a complete replacement of the SQL Anywhere database that comprises Foxhound.

If a Foxhound database already exists on a customer's computer, it is not altered or modified in any way. Instead, all of the customer's data is copied across to the new database as part of the upgrade process.

Just the data. None of the schema or executable code (which also exists inside the database) is copied.

This technique has several advantages, the primary one being that a new Foxhound database IS a new database after upgrading, not an altered database whose schema may differ from a fresh new database in some subtle and unpredicted ways.

The primary disadvantage is time... it takes time to copy gigabytes of data, especially using SQL Anywhere 11's proxy table facility to perform INSERT new-table SELECT * FROM proxy-old-table. The INSERT SELECT process also requires a lot of extra disk space during the upgrade process: 100% more to be exact.

Of the twenty-odd tables copied as part of a Foxhound upgrade, only three count as far as performance is concerned because they hold 99% of the rows and use up 99% of the disk space. For these three tables the INSERT SELECT process was replaced with

FORWARD TO statements that run on the new database

to push UNLOAD COMPRESSED statements over to the old database

where they are executed to unload the tables to files

that are then processed by LOAD TABLE statements run on the new database.

The results are impressive: First of all, the Foxhound upgrade process now runs 90% faster for a few hundred megabytes. For larger databases, the savings may be greater.

And second, even though you might think the existence of three new UNLOAD files would increase the disk space requirements, the opposite is true. The reason is that LOAD TABLE statements don't write to the transaction log, and that more than compensates for the UNLOAD files: the "high water mark" for extra disk space is now 66% instead of 100%.

Technique 4: Download Via Linked Server

The Linked Server feature was introduced in SQL Server 2000 to allow ad-hoc queries in SQL Server to operate on a variety of external data sources via OLEDB and ODBC. Linked servers offer similar functionality to SQL Anywhere's proxy tables, but with different syntax: instead of referring to proxy tables, direct references are made to actual tables residing elsewhere from within SQL statements running in SQL Server.

As with proxy tables, all you need to use a linked server is ODBC access to the actual data and a few lines of SQL code. Figure 20 shows the code necessary for SQL Server to define a linked server to connect to SQL Anywhere and run an INSERT SELECT to copy all the data from the local table mss_source to the target table sa_target.

Lines 1 through 6 in Figure 20 call the system stored procedure that defines the linked server in SQL Server.

Line 2 specifies the logical local server name "mem" to be used in later SQL statements.

Line 3 specifies the "product name" of the linked server. Apparently, at least in this case, this parameter isn't important as long as you specify something other than the default NULL value.

Line 4 identifies the OLE DB provider to be used to communicate with SQL Anywhere. In this case MSDASQL is specified, which is the standard Microsoft OLE DB Provider for ODBC. You can also used the SAOLEDB.11 provider that ships with SQL Anywhere 11, but the MSDASQL provider is capable of handling the simple SQL used in this article.

Line 5 names the ODBC System DSN that the OLE DB provider will use to reach SQL Anywhere; the registry entry for this DSN is shown in Figure 21 below.

Lines 8 through 13 in Figure 20 call the system stored procedure that defines the login path that SQL Server will use behind the scenes to connect to SQL Anywhere.

Line 9 names the linked server that was defined earlier, on line 2.

Line 10 says that the local SQL Server login id will not be used to connect to SQL Anywhere; instead, a user id and password unique to SQL Anywhere is provided on lines 11 and 12.

Lines 15 through 18 show how all the rows are retrieved from the local SQL Server table mss_source and inserted into the SQL Anywhere table sa_target.

Line 15 uses the standard four-part SQL Server table naming convention: [server].[database].[owner].[table]. In this case the server is linked server mem defined above. The database name is omitted because, with SQL Anywhere, there's only one database per connection. The last two parts are the owner and table names: dba.sa_target.

Figure 21 shows the system DSN that is stored in the registry on the server computer. This is a straightforward SQL Anywhere DSN, with default values for everything except UID, PWD, DatabaseName, ServerName and CommLinks.

Technique 3: Download Via Proxy Table

Remote data access was introduced in SQL Anywhere version 6 over 10 years ago. This feature allows you to create the schema for a table inside a SQL Anywhere database without actually storing the data inside that database. This is called a "proxy table" because it represents data that is actually stored in some other location: an Excel spreadsheet, an Oracle or DB2 database, a text file, or, in the case of this article, a SQL Server database.

Proxy tables are surprisingly powerful and easy to use; all you need is ODBC access to the actual data, and a few lines of SQL code, and you're all set. Figure 18 shows all the code necessary for SQL Anywhere to connect to SQL Server, create a proxy table that represents the remote table mss_source, and run an INSERT SELECT to copy all the data to the local (real, non-proxy) table sa_target.

Line 1 in Figure 18 creates a remote server schema object in the SQL Anywhere database and gives it the local name mss.

Line 2 specifies the access path or server "class" to be used. Currently there are 12 different classes available in SQL Anywhere, ranging from vanilla ODBC class for Excel, text files, etc., to Oracle-via-ODBC and Sybase ASE-via-JDBC. In this case, the MSS-via-ODBC class MSSODBC is the best choice.

Line 3 specifies the ODBC DSN to be used. It is possible to specify a DSN-less connection by putting ODBC driver information in the USING clause, but in this case the DSN shown in Figure 19 is used.

Lines 5 through 8 define the "external login" path that SQL Anywhere will use behind the scenes to connect to SQL Server: the local user id DBA will be used to connect to the remote server earlier given the name mss, with the SQL Server user id and password given by the REMOTE LOGIN and IDENTIFIED BY clauses.

Lines 10 begins the definition of the proxy table named proxy_mss_source. The EXISTING clause specifies that the real table already exists on SQL Server so it doesn't need to be created over there, just the proxy table here on SQL Anywhere.

The AT clause on line 11 identifies the real table on SQL Server. The first parameter, the remote server name mss, identifies the CREATE SERVER access path and by implication the CREATE EXTERNLOGIN login path. The remaining three parameters name the SQL Server database main, the table owner dbo, and the real table mss_source.

Lines 13 through 15 show how all the rows from mss_source may be retrieved via the proxy_mss_source table and inserted into the real SQL Anywhere table called sa_target. Like the previous two techniques (LOAD TABLE and MobiLink), the INSERT in Figure 18 runs as a single transaction, gathering and then releasing 1.9 million row locks in SQL Anywhere.

Figure 19 shows the user DSN that is stored in the registry on the laptop computer. Note that except for being stored in the registry of a different computer, this DSN is identical to the one used for MobiLink in the previous section; see Figure 14.

Friday, December 17, 2010

Way long ago, Version 11 of SQL Anywhere introduced UNLOAD COMPRESSED... here's an excerpt from the "What's New in Version 11.0.0" section in the Help:

UNLOAD statement enhancements When using the UNLOAD statement, you can now specify whether to compress and/or encrypt the data that is being unloaded by specifying the COMPRESSED or ENCRYPTED clauses, respectively. See UNLOAD statement.

Files compressed or encrypted using these clauses can only be loaded (for example, using LOAD TABLE) by SQL Anywhere 11.0.0 database servers (or newer). Files compressed or encrypted using other tools are not usable by SQL Anywhere.

Whaddaya mean, it didn't make it to the Top 10 list?

Apparently not...

Does UNLOAD COMPRESSED work?

It sure does; here's a test involving three small tables... small, as 25,000 rows and 11M of disk space in the database.

Here's one of the tables and one of the rows in that table... lots of numbers, lots of strings, some NULLs, nothing too exciting, just your average Table From Hell (because in the real world, this table contains millions of rows, gigabytes of data):

Technique 2: Download Via MobiLink

MobiLink is bidirectional synchronization software that ships in the box with SQL Anywhere. MobiLink works with one or more SQL Anywhere "remote" databases and one central "consolidated" database running on SQL Server, Oracle, IBM DB2, Sybase ASE, SQL Anywhere or MySQL.

MobiLink consists of client and server components which connect to the remote and consolidated databases respectively. For the technique described in this section, MobiLink is used in one direction only (download) from a SQL Server consolidated database to a SQL Anywhere remote database. The process works as follows:

a) When the MobiLink server is started, often as a service running on or close to the computer running SQL Server, it establishes an ODBC connection to the SQL Server consolidated database and then waits to be contacted by the MobiLink client.

b) When the MobiLink client is launched to start a synchronization session, it establishes a database connection to the SQL Anywhere remote database, and then it establishes a network connection to the MobiLink server using a proprietary high-level protocol.

c) The MobiLink server then executes a user-written SQL SELECT statement against the SQL Server database and sends the result set to the MobiLink client via the proprietary network protocol.

d) The MobiLink client applies the downloaded result set, row by row, to the SQL Anywhere database using SQL INSERT statements.

Folks familiar with MobiLink will recognize that the above steps do not begin to exploit the features and flexibility built in to MobiLink. When it comes to the subject of downloading data to an in-memory database, the following features may be valuable (but are not shown in this article):

a) The WHERE clause of the download SELECT can include a predicate that selects rows which have been inserted or updated since the previous synchronization, if the rows contain a datetime column that record when the row was inserted or last updated. MobiLink automatically maintains the datetime of the last successful synchronization for this purpose. This technique may be more useful when the data in the in-memory database is saved persistently; i.e., when the -im c option is used instead of -im nw (in-memory with checkpoints instead of no writes).

b) Other predicates may be used to select a subset of rows; e.g., only download this year's data.

c) The download SELECT statements can be a complex as you want, even using stored procedure calls, as long as the result set matches the row layout defined on the remote database. This fact makes it easy to accommodate schema differences between consolidated and remote databases: normalized versus denormalized designs, and so on.

Figure 10 shows how the MobiLink system tables and other MobiLink schema objects are added to the SQL Server database before any synchronizations can be run. MobiLink doesn't require any user-written configuration files; everything necessary to run the server side of a synchronization is stored in the SQL Server database itself.

Line 2 in Figure 10 specifies the input SQL command file which is installed as part of the standard SQL Anywhere setup. Included in this file is the CREATE statement for the ml_script table used by MobiLink to store the download SELECT statement mentioned earlier. When the MobiLink server starts a synchronization session, it reads the SELECT statement from ml_script and then turns around and executes that statement against SQL Server. There's lots of other objects defined in syncmss.sql but ml_script is at the heart of things for the technique shown here.

Figure 11 shows how the download SELECT is added to the MobiLink system table on SQL Server.

Line 4 in Figure 11 launches one of the MobiLink system procedures created by the command shown in Figure 10. In this case, the ml_add_table_script procedure adds a new table-specific MobiLink script to the ml_script table.

Lines 5, 6 and 7 provide the three-part primary key that uniquely identifies the SELECT statement on line 8.

The script version 'v1' on line 5 is used to make it easy to define whole different sets of MobiLink scripts to be used by different versions of your application. In this article, we're only using one version of one single script, so the version isn't important.

Line 6 specifies the remote database table name 'sa_target' to which this script applies: the SQL Server result set produced by the SELECT * FROM mss_source on line 8 is to be sent to sa_target on the SQL Anywhere database.

Line 7 names which "MobiLink event" will launch the script on line 8. The MobiLink server is event driven, with many dozens of connection, synchronization and table-level event. In a complex synchronization application, many hundreds of synchronization scripts are required, for various purposes like authentication, conflict resolution and upload processing as well as download SELECT statements.

Note: The actual schema for the MobiLink system tables is a bit more complex than implied here, so if you go looking at the table layouts you'll find that ml_script has a single-column surrogate primary key, that the relationship table ml_table_script contains the three-part primary key described above, and that integers are used to identify the version and table, via yet more relationship tables... none of which is really important for the discussion here.

In this article exactly one single MobiLink event is used, the table-level 'download_cursor' script. In a perfect world this event would be named 'download_select' because that's what you code: a SELECT, or a procedure CALL which returns a result set, never a cursor definition.

When the MobiLink server reaches the point in the synchronization process when it's time to download data to the sa_target table, it fires the download_cursor event for sa_target and the script on line 8 in Figure 11 is executed.

Figure 12 shows the MobiLink server running on the server computer BRECK-PC, and Figure 13 shows the command that was used to start it. At this point, the MobiLink server is waiting for a synchronization request from the client side.

Line 1 in Figure 13 is the filespec for the MobiLink server mlsrv11, and line 2 provides the ODBC database connection string that the MobiLink server will use to connect to the SQL Server database: the ODBC DSN and the SQL Server user id and password.

Line 3 specifies the text file where the MobiLink server should write diagnostic and progress messages, and the -vscn option on line 4 sets the verbosity level: script names, script contents and row counts. Experience has shown that this diagnostic file is critical to the debugging process during both development and production... because even in production, whenever a network is involved, stuff happens.

The -zu+ option on line 5 tells the MobiLink server not to bother authenticating the MobiLink user name sent to it by the MobiLink client. If you need extra security between the MobiLink client and server, you can predefine the MobiLink user name and password and omit this option to force authentication. You can also specify transport layer and end-to-end encryption for the path between MobiLink client and server, but the details of encryption are not discussed in this article.

Figure 14 shows the user DSN that is stored in the registry on the server computer.

Figure 15 shows the MobiLink setup necessary for the SQL Anywhere remote database on the client side. MobiLink uses a simplified "publish subscribe" model where the tables to be synchronized are named as in a "publication" and a MobiLink "user" is linked to that publication via a "subscription".

Lines 1 and 2 in Figure 15 define a publication called p1, consisting of the single table sa_target. In more complex setups, multiple table names can appear, with column name lists to specify subsets of columns and WHERE clauses to control which rows are uploaded.

Normally, tables are subject to upload as well as download but that's changed by the FOR DOWNLOAD ONLY clause. This is done for two reasons: The technique being described here is download only, and the usual MobiLink transaction-log-driven upload process is impossible because transaction logs don't exist for in-memory databases. MobiLink does offer a non-log-based script-driven form of upload, but again, upload isn't needed here.

Line 4 names the MobiLink user id as "1" and specifies TCP/IP as the network transport layer between the MobiLink client and server. In simple setups like this the MobiLink user id uniquely identifies the remote SQL Anywhere database. Other clients would use 2, 3, etcetera, or any other unique VARCHAR ( 128 ) string. This user id is all that's necessary, and it is how the MobiLink server keeps different synchronization sessions apart.

Note: The MobiLink user id is effectively a globally unique remote database identifier that you manage. It is not a database user id.

Figure 16 shows the MobiLink client running on the laptop computer PAVILION2, and Figure 17 shows the command that was used to start it. At this point, the MobiLink server has responded and a synchronization is in progress; in fact, all the data has been sent to the MobiLink client and 1.3 million rows have been inserted into the SQL Anywhere database so far.

Line 1 in Figure 17 is the full filespec for the dbmlsync program, and line 2 shows the standard SQL Anywhere database connection string: server or engine name, database name, and SQL Anywhere user id and password.

Line 3 provides extra synchronization parameters, in this case the host address of the MobiLink server which is running on the BRECK-PC computer together with SQL Server, plus the script version sv=v1 to tell the MobiLink server which set of MobiLink scripts to use.

Line 4 specifies the text file where the MobiLink client should write diagnostic and progress messages, and line 5 specifies the -vno sets the verbosity of these messages to include row counts and options used. The client-side diagnostic file is not as useful as the server-side file defined earlier on line 3 in Figure 13... but it sometimes helps.

Monday, December 13, 2010

This is Part 1 of an article written in March 2010 with the title "Downloading Data into a SQL Anywhere In-Memory Database from Microsoft SQL Server".

This is the first in what may become a series of articles presenting various techniques for downloading data into a Sybase SQL Anywhere Version 11 in-memory database from a variety of other databases, and then saving or uploading data to those other databases before the in-memory process is shut down.

This article specifically discusses downloading data from a Microsoft SQL Server 2008 database into a SQL Anywhere database that is using the "never write" mode as opposed to the "checkpoint only" mode of in-memory operation. Five techniques are presented, with actual code tested on a non-trivial set of data. The data was chosen to expose some of the quirks that may be encountered when dealing with the interface between two different database products.

This is primarily a how-to article. No conclusions are drawn about "what's best for you" because that depends on your priorities. In fact, one single application may use more than one technique for different purposes; e.g., fast in one place, flexible in another.

For an overview, read the next section Introducing The Techniques Used, and then skip ahead to the sections Performance and Breaking News!

Technique 1 is a two-step process: First, the high-performance SQL Server BCP utility (Bulk Copy Program) is used to unload the SQL Server source table across the LAN to an ASCII text file.

Then, SQL Anywhere's high-performance LOAD TABLE statement is used to load the data from that file into the target table in the in-memory database.

The BCP "queryout" option is used to run the simple query "SELECT * FROM main.dbo.mss_source". Special character strings are used as row and column delimiters in the text file to deal with the fact that the data itself contains special characters like tabs, commas, quotes and line breaks.

The MobiLink synchronization process is launched by running the MobiLink client dbmlsync.exe on the client computer, which then connects via TCP/IP to the MobiLink server running on the server computer. The MobiLink server then connects via ODBC to the SQL Server database and runs a pre-defined SQL script called a "download_cursor" (see below) to select all the rows from the mss_source table. Those rows are then sent down to the MobiLink client which inserts them into the SQL Anywhere database.

Technique 5 uses SQL Server's OPENROWSET syntax to perform the same operation as Technique 4 but without having to define the linked server. The code looks vastly different but under the covers it's the same technology:

Techniques 4 and 5 are shown twice, the second time using a different OLE DB provider: the SAOLEDB.11 provider that ships with SQL Anywhere is a faster alternative to the MSDASQL provider that ships with SQL Server.

This section describes the test setup in some detail. You can skip to Technique 1 if you want, and return to this section when you have questions about the setup.

This is not a benchmark performance test setup, not by any stretch of the definition of "benchmark", nor is it a comparison of SQL Anywhere with SQL Server. Every technique presented involves both products, and the performance figures are presented to help you decide among the various techniques, not decide between two products. Even then, performance is often not the only determinant; you may find one technique better than another based on other criteria, such as flexibility or simplicity, rather than performance.

Here is the hardware and software used to create the test environment.

Server computer

Whitebox desktop with Intel Core 2 Quad Q9450 2.66Ghz 4G RAM

Windows Vista Ultimate 64 bit build 6001 SP1

SQL Server 2008 Enterprise Edition 64 Service Pack 1

Host name "BRECK-PC"

SQL Server name "TSUNAMI"

Database name "main"

SAOLEDB.11 provider from SQL Anywhere 11.0.1.2276

Client computer

HP Pavilion laptop, 4GHz Pentium 4, 2G RAM

Windows XP SP2

SQL Anywhere 11.0.1.2276

Hostname "PAVILION2"

SQL Anywhere server name "mem"

Database name "mem"

Standard setups were used to install Windows, SQL Server 2008 and SQL Anywhere 11, including all the services required to launch SQL Server. These standard setups are not described here, but everything else is, including the commands to launch the SQL Anywhere server.

For the most part text scripts are presented for both SQL statements and Windows commands, as opposed to GUI dialogs that perform equivalent tasks. Text scripts are used for two reasons: First, scripts are easier to explain because they're more concise, and script syntax tends to change less frequently than GUI layouts. Second, scripts are favored by many DBAs for implementing processes that must be repeated. GUIs are great for learning new subjects and performing one-time tasks, but scripts are often much better at automating repetitive tasks. Scripts also allow comments and they lend themselves to change control.

For the record, the equivalent GUI dialogs are available in these installed programs:

Line 1 in Figure 1 shows the filespec for sqlcmd.exe. According to Microsoft this is a modern replacement for the venerable osql.exe, but it works pretty much the same way. The caret "^" is the Windows command line continuation character, and it's used to make the scripts in this article easier to read and to describe.

Line 2 specifies the database to be used when first connecting to SQL Server. Unlike SQL Anywhere, with SQL Server you connect to a server and then specify which database you want to use, and you can switch databases while remaining connected. With SQL Anywhere, connections are made to a database within a server, and if you want to use a different database you must start a new connection even if it's on the same server.

Line 3 lets you use "doublequotes" around identifiers in your SQL commands, which you must do if any of them are reserved words.

Lines 4, 5 and 6 specify the password, server and userid for sqlcmd's connection to SQL Server. BRECK-PC is the host name for the server computer and TSUNAMI is the SQL Server database server name.

Figure 2 shows the SQL Server CREATE DATABASE statement used to create the source database. The data and log files are stored together on the E: drive, separate from the C: drive containing the operating system and SQL Server binaries.

Line 1 in Figure 2 specifies the database name "main". Lines 2 through 6 specify where the data file resides whereas lines 7 through 11 specify where the log file goes. As far as the rest of this article is concerned only the database name "main" is important, the rest is not.

Extra disk I/O on the server computer was avoided by using Control Panel - System - System Protection - Automatic restore points to turn off Windows Vista restore points for the E: drive. Also, the SQL Server VSS Writer was disabled using Control Panel - Administrative Tools - Services for the same reason.

Some SQL Server configuration changes were necessary for some techniques; the code for these changes is shown later in this article.

Figure 3 shows the CREATE TABLE for the source table on SQL Server. This table contained 63 columns, 1,925,469 rows and approximately 1 gigabyte of statistical data gathered by an automatic database monitor process. The actual nature of the data is not as important as the fact that it comes from a real-world application, not an artificial test data generator. No claims are made, however, for the suitability of this data for benchmark purposes.

Tip: When transferring data back and forth between SQL Server and SQL Anywhere, avoid blobs. That means on the SQL Server side avoid data types like TEXT and VARCHAR ( MAX ) and on the SQL Anywhere side avoid LONG VARCHAR. In fact, in SQL Anywhere you should avoid any string defined as VARCHAR ( 8001 ) or larger; those aren't blobs in SQL Anywhere but SQL Server will treat them as such because 8000 is the limit there. The reason for this suggestion? To avoid performance problems on the SQL Server side as well as a fatal memory leak. The original table used for this article contained nine columns defined as TEXT but since the actual data didn't contain any string values longer than 144 characters it was easy to change them all to VARCHAR ( 1000 ). The fact that that helped indicates the problem lies with the data type, not the data.

Line 1 in Figure 4 is the full filespec for the database initialization utility dbinit.exe as installed by a standard SQL Anywhere setup on Windows. Multiple versions of SQL Anywhere often exist on workstation computers so when using command files it is always a good idea to specify full filespecs instead of relying on the system PATH.

Tip: The environment variable SQLANY11 makes it a easier to code filespecs in Windows command files that run SQL Anywhere executables. This environment variable is created by the standard SQL Anywhere 11 setup as follows:

SQLANY11=C:\Program Files\SQL Anywhere 11

Line 2 specifies the file name for the SQL Anywhere database file. The transaction log file name defaults to mem.log, and the page size defaults to 4K which is the best choice for most applications.

Line 4 is optional. It launches the "spawn" program dbspawn.exe, which in turn launches the database server itself (lines 5 through 9) in the background. This is handy when writing command files because without the services of dbspawn.exe, the server program dbsrv11.exe would run in the foreground and not return control to command file until the server was shut down. In other words, without dbspawn, the third command (dbisql.com) wouldn't be executed until after the server was shut down.

The -f parameter on line 4 tells dbspawn to ignore the possibility that other SQL Anywhere servers might be running, and to start this one regardless.

Line 5 is the full filespec for the network version of the SQL Anywhere database server, dbsrv11.exe. Alternatively, you can use "personal server" dbeng11.exe which supports exactly the same features except for these differences:

dbeng11.exe supports a maximum of ten concurrent connections,

it uses at most one CPU for request processing, and

it doesn't support network client/server connections.

Line 6 specifies the "-im nw" option: in-memory never write. This is the most efficient and most extreme version of in-memory operation: Not only is there no transaction log and no temporary file like the other version (-im c or in-memory checkpoint only) but there is no checkpoint log and changes are never written to the database file.

The implications of in-memory operation are quite profound: All the data must fit into the cache, and so must all the temporary data; there is no temporary file to absorb the excess. Plus, it's up to you to save the data if you need it later. You can write SQL code to do this, or use the dbunload.exe utility, but neither the BACKUP statement nor the dbbackup.exe utility will work... well, they'll work, but they won't help: they won't back up the data in memory, just the empty database file.

Line 7 is optional, but recommended for the tests described here. The -c 1200M option specifies an initial database cache size 1200 megabytes. Testing showed that this is a reasonable choice for the data set used in this article, and the whole topic of "picking the initial cache size" is worth some discussion:

SQL Anywhere is renowned for being a self-managing database. Most performance and tuning options have well-chosen defaults, and in most cases SQL Anywhere runs very well out of the box with no configuration changes. The database cache size is no exception: The initial cache size is automatically calculated using a formula that performs well in most circumstances, and the cache is dynamically resized up and down as needs change during execution.

However, launching an empty database using the in-memory mode, and then loading vast quantities of data into the database cache, qualifies as an exception to the above phrase "in most cases". In this case the formula for the initial cache size picks 2 megabytes because the empty database file is very small, and the dynamic resizing process must grow the cache by a factor of 600. This adversely affects performance in two ways: a less-than-optimal internal cache structure is determined by the tiny initial cache size, and the act of growing the cache step-by-step as data is loaded can slow down the loading process itself.

In this particular case, an in-memory database that starts out empty, performance can be improved by taking a guess at the initial cache size, say -c 500M, or even better -c 1G, or as determined by testing, -c 1200M.

Line 8 in Figure 4 is optional but recommended for every production database. The -o parameter specifies the filespec for a text file to receive a copy of all diagnostic messages written by the database server. This output is often called the "console log", not to be confused with the transaction log. Unlike SQL Server, SQL Anywhere does not save these messages anywhere unless you specify -o filespec.

Line 9 specifies the filespec for the database file, in this case mem.db. Unless overridden by other parameters the file name portion determines the runtime "engine name" and "database name" to be used in connection strings: ENG=mem;DBN=mem;

Lines 11 and 12 launch the Interactive SQL utility used to run many of the SQL statements in this article. The -c connection string specifies runtime server or engine name ENG=mem, the runtime database name DBN=mem, the SQL Anywhere user id UID=dba, the password PWD=sql, and an optional connection name CON=mem-1.

Tip: Because SQL Anywhere databases are often isolated and embedded behind other layers of software, the same user id is often used for all database connections and it can be hard to tell one connection from another when debugging problems. Different CON= connection names can be used to alleviate the situation.

Figure 5 shows the SQL Anywhere version of the SQL Server table defined earlier in Figure 3. SQL Anywhere offers a high degree of Transact SQL compatibility so the only syntactic differences are the three computed columns defined on lines 63, 67 and 71 in Figure 5.

This two-step technique uses BCP (Bulk Copy Program) to unload the SQL Server source table to a text file, and SQL Anywhere's LOAD TABLE statement to load the data from that file into the target table in the in-memory database.

The BCP utility is a popular high-performance workhorse often used to load and unload vast quantities of data to and from SQL Server databases.

On the SQL Anywhere side, similar functionality is provided by the LOAD TABLE and UNLOAD SQL statements.

Figure 6 shows the code for the first step, the Windows command line that executes BCP.

Line 1 in Figure 6 is the full filespec for bcp.exe as installed by a standard SQL Server 2008 setup on Windows.

Line 2 selects all the rows and columns the table to be unloaded: mss_source, owned by "dbo", located in the database "main". BCP is a bit restrictive when it comes to coding the SELECT: it must be enclosed in "double quotes" and appear all on one line; it can't be written on multiple lines even if you use continuation characters.

Line 3 specifies "queryout" for output (as opposed to "in" for input), and specifies the full output filespec. This filespec is relative to the computer running SQL Server itself, not the computer running the BCP utility. In this case, BCP is being run on the target laptop computer, not the server computer, and the text file is being written to the laptop... but it doesn't matter where BCP itself is running, it matters where the target file resides relative to the server. That's why the filespec is written using the "\\server\share\..." Universal Naming Convention format, so that SQL Server will write the data across the network to the PAVILION2 computer.

Line 4 specifies that the -c for "character" format is to be used for the data written to the text file. A -n for "native" format is also available but it's meant for target tables stored on other SQL Server databases; it cannot be used with SQL Anywhere.

Line 5 specifies the -t string to placed between each column value in the output text file. The documentation calls it a "field terminator" but it's really a separator because it doesn't appear after the last column value on each line. The default separator string is the single tab character (hexadecimal 09, or \t in SQL-Server-speak), but since the data contains tab characters something different must be chosen as a terminator. The -t string $#$\t means dollar sign, number sign, dollar sign and tab, and it corresponds to the SQL Anywhere DELIMITED BY '$#$\x09' clause in the LOAD TABLE statement shown later.

Line 6 specifies the -r "row terminator" string to be placed at the end of each row of data in the output text file. The default is the "newline" character, which corresponds to the carriage return - line feed pair (CR-LF or hexadecimal 0D0A) in Windows. Once again, the data contains all sorts of special characters including CR-LFs, so a different -r string is specified: $#$\n means dollar sign, number sign, dollar sign and CR-LF. This corresponds to the SQL Anywhere clause ROW DELIMITED BY '$#$\x0D\x0A'.

Figure 7 shows how lines 2 and 3 in Figure 6 may be changed to specify a table name rather than a SELECT when you want all the rows and columns. With this simpler but less flexible table name syntax, you also have to change the "queryout" on line 3 to "out".

Figure 9 shows the SQL Anywhere command that loads the file created by BCP in the earlier Figure 6. This step ran in about 4.6 minutes to load all 1.9 million rows into memory. As discussed in the previous section, if the initial cache size parameter -c 1200M is omitted from the SQL Anywhere server command line in Figure 4 the LOAD TABLE in Figure 9 slows down dramatically, to over 11 minutes from less than five.

Figure 9: SQL Anywhere LOAD TABLEContext: SQL Anywhere

LOAD TABLE sa_target FROM 'c:/data/main/mss_source.txt' DELIMITED BY '$#$\x09' ESCAPES OFF QUOTES OFF ROW DELIMITED BY '$#$\x0D\x0A';

Line 1 in Figure 9 specifies the target table name to be loaded, and line 2 specifies the filespec of the input file.

Tip: The forward slashes "/" in a string literal are interpreted as backslashes "\" when SQL Anywhere processes a filespec on Windows. This usage avoids any confusion with SQL Anywhere's interpretation of the backslash as an escape character in string literals; i.e., \n is usually interpreted as a new line character whereas /n is not. Note that this interpretation of "/" as "\" is a special trick that only applies to filespecs; in other contexts, a forward slash is treated as a forward slash.

Line 3 in Figure 9 is SQL Anywhere's syntax for a column separator string that agrees with the SQL Server syntax shown on line 5 in Figure 6: a dollar sign, number sign, dollar sign and a tab character represented by the hexadecimal escape string \x09.

Line 4 tells SQL Anywhere that SQL Server doesn't transform special characters into escape strings using the backslash character; e.g., if the input string contains a tab character, it will contain the actual single tab character rather than the hexadecimal escape string \x09.

Line 6 is SQL Anywhere's syntax for a row terminator string that agrees with the SQL Server syntax shown on line 6 in Figure 6: a dollar sign, number sign, dollar sign followed by a carriage return and line feed represented by the hexadecimal escape strings \x0D and \x0A.