Mater artium necessitas

Posts filed under 'MSAccess'

In this post, we talk about Unicode, about Strings, about how they sometimes become unexpectedly mangled when you pass them around the system and what we can do about it.
We’ll see some Chinese ideograms, peek into memory and display some nonsense on screen in the process.

VBA likes Unicode

Internally, VBA handles strings in Unicode encoded as UTF-16.
This means that each ‘character’ of the string uses at least a 16 bit WORD (2 bytes) of memory (not entirely true, but let’s keep it simple).

Below is the proof; it shows the memory content of a VBA string variable that contains the following sentence:

Chinese or the Sinitic language(s)
(汉语/漢語, pinyin: Hànyǔ; 中文, pinyin: Zhōngwén)
is the language spoken by the Han Chinese

If you see squares in place of the Chinese ideograms, then you are probably using an old version of Internet Explorer and do not have the Asian fonts installed: simply use the latest versions of Firefox or Chrome if you want to display them correctly (although you can still go through this post without seeing the exact glyphs).

In memory, the layout of this VBA string is:

Each character of our string occupies 2 bytes: our first letter C whose hexadecimal code 0043 is located at address 0EED7974, then, 2 bytes later at address 0EED7976, we have our second character, h whose code is 0068 and so on.
Note that Windows is a Little Endian architecture, so the lowest significant byte 68 appears first in memory, followed by the most significant byte 00 at the next address.

On the right side, the map displays strange characters in place of the Chinese ones because the memory map interprets each byte as a simple ASCII character, but you can see for instance that the ideogram 中 is located at address 0EED79EA and that its code is 4E2D .

The character codes we see here are simply the ones that would be returned by the AscW() function in VBA (more or less).

VBA doesn’t always like Unicode

The thing though, is that VBA considers the outside world to be ANSI, where characters take (generally) 1 byte and strings must be interpreted according to a Code Page that translate the character code into a different visual representation depending on the encoding.

This means that if you are Greek, your system’s code page will be 737 and you will see Ω (the capital letter Omega) if a string contains the hexadecimal code 97 whereas on a system set for western languages, including English, you will see ù: different representations of the exact same character code.

This nightmare of ANSI encoding made it very hard to pass around strings if you did not know which code page was associated with them. Even worse, this makes it very hard to display strings in multiple languages within the same application.

The Office VBE IDE editor is one such old beast: it only support the input and display of characters in your current system’s Code Page and it will simply display garbage or ? placeholders for any characters it can’t understand.

Fortunately, we’re not really living in an ANSI world any more. Since the days of Windows NT and Windows XP SP2, we’ve been able to use Unicode where each possible character, glyph, symbol has its own code point (things can be quite complicated in Unicode world as well, but that’s the topic for another post).

Unfortunately, VBA still inherits some old habits that just refuse to die: there are times when it will convert your nice Unicode strings to ANSI without telling you.

To make the situation worse, most of the VBA code on the Internet was written by English speakers at a time when Unicode was just being implemented (around 2000).
The result is that even Microsoft’s own examples and Knowledge Base articles still get things wrong and copying these examples blindly will probably make you, your customers and users very unhappy at some point.

To understand this unwanted legacy, and the solutions to avoid these problems, we’ll go through a concrete example.

Message in a box

Let’s study a simple case: display our previous message using the MessageBox Windows API.
The result we want to achieve is this:

Note that in order to display Chinese ideograms, your system must have the proper fonts installed. If not, you’ll end up with little boxes where the glyphs should be displayed.

What we are talking about here is not limited to Asian languages though: it concerns all languages, including English if you ever include any symbols that’s outside of extended ASCII, like the Euro €, mathematical symbols, words in other alphabets like Greek, phonetics, accents, even Emoji (emoticon) symbols.

The GetUnicodeString() function simply returns a string containing the text we want to display. It uses a function UnEscStr() from a previous article to convert escaped Unicode sequences into actual characters.

What we obtain is not what we expected:

All Chinese characters have been replaced by ? placeholders.

Ok, so, if we look again at the signature of the function, we’re calling MessageBoxA. What does this A tacked at the end means?
The Windows API has generally 2 version of functions that handle strings: the old ANSI version, ending with a A, and the Unicode version, ending with a W, for Wide-Characters.

Our first tip is: whenever you see a WinAPI declaration for a function ending in A, you should consider the W version instead.
Avoid A WinAPI functions like the plague!

Get it right: MessageBoxW

So, now we know that we must use MessageBoxW for Unicode, the API documentation says so as well, so we must be on the right track.

The only change is the W in the function name and we’ll call the Unicode version.
Let’s try to open that box again:

Dim s As String
s = GetUnicodeString()
MessageBoxW 0, s, "test", 0

And this displays:

Oh dear.
Looks like Chinese, but it certainly isn’t what we expected. Here we’ve just got Mojibake, garbage due to incorrect processing of Unicode.

What could be wrong? We did pass a proper Unicode string, so what happened?

Well, VBA happened: whenever you pass a String as a parameter in a Declare statement, VBA will convert it to ANSI automatically.

Let me repeat this: when you see a Declare statement that uses As String parameters, VBA will try to convert the string to ANSI and will likely irremediably damage the content of your string.

I’ll show you what happens by going back to our memory map. This is the content of memory before we call MessageBoxW:

This is the memory of the same string after the call to MessageBoxW:

The string has been converted to ANSI, then converted back to Unicode to fit into 2-byte per character but in the process all high Unicode code points have been replaced by ? (whose code is 003F).

Note that the memory location of the string before and after the call are different. This is normal since the string was converted, so another one was allocated (sometimes they end-up occupying the same memory though it’s probably an optimisation of the interpreter).

So our string was actually returned modified to us; we can’t even rely on VBA to keep our original string intact…

Get it right, use pointers!

So, it’s clear now, whenever VBA sees that you are passing a String in a Declare statement, it will silently convert that String to ANSI and mangle any characters that doesn’t work in the current Code Page set for the system.

To avoid that unwanted conversion, instead of declaring the parameters as String, we will declare pointers instead so that our final declaration will be:

What we did here is simply define MessageBoxU as an alias for MessageBoxW which we know is the right API function.
Then we replaced the declaration for ByVal lpText As String and ByVal lpCaption As String, by pointers values: ByVal lpText As LongPtr and ByVal lpCaption As LongPtr.

From the point of view of the MessageBoxW function, in both instances it will receive a pointer, except that instead of a String, itwill receive the value of a LongPtr, which is functionally identical.

Now, when using our new MessageBoxU function, we also need to pass the value for the pointers to the strings explicitly:

The StrPtr() function has been part of VBA for a while now but it is still not documented. It simply return the memory address of the first character of a string as a LongPtr (or Long versions of Office older than 2010).

There are other functions to get the address of a variable: VarPtr() returns the memory address of the given variable while ObjPtr() returns the memory address of an object instance. We’ll have to talk more about those in another post.

So, now, the result:

Hurray! We did it! No unexpected conversion!
If you looked at the memory map, there would be nothing really interesting to see: the string would have stayed the same throughout the call and would be untouched: no conversion, no copy.

Not only did we manage to make call to API functions work, they are also faster because VBA doesn’t have the overhead of these silly Unicode-ANSI-Unicode conversions every time we use the API function.

Parting words

The point of all this is simply to remember the following when involving Strings in Win API calls:

Do not use the A version of API calls, always use the W version instead.

This is also relevant for some of the built-in VBA functions:
use ChrW$() over Chr() and AscW() over Asc(), both W version handle Unicode characters and are about 30% faster in my tests (see KB145745 for more information).

VBA will mess-up and convert to ANSI all strings parameters in a Declare statement.

To avoid this mangling, use pointers instead of Strings in your declarations:
never declare As String in an API Declare statement. Always use ByVal As LongPtr instead and then use StrPtr() in your calls to pass the pointer to the actual string data.

Similarly, when passing a User-Defined-Type structure that contains Strings to a Declare statement parameter, declare the parameter as ByVal As LongPtr and pass the UDT pointer with VarPtr().

Remember that the VBE IDE cannot process and display Unicode, so don’t expect your debug.print to display anything other than ? for high code point characters.

Well, the dragon will probably not be visible, even in your browser it’s probably not well supported by your installed fonts, but at least you should see the Euro symbol.

.Net strings

It’s no secret that strings in .Net have a lot more capabilities when it comes to building and formatting.
You can do everything in VBA as well, but things tend to be more difficult to read and compose. In .Net, strings can be formatted in lots of different ways, including creating columns of fixed-length text, padding, aligning text to the left or the right, specifying precision for numbers, formatting dates, etc.

Strings can also contain special character literals to insert newlines or tabs for instance and special Unicode escape sequences meant to represent characters that would otherwise be impossible to include in a string by hand.

Character literals and Unicode escape sequences

Before we go deeper into how things works, let’s go through some of the helper functions you will find in the attached code.

The StringUtils.bas module contains the following function that takes a string that may contain any .Net special character literals and Unicode escape sequences and returns a string that will contain the un-escaped version:

UnEscStr(str As Variant) As String

For instance, to include new lines in our strings, just include the \n literal. Adding a tab is simply a matter of using \t.
Note that if you want a backslash \ in your string, you will have to escape it to \\ otherwise the function will interpret the following characters in a way you may not expect.

You can also encode any Unicode character using its Unicode code-point expressed in hexadecimal (for instance, the code-point U+0041 represents the character A).

VBA uses UTF-16 for its string encoding, meaning that each character in a string is encoded using 16 bits and should therefore be capable of containing any possible glyph in the world.

However, to include Unicode values in strings is difficult in VBA:

You can’t just type the character in the string: the IDE is a remnant of the past and uses Extended ASCII only. Anything that cannot be displayed in the IDE will come out as ?.

You can use ChrW$() to include Unicode in your strings, but it is quite verbose and ends-up an ugly mess as you have to cut your string in bits to insert Unicode characters:

As a more elegant solution, you can use Unicode escape sequences directly in the strings and they will be converted through the UnEscStr() function.

To properly include a Unicode code-point in your string, you must follow some rules:

If the code-point is between U+0 and U+FFFF, use the \uXXXX sequence (lower case u followed by 4 hex digits).
For instance to encode the € sign whose code-point is U+20AC, just include \u20ac in your string.

If the code-point is above 0+FFFF, you must use the \UXXXXXXXX (upper case U followed by 8 hex digits).
For instance, to encode the dragon symbol whose code-point is U+1F409, you will have to use \U0001F409 in your string.

There is a third way to include a code-point sequence by using the \x notation.
This is similar to \u but you do not have to use exactly 4 hex digits, you can use 1, 2, 3 or 4.
For instance, instead of encoding the letter A as \u0041, you can simply use \x41.
However, since this notation accepts a variable number of digits, you have to be careful: the string \x27Effect\x27 will not print 'Effect', because E and f are considered hex digits, you will end up with something that will look like )fect' instead.
Better use the \u encoding, it’s what you’ll need in 99% of the cases anyway.

String formatting

The StringUtils.bas module also contains an implementation of the .Net String.Format() method:

sFormat(frmt As String, ParamArray args() As Variant) As String

In .Net, string formatting uses {x} place-holders inside the string itself. The place-holder {0} refers to the value of the first parameter, {1} to the value of the second, and so on.
Place-holders can also be used to define how the value should be formatted:

Having the most common method as a default makes things a bit cleaner, I think.

The thing to remember, is that in our example, s is an instance of the class, it is not a string, it’s an object, so once you have completed your string composition, you must call ToString() to get the actual VBA string back.

The class is fairly flexible, allowing you to insert values in each formatted string or, as shown above, by adding the parameters in the call to ToString():

You can find out the current length of the string with the Length property.

Be careful that Replace() is case sensitive since all .Net operations are case sensitive.
So Replace("line", "Row") is not the same as Replace("Line", "Row").

Using .Net strings in VBA

Using .Net from VBA can be a challenge depending on what part of .Net you want to use and how you want to use it from VBA.

The .Net String object itself is not accessible from VBA. You can’t create an instance of it directly from VBA and when you use an existing instance of a .Net string, it gets converted to a VBA string immediately.

This .Net class is the workhorse behind the implementation of the StringBuilder class and the sFormat function in VBA.
Thanks to this easily instantiable object, we can use some of the power of .Net in our VBA string at no cost.

Some performance information

Of course, doing all that processing has a cost, although in the vast majority of cases, it should not be noticeable.

I timed the use of UnEscStr above in the string that contains Chinese Unicode characters.
The call takes 0.010ms on my machine (a 4 year old i7)

A call to sFormat similar to the example given in this article took about 0.10ms.

This timing depends a lot of course on the complexity of the string, the type of formatting, the amount of literal characters an Unicode sequences to convert, etc.

So if you are formating 10,000s of strings, or un-escaping 100,000s or strings the processing time will probably add up to be noticeable but in most use cases, even if you have a few 1,000s of strings, processing will not impact performance in any noticeable way.

Code Updates

v1.0: 12SEP2013
Original release.

Code download

The 2 files contained in the archive below are all you need. Actually, if you are only interested in the sFormat() or UnEscStr() function you only need to import the StringUtils.bas module.

However, if you want to use the StringBuilder class, you must import both files.

To import the files, just open your Access application, open the IDE then, from the menu, File > Import File, and select the StringUtils.bas, then repeat for StringBuilder.cls.
That’s all there is to it.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Free for re-use in any application or tutorial providing clear credit is made about the origin of the code and a link to this site is prominently displayed where end-users can easily access it.

I’ve put off writing about x64 MS Office because I was a bit lazy and hadn’t build the necessary environment to test it properly.

In my day-to-day job -as far as Access is concerned-, I have so far targeted the 32 bit (x86) version of Office 2010 but the 64 bit version is bound to become more popular with time and some people already have to use it, even if sometimes their reasons are misguided.

When should I care?

If your code has any chance of being run on Office 2010 and later, and you want to avoid issues when one of your users or customers will have Office x64 (or just one of its components) installed, then you should make sure to use your code uses the new VBA7 constructs.

The changes to be performed only affect the parts of your code that use Win API calls or manipulate Windows control handles and memory pointers.

For a small fraction of applications, actually using Access/Excel x64 could make sense as it allows you to deal with 64 bit integers (as opposed to the standard 32 bit long). While this could be a useful feature, code that use these 64 bit integers will not compile on a 32 bit version of Office.

If you use external libraries and controls

If you are using external libraries and ActiveX controls, it’s unlikely that they will only work in Office x64.
The reason is that most libraries (especially older ones) are only available in 32 bits.

For these to work under Office x64, they would have to be compiled especially for 64 bit as well.

completely control the user environment an ensure it will always run Office x86 (32 bit).

you can compile or get 64 bit versions of these libraries and ActiveX components so you can build a version of your application that will work for all your users.

If your code targets Access 2010 and later

There are 2 changes that may affect your code: external library declarations and usage of the new pointer and very large integer types.

Library declarations:

Any declaration for external libraries (such as Win API or code that resides in DLLs), the declaration statement has changed and must use the new PtrSafe keyword to tell the compiler that the declaration should be interpreted in a 64 bit context when relevant. If you omit the keyword, your declarations will always be interpreted as being in a 32 bit context and the results may be random.

For instance, the old declaration for the SetFocus Win API function was:

You will need to replace declarations that use long with the more appropriate type:

LongPtr
Use the new LongPtr wherever you deal with declarations that relate to Windows control handles or memory locations.

LongLong
Defines a 64 bit integer.
The reason for LongLong is mainly to access some 64 bit versions of the Windows API, in particular, for math functions.
This type only exists if you are running under a 64 bit version of Access. On 32 bit systems, it will cause a compile error. To avoid that problem where your code needs to run on both 32/64 bit platforms, you can use the new conditional constant Win64:

The compiler will only see and compile one code path and will ignore the other, so doing this does not affect the speed of your application.

If your code must also work in Access 2007 and earlier

Older versions of Access will not be able to compile the new PtrSafe, LongPtr and LongLong keywords introduced by VBA7.
If your code must also work in older versions of Access, you must either create 2 versions of your application (not very convenient), or simply use the new conditional constant VBA7 to separate blocks of code that will run on Office 2010 and later from blocks of code that will run on Office 2007 and ealier.

The new VBA7 conditional constant will evaluate to True if it is running under Office 2010 and later.

So now you can create variations of your code that will work everywhere:

Adapting old code

When going through old code, you can’t just add PtrSafe to all API declarations and do a search to replace Long into LongPtr. That would be the surest way to get everything wrong.

Instead, you must be very careful about the intent of the code and understand what your variables are manipulating: is that long really just a number or is it mean to be a memory location or a control handle ?

To help you, at least as far as the most common Win API declarations are concerned, Microsoft has made available a text file with hundreds of such declarations compatible for Office x64: Win32API_PtrSafe with 64-bit Support

If you need to use a WinAPI call that isn’t in the list, you will need to have a look at its actual definition and, based on the type of data being passed around, decide if you need to use a simple Long or a LongPtr.

For instance, I had to use the MonitorFromWindow API function. Its C++ definition is as follow:

HMONITOR MonitorFromWindow(
_In_ HWND hwnd,
_In_ DWORD dwFlags
);

The definition tells me:

That the function is taking a Handle (basically a pointer) to a Window and some flags

and that the function returns a Handle to the display monitor (basically a pointer).

So the API definition that will work on all versions of Access is:

#If VBA7 Then
' Works in 32/64 bits of Office 2010 and later
Declare PtrSafe Function MonitorFromWindow Lib "user32" _
(ByVal hwnd As LongPtr, ByVal dwFlags As Long) As LongPtr
#Else
' Works in Office 2007 and earlier
Declare Function MonitorFromWindow Lib "user32" _
(ByVal hwnd As Long, ByVal dwFlags As Long) As Long
#End If

When using the external function, you also need to be careful in the code and use the VBA7 conditional to separate the parts where you access structure pointers:

Make sure you save, re-compile and test your project as you go along. Also make frequent backups of your application as a wrong declaration could crash and corrupt it.
It may even be possible to crash the whole machine, although I have not experienced that so far.

Conclusion

Updating old code to make it work properly on all versions of Access is not difficult. It is usually fairly easy, although it can be a bit cumbersome at times.

The trick is to know what your code really needs to target:

Office 2010 and later, the present and the future, in which case everything is straightforward and you use the new VBA7 keywords PtrSafe, LongPtr and LongLong where appropriate.

All versions of Access, in which case you will have sometimes to separate and maintain 2 versions of your code using the VBA7 conditional.

(Updated Saturday 10DEC2011.) On 28th of June 2011, Microsoft Office 2010 Service Pack 1 and the Access 2010 Runtime Service Pack 1 were issued.

After upgrading my development machine (Win7 x64) and a few clients (Windows 2008R2 x64) to SP1 (x86), I started to get strange issues:

I use .Net libraries from my Access application and suddenly, even when not instantiating any .Net objects, Access would crash, usually on startup, but sometimes when opening the VBE.
Decompiling and re-compacting the database would be OK, usually once, but the problem would reappear the next time I would restart the application.

In the Runtime, I would get strange errors, such as The setting your entered isn’t valid for this property, or Action Failed Error Number: 2950, or Runtime Error 3270 Property not found.
The strange thing about these errors is that they would occur in places that had not been modified for many releases of our application, parts that have been running without problem until now.

Another weird issue was the systematic reset of our custom ribbon to its first tab. this could happen randomly, but most it could also be reproduced by simply opening a report as a tab page (that fill-in the whole MDI window). When closing that form, the first tab of the ribbon would select itself automatically. This wasn’t happening when closing pop-up windows.

After removing the Office and Runtime Service Pack 1, everything went back to normal.

For a String type, the string can have no content at all: "", vbNullString.

For a Variant type, the string can have any of the following attributes or values:

it can be Missing if the variable is an unused optional parameter,

it can be Empty if it was never assigned,

it can be Null if, for instance it’s bound to a nullable field or unbound with no value,

it can be an empty string "", vbNullString.

When having to check these variables in code, it can be tiresome to have to go through testing some of these possibilities just to find out that your variable does or not not contains something useful, regardless of the type of variable you are using.

To avoid having to do all these tests, make the code a bit more tidy and allow me to move on to more important things, I use this small utility function quite often:

So now I don’t have to worry so much about the type of the variable I’m testing when I want to know if it contains useful data:

...
' Here assume that CustomerReference is a control on a form.
' By using IsBlank() we avoid having to test both for Null and empty string.
If IsBlank(CustomerReference) Then
MsgBox "Customer Reference cannot be left blank."
End If
...

Obviously, IsBlank() doesn’t replace the other tests but I found it to be more straightforward to use in most cases.

I have been spending a lot of time trying to find out why some of the code used to insert new records into a linked SQL Server table would systematically fail with an error:

Run-time Error '3155' ODBC--insert on a linked table failed

It was driving me mad.
I could insert a simple record using SQL Server Management Studio, I could add new records to the table in datasheet mode within Access, but as soon as I tried to insert a record from code, whether using DAO recordset or executing the same SQL INSERT, it would miserably fail.

After a fair bit of investigation and tests, of which you can read the full account on the question I asked on StackOverflow, it turns out that this is a long-standing bug in the ODBC Driver (or Access).

Memo fields in Access are usually translated into nvarchar(MAX) in SQL Server by tools like SSMA.
Unfortunately, when you link tables having these fields using the SQL Server Client driver, these fields get incorrectly interpreted as string, even though they appear ok from the table design view.
It’s only if you try to insert something into the field, either text larger than 255 chars or NULL, that you get the error message.

So, the solution, at least in this case, is to revert to the older SQL Server ODBC driver instead, or use varchar() instead of nvarchar(), but if you’re dealing with Unicode, you have to stick with nvarchar().

In the perspective of upsizing my current Access 2007 application, I have been trying to understand a bit more about the possible performance impact of various choices of Primary Keys.

My problem is that currently, the Access application uses autoincrement numbers as surrogate Primary Keys (PK).
Since I will need to synchronise the data over multiple remote sites, including occasionally disconnected clients, I can’t use the current autoincrement PK and will need to change to GUID.

To see for myself what could be the impact, I made a series of benchmarks.
This first part is fairly simple:

For the table using a GUID, we use the NewSequentialID() instead of NewID() to create new keys. This is supposed to offer much better performance as the generated GUIDs are guaranteed to be sequential rather than random, resulting in better index performance on insertion.

For the Access version of the tables, we basically use the same definition, except that we used 4 tables:

Basically, we perform 1000 transactions each inserting 1000 records into the table ProductGUID or ProductINT.

Access 2007 Test code

To duplicate the same conditions, the following VBA code will perform 1000 transactions each inserting 1000 records.
Note that the recordset is opened in Append mode only.
The importance of this will be discussed in another article.

' Run this to inset 1,000,000 products in batches of 1000
' In the given table
Public Sub Benchmark(TableName as String, InsertSeqGUID as Boolean)
Dim i As Integer
For i = 1 To 1000
Insert1000Products TableName, InsertSeqGUID
Next i
End Sub
' Insert 1000 products in a table
Public Sub Insert1000Products(TableName as String, InsertSeqGUID as boolean)
Dim i As Long
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim ws As DAO.Workspace
Dim starttime As Long
Dim timespan As Long
Set ws = DBEngine.Workspaces(0)
DoEvents
starttime = GetClock ' Get the current time in ms
ws.BeginTrans
Set db = CurrentDb
Set rs = db.OpenRecordset(TableName, dbOpenDynaset, dbAppendOnly)
With rs
For i = 1 To 1000
.AddNew
If InsertSeqGUID Then !ID = "{guid {" & CreateStringUUIDSeq() & "}"
!SKU = "PROD" & i
!Description = "Product number " & i
.Update
Next i
End With
ws.CommitTrans
rs.Close
timespan = GetClock() - CDbl(starttime)
Set rs = Nothing
Set db = Nothing
' Print Elapsed time in milliseconds
Debug.Print timespan
DoEvents
End Sub

ProductGUIDRandom table: we let Access create the Random GUID for the primary key.

ProductGUIDSequential: we use the Windows API to create a sequential ID that we insert ourselves.

Test results

Without further ado, here are the raw results, showing the number of inserted record per second that we achieve for each test over the growing size of the database (here are only shown tests comapring Sequantial GUID and Autoincrement on SQL Server and Access, see next sections for the other results):

What we clearly see here is that performance when using autoincrement and Sequential GUID stays pretty much constant over the whole test.
That’s good new as it means that using Sequential GUIDs do not degrade performance over time.

As a side note, in this particular test, Access offers much better raw performance than SQL Server. In more complex scenarios it’s very likely that Access’ performance would degrade more than SQL Server, but it’s nice to see that Access isn’t a sloth.

Using Sequential GUID vs Autoincrement in Access

The results show that we do take a performance hit of about 30% when inserting Sequential GUID vs just using autonumbers.
We’re still getting good results, but that’s something to keep in mind.

In terms of CPU consumption, here is what we get:

Random PK, whether they are simple integer or GUID do consume substantially more CPU resources.

Using Sequential GUID vs Identity in SQL Server

Out-of-the box, SQL Server performs quite well and there is not much difference whether you’re using Sequential GUIDs or autoincrement PK.

There is however a surprising result: using Sequential GUIDs is actually slightly faster than using autoincrement!

There is obviously an explanation for this but I’m not sure what it is so please enlighten me 🙂

CPU Consumption:

Using Random GUID vs Sequential GUID vs Random Autonumber in Access

So, what is the impact of choosing a Sequential GUID as opposed to letting Access create its own random GUIDs?

It’s clear that random GUIDs have a substantial performance impact: their randomness basically messes up indexing, resulting in the database engine having to do a lot more work to re-index the data on each insertion.
The good thing is that this degradation is pretty logarithmic so while it degrades over time, the overall performance remains pretty decent.
While GUIDs are larger than Random Integers (16 bytes vs 4 bytes) the actual performance of inserting records whose PK is a random integrer is actually worse than random GUID…

Provisional conclusions

Here we’ve check the baseline for our performance tests.
In the next article, we’ll look exclusively at the performance of inserting data from a remote Access 2007 front end using our VBA code.

Having this baseline will allow us to check the performance overhead of using ODBC and letting Jet/ACE manage the dialogue with the SQL Server backend.

Feel free to leave your comments below, especially if you have any resources or information that would be useful.

Updates

When you start building an Access application, it’s tempting to just think about today’s problem and not worry at all about the future.
If your application is successful, people will want more out of it and, over time, you’ll be faced with the task of moving the back-end database to a more robust system like SQL Server.

Naming conventions

Access is pretty liberal about naming conventions and it will let you freely name your tables, columns indexes and queries.
When these get moved to another database you’ll most probably be faced with having to rename them.
In some cases, you could actually create subtle bugs because something that used to work fine in Access may be tolerated in the new database but be interpreted differently.

Do not use spaces or special characters in your data object names.
Stick to characters in the range A through Z, 0 to 9 with maybe underscores _ somewhere in between (but not at the start or the end).
Also try to respect casing wherever you reference this name (especially for databases like MySQL which are case-sensitive if the hosted on a Linux platform for instance).
eg:Customer Order Lines (archive) should be CustomerOrderLines_Archive.Query for last Year's Turnover should be QueryLastYearTurnover.
Index ID+OrderDate should become instead ID_OrderDate.

Do not use keywords that are reserved or might mean something else whether they are SQL keywords or functions names:
A column called Date could be renamed PurchaseDate for instance.
Similarly, OrderBy could be renamed SortBy or PurchaseBy instead, depending on the context of Order.
Failing to do so may not generate errors but could result in weird and difficult to debug behaviour.

Do not prefix tables with Sys, USys, MSys or a tilde ~.
Access has its own internal system tables starting with these prefixes and it’s best to stay away from these.
When a table is deleted, Access will often keep it around temporarily and it will have a tilde as its prefix.

Do not prefix Queries with a tilde ~.
Access use the tilde to prefix the hidden queries kept internally as recordsource for controls and forms.

Database design

Always use Primary keys.
Always have a non-null primary key column in every table.
All my tables have an autonumber column called ID. Using an automatically generated column ID guarantees that each record in a table can be uniquely identified.
It’s a painless way to ensure a minimum level of data integrity.

Do not use complex multivalue columns.
Access 2007 introduced complex columns that can record multiple values.
They are in fact fields that return whole recordset objects instead of simple scalar values. Of course, this being an Access 2007 only feature, it’s not compatible with any other database.
Just don’t use it, however tempting and convenient it might be.
Instead use a table to record Many-To-Many relationships between 2 tables or use a simple lookup to record lists of choices in a text field itself if you’re only dealing with a very limited range of multivalues that do not change.

Do not use the Hyperlink data type.
Another Access exclusive that isn’t available in other databases.

Be careful about field lookups.
When you create Table columns, Access allows you to define lookup values from other tables or lists of values.
If you manually input a list of values to be presented to the user, these won’t get transferred when upsizing to SQL Server.
To avoid having to maintain these lookup lists all over your app, you could create small tables for them and use them as lookup instead; that way you only need to maintain a single list of lookup values.

Be careful about your dates.
Access date range is much larger than SQL Server.
This has 2 side-effects:
1) if your software has to deal with dates outside the range, you’ll end-up with errors.
2) if your users are entering dates manually, they could have made mistakes when entering the year (like 09 instead of 2009).
Ensure that user-entered dates are valid for your application.

VBA

While most of your code will work fine, there are a few traps that will bomb your application or result in weird errors:

Always explicitly specify options when opening recordsets or executing SQL.
With SQL Server, the dbSeeChange is mandatory whenever you open a recordset for update.
I recommend using dbFailOnError as well as it will ensure that the changes are rolled back if an error occurs.

Get the new autonumbered ID after updating the record.
In Access, autonumbered fields are set as soon as the record is added even if it hasn’t been saved yet.
That doesn’t work for SQL Server as autonumbered IDs are only visible after the records have been saved.

' Works for Access tables only
' We can get the new autonumber ID as soon as the record is inserted
rs.AddNew
mynewid = rs!ID
...
rs.Update
' Works for ODBC and Access tables alike
' We get the new autonumber ID after the record has been updated
rs.AddNew
...
rs.Update
rs.Move 0, rs.LastModified
mynewid = rs!ID

Never rely on the type of your primary key.
This is more of a recommendation but if you use an autonumbered ID as your primary key, don’t rely in your code or you queries on the fact that it is a long integer.
This can become important if you ever need to upsize to a replicated database and need to transform your number IDs into GUID.
Just use a Variant instead.

Parting thoughts

These simple rules will not solve all your problems but they will certainly reduce the number of issues you’ll be faced with when upsizing you Access application.
Using a tool like SSMA to upsize will then be fairly painless.

If you have other recommendations, please don’t hesitate to leave them in the comments, I’ll regularly update this article to included them.

I’m currently researching ways to move my main MS Access application from a simple local network client/backend setup to a global, multiple remote sites configuration using SQL Server.

One of the challenges is to upsize the current MS Access accdb backend database to SQL Server 2008.
If you try it from Access itself using the Upsizing Wizard, you may end up getting this error message:

The Upsizing Wizard only works with Microsoft SQL Server (Versions 6.50 SP5 or higher). Please log in to a SQL Server data source.

After spending some time fiddling around with SQL Server settings I couldn’t understand why I was still getting this error.
Turns out that the upsizing wizard is apparently sensitive to the version of SQL Server you’re using and it doesn’t consider SQL Server v10 (2008) as being later than v6.50…

This issue is in fact a blessing.
Microsoft provides a migration tool for upsizing MS Access database to SQL Server 2008 that’s orders of magnitude better than anything the basic wizard can do: the SQL Migration Assistant for Access, or SSMA.

SSMA lets you take a bunch of Access databases and move the tables and queries you choose to SQL Server, automatically linking them in your original database if you want.
It’s not just a one-off thing either: SSMA keeps track of the objects that where transferred and allows you to synchronise both schema and data as often as you need.

So here you are: do not use the basic MS Access Upsizing Wizard, download and use SSMA instead.

Strange COM Error

While SSMA works perfectly fine on my Windows 2008 x64 laptop, on my main Windows XP desktop it throws an exception when trying to load an Access database:

It was a COM error saying that the library for DAO couldn’t be loaded.

I couldn’t find any relevant information on the web.
After a while, I had a look at the DAO driver inC:\Program Files\Common Files\Microsoft Shared\DAO\dao360.dll
and I noticed that the filename was in blue: that reminded me that I had set compression on the filesystem.

I disabled compression for the file and, magically, SSMA worked again…

Moral of the story: be careful about compressing your filesystem, some registered libraries and system files may work in unpredictable ways…

My trusty old gigantic Sony Vaio is about 4 years old. It served me well and still works but it’s about to become my main development machine for the next couple of months and I can’t afford to have it die on me during that time.
It was time to get something as gigantic and more up-to-date in terms of technology.

I use VMware on my main desktop to keep multiple OS setups that match typical configurations of my customer’s machines.
This allows me to test my software before deployment and make sure everything works as expected.
It saved me many times from strange bugs and I would consider these final tests to be a mandatory step before deployment.
My old trusty vaio would be hard pressed to run any of these without slowing down to a crawl.

I looked at some possible replacements. Initially I checked Lenovo’s offerings but they don’t seem to offer anything in large screen size (WUXGA 1920×1200) (Note, actually, they have, but not really for me).
Dito for Dell, not counting their humongous XPS M1730 luggable gaming machine that was wayyy over the top as a work computer, not to mention probably heavier than its volume in pure gold.

On a hint from a friend I checked out Apple’s online store and saw they had a nice Macbook Pro configuration. I went to check it out in the retail store close to my office and they had that exact specification in stock, so, in what must have been the highest rated expense/time-to-think ratio of any decision I ever took, well, I bought it…

The spec, some bragging rights:

Macbook Pro 17″

Core Duo T9500 2.6GHz processor

nVidia 8600M GT 512MB graphics card

200GB 7200rpm drive

Kingston 4GB DDR2 667MHz RAM

Hi Resolution 17″ 1920×1200 glossy screen

It’s a very nice machine, Apple knows how to make nice hardware, there is no question there.
OSX has some cool features, some of them still a bit foreign to me and some minor annoyances are creeping up, like Thunderbird’s not picking up my system date and time settings and displaying the date in the wrong format (a pet peeve of me), probably not Apple’s fault but annoying nonetheless.
So far so good and while I don’t mind using OSX for my browsing, email and creative stuff, that machine is meant to be running Windows Server 2008 x64 as a development platform.

Why Windows Server 2008 x64?

Well, it has some excellent features, a smaller footprint than Vista, all the aero eye candy, is apparently noticeably faster than Vista and has none of the nagging security prompt (you are considered administrator though, so keeping safe is entirely up to you).
The 64 bit version can also address the full 4GB of RAM without limitation and all server features are optionally installable.
By default, the installation is actually pretty minimal and you have to set services and options to get Windows configured as a proper workstation. It is after all, meant to be a server.
Oh, I almost forgot that there is also support for HyperV, although you must make sure you download the right version (if you list all available downloads in your MSDN subscription, you’ll see some that are explicitly without that technology).

Installing Windows Server 2008 x64 is remarkably easy.

You’ll need to repackage the ISO as it won’t work properly (something to do with non-standard file naming options).
It’s fairly easy if you follow the instructions from Jowie’s website(cached version): you can get the ImgBurn software for free as well, which is a good find in itself. It should’t take more than 30 minutes to repackage the DVD.

In OSX, go to Applications > Utilities > Boot camp and follow the instructions on screen.
You will be able to resize the default partition by just moving the slider. I left 60GB for OSX and allocated the rest to Windows. The good thing is that OSX can read Windows partitions, so you can always store data there. Windows however, can’t read the HFS+ mac file system, although there are some third-party tools that can do it [1][2][3].

Insert your repackaged DVD and Bootcamp will have rebooted the machine.
After a few minutes of blank screen (and no HDD activity light to let you know something is happening), windows setup launches.

You will be then prompted with the choice of partition to install to.
Select the one named BOOTCAMP, then click the advanced options button and click format.
From there one, windows will install everything, then reboot, then carry on installing, then reboot one last time.

In the Control Panel > System > Advanced System Settings > Advanced > Settings > Advanced > Processor scheduling, set to Programs instead of Background services.

Activate your copy of Windows using Control Panel > System.
I was getting an error code 0x8007232B DNS name does not exist error. To force activation, just click on the Change Product Key button and re-enter the same key you used during install.
Windows will activate straight away.

When booting your Macbook, press the Option key and you will be presented a list of boot choices.