In the perspective of upsizing my current Access 2007 application, I have been trying to understand a bit more about the possible performance impact of various choices of Primary Keys.

My problem is that currently, the Access application uses autoincrement numbers as surrogate Primary Keys (PK). Since I will need to synchronise the data over multiple remote sites, including occasionally disconnected clients, I can’t use the current autoincrement PK and will need to change to GUID.

To see for myself what could be the impact, I made a series of benchmarks. This first part is fairly simple:

For the table using a GUID, we use the NewSequentialID() instead of NewID() to create new keys. This is supposed to offer much better performance as the generated GUIDs are guaranteed to be sequential rather than random, resulting in better index performance on insertion.

For the Access version of the tables, we basically use the same definition, except that we used 4 tables:

Basically, we perform 1000 transactions each inserting 1000 records into the table ProductGUID or ProductINT.

Access 2007 Test code

To duplicate the same conditions, the following VBA code will perform 1000 transactions each inserting 1000 records. Note that the recordset is opened in Append mode only. The importance of this will be discussed in another article.

' Run this to inset 1,000,000 products in batches of 1000
' In the given table
Public Sub Benchmark(TableName as String, InsertSeqGUID as Boolean)
Dim i As Integer
For i = 1 To 1000
Insert1000Products TableName, InsertSeqGUID
Next i
End Sub
' Insert 1000 products in a table
Public Sub Insert1000Products(TableName as String, InsertSeqGUID as boolean)
Dim i As Long
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim ws As DAO.Workspace
Dim starttime As Long
Dim timespan As Long
Set ws = DBEngine.Workspaces(0)
DoEvents
starttime = GetClock ' Get the current time in ms
ws.BeginTrans
Set db = CurrentDb
Set rs = db.OpenRecordset(TableName, dbOpenDynaset, dbAppendOnly)
With rs
For i = 1 To 1000
.AddNew
If InsertSeqGUID Then !ID = "{guid {" & CreateStringUUIDSeq() & "}"
!SKU = "PROD" & i
!Description = "Product number " & i
.Update
Next i
End With
ws.CommitTrans
rs.Close
timespan = GetClock() - CDbl(starttime)
Set rs = Nothing
Set db = Nothing
' Print Elapsed time in milliseconds
Debug.Print timespan
DoEvents
End Sub

ProductGUIDRandom table: we let Access create the Random GUID for the primary key.

ProductGUIDSequential: we use the Windows API to create a sequential ID that we insert ourselves.

Test results

Without further ado, here are the raw results, showing the number of inserted record per second that we achieve for each test over the growing size of the database (here are only shown tests comapring Sequantial GUID and Autoincrement on SQL Server and Access, see next sections for the other results):

What we clearly see here is that performance when using autoincrement and Sequential GUID stays pretty much constant over the whole test. That’s good new as it means that using Sequential GUIDs do not degrade performance over time.

As a side note, in this particular test, Access offers much better raw performance than SQL Server. In more complex scenarios it’s very likely that Access’ performance would degrade more than SQL Server, but it’s nice to see that Access isn’t a sloth.

Using Sequential GUID vs Autoincrement in Access

The results show that we do take a performance hit of about 30% when inserting Sequential GUID vs just using autonumbers. We’re still getting good results, but that’s something to keep in mind.

In terms of CPU consumption, here is what we get:

Random PK, whether they are simple integer or GUID do consume substantially more CPU resources.

Using Sequential GUID vs Identity in SQL Server

Out-of-the box, SQL Server performs quite well and there is not much difference whether you’re using Sequential GUIDs or autoincrement PK.

There is however a surprising result: using Sequential GUIDs is actually slightly faster than using autoincrement!

There is obviously an explanation for this but I’m not sure what it is so please enlighten me :-)

CPU Consumption:

Using Random GUID vs Sequential GUID vs Random Autonumber in Access

So, what is the impact of choosing a Sequential GUID as opposed to letting Access create its own random GUIDs?

It’s clear that random GUIDs have a substantial performance impact: their randomness basically messes up indexing, resulting in the database engine having to do a lot more work to re-index the data on each insertion. The good thing is that this degradation is pretty logarithmic so while it degrades over time, the overall performance remains pretty decent. While GUIDs are larger than Random Integers (16 bytes vs 4 bytes) the actual performance of inserting records whose PK is a random integrer is actually worse than random GUID…

Provisional conclusions

Here we’ve check the baseline for our performance tests. In the next article, we’ll look exclusively at the performance of inserting data from a remote Access 2007 front end using our VBA code.

Having this baseline will allow us to check the performance overhead of using ODBC and letting Jet/ACE manage the dialogue with the SQL Server backend.

Feel free to leave your comments below, especially if you have any resources or information that would be useful.

Why didn’t you test random autonumbers in Jet? Also, you do realize that if you were using an MDB instead of an ACCDB, you could use Jet replication? Jet 4 can also participate in a SQL Server 2000 replica set, with what is called heterogeneous replication (i.e., with MDB replicas as subscribers to the SQL Server replica set). Unfortunately, Microsoft dropped this from SQL Server 2005 and later.

I’m not sure why you want sequential PKs as it can lead to concurrency problems in a Jet back end. That is, if you have a sequential PK, after a compact, the table is written in PK order (because it’s clustered on the PK). This means that records with sequential PKs will often be stored in the same data page. If editing of the data is related to the ordering of the PK values, i.e., it’s more likely that users will edit recent records than older records, you can increase the chance of contention for those data pages.

With random PK values, the data pages will be distributed more-or-less evenly and there would be less chance of contention for data pages.

Now, that is only relevant if the editing tends to be on close together records. If that’s not the case, this isn’t really relevant. But a random PK actually can improve concurrency in a Jet back end in that kind of situation.

And, BTW, I’ve used random autonumbers with all my replicated apps since 1997, and have never had a PK collision between replicas. If you think about it, it’s unlikely that this would ever happen (less likely than if you had random autonumbers in non-replicated MDBs, in fact) because after each synch, the list of used-up values is known to all replicas. As long as you synch around your replica set on a reasonably frequent schedule, the chances of a random autonumber collisions should be minimal — certainly not enough to justify using GUIDs as PKs.

And, again, let me point you to Michael Kaplan’s article on the problems of GUIDs in Access:

Hi David, sorry your comment was mistakenly marked as spam and I just saw that.

Why ACCDB: because I use features of Access 2007 like the RichEdit textbox, the Ribbon, better integration with Office 2007 look and feel, transparent PNG, etc. In any case, it’s the future of Jet and there was no reason for me not to use it.

Jet replication isn’t really encouraged in Access 2007 so we can probably say that it doesn’t have a bright future and will most likely be replaced by something else, maybe related to Sync services (see Project Huron).

To be honest, I don’t really want to use SQL Server Replication: I’d prefer to use the Sync framework to have better control over what is going on. I also think it offers more possibilities for disconnected clients.

And I’m moving the project to a SQL Server 2008 back-end, the front-end remaining Access 2007 for now (but it will probably change one day).

Using sequential GUIDs has PK a few advantages: for clustered keys, they cause almost index fragmentation and result in better performance than random GUID. Using GUID also gives me the ability to ensure that any single record in any table can be uniquely identified.

Random autonumbers are only guaranteed to be unique for a given table. Once that table gets to rows in the millions you can’t really be absolutely sure that another replica won’t use that exact same number and create a collision when you try to reconcile them. I know that in disconnected environments, I have to assume that clients may be unable to reconnect for a week so standard replication isn’t really going to help me here.

I’d rather play it a bit safer and future-proof the design now. With other projects I may do otherwise but for this one I’d rather not and stick with GUID.

I very well know the link to MichKa’s article and I did consider using my own PK scheme for a while. In the end, replication and sync framework are somewhat easier to deal with with GUID. It has its inconveniences in Access and forces me to adapt some stuff (got a fair amount of VBA code) but that’s part of moving forward.

In SQL Server though, I was surprised to see that using sequential GUIDs as little to no effect on performance.

If I have some time I’ll test random autonumber and add it to the stats above.

Why ACCDB: because I use features of Access 2007 like the RichEdit textbox, the Ribbon, better integration with Office 2007 look and feel, transparent PNG, etc. In any case, it’s the future of Jet and there was no reason for me not to use it.

These are all front end issues, whereas replication is a back-end issue. That is, you could use a replicated MDB for storing your data tables and an ACCDB for your front end.

You continue:

Jet replication isn’t really encouraged in Access 2007 so we can probably say that it doesn’t have a bright future and will most likely be replaced by something else, maybe related to Sync services (see Project Huron).

To say “replication isn’t encouraged in A2007” seems odd. If you open an MDB, the replication menus are still there (albeit accessed from a different location), and the ACE obviously supports Jet replication, since it can read/write reploicated MDBs (as long as they are Jet 4 MDBs, i.e., A2000, A2002, A2003 or A2007 format).

I can’t comment on Microsoft’s future plans. Those of us who saw MS’s plans for Access when it released Access 2000 are pretty skeptical of anything that MS claims for the future. A large portion of the new features in A2000 that MS promoted as the future have been completely dropped or deprecated, or development has been stopped completely. They told us Jet was dead then, but now there’s a brand-new version of Jet (i.e., ACE).

So, I’m pretty wary of anything MS suggests as the future and wouldn’t make plans based on that.

You also write:

Once that table gets to rows in the millions you can’t really be absolutely sure that another replica won’t use that exact same number and create a collision when you try to reconcile them.

How do you know this? Have you tested?

I know that in disconnected environments, I have to assume that clients may be unable to reconnect for a week so standard replication isn’t really going to help me here.

You’re assuming large batches of inserts. My belief is that if you’re using up enough of the numeric space available with a random Autonumber, then Jet is not a proper choice for your back end in the first place. However, it certainly makes replication a helluva lot easier than SQL Server, which is why I’m such a promoter of it.

@David: Thanks for your comments. I think we misunderstood each-other: I’m not considering Jet for my back-end, I want to move away from it and use SQL Server instead. Replication/synchronisation is only one of the reasons.

While I could use MDB as a back-end and use its replication facilities the fact that MS is investing in a new file format that doesn’t support it doesn’t inspire me confidence on the future of Jet. Yes I know, Jet and MBD will still be around but there is a difference between maintaining a legacy application and starting a new one using a technology whose future is compromised. Looking at Access 2007, it’s clear that MS would rather you use Sharepoint for solving offline and data synchronisation than use Jet replication for which no tools for creating replicas are available from the Access 2007 UI.

Regarding random autonumber you may be right, the chances of a collision may be slim but I don’t like the idea of index fragmentation and the performance hit. Maybe if GUIDs become too much to handle I’ll consider random autonumbers but for now I’d prefer to stick with GUID.

You’re right that we should be wary of new projects coming out of MS and not jump too soon. I referenced project Huron but it may be something else. What is sure is that MS is very heavily investing in syncronisation and these tools will be built, in one way or another, into almost everything: the Windows OS, SQL Server, MS Office, etc.

Looking at Access 2007, it’s clear that MS would rather you use Sharepoint for solving offline and data synchronisation than use Jet replication for which no tools for creating replicas are available from the Access 2007 UI.

When you open an MDB in A2007, the UI has replication on the menus, more or less in the same location as it was in previous versions of Access. I don’t have anything but the A2007 runtime installed on any accessible PC, so I can’t tell you exactly where it is, but I know it’s there — support has not been removed from the ACCESS the APPLICATION, only from the new file format.

I think David W Fenton and Renaud should get a room!! Bang it out man on man style yeah

PS: Film it !!

9. Nathan Wilson | November 4th, 2009 at 12:43 am

When my mdb was corrupted, Access 2007 gave me the option to repair it. It was unable. I opened the mdb in Access 2003 and it gave me the option to repair. It was successful. That was a sign to me that I need to move away from mdbs as soon as possible.

10. Ice Polar | November 10th, 2009 at 6:30 am

Hello. Very interesting thread. I agree with the thread-starter: leave the mdb’s as soon as possible. We do this since some times and are now investigating the new benefits of sql-server 2008. But our application is not access but is build onto the mdb’s and the jet-engine, we can’t change that.

The mdb is solely a “ODBC-Link container” but this is the next grave: ODBC was and is not bug-free and as i understand the new SQL Native Client 10.0 Story parts of DAO are deprecated…

And, thanks for the detailed analysis, now i have it black on white: for a single user apllication acccess/jet tops sql-server. But we have a multiuser application and are fighting with the weired mechanics jet does in such a environment like strssing the fileserver and the network.

What is the thing-a-magic with that *.ldb, hum… we are working in read-only mode against the mdb and jet fiddles araound with this ldb, sometimes like that the fileserver locks it for ever! Only a reboot releases this lock and this are new windows 2008 64-bit heavy duty clusters… With this esample you can see what the jet is able to.

DAO is not deprecated at all. For Jet/ACE it is the preferred interface form COM applications. For .NET, it means using COM so that may be where they are deprecating it, but I don’t think Jet/ACE is a very good datastore for a .NET app, in any event.

For Access apps using ODBC, it is also the preferred interface, because Access uses ODBC through Jet/ACE.

If Jet/ACE is making wrong decisions about what to send to the server, then you need to move that logic server-side or within Access use passthrough queries.

As to LDB files, if your back end is not Jet/ACE, this is relevant only to the Access front end and shouldn’t be an issue.

If your back end is Jet/ACE, the LDB file is there as soon as a single user opens any MDB/ACCDB because that’s where the locks are recorded for all the users of the data file. Jet/ACE does not wait until a second user attempts to access the data as this would be really inefficient for that next user (creating the LDB file is one of the main performance bottlenecks, in fact).

If you’re having LDB file problems, you’re likely not closing your connections appropriately. Improper app shutdown could also leave an orphaned LDB (i.e., force quitting Access). This is something you should avoid as it can corrupt your Jet/ACE data file (while an improperly shut-down file will be flagged as suspect, it is almost never the case that the actual data structures are corrupted — what determines that is whether or not there is an active edit in process at the time of the forced quit, and even those are almost never unrecoverable through Jet/ACE’s standard compact/repair operation).

Now, that said, with Access, you should basically open a connection and keep it and use it throughout the Access session and close it when you’re done in Access (as part of the shutdown process). When closing your connection, it is not sufficient in VBA to set the database variable to Nothing — you must close the external database before clearing the pointer.

If using a Jet/ACE file from another development environment, I would suggest that whatever data structures you use to work with your data file, you make sure that you explicitly tell Jet/ACE to close the open database connection, not just depending on the clearing of whatever data structure you’re using to release the connection implicitly.

All that said, this blog, good as it is, is really not a great forum for discussing issues like this. If you’re working with Jet/ACE data, whether from Access or not, you’ll get better information by going where experienced Access developers congregate. The principle place for this is the microsoft.public.access Usenet groups, where these Access developers discuss strategies for address these issues ALL THE TIME. This expertise is going to be found among Access developers, who as a group are quite accustomed to developing Access front ends for both Jet/ACE data stores and a range of database servers, chiefly SQL Server, but also Oracle, MySQL, PostgreSQL and others.

You can search these newsgroups on Google Groups (restrict the newsgroup searched to access), and you can post to the Microsoft newsgroups by using any news reader and connecting to msnews.microsoft.com, which is an open news server carrying only the Microsoft newsgroups. One caveat: if you do that, you’ll be missing out on one of the best newsgroups, comp.databases.ms-access, which is not a microsoft newsgroup. Another alternative is to use a free news server like the one offered at http://www.eternal-september.org/ .

Very interesting conversation here and nice detailed posts David. you’re right that this is not the best place for lengthy discussions and that teh dedicated MS groups are much better.

Regarding issues with a shared MDB/ACCDB file I can assure you that it’s always due to not properly releasing resources when you’re not using them and forgetting to explicitly disconnecting from the database.

So far, the only couple of issues I’ve ever had were because of some users connecting to the back-end through an unstable WiFi. Never lost any data and my main app is being used by about 50 users, without problem, all day long.

I have of course set up some optimisations for reducing the network traffic by caching many queries on the local user machine so the initial fetch is a bit longer but using large reports is as fast as the local machine allows.

These strategies help a lot in making Access keep decent performance over a network with a reasonably large user base. There is no doubt that we are moving to SQL Server, but most people bad-mouthing Access haven’t tried very hard to make the most out of it.

Don’t forget that it’s an extremely cheap application development environment: buy a single license of it and deploy as many free runtime versions of your app to as many users as you need.

You should never under any circumstances use a Jet/ACE back end over WiFi — that way lies madness. There is no workaround, no safe way to do it. If you have to deploy on WiFi, either host the Access app with the Jet/ACE back end on a Windows Terminal server, or upsize the back end to a server back end that will not cause problems with unreliable network connections.

Let’s assume a Citrix-Farm with 20 Server where users work with an app which works with an mdb-file lying on a central fileserver. Finaly there is also a sql-server.

For my understanding of the mechanics of the *.ldb: If the app opens the *.mdb in read-only mode, what is stored in the ldb-file? the existence is like a flag, but inside the ldb are 254 slots – the max. count of concurrent access. Has this in read-only modus an effect?

If there are solely linked tables (odbc to sql-server) in that *.mdb: Does Jet handle concurrent access by let’s say 500 Users like the tables where nativ in the mdb-file?

First off, if you’re using a Windows Terminal Server/Citrix deployment, there’s no issue with WiFi, as the Jet/ACE data file is not being opened across the wireless connection. It’s only in the situation where a workstation is opening a local front end with linked tables to a shared back end data file on a server via a wireless connection that there’s danger, since we all know how flakey WiFi is — Jet/ACE simply does not function in that scenario. You get the dreaded “disk or network error”, which is nontrappable and nonrecoverable and endangers the health of the file opened across the wireless network connection.

But in a WTS/Citrix environment, that is not an issue.

Now, it sounds from the question as if you’re considering having multiple users opening the same MDB/ACCDB file. That is a completely unsatisfactory approach. In all cases, a multi-user Access app should be split into back-end data tables, stored on a file server (shared by all users, max of 255, read or write users), and a front-end application file (forms/reports/etc.) that has links to the back-end data tables. A copy of the front end should be on each user’s PC when the app is running on individual workstations. Likewise, in a WTS/Citrix environment, each user MUST have her own copy of the front end (usually stored in the user profile, or in a folder designated for that purpose, keyed to the Windows logon name).

Jet/ACE has no difficulties accommodating workgroup-size numbers of users connect to data tables (pure Jet objects), but Access does much less well when multiple users try to share a front-end MDB/ACCDB. That situation almost always leads to wonky behavior, corruption of the front end, and in some outside cases, corruption of the back-end data.

So, all Access apps should be split (i.e., apps that store tables and forms/reports/etc. all in a single file should not exist and should be split).

And never ever should more than one user at a time open a front end MDB/ACCDB.

The only file that is ever shared in an Access/Jet/ACE appliction is the one containing the data tables — all other parts of a properly-designed Access app will be opened by one user at a time.

So, getting back to the question:

If you have an Access app with a front end MDB/ACCDB linked to a back-end MDB/ACCDB containing the data tables, there is one back-end data file and a maximum theoretical limit of 255 users of that one back-end data file. Read-only makes no difference, as those users contribute to the maximum user count because read locks are placed on the tables they have open in order that their views of the data can be refreshed at appropriate intervals.

While read-only users add little to the concurrency load, they still count towards the hard 255-user limit.

In the real world, 25 simultaneous write users is about all an Access app that is not super-optimized is going to be able to support (i.e., if Access development has not been your primary professional occupation for a while, you probably don’t know enough to create an Access/Jet/ACE app for more than 25 simultaneous write users).

However, with a good schema and good application design, an experienced Access developer can bump that practical limit to 50-100 simultaneous write users. But it’s all highly dependent on the particular app’s data schema and its front-end implementation.

In regards to ODBC links, those don’t belong in a back end, only in a front end. Thus, any MDB/ACCDB file with ODBC links will be opened by a single user, and the only limit on the number of simultaneous users of the application is whatever is set by the ODBC data source.

For instance, an Access front end linked via ODBC to a SQL Server database has no theoretical limit on the number of simultaneous users (though certainly 1000 users could bring a SQL Server to its knees if it’s not well-provisioned and the application being used by the 1000 users optimized for maximum concurrency; and, of course, if there aren’t sufficient CALs, users would be locked out, but that’s not a theoretical consideration nor a limitation on the actual capability of the database engine — it is only a licensing limitation and resolved by purchasing the appropriate number of CALs or the appropriate SQL Server version to avoid purchasing individual CALs).

To summarize:

each user has an individual copy of the front end so that it is never opened by more than one user.

the maximum of concurrent users is entirely controlled by the limitations of the back-end data store, i.e., in the case of Jet/ACE, 255 theoretically (read or write makes no difference), but theoretically unlimited in the case of server databases (i.e., limited only by hardware, load balancing, efficient app design, and, of course, licensing).

Last of all, this is all really, really basic Access information. Any Access developer who has been in the business a couple of years should know all of this, and anybody who regularly reads the Access newsgroups would already know all of this. If you’re using Access/Jet/ACE and don’t know these things, you either need to get educated really, really fast, or you need to switch to something other than Access/Jet/ACE — it’s just this type of developer ignorance of how to properly design, deploy and maintain Access apps that provides fodder to the Access bigots for their blanket condemnations of all things Access.

about

This is a simple technical weblog where I dump thoughts and experiences from my computer-related world. It is mostly focused on software development but I also have wider interests and dabble in architecture, business and system administration.More About me…