Progressive Core Data Migrations

There are very few certainties in app development, but one is that once your app is released it will change in unexpected ways. And no matter how flexible your architecture is inevitably one of those changes will be a breaking change. Perhaps the most important breaking changes involve the user's data. If your app loses or corrupts user data you can expect at least some reputational damage and if the loss is severe enough you can end up doing your competitors marketing for them by turning your users into their users. If you have any doubt about the impact of data loss imagine how you would feel if a game you had been playing was updated and deleted your recently hard earned thingamabob - all that time and effort lost through no fault of yours. And that's just a game, now imagine how your users would feel when your far-more-important app starts losing their data.

In our iOS apps we often store these thingamabobs in Core Data. The structure of which is defined by a model/schema - a set of entities with attributes and relationships. To allow changes to be made to this model to meet your apps changing needs, Core Data has a built-in mechanism to migrate data from one model structure to another model structure.

In this post, we are going to build a simple system to manipulate the inbuilt Core Data migration mechanism to make migrations simpler and so reduce the risk of losing your user's data.

This post will gradually build up to a working example however if you're on a tight deadline and/or there is murderous look creeping into your manager's eyes 😡, then head on over to the completed example and take a look at CoreDataManager, CoreDataMigrator, CoreDataMigrationStep and CoreDataMigrationVersion to see how things end up.

The Migration Process

As mentioned above, Core Data allows the model to evolve. Typically a model version's changeable lifecycle (when it can be changed) is from when that version is created until it's released as an app update. Once released a version is effectively "frozen" - if any further changes were made to it, it would result in an app crash upon launch (as Core Data would have two different structures for the same version). Depending on your release cycle, the opportunity to make changes to a model version can be quite short because of this Core Data has a robust migration system. Migrations can be handled using one of two techniques:

Lightweight Migration - when Core Data can automatically infer how the migration should happen and creates the mapping model on the fly.

Standard Migration - when Core Data cannot infer how the migration should happen and so we must write a custom migration by providing a mapping model (xcmappingmodel) and/or a migration policy (NSEntityMigrationPolicy).

By default, Core Data will attempt to perform a migration automatically when it detects a mismatch between the model used in the persistent store and the bundle's current model. When this happens, Core Data will first attempt to perform a Standard migration by searching in the app's bundle for a mapping model that maps from the persistent store model to the current bundle model. If a custom mapping model isn't found, Core Data will then attempt to perform a Lightweight migration. If neither form of migration is possible an exception is thrown.

If you are using NSPersistentContainer, Lightweight migrations are enabled by default, however if you are still directly setting up the NSPersistentStoreCoordinator then you need to enable Lightweight migrations by passing in an options dictionary with both NSMigratePersistentStoresAutomaticallyOption and NSInferMappingModelAutomaticallyOption set to true when loading the persistent store.

These automatic migrations are performed as one-step migrations; directly from the source to destination model. So if we support 4 model versions, mapping models would exist for 1 to 4, 2 to 4 and 3 to 4. While this is the most efficient migration approach from a device performance point-of-view, it can actually be quite wasteful from a development point-of-view. For example if we added a new model version (5) we would need to create 4 new mapping models from 1 to 5, 2 to 5, 3 to 5 and 4 to 5 which as you can see doesn't reuse any of the mapping models for migrating to version 4. With a one-step migration approach, each newly added model version requires n-1 mapping models (where n is the number of supported model versions) to be created.

It's possible to reduce the amount of work required to perform a Core Data migration by disabling automatic migrations and so break the requirement to perform migrations in one-step. With a manual migration approach, we can perform the full migration by chaining multiple smaller migrations together. As the full migration is split into smaller migrations when adding a new model version we only need to handle migrating to the new model version from its direct predecessor rather than all it's predecessors e.g. 4 to 5 because we can reuse the existing 1 to 2, 2 to 3 and 3 to 4 mapping models. Not only do manual migrations reduce the amount of work involved they also help to reduce the complexity of the migration as the conceptional distance between the source and destination version is reduced when compared to one-step migrations i.e. version 4 is much nearer to the structure of version 5 than version 1 is - this should make it easier spot any issues with the migration.

This reduction in work to support migrations allows us to focus on what's important: building better apps for our users.

Progressive migrations

In order to support progressive migrations we'll need to answer a few questions:

Which model version comes after version X?

What is a migration step?

How can we combine the migration steps into a migration path?

How do we trigger a migration?

These questions will be answered with the help of 4 separate types:

CoreDataMigrationVersion

CoreDataMigrationStep

CoreDataMigrator

CoreDataManager

These types will come together in the following class structure:

Along with several helper extensions.

Which model version comes after version X?

Each CoreDataMigrationVersion instance will represent a Core Data model version. As each Core Data model version is unique and known at compile time they can be perfectly represented as enum cases, with the raw value of each case being the Core Data model name:

Migrations are often concerned with what the latest model version is - the static current property allows easy access to this version. Before Swift 4.2 we would probably have had to hardcode this property to one case which would then lead to bugs if we forgot to update that property when adding a new version. However in Swift 4.2 we got the CaseIterable protocol which makes it possible to get an array of the cases in an enum in the order they were defined in via the allCases property. This means that to get the latest model version should be as simple as calling last on the allCases array - no need to hardcode anything.

In CoreDataMigrationVersion the nextVersion() method is where the real work happens as it determines which (if any) version comes after self.

You may be thinking:

"Why bother with nextVersion() when we can just always choose the next enum case?"

If you are reading this post before performing your first migration I congratulate you on your:

Excellent taste in selecting blog posts.

Organisational ability.

However, I'm guessing it's more likely that you've found this post having already performed a number of migrations and been hit by the inherent scaling issue with the default one-step migration approach. If you are in the latter camp then you will have already implemented one-step migrations having configured various mapping models and maybe even written a migration policy or two. Instead of throwing all that work away we can use it and tie it into the new progressive approach. In a hypothetical project that had 6 model versions which until model version 4 used the one-step migration approach and before switching over to the progressive migration approach, then nextVersion would look like:

In the above code snippet, version1, version2 and version3 migrate directly to version4 and then version4 and version5 migrate to their direct successor. As you can see both these migration approaches can co-exist very happily with each other.

Even if you don't have any existing migrations, it's possible that at some point in the future a broken model version is released that corrupts your user's data upon migration. In order to minimise to the impact of this mistake, nextVersion could be configured to bypass that broken model version so that any currently unaffected user are never impacted:

Both these issues are easily bypassed using nextVersion() without adding too much complexity to the overall solution.

What is a migration step?

A migration happens between 2 model versions by having a mapping from the entities, attributes and relationships of the source model and their counterpoints in the destination model. As such CoreDataMigrationStep needs to contain 3 properties:

It's possible to have multiple mapping models between versions, (this can be especially useful when migrating large data sets) in this post in an attempt to keep things simple I assume only one mapping model.

CoreDataMigrationStep takes the source model and destination model and attempts to find a way to map between them. As we know there are two types of migrations: Lightweight and Standard - both of which use a NSMappingModel instance to hold the mapping path between the versions. Because of this shared output type mappingModel(fromSourceModel:toDestinationModel) handles searching for a mapping model using either Lightweight and Standard migration. First, a search is made for a custom migration mapping existing in the bundle (Standard migration) and then if no custom mapping model is found Core Data is asked to try and infer a mapping model (Lightweight migration). If a mapping model can't be found using either approach, a fatal error is thrown as this migration path isn't supported.

How can we combine the migration steps into a migration path?

CoreDataMigrator is at the heart of our migration solution and has 3 tasks:

Determining if there needs to be a migration.

Ensuring the persistent store is ready to be migrated.

Performing the migration.

As CoreDataManager holds a reference to CoreDataMigrator (we will see this later) we can make our lives easier by wrapping CoreDataMigrator in a protocol so that it's easier to mock when writing tests for CoreDataManager:

In the above method, the persistent store's metadata is loaded and checked to see if it's compatible with the current bundle model's metadata.

If a migration is required, some housekeeping on the persistent store needs to be performed before the migration can begin. Since iOS 7, Core Data has used the Write-Ahead Logging (WAL) option on SQLite stores to provide the ability to recover from crashes. If you have ever had to perform a rollback before, WAL works a little differently from what you may be expecting. Rather than directly writing changes to the sqlite file and having a pre-write copy of the changes to rollback to, in WAL mode the changes are first written to the sqlite-wal file and at some future date those changes are transferred to the sqlite file. The sqlite-wal file is in effect an up-to-date copy of some of the data stored in the main sqlite file. The sqlite-wal and sqlite files store their data using the same structure which can cause problems after a migration as we only alter the structure in the sqlite file. The resulting mismatch in structure will lead to a crash when Core Data attempts to update/use data stored in the sqlite-wal file 😞 . To avoid this crash we need to force any data in the sqlite-wal file into the sqlite file before we peform a migration - a process known as checkpointing:

In the above methods, the migration path is built by looping through the appropriate model versions until the destination model version is reached. This migration path will take the users data from the persistent store's model version to the bundle model version in a progressive migration:

In the above method, we iterate through each migration step and attempt to perform a migration using NSMigrationManager. The result of each completed migration step is saved to a temporary persistent store, only once the migration is complete is the original persistent store overwritten. If there is a failure during any individual migration step a fatal error is thrown - this is especially useful during the development of a custom migration path.

In the above code snippets, we've seen a number of methods used that are not part of the standard API so I've included the extensions that contain these methods below. As with most extensions, the methods are used to reduce boilerplate code:

If you have ever seen a Core Data stack setup before, you will instantly notice how little code the CoreDataManager contains. Over the years Core Data has evolved and become more developer friendly, above we are taking advantage of a relatively new piece of the Core Data family - NSPersistentContainer which was introduced in iOS 10:

NSPersistentContainer simplifies the creation of the managed object model, persistent store coordinator and the managed object contexts by making smart assumptions on how we want our persistent store configured. It's still possible to access the NSManagedModel, NSPersistentStoreCoordinator and NSManagedObjectContext instances via this container but we no longer have to handle their set-up code.

Our example project is called CoreDataMigration-Example however as you can see when creating the NSPersistentContainer we give CoreDataMigration_Example as our model's name - see Apple's documentation on why the - became a _.

CoreDataManager is a little odd when it comes to being a singleton in that it has an explicit init implementation. This explicit init method allows for changing the type of persistent store used - by default it's NSSQLiteStoreType however when unit testing we will actually create multiple instances of CoreDataManager using NSInMemoryStoreType to avoid persisting data between tests (and having tests potentially pollute each other). A persistent store type of NSInMemoryStoreType will cause our Core Data stack to only be created in-memory and so be more cheaply torn down and set up than if we used NSSQLiteStoreType. In the accompanying example project, you can see how this is used in the CoreDataManagerTests class.

Loading the persistent store involves interacting with the disk which compared to memory interactions is more expensive ⏲️, as such the loadPersistentStores(completionHandler:) method on NSPersistentContainer is asynchronous. This is mirrored by the setup(), loadPersistentStore(completion:) and migrateStoreIfNeeded(completion:) methods:

Before an attempt is made to load the persistent store, we check if the model needs to be migrated by calling migrateStoreIfNeeded(completion:).

If the answer is yes - the migrator attempts to migrate the user's data. As migrating can be a relatively slow process, the migration happens on a background queue to avoid hanging the UI. Once the migration is completed the completion closure is called on the main queue.

If the answer is no - the completion closure is called straight away.

Once the persistent store is successfully loaded, the setup() method calls its completion closure and the stack finishes setting up.

The above code snippet is from the example project where the user is shown a loading screen while the Core Data stack is being set up. Only once the setup is complete is the user allowed into the app proper. presentMainUI switches out the window's root view controller for a navigation stack that can freely use Core Data. While this is strictly not necessary, by splitting the UI into pre and postCore Data stack set up it is possible to avoid race conditions where the app is attempting to use Core Data before it has finished setting up.

💃🥂🎉🕺

Congratulations, that's all there is to the progressive migration approach.

The rest of this post is devoted to putting the above migration approach into practice by migrating an app through 3 Core Data model versions.

Colourful Posts

Colourful Posts is a simple app that allows the user to create posts that are persisted in Core Data. Each post consists of:

A unique ID.

A random associated colour represented as a hex string.

The body/content of the post.

The date the post was created on.

So that the model looks like:

Each post that the user creates is then displayed in a tableview as a brightly coloured cell.

To keep this post to a responsible length I won't show any code from Colourful Posts that isn't connected to performing a migration.

It's a simple, fun app that we submit to Apple for approval 🤞.

Migrating to version 2

Despite not being able to edit posts, Apple not only approves Colourful Posts they love it so much that they feature it on the Today tab. Colourful Posts is instantly propelled to the top of the charts and is an overnight success. Hundreds of thousands of downloads later we decide to hire a new developer to help with developing features that will continue the success-train we find ourselves on 🚂. In their first week, the new developer mistakes the information stored in the color property on Post to be an RGB representation of a colour which leads to the app crashing 😞. To avoid this issue happening when we hire more developers we decide to rename color to hexColor. As this is a change to the model we need to create a new model version and handle the migration between the old and new version.

To create a new model version, select the *.xcdatamodel (it may be called *.xcdatamodeld) file in the Project Navigator, open the Editor menu from the top bar and click on the Add Model Version... option. In the wizard that opens, this new model will already be given a name, this typically follows [ModelName] [Number] so CoreDataMigration_Example 2 but this can be changed to whatever you want.

Lightweight migrations are typically a less intensive form of migration than Standard migrations (both from a developer and performance POV) because of this I prefer to perform Lightweight migrations whenever possible. Lightweight migrations can handle the following transformations to the model:

An impressive list of transformations that we get free (or almost free) with Lightweight migrations. The color to hexColor change is covered by the Renaming an entity, attribute or relationship which has a small caveat: by providing a Renaming ID. The Renaming ID creates a link between the old attribute and the new attribute. All it requires is to add the old attribute name to the new attribute's metadata:

With this information, Core Data now knows that color and hexColor are the same attribute just with different names and that rather than discarding color during a Lightweight migration the value should be transferred to hexColor.

With that change the only thing that's left to do is update CoreDataMigrationVersion to allow migrations from CoreDataMigration_Example to CoreDataMigration_Example 2:

A new case was added to CoreDataMigrationVersion - version2. As with version1, this new version has a raw value which maps to the name of its respective model version - CoreDataMigration_Example 2. nextVersion() has also been updated so that there is a migration path from version1 to version2.

Now that we have a migration path, let's look at unit testing it. Unit testing a migration path requires:

Populating a SQLite database using the CoreDataMigration_Example model.

Copying that SQLite database into the test target.

Asserting that the contents of that SQLite database migrated as expected.

Before copying your SQLite database, it's important to ensure it is in fact populated with test data. As we discussed above, Core Data uses Write-Ahead Logging to improve performance so your data could be residing in the sqlite-wal file rather than the sqlite file. The easiest way to force any uncommitted changes is to fake a migration - add a breakpoint just after the forceWALCheckpointingForStore(at:) method, open the Application Support folder, copy the sqlite file and then abort the migration.

There is no need to test every object stored in the persistent store rather we just have to assert that each entity has the correct number of objects and then select one object per entity and assert the values on that object.

In the above test, a migration is triggered between the CoreDataMigration_Example and CoreDataMigration_Example 2 models. An interesting point to note is that rather than making use of the Post subclass of NSManagedObject, the above test uses a plain NSManagedObject instance and KVC to determine if the migration was a success. This is to handle the very likely scenario that the Post structure defined in the CoreDataMigration_Example 2 model will not be the final Post structure. If we used Post instances then as the Post entity changed in later versions of the model, those changes would be mirrored in PostNSManagedObject subclass which would result in this test potentially breaking. By using plain NSManagedObject instances and KVC it is possible to ensure that this test is 100% accurate to the structure of the Post entity as defined in CoreDataMigration_Example 2 model.

As changes are being made to the file system the last thing the test does is tear down the Core Data stack using the tearDownCoreDataStack(context:) method.

Just deleting the migrated SQLite files from the file system would result in a rather serious sounding error BUG IN CLIENT OF libsqlite3.dylib: database integrity compromised by API violation: vnode unlinked while in use:.... being printed to the console. This is because the store would be being deleted from under an active Core Data stack. While the active Core Data stack in question will then be discarded resulting in this error not actually creating any issues, having it clutter the console would make it that much harder to read it and spot any genuine issues printed there so best to tear things down properly.

In the above test class there are a few extensions being used to make things easier:

Migrating to version 3

After another successful release, we decide to expand our posting functionality by allowing the user to add multiple sections to a post. These sections will be stored alongside the post in Core Data. As with any model change we need to create a new model version: CoreDataMigration_Example 3.

Each section consists of:

A title.

A body.

An index.

which in turn reduces a post to:

A unique ID.

A random associated colour represented as a hex string.

The date the post was created on.

A collection of sections.

Such that:

Migrating from CoreDataMigration_Example 2 to CoreDataMigration_Example 3 is slightly trickier than the previous migration as CoreDataMigration_Example 2 splits an existing entity into two entities and creates a relationship between them. This will require implementing both a mapping model and migration policy.

To create a mapping model open the File menu on the top bar then click on New File->New, in the window that opens scroll down to the Core Data section and double tap on Mapping Model. This will open a wizard where you can select your source and destination model versions so in this case: CoreDataMigration_Example 2 and CoreDataMigration_Example 3. After that you need to give the mapping a name and save it, I tend to follow Migration[sourceVersion]to[destinationVersion]ModelMapping as a naming convention so Migration2to3ModelMapping.

A mapping model defines the transformations required to migrate from the source model to the destination model. In Xcode, a mapping model is an xcmappingmodel file that when opened has a GUI that's very similar to the Core Data Model GUI. A mapping model handles mapping between entities, attributes and relationships. The mapping model GUI even allows for simple transformations. If the model had a percentage attribute that used to have a value between 0 - 100 but in the new model that value should be between 0 - 1, we could use the Expression field on that attribute to perform this transformation by setting the expression to: $source.percentage/100. Despite the range of transformations possible within the mapping model GUI some changes are just too complex and require a more custom approach - this is handled by creating a migration policy. A migration policy is an NSEntityMigrationPolicy subclass that defines how to map between two entities from two different model versions using the full Core-Data/Swift toolkit.

Migrating from CoreDataMigration_Example 2 to CoreDataMigration_Example 3 will require a custom migration policy as we will need to move the current content attribute's value on Post to both the title and body attributes on a newly created Section instance:

Just like with mapping models I have a naming convention for migration policies: [Entity][Version]To[Entity][Version]MigrationPolicy, this way I can know at a glance exactly what the migration policy is doing.

The above migration policy overrides createDestinationInstances(forSource:in:manager) to allow for transforming existing CoreDataMigration_Example 2 model Post instances into CoreDataMigration_Example 3 model Post and Section instances. Again in order to interact with attributes on each Post instance, we need to use KVC. First, a new CoreDataMigration_Example 3 model Post (destinationPost) is created using the mapping rules defined in the mapping model (these rules are set in the mapping model GUI). Then a Section instance from the new Section entity. As the old Post didn't have the concept of a title, we take the first 4 characters of that older post's body value and combine it with ... so that it can be used as the title of the new Section instance. After setting the other properties of the section, a relationship between this section and the new post is created.

In order for this migration policy to be used during the migration we need to add it to the mapping model by setting the Custom Policy on the PostToPost entity mapping:

It's important to note that the migration policy class name is prefixed with the module name.

All that's left to do is to update: CoreDataMigrationVersion by introducing a version3 case and updating nextVersion:

Migrating to version 4

The success of Colourful Posts knows no bounds and we decide to release our next killer feature: deleting posts. This deletion functionality is actually a soft delete which means that the post will still exist in Core Data but won't be shown to the user. We can achieve this by adding a new attribute to the Post entity - softDelete. Of course, this change will require a new model version and for us to handle the migration to that version. This migration can be handled as a Lightweight migration and in fact requires very little effort on our part. We only need to add a new case to CoreDataMigrationVersion and update nextVersion:

We got there 🏁

Core Data migration can often seem like a tedious and cumbersome process that punishes developers for mutating their models. However (hopefully) this post shows that by diverging from the default one-step migration approach we can simplify the process and significantly cut down the amount of work required to perform a successful migration. This simplification makes it much easier to treat our user's data with the care that I hope others treat my data with.

I want to acknowledge that I based the above approach on the migration example shown in the excellent Core Data book by Florian Kugler and Daniel Eggert which you can get here. I would highly recommend that you give that book a read as it's a treasure trove of Core Data knowledge.

What do you think? Let me know by getting in touch on Twitter - @wibosco