If it feels like Wave has been offline for updates more often than usual over recent weeks, that’s because we have. With four more upgrades to come in the next couple of months, we asked our Senior Director of Engineering, Ash Christopher, to explain what’s going on behind the scenes, and what this means for customers.

“Big picture — what’s going on with all these upgrades?”

For the past few years, Wave has run in a private cloud environment. We’ve had great service from our infrastructure partner, but newer cloud computing environments offer us additional flexibility to actively manage our service and scale resources to meet increasing demand peaks — whether that’s busy days like the first of each month when our customers run more than half a million recurring invoices, or tax season when everyone needs to get online to update their accounting information.

More than a year ago, we determined that Wave should migrate to Amazon’s AWS platform, which powers many of the world’s largest and most reliable cloud software services. From a technical perspective, this is a very large and complex migration, and the only way to complete it is in multiple steps.

“You say the migration needs multiple steps. How do you plan and manage this?”

It’s important to remember that there are more than 2,500,000 Wave accounts, and some of these have more than 5 years of transaction history. Our first priority in completing this migration to AWS is ensuring that every single transaction for every single customer is transferred 100% accurately. In fact, every upgrade window starts with the team reviewing and confirming they understand our Engineering Values statement, which is:

There is nothing more important than the consistency of our users' data.

We choose to make operations slower and downtimes longer to reduce the risk of data inconsistency.
If we are unable to confirm 100% success of a migration, we must roll-back to the secured previous state. We can migrate another day.

There is nothing more important than the consistency of our users' data.

With so much data, and so much priority on data accuracy and consistency, it would be impossible to move everyone’s data all at once — or to migrate customers in small groups. Fortunately, the way Wave is built, it consists of lots of “services”, which each handle a specific task. An invoice service, for example, or a receipts service, or a service that handles sign-ins. This allows us to migrate service-by-service, keeping each migration window as short as possible.

“Most upgrades seem to be scheduled for about 4 hours. Why is that?”

Each time we migrate a service, we need to stop all Wave services — to ensure no inconsistencies are introduced with data changing in other services. We then need to secure the “pre-migration” state of all data, in case we should need to roll-back the migration. Next, the actual data migration takes place, which for most services takes about an hour. Next, we run consistency and integrity checks to ensure that 100% of data is migrated correctly, and finally we change a slew of configuration settings and re-start the service in AWS. We re-start all other Wave services, and test to make sure everything is working.

“Wave has customers all around the world. That means whenever you plan your migrations, there’s going to be customers inconvenienced. How do you decide when to schedule these upgrade windows?”

I completely understand how much of a problem it is for customers when they want to get into Wave and we are unavailable. As Director of Engineering, one of my biggest accountabilities is to keep Wave online and available, so I don’t take this lightly!

There’s no way to migrate Wave to AWS and achieve all the benefits that offers without going through this series of upgrades, so the best we can do is schedule updates to minimise impact and risk. The time we have fewest customers online is early mornings (Eastern Time) on the weekend.

But usage levels aren’t the only consideration. We also prefer to migrate on Saturdays, which ensures that, should there be any problem with a migration, we are able to roll-back the update and also have Sunday available for any follow-up work that might be needed before usage spikes up on Monday. As I mentioned, protecting the integrity of our customers’ data is our highest priority at Wave, so we always plan to avoid risk, as well as to minimize inconvenience.

“OK. I get why it’s complicated, and why it takes time. When all this work is finished, how is it going to be better for Wave customers?”

Great question. We wouldn’t go to all this trouble if it wasn’t going to be better for customers!

The two main benefits are speed, and reliability. Once we are fully migrated, we expect Wave to run between 20% and 30% faster than before we started (even though we’ve added a ton of new users in the meantime!)

More important than the raw speed, I think, will be that we can ensure optimal performance even when Wave under very heavy use. On AWS it will be much easier for us to scale up our resources for busy times of the month or the year.

Reliability is the other thing. AWS is a super-reliable platform. Of course, it’s been inconvenient having Wave offline for these upgrades, but the pay-back should be that you will see more resilience from the Wave systems.

Over all, I see this migration as a huge win for customers, as well as giving Wave itself an awesome platform for growth and addition of powerful new services and features.

“Alright. Sounds like we’re heading to a better place. How much more work is needed to get there?”

We’re way more than half way through the process. We’ve already migrated some of Wave’s most important underlying services, like our Identity service which underpins everything in Wave.

This Saturday (June 24th) will see our largest single migration, as we transfer our core Accounting service over. This is going to be a longer than usual migration, simply because the Accounting service has much more data to transfer than any other service within Wave, so we'll be upgrading through the night. Wave will go offline at 11:00 PM Eastern time, and we expect to complete migration and be back online 9 hours later, at 8:00 AM Sunday morning.

After Accounting is migrated, we have three more services left to migrate. Our goal is to complete these final migrations during June and the first half of July, and then Wave will be entirely on AWS!

“Thank you Ash. I’m sure no-one who’s looking to update their bookkeeping or get their invoicing out during these Saturday morning migrations is exactly happy to see us offline, but hopefully understanding the reason will help. And I guess everyone will be happy to see a faster Wave!”