Blog:

Evernote’s July 1st Server Problem

Evernote experienced a series of hardware failures on one of our servers between July 1st – July 4th, which potentially affected 6,323 users worldwide. As a result of the failure, some of the notes created and edited by these users between July 1st and July 4th were not properly recorded on the Evernote servers. We immediately contacted all affected users via email and our support team walked them through the recovery process. We automatically upgraded all potentially affected users to Evernote Premium (or added a year of Premium to anyone who had already upgraded) because we wanted to make sure that they had access to priority tech support if they needed help recovering their notes and as a partial apology for the inconvenience.

Less than one fifth of one percent of our users were in the potentially affected group and we were able to identify 100% of them from the server logs. Because of this, we decided not to post a wider announcement to give our support staff the time to work with the actual people affected instead of fielding a flood of requests from the more than 99% of users who were not in the affected group but had no way of determining that themselves.

If you did not receive an email from us in early July, then you were not affected.
Because of the redundancy inherent to Evernote (copies of notes saved locally, email and browser history), the majority of the affected users were able to recover all their notes.

We want to assure you that this was a one-time issue. We have significantly improved our reporting and redundancy infrastructure to ensure that it does not happen again. We sincerely apologize for the inconvenience to our affected users. Even though most of them didn’t wind up losing any data they still had to read through a lengthy and potentially worrying email. For the ones that did lose data, we hope that knowing exactly which notes were effected over a four day period is enough information to recover or recreate the most important ones.

We received replies from several hundred of the affected users, and we are extremely grateful for their understanding and continued support. We are posting this now because of erroneous information that we’ve seen popping up on the web.

For the technically minded, here’s what happened

Every user’s data is stored on a “shard”. A shard is made up of a server together with a redundant fail-over server. If there is any problem with a server, the system automatically fails over to the second server in the shard. We currently have 37 shards. Shard 22 was the one that had problems last month. The data in each server is stored on a RAID 1 (fully redundant) array. All data is also backed up on-site and off-site. A full copy of your notes are also stored on the Windows and Mac clients (and the iPhone and iPad clients for Premium users who enable that option). This means that every note in Evernote is stored in at least six redundant locations: the disk on the primary server, the RAID mirror, the fail-over server on the shard and it’s RAID mirror, the on-site backup and the offsite backup. Most users also have another one or two full copies on their local clients. This makes data loss in Evernote extremely rare. The problem with shard 22 was a very idiosyncratic intermittent combination of hardware problems with both the primary server and the fail-over mechanism. Basically, the shard kept failing over back and forth between two servers over the time period causing some of the data created during that time to get overwritten. Everything created before the failure was easy to recover from backup. The chance of this particular sequence of failures happening again is extremely low, but we’ve modified the fail-over mechanism, just in case, to make sure that it is impossible to override data even in the worst-case scenario.

Thanks for good explaining what happened. The problem was still that your website was down and I was not able to access it when I needed. And there was no information at all from your side. No backup website, no tweet, nothing. This is just unacceptable, specially for Premium customers. Technical problems happens, that’s understandable but not notifying clients immediately when some problems occurs, that is something I don’t want to see again. Otherwise, keep up good work

Ahh, how I missed that twitter link… Was that from the beginning…?
But thanks for that, now that looks exactly what it needs to be Great work, and thanks for great service, I still find more and more uses every day.

I think it was the right decision to keep it under wraps to avoid a panic, even though you knew you would incur the wrath of those effected until you could identify and contact them. When bad things happen, it is about limiting the damage, not eliminating it. The free upgrade (or extension) should help mollify them.

I agree Evernote data is stored on multiple locations. But this is no insurance against mistakes like one of the servers missing one or more notes, or a user throws away some of his/her notes. If a note gets lost somewhere, the ‘deletion’ of this note will propagate through all locations and you will still lose that note. It’s better to make an offline copy once a while.

We keep multiple levels of backups to allow us to recover from a server-related data loss. We also maintain historic versions of existing notes in case you ever make an accidental change to a note:
http://blog.evernote.com/2010/04/14/new-premium-features-note-history-and-50mb-notes/

However, if you choose to put a note in the Trash, then empty the Trash (and say “yes” to the scary confirmation dialog), then we permanently remove the contents of that note from our servers. This is an intentional decision for user privacy. If someone accidentally put something sensitive into their account and then went through the multiple steps to completely delete it, we felt that it would be inappropriate to keep a copy of it on our servers. That policy is the reason that we make it relatively difficult to go through the whole process of completely deleting notes (including a dialog box that explicitly warns you what is happening).

Your own desktop client for Mac or Windows has a full copy of your notes, so you could always use your own backup solution to maintain archival copies of your database as well.
On the Mac, your database is stored within your home directory under:
Library / Application Support / Evernote
On Windows, you can find your database location via: Tools > Options > General

Yes, RAID on a single box is just our first level of redundancy. Block level replication to a second RAID on another box is our next level of redundancy. Our third level of redundancy is a nightly snapshot (on the secondary box) with an incremental file system backup onto a separate local drive. The fourth level is a nightly network backup to separate backup media. The fifth level is a weekly offsite rotation of backup media. The sixth level is dedicated “cold storage” for file system volumes that are “full” and no longer receiving new data. (We actually use high-capacity hard drives instead of tapes, for faster recovery times.)

In this incident, the first four levels of redundancy failed due to a combination of hardware, software, and operational errors. One of the many changes we’ve made as a result of this incident is to change from keeping a single nightly backup to keeping nightly backups for the last 7 nights. So a single night’s corrupted backup won’t overwrite last night’s backup.

“[…] We automatically upgraded all potentially affected users to Evernote Premium (or added a year of Premium to anyone who had already upgraded) because we wanted to make sure that they had access to priority tech support if they needed help recovering their notes and as a partial apology for the inconvenience.[…]”

made me fall in love with Evernote all over again.
I wasn’t affected by the outage and am glad i didn’t get a worrying email, so good job, too, on keeping it covered

I also agree with my fellow commenters that your level of transparency is great and it makes me feel even more confident in the product.