Microsoft OneDrive for Business modifies files as it syncs

While we often hear about privacy concerns with storing data in the cloud such as on Dropbox, one thing we take for granted is data integrity, where files are not altered in any way on the cloud unless the user actually modifies them online. For example, if a user syncs a spreadsheet file with Google Docs, the file stored on Google drive should be an exact byte for byte match with the original file until the user either modifies the cloud file in Google Docs or the locally stored file in a spreadsheet application. In fact, many consumers go as far as trusting the cloud as their only backup.

Microsoft OneDrive for Business (formerly SkyDrive Pro) is Microsoft’s workplace equivalent of OneDrive and comes bundled with most Office 365 subscriptions. It is designed to give the business control over the employee’s data stored within the synced folders. However, unlike the consumer version of OneDrive, we found out by accident that what gets synced to the cloud is generally not the same as what gets synced back from the cloud, even when no one has touched the files online or elsewhere.

When OneDrive got stuck in an endless loop of trying to sync a few files and the issue returned when I tried clearing its cache as instructed on Microsoft’s discussion forum, I decided to stop syncing the OneDrive folder and backed it up. I then deleted the original synced folder and got OneDrive to start syncing it again, so it would get a fresh copy from the cloud. In an aim to check if any files got damaged due to the earlier syncing issue, I used a utility called MD5summer to create MD5 hashes for its content and repeated this process for the freshly synced folder. To my surprise, the vast majority of the files showed ‘Checksum did not match’. Surely most of my files haven’t gone corrupt?

I then started opening various files that failed the MD5 check, but could not find any obvious damage to any file. That was until I noticed several PHP files from a website theme that also failed the MD5 check. When I compared them side by side in Notepad++, I noticed straight away a few pieces of code injected into the header that clearly could not have been caused by any form of data corruption. I knew for sure that neither I nor anyone else would have made these changes as the theme files were from a former website CMS package, so I then tried finding out what was modifying these files.

To check if OneDrive for Business was the culprit, I created a handful of mostly empty files of different types I frequently use and handwrote a simple PHP file and HTML file in Notepad++, so any modifications would clearly stand out. I then used MD5summer to create MD5 hashes and then placed these files in a folder for OneDrive for Business to sync. A few hours later, I booted my laptop which also has OneDrive for Business installed and a moment later, this folder appeared. I then ran MD5summer and this is what I got:

The following highlighted in red is what OneDrive for Business injected into the HTML file:

While ‘uuid’ stands for Universally unique identifier, this code “C2F41010-65B3-11d1-A29F-00AA00C14882” remains the same in every PHP and HTML file it modified, including with other users. Even though this modification does not make the file traceable, this is obviously going to be a nuisance for web developers who use OneDrive for Business to sync web files with each other, especially handwritten files where they don’t expect extra code to be added.

As for Word, Excel and Publisher files (‘docx’, ‘xlsx’ and ‘pub’ file extensions), these grew by about 8KB. Unlike the web files, these Microsoft Office files had what appears to be uniquely identifiable code added, potentially making it possible to match them to a company and possibly even to a specific user’s account. To get an idea of what was added, I used 7-Zip to extract the content of the Word file before and after syncing. There were two ‘.rels’ files and one XML file modified and three folders with files added – ‘customXml’ containing 6 XML files, a folder ‘_rels’ inside this containing three ‘.rels’ files and a ‘[trash]’ folder containing a ‘0000.dat’ file. In the ‘docProps’ folder, a file ‘custom.xml’ contains a property with a ‘ContentTypeId’ name attribute with a unique ID.

When I used 7-zip to look inside the two Microsoft Publisher files, the synced Publisher file had a ‘MsoDataStore’ folder added in it, inside which contains 3 folders with gibberish names and 2 XML files inside each. I found the same ContentTypeID code inside as the Word file and while it matched, it was different to that in files I compared with other users.

Even though OneDrive for Business modified these files, it left the ‘Date Modified’ attribute in every file unchanged, so to an unsuspecting user who just checks when the files were modified, they appear untouched. For example, the Word file shows a modified time of ’16:14:14’ for both the original and synced file, even though the file sizes are clearly different. The only files that remain untouched are those that were placed in the synced folder on the original computer, so even if a user checks the files they place in a synced folder, they would not know anything is being modified unless they physically took those files to another computer with the matching synced folder to compare them.

So what this means is that people who use OneDrive for Business or SharePoint need to be very careful with what they sync with it, especially those handling third party data due to confidentiality issues. For example, if an employee needs to transfer confidential files that absolutely must not be touched between its laptop and PC and decides to do so through a synced folder in OneDrive for Business, those files will end up being inadvertently modified without the user’s knowledge. This could have severe consequences if let’s say a file is used as evidence in a court case. How do you prove that the company did not intentionally modify it?

Based on Myce testing, we found that the consumer version of OneDrive (formerly SkyDrive) does not appear to any modify files, whether synced with the desktop product or through the web interface. We also tested BitTorrent Sync and found that it does not modify any files either, even when testing a 1GB folder with a wide range of file types.

Their media player (the last time I used it) also modified .mp3 files. It meant my files were showing as corrupt after playing them because I kept md5 info on them.
I think it was updating, or clearing some **** data but couldn't see anything. I found nothing to prevent this, other than making the files read only before playing.

Looks like businesses will be forced to only use services which allow them to overlay their own encryption, with locally stored keys, and strong legal protection to guarantee that their data will never leave the EU.

And if your business is required by law to keep verifiable records... You could end up in serious trouble.

Welcome to the world of everything-as-a-service computing. This is just the begining.

OneDrive for Business _is_ SharePoint - it's the new new name for SharePoint Workspace to be more precise. Syncing with a SharePoint server in the MS cloud. It is thus a completely different thing from OneDrive (consumer) and why MS marketing chose to confuse it this way I guess we'll never understand.

SharePoint is a DMS not a filesystem, and it syncs document management metadata from, and _to_, the documents (depending on file format).

In container file formats this is done in metadata, not content, sections - your document content is not touched (try office documents with digital signatures - the signature will remain valid, because it validates the content areas of the file format, not metadata). In some file formats it will add comments in a way that does not affect the content (as above).

That is all this is, SharePoint on-premise will do exactly the same thing, and it is a documented SharePoint feature for at least a decade - see e.g. http://weblogs.asp.net/bsimser/archive/2004/11/22/267846.aspx

It's certainly no clickbait. From your post you can at least conclude it's bad marketing from Microsoft as the new name certainly gives different expectations. Also adding meta data to a file that renders it unusable is not really a great 'feature'.

What I find particularly concerning is that it also modifes the contents of password-protected files.
The fact that a password-protected ".xlsx" file is NOT a zip file suggests that perhaps the whole thing is encrypted, not just key files within the ZIP. If so (and I *do* hope I am wrong) it suggests that there could be a password-free back-door into the encrypted files that SharePoint uses.

This same problem has started occurring this past week on OneDrive for consumers. It makes me wonder if Microsoft has migrated that product to the same platform/technology as OneDrive for Business. The support forums for OneDrive on Microsoft are now teeming with people try to figure out why everything is broken for their Office files synced by OneDrive.

I'm very surprised to hear they started doing this with consumer files, particularly since online backup seems to be one of their main selling points for their Office 365 Home Personal/Premium subscription service. Modifying files being "backed up" is not really a back up, since modified files are no longer considered originals.

The problem seems to have started sometime around 8/27. The problem manifests itself when creating a file on one machine and then having it synced to another. If one goes to OneDrive on the web, you can download the file just fine, but any office file (particularly Excel files) synced automatically is reported as corrupted.

I'm wondering if perhaps this only affects those with the 1 TB version of OneDrive for consumers, but have no way to test that out.

From my testing, syncing files between PCs in OneDrive (consumer version) appears to be fine with over 10 file types I tried, i.e. between two Windows 7 PCs and between two Windows 8.1 PCs. It meant I had to convert my Windows 8.1 local account into a Microsoft account, as SkyDrive for Windows 8 does not work in a local account.

However, I was able to replicate the reported corruption bug. This is different to the story I reported here where OneDrive for Business adds metadata to each file as this time it doesn't seem to be a metadata issue, i.e. when the file goes corrupt, it simply cannot open.

Edit: I was able to replicate this in Excel only, so I posted an article with a video recording to demonstrate it.

Great work on this Sean. Some new things have surfaced in users testing this over on the Microsoft site. Several people have claimed that when the Excel file syncs there is a brief moment when the full file appears to be present on the secondary machine, then the file size drops dramatically and the file is reported as corrupted. I can't replicate that, but my PCs and Internet connection are pretty fast.

If one elects to password protect the file, everything works just as it should. No file corruption.

As you noted, this issue at least seems Windows 8/8.1 specific. Part of me wonders if this really is a OneDrive problem or if OneDrive is actually a "victim" of some other Windows service.

I'm not suggesting this is specifically what the problem is, but it almost seems like something that might happen if a virus scanner or similar service on a Windows 8/8.1 machine examines Office files and somehow corrupts the Excel files in the process. Perhaps something about the way that OneDrive puts files in the file system triggers some kind of check that runs amuck.

My reason for speculating the above is, as you have noted, the files that are actually on the OneDrive cloud are not corrupted. If they are explicitly downloaded to the user's machine, they work just fine. It is only placement through the automatic file sync process that causes this problem.

I also seem to be a victim of the Windows 8.1 update bug. I don't think I rebooted my laptop since the last Windows Update process did so. When I rebooted yesterday evening, it got stuck in a BSOD loop for a few iterations and then went through an automatic system restore. This in turn caused Office 2013 to require an online repair, which took most of the evening due to my limited DSL speed.

One test I'd like to try is boot my desktop into Windows 8.1 and modify an Excel on my OneDrive folder to see if that also results in the Excel file becoming corrupt on my laptop. I'll also try various other tests, such as temporarily disabling MalwareBytes and Windows Defender (Windows 8's built-in Antivirus), doing a fresh sync, etc.

At work, we've already stopped using OneDrive for Business over a month ago. My work colleagues were having files going corrupt (not just recently) as well as complete crashes of OneDrive for Business (i.e. require the folder structure & cache to be removed and a fresh resync.) We're now using Google Drive and so far haven't had a single issue with it and it's also far less resource intensive than OneDrive for Business was. As for the personal version of OneDrive, I just use mine for testing only. I mainly use BitTorrent sync (this only syncs between devices, not to any online storage) and Dropbox (my phone came with 25GB for 2 years.)

Just to update on this discussion, the OneDrive for consumers file corruption issue has been reported fixed.

I have also ran PC to PC sync test with Google Drive and Dropbox and neither of these modify files as they sync, at least across 13 file types I tested with.

For curiosity sake, I did check what happens if I open Word and Excel files Online in the OneDrive consumer version to see if that causes any file modifications to be made and indeed it does, at least with Excel files.

Word Online: Opening a file in the online viewer appears to have no effect. Opening the file in the online editor does cause the file to be modified without typing a single keystroke. So don't open Word files in the online editor unless you make a backup.

Excel Online: Excel does not appear to offer a viewer mode online, i.e. opening an Excel file online without typing a single keystroke will cause the Excel file to be modified and thus modify locally stored versions!

So although the OneDrive consumer does not appear to modify files synced between PCs, it can modify certain file types that are opened online (Excel in particular) without the user making a single modification.