How to avoid data loss in Windows Server 2016

Microsoft just released its final version of Windows Server 2016 a few weeks ago. It comes in three flavours – Windows Server 2016 Standard for physical (or minimal virtualised) data center, Windows Server 2016 Datacenter for extreme virtualised data center and cloud environments and Windows Server 2016 Essentials for small businesses with a maximum of 25 users and 50 devices. Additionally, two Windows Storage Server 2016 Editions – Workgroup and Standard – were also released, both of which are only available when bundled with manufacturer hardware.

Microsoft claims its new release is the basis for the future and many experts say that it has several advantages that organisations could consider when changing to the new system. For example in Windows Server 2016 new security layers are implemented that mean threats to infrastructure are better recognised and defended. More flexibility and stability is promised with the provision of virtual environment based on HyperV. With a new network stack, Windows Server 2016 provides fully integrated basic networking functionality as well as the SDN Architecture of Microsoft Azure. Additionally, the new server OS is based on the concept of Software Defined Storage (SDS) which allows for additional memory and data storage to easily be added within the whole server structure.

Version 2016 offers various tools for dynamic management of computing, networking, storage and security. Last but not least: Windows Server 2016 promises more fault tolerance with the latest version. As Microsoft claims itself: “When hardware fails, just swap it out; the software heals itself, with no complicated management steps”. All sounding good so far.

However, from a disaster recovery perspective, it is important to focus on the new technology which could have an impact on data loss and how successful potential recovery attempts would be. In this case we’ll be analysing the Resilient File System (ReFS) in its new version 3 as well as the concept of Storage Spaces Direct – the successor of Storage Spaces first introduced in Windows Server 2012. There’s lots of technical goodness to cover in this article, but let´s begin with ReFS.

ReFS: Improvement or a challenge?

With the new version of Microsoft´s ReFS version 3 implemented in Windows Server 2016, a new challenge is presented both for users and data recovery professionals: the number of experts around the world who have the knowledge to recover lost data from this new file system is scarce. ReFS is a proprietary technology, which means that Microsoft has not disclosed the specifications and in the case of data recovery a lot of reverse engineering is required to analyse the file system and create proper tools to retrieve lost data.

Remember: ReFS is a file system to store data on, not to run the OS on. It was therefore designed for use in systems with large data sets, thereby providing efficient scalability and availability in comparison with NTFS. Data integrity was one of the main new features added, allowing for business critical data to be protected from commons errors that can cause data loss. If a system error occurs, ReFS can recover from the error without risk of data loss and also without affecting the volume availability. Media degradation is also another issue that was addressed to prevent data loss when a disk wears out.

One of the main benefits of using Windows Server 2016 with ReFS is that the system automatically creates checksums for the metadata stored on a volume. Any mismatch of the checksum results in an automatic repair of the metadata. But what makes it really an awesome feature, is that user data can also be secured against failure by combining them with checksums. If a wrong checksum is found, the file will be repaired. This function is named Integrity Stream. This security feature can be activated for the whole volume, for a specific folder or for individual files.

“When ReFS is used in conjunction with a mirror space or a parity space, detected corruption (both metadata and user data, when integrity streams are enabled) can be automatically repaired using the alternate copy provided by Storage Spaces.”

They also add:

“With ReFS, if corruption occurs, the repair process is both localised to the area of corruption and performed online, requiring no volume downtime. Although rare, if a volume does become corrupted or you choose not to use it with a mirror space or a parity space, ReFS implements salvage, a feature that removes the corrupt data from the namespace on a live volume and ensures that good data is not adversely affected by nonrepairable corrupt data.”

What that means is that Microsoft implemented handy features for the self-healing of corrupt data and files.

Challenges with restoring data

As we have pointed out in our detailed article on ReFS already, the structure it uses works like a database so it is completely different from an NTFS recovery that uses a flat table of metadata. To find any lost data, recovery experts have to traverse ReFS like a database, opening tables which contain another set of tables, and so on…

Here’s another problem that can be quite challenging when it comes to data recovery with the new ReFS file and volume sizes; a single file on a volume can become 16 exabytes (that’s 16 million Terabytes!) and a ReFS volume can become as large as one Yottabyte (yes, that is one trillion terabytes). When you consider this, it becomes quite clear that this enormous storage space is also the danger of this technology when it comes to getting back any lost data. Imagine the time and complexity involved with restoring a volume of that size!

Now we’ve explained the impact of ReFS on data structures in Windows Server and data recovery, we will focus on the second important technological development in this new OS – Storage Spaces Direct.

Microsoft´s answer to VMWare vSAN

Storage Spaces Direct is the evolution of Microsoft’s Storage Spaces technology which it introduced in its last server OS – Windows Server 2012. In short, Storage Spaces means that all data is stored in a virtual storage pool which is based on physical hard disks or SSDs. An advantage of this concept is that one can easily add new disks to the pool without any huge adjustments or problems.

With Storage Spaces Direct (which is only available in the Windows Server 2016 Datacenter edition) the concept is expanded: it is now possible to combine local volumes of different servers in a cluster to one unified pool (instead of just all physical drives of one server into a data pool). Now it is possible (in contrast to the old version, which was only capable of using SAS-JBODs) to use different types of drives including SATA, SAS and SSD. All of these, except for disks who are attached using ‘multipath’, can be used now to create one single storage pool. Additionally Microsoft integrates the Software Storage Bus, which (to keep it simple) replaces the old Fibre Channel Cables, integrates networking between several servers and with it establishes a software-defined storage fabric where all servers can see all connected drives.

The redundancy concept of Storage Spaces is similar to the concept of RAID, but the technology used is different and software-based. During setup one can choose between a simple, mirror or parity storage arrangement. Simple stands for no redundancy, whereas Mirror simply mirrors the disk(s). When two disks are available, one disk can fail. If three are available, two can fail. Finally, Parity is based on RAID5 and needs three disks in one Storage Pool to prevent data loss.

Additionally with this new technology implemented in Windows Server 2016, it is possible to implement both a converged storage solution (storage and compute in separate clusters) and a hyper-converged solution (one cluster both for compute and storage). The main difference between VMware´s vSAN and Windows Server 2016 is that Microsoft also allows its users to create its own storage tier, because Storage Spaces Direct is not welded with the Hypervisor.

Whether one deploys a converged or hyper-converged solution with Windows Server 2016, Microsoft claims that with its new enhanced version of Storage Spaces the data stored inside the pool is even more secure than ever before:

Conclusion

Although Microsoft claims that Windows Server 2016 is the basis for the future of server technology and implemented a lot of interesting and useful (but complex) features, this can also pose a threat if you end up losing data. With the combination of two advanced technologies like ReFS and Storage Spaces Direct, which in reality is Microsoft´s (hyper-) converged storage and software-based solution, based on HyperV virtualisation and virtual machines. This makes recovering lost data a more complex and challenging task.

Just think, that in a data recovery scenario for some reason many problems come together; several disk failures at the same time (not so unlikely, because they were bought and built-in at the same time), a power outage happens, your Uninterruptible Power Supply (UPS) fails for some reason, therefore the system wasn’t run down properly and this can cause the whole system to crash with severe data loss as a result. Problems like this are uncommon, but do happen!

Even though professional data recovery specialists like Kroll Ontrack already have successfully recovered data from multilayered storage systems or servers like VMware vSAN or HP EVA 6000 and have developed tools to rebuild both the system as well as the data structures, it is a more challenging, time-consuming and costly project.

So when you are starting to use Windows Server 2016 try to keep your infrastructure design as simple as possible and avoid using too many features together, even when they are available (e.g. deduplication, virtualisation and data compression all in one system). Most importantly, document everything for future reference in case of an emergency; whether its data loss, failure of the whole system or both – you’ll thank yourself for doing it!

Have you already upgraded to Windows Server 2016? What has been your experience so far? Let us know by commenting below, or you can tweet @DrDataRecovery