Virtualizing storage for scale, resiliency, and efficiency

In this post, we are going to dive into a feature in the Windows 8 Developer Preview. Storage Spaces are going to dramatically improve how you manage large volumes of storage at home (and work). We’ve all tried the gamut of storage solutions—from JBOD arrays, to RAID boxes, or NAS boxes. Many of us have been using Windows Home Server Drive Extender and have been hoping for an approach architected more closely as part of NTFS and integrated with Windows more directly. In building the Windows 8 storage improvements, we set out to do just that and developed Storage Spaces. Of course, the existing solutions you already use will continue to work fine in Windows 8, but we think you will appreciate this new feature and the flexible architecture. As we talk all about consumer electronics next week, thinking about all the media we all have in photos (especially huge digital negatives) and videos, this feature is sure to come in handy. In this post, Rajeev Nagar, a group program manager on our Storage and File System team, details this new feature.

In previous posts we’ve seen folks jump to try to identify edge cases or debug the designs. We’re trying an FAQ approach at the end of this post to see if we can focus the dialog a bit :-) The FAQ also talks about the numerous opportunities to use PowerShell as a management tool for Storage Spaces.

--Steven

By my own admission, I am a digital packrat. My data collection continues to expand and includes some of my most precious memories, including irreplaceable photos and home videos of my children since their birth. For quite some time now, I have sought a dependable, expandable, and easy to use solution that maximizes utilization of my ever-growing collection of USB drives. Further, I want guarantees that my data will always be protected despite the occasional hardware failure.

Windows 8 provides a new capability called Storage Spaces enabling just that. In a nutshell, Storage Spaces allow:

Organization of physical disks into storage pools, which can be easily expanded by simply adding disks. These disks can be connected either through USB, SATA (Serial ATA), or SAS (Serial Attached SCSI). A storage pool can be composed of heterogeneous physical disks – different sized physical disks accessible via different storage interconnects.

Usage of virtual disks (also known as spaces), which behave just like physical disks for all purposes. However, spaces also have powerful new capabilities associated with them such as thin provisioning (more about that later), as well as resiliency to failures of underlying physical media.

Before we start exploring Storage Spaces in more detail, I will digress briefly to give you a little more context: some of us have used (or are still using), the Windows Home Server Drive Extender technology which was deprecated. Storage Spaces is not intended to be a feature-by-feature replacement for that specialized solution, but it does deliver on many of its core requirements. It is also a fundamental enhancement to the Windows storage platform, which starts with NTFS. Storage Spaces delivers on diverse requirements that can span deployments ranging from a single PC in the home, up to a very large-scale enterprise datacenter.

Pools and spaces

The figure below illustrates the concept of a storage pool. As you can see, we have taken a pair of 2TB (note we use byte measurements as you see in marketing) USB disks and “pooled them” (logically speaking) for subsequent usage.

From this storage pool, we are free to create one or multiple spaces. Note that once physical disks have been added to a pool, they are no longer directly usable by the rest of Windows – they have been virtualized, that is, dedicated to the pool in their entirety. And although we call this “virtualized,” the storage and reliability provided is very real. The available storage capacity can be utilized though creation of spaces from this pool. In the illustration below, we have carved out one such space from the “My Home Storage” pool.

This virtual disk is usable just like a regular physical disk – you can partition it, format it, and start copying data to it. You will notice, however, that the space has a couple of interesting properties:

Its logical capacity is listed as 10TB although the underlying physical disks in the pool have only 4TB of total raw capacity. As a result, you no longer need to worry up-front about the size.

Resiliency is built in by associating the mirrored attribute, which means that there are at least two copies of all data contained within the space on at least two different physical disks. Because the space is mirrored, it will continue to work even if one of the physical disks within the pool fails.

The magic that allows us to create a 10TB mirrored space on 4TB of total raw capacity is called thin provisioning. Thin provisioning ensures that actual capacity is reserved for the space only when you decide to use it, for example, when you copy some files to the volume on the space. Previously allocated physical capacity can be reclaimed safely whenever files are deleted, or whenever an application decides that such capacity is no longer needed. This reclaimed capacity is subsequently available for usage by either the same space, or by some other space that is carved out from the same pool. We achieve all of this through architected cooperation between the underlying file-system (NTFS) and Storage Spaces.

With thin provisioning, you can augment physical capacity within the pool on an as-needed basis. As you copy more files and approach the limit of available physical capacity within the pool, Storage Spaces will pop up a notification telling you that you need to add more capacity. You can do so very simply by purchasing additional disks and adding them to your existing pool.

As you see in the illustration above, we have expanded the raw capacity of the “My Home Storage” pool by purchasing and adding four 3TB disks – of course, you could just as well connect SATA and/or SAS storage in conjunction with USB-connected physical disks, and, grow your pool capacity that way. Once we have added this physical capacity, we don’t need to do anything more to consume it. We can simply keep copying files or other data to the space within the pool and this space will automatically grow to utilize all available capacity within the containing pool, subject to its maximum logical size of 10TB. If needed, you can certainly also increase the maximum logical size of a space.

You do not need to explicitly inform Storage Spaces which of your USB disks should be used for each of the spaces you have created. Behind the scenes, Storage Spaces optimally manages the capacity of each of the physical disks within the storage pool, for all the spaces carved out from the pool.

Another core (also optional) capability associated with a space is resiliency to failure of the physical disks comprising the storage pool. For example, the space we’ve illustrated above is a mirrored space (in other words, it has the mirrored resiliency attribute associated with it). This mirrored setting ensures that we always store at least two (and optionally three) complete copies of data on different physical disks within the pool. This way, despite partial or complete disk failure, you’ll never need to worry about loss of data. As a matter of fact, the physical disks comprising the pool are typically not even visible to other components within Windows or to applications running on your PC. By extension, the fact that some physical disks within the pool have failed, is completely shielded from other Windows components or applications. They continue to operate on the space, completely oblivious to the fact that Storage Spaces is working quietly in the background to maintain data availability. Additionally, upon disk failure, Storage Spaces automatically regenerates data copies for all affected spaces as long as sufficient alternate physical disks are available within the pool.

Resiliency through mirroring

It might be interesting to more closely examine how your data is mirrored on different disks. The illustration below shows how a (two-copy) mirrored space is constructed from a two-disk pool:

In this case, Storage Spaces has allocated physical capacity for the mirrored space in what we call “slabs”, which are multiples of 256MB. Also, for this particular example, half of each slab is mirrored on 2 separate disks. Even if one of the two disks fails, Storage Spaces can continue to deliver your data because at least one copy exists on a non-failed physical disk. When multiple disks are available, Storage Spaces spreads slabs across suitable disks as shown in the six-disk pool below:

When a pool disk fails, Storage Spaces identifies the impacted slabs for all spaces utilizing the failed disk, and reallocates them to any available hot-spare disk or to any other suitable disk within the pool (hot-spares are reserved disks within the pool, only to be used as automatic replacements for failed disks). This self-healing is done automatically and transparently so as to minimize the need for manual intervention. We’ve also optimized for speed to prevent data loss from multiple hardware failures at the same time.

Resiliency through parity

There’s another resiliency attribute, called parity, which directs Storage Spaces to store some redundancy information alongside user data contained within the space, thereby enabling automatic data reconstruction in the event of physical disk failure. While conceptually similar to mirroring, parity-based resiliency utilizes capacity more efficiently than mirrored spaces do, but with higher random I/O overhead. Parity spaces are well suited for storing data such as large home videos, which have large capacity requirements, large sequential (predominantly append) write requests, and an infrequent-to-minimal need to update existing content.

Akin to mirrored spaces, slabs for parity spaces are strewn across available disks (with capacity utilized for parity information) as shown below for a parity space contained within a six-disk pool:

When a disk fails, the parity space recovers equally transparently and automatically as does the mirrored space. For parity spaces, Storage Spaces utilizes the parity information to reconstruct affected slabs for all affected spaces, and then automatically reallocates the slab to utilize any available hot-spare disk or any other suitable disk within the pool (just as it does for mirrored spaces)

The illustration below shows two spaces – one with mirrored resiliency and the other with parity resiliency – carved out from the same pool:

Obviously, both spaces above are thinly provisioned and share the same backing pool (physical disks). Slabs for both spaces are intermingled, and optimally spread over all available physical disks, although each space uses different mechanisms to recover from physical disk failure.

You can access spaces contained within a pool, as long as a simple majority of physical disks comprising the pool are healthy and connected to your PC, a concept called quorum. For example, you will need four of the six disks comprising the My Home Storage pool to be healthy and physically connected to the PC in order to access either the Documents or the Multimedia space. Of course, as previously stated, the resiliency attribute associated with the space determines degree of data availability in the presence of physical disk failure – for example, if the Documents space is three-way mirrored and allowed to use all disks within the pool, you can continue accessing data despite the loss of any two disks.

I’ll explain the virtualization capabilities of Storage Spaces by walking you through a common usage scenario. Imagine that you have just purchased a Windows 8 PC and wish to use this machine as a central repository for much of the digital content in your home or small business. A reasonable setup would involve creation of two resilient spaces – one is a mirrored space for your important documents and the like (these are typically modified more often), and the other is a parity space, for your large multi-media content like home videos and family pictures, which you typically update less often, but view more often. By using the appropriate resiliency scheme, you can optimize for both capacity utilization as well as for best performance.

Logically, your storage configuration would look exactly like the illustration provided above, wherein two spaces with different resiliency attributes are carved out from a single pool. Achieving this is quite simple:

Connect your physical disks to your PC via USB

Create your pool and the two spaces

You can invoke powershell to create the pool and spaces, as well as to complete more advanced tasks. In our example, we have purchased and connected six physical disks to our PC. Below are the simple PowerShell commands to set up our pool and two spaces:

Note that the above commands will only work on the forthcoming Windows 8 Beta and subsequent releases. A preliminary version of Storage Spaces is available in the Windows 8 Developer Preview, but the above PowerShell commands will not work in that build. If you are curious to try out Storage Spaces on the Developer Preview build, you can use the below alternative commands:

Also note that, in the Developer Preview, space sizes were limited to < 2TB. That limitation will be removed in the Beta release. Since the availability of the WDP release, we have also activated many additional features within Storage Spaces.

We now get to take a sneak peak at an alternative easy-to-use tool to configure pools and spaces. Beginning with the forthcoming Windows 8 Beta, you can simply go to Control Panel and walk through the sequence below:

(a) To create our pool and a mirrored space, go to Control Panel, click System and Security, and then Storage Spaces.

Click Create a new pool and storage space.

Select the drives you want to add to the new pool.

Select your resiliency mechanism and other options.

Note that you can assign a drive letter and format the resultant volume as part of creating the space.

(b) To add a couple of disks to an existing pool, select the drives you want to add.

(c) To create an additional parity space, click Create a storage space, and then select Parity from the layout options.

(d) In the event you start running out of capacity, expect a notification like this:

Click the notification to see information about the problem and how to fix it.

That’s all you need to do to start using Storage Spaces. Once the spaces have been created, you can utilize them just like any other “disk.” For example, you can turn on BitLocker for the spaces you have created, as shown below.

There is a lot more to say about the many capabilities of Storage Spaces and how other Windows technologies can also leverage these capabilities – we will continue with this discussion in subsequent write-ups.

I hope you find this new capability intriguing and encourage you to play with it. It will all be available to you as part of the Windows 8 Beta release in addition to the features available in the Developer Preview.

- Rajeev

Storage Spaces FAQ

We know that some of you will still have questions about Storage Spaces, so here is an FAQ that we hope will cover most of them. As we get more questions from you in the Comments, we will try to update this FAQ to be more complete.

Q) I use Windows Home Server with Drive Extender. Is there a tool to help me migrate data from the Drive Extender format to Storage Spaces?

No. You will need to create a pool on a Windows 8 PC with a fresh set of disks. Then, you can simply copy data over from your Drive Extender-based volumes to a space within your pool. The functionality delivered through Storage Spaces is more flexible and better integrated with NTFS, so it will generally be more reliable and useful.

Q) Are Storage Spaces some kind of RAID? If it is, what RAID versions do you implement?

Fundamentally, Storage Spaces virtualizes storage in order to be able to deliver a multitude of capabilities in a cost-effective and easy-to-use manner. Storage Spaces delivers resiliency to physical disk (and other similar) failures by maintaining multiple copies of data. To maximize performance, Storage Spaces always stripes data across multiple physical disks. While the RAID concepts of mirroring and striping are used within Storage Spaces, the implementation is optimized for minimized user complexity, maximized flexibility in physical disk utilization and allocation, and fast recovery from physical disk failures. Given these significant differences in objectives and implementation between Storage Spaces and traditional inflexible RAID implementations, the RAID nomenclature is not used by Storage Spaces.

Q) How does the read performance of a space compare to RAID 0 or RAID 10?

For both mirrored and striped spaces, read performance is very competitive with optimized RAID 0 or RAID 10 implementations.

Q) Can I use a RAID enclosure with Storage Spaces for additional reliability and/or performance? Is that a good idea?

We don’t recommend it. Storage Spaces were designed to work with off-the-shelf commodity disks. This feature delivers easy-to-use resiliency to disk failures, and optimizes concurrent usage of all available disks within the pool. Using a RAID enclosure with Storage Spaces adds complexity and a performance penalty that does not provide any improvement in reliability.

Q) Can I boot from a space?

In Windows 8, you cannot boot from a space. As an alternative, you can continue to use dynamic volumes for booting. At release, we will offer guidance on how you can add appropriately partitioned system/boot disks (with dynamic volumes) to a pool.

Q) What is the minimum number of disks I can use to create a pool? What is the maximum?

You can create a pool with only one disk. However, such a pool cannot contain any resilient spaces (i.e. mirrored or parity spaces). It can only contain a simple space which does not provide resiliency to failures. We do test pools comprising multiple hundreds of disks – such as you might see in a datacenter. There is no architectural limit to the number of disks comprising a pool.

Q)How can I know which physical disk a space is on?

Through PowerShell, you can query the set of physical disks backing a particular space. Since all data is striped across all physical disks backing the space, you have this information.

Q) How will I know when a physical disk fails? How do I replace a failed disk?

If the physical disk is contained within an enclosure that supports the SCSI Enclosure Services protocol, we will activate a red LED (if present) next to the failed physical disk. A standard notification will pop up in the desktop. You can also see information about the failure in the Storage Spaces applet in Control Panel. Here is what that looks like:

Through PowerShell, you can also query disk health to determine if a disk has failed.

Once you’ve detected the failed disk, you can physically disconnect it at any time. Replacing a failed physical disk is easy – after removing the failed disk you simply connect the replacement disk to the PC, and then add the disk to the pool either via PowerShell or via Control Panel .

Q) How do I replace a working drive with a bigger one (or just cycle drives)? Does it require a “rebuild”?

As long as you have created mirrored or parity spaces, you can always simply remove a physical disk within the pool, and add a different (perhaps larger) one. Within a short period of time, the impacted spaces will automatically be resynchronized (the Storage Spaces design optimizes this operation to be faster than traditional RAID rebuilds). You can determine whether all spaces are healthy – i.e. data has been resynchronized so as to maintain the designated number of copies –either via Control Panel or via PowerShell commands.

Q) Can I trigger resynchronization myself?

Yes. If you don’t want to wait for automatic resynchronization to start, you can choose the Repair command via PowerShell, which will initiate resynchronization so long as suitable replacement disks and/or spare capacity is available.

Q) What kind of disks will Storage Spaces work with? Are there any special requirements? What about custom enclosures housing these disks?

You can use Storage Spaces with any physical disk that otherwise works with Windows, connected via USB, SATA, or SAS. If the physical disks are connected via some custom enclosure (e.g. in JBOD configurations), Storage Spaces will utilize the SES protocol (if supported by the enclosure), to identify physical slots where the disks are located. When needed, Storage Spaces will also use SES to light up failure LEDs associated with physical disks (assuming that the enclosure has such LEDs). For Storage Spaces to use enclosure capabilities, the enclosure must conform to the Windows logo requirements. Enclosure vendors have been made aware of these requirements and we expect increasing conformance over time.

If your disks are housed within an enclosure, and if Storage Spaces either does not provide you with slot information associated with the physical disks or does not light up LEDs on the enclosure, you can assume that the enclosure does not conform to Windows logo requirements.

Q) Is there a defrag or CHKDSK equivalent for pools?

No. Storage Spaces optimally utilizes all physical disks. In the event that Storage Spaces metadata on a physical disk becomes corrupt (which will be obvious since the disk health will indicate a problem with the physical disk), you can treat the disk just as you would any other failed disk – simply remove it from the pool. If the physical disk is healthy, you can subsequently re-add it to the pool.

Q) How do I know how many mirrors a given file has?

If your file resides within a NTFS volume on a two-way mirrored space, two copies of all your file data will be maintained. If you configure a three-way mirrored space, there will be three copies.

Q) Can I pick which drive to use for mirrors? For example, if I know a particular disk is faster/better/newer?

Yes. In typical deployments, Storage Spaces will automatically select physical disks from the pool to back your spaces. However, if you so desire, you can manually specify a specific set of physical disks within your pool to back a particular space and thereby control allocation. You can do this via PowerShell options at the time you create the space.

Q) Can I change the maximum size of a space? Are there advantages or disadvantages to just making every space 50TB?

You can increase the logical size of a space at any time via Control Panel or PowerShell. Decreasing the logical size is not supported (or needed), given thin provisioning. It makes no difference whether you specify the initial logical size to be a smaller number (say 1TB) and grow it as needed, or set it to a very large number (say 50TB) right from the beginning. The latter may save you time and effort later.

Q) Can I change the slab size to something other than a multiple of 256MB?

No. The slab size is automatically determined by Storage Spaces based on a multitude of factors to deliver an optimal experience in terms of performance and availability.

Q) Does the pre-defined slab size result in sub-optimal utilization of capacity? For example, what if most of my files are very small? What if they’re all large video files?

The slab size is an internal unit of capacity that we use for provisioning across multiple spaces within the same pool. Its value has no bearing on optimal storage of files, regardless of file size.

Q) Can I move a storage pool from one PC to another, once created? For example, if I have a cage with 6 removable drives?

Yes. Just connect the physical disks comprising the pool to the new PC.

Q) Say I have 3 external enclosures and I remove them one at a time. I then plug them into another Windows 8 PC in reverse order. Will the new PC think I have a broken pool or will it eventually catch up? What if I never plug in one of the enclosures?

You can plug enclosures back in in any order. When Storage Spaces detects a sufficient number of disks for quorum, it activates the pool and contained spaces. You can plug in more enclosures later. If the data on any disks becomes out of sync, Storage Spaces will automatically sync them. Even if you never plug in some enclosures, as long as Storage Spaces detects the minimum number of disks needed, you can continue working with your data. Both via PowerShell and via Control Panel, Storage Spaces informs you that a few physical disks are missing, thereby encouraging you to plug them back in.

Q) You mentioned that quorum for the pool requires a simple majority of healthy and connected physical disks. Does that mean I always need to have an even number of physical disks in the pool? Or do I need an odd number of physical disks? What about two-disk pools?

There is no requirement for an even or odd number of physical disks. Storage Spaces correctly handles two-disk pools and continues delivering resiliency to failures for a two-way mirrored space contained within such a pool, even if one physical disk fails or is disconnected.

Q) What happens when I plug physical disks comprising a pool into a Windows 7 machine?

Windows 7 does not support Storage Spaces and will treat the physical disks just as it would any disk with an unfamiliar partitioning scheme.