VTL primer and predictions

Posted on November 13, 2009

Upcoming features include support for alternate media, clustered VTL engines, content search capabilities, and support for protocols such as NFS and CIFS.

By Thomas Rivera and Jason Iehl

-- Virtual tape libraries (VTLs) can play a key role in improving service levels by leveraging the advantages of backing up to disk-based technology with established data protection products and practices. This article explains the current state of VTL technology, and explores where the technology is going in the future.

A VTL presents a storage component system as one or more tape libraries with a number of tape drives and tape cartridges, for use with existing backup software. The storage systems are typically configured with SATA drives, but some vendors integrate Fibre Channel or SAS drives. There are a number of basic advantages to adding a VTL instead of straight disk-based backup into a backup environment over (or in front of) a physical tape library, including:

Faster backup and restore performance (compared to physical tape). The performance gains are not only achieved with disk drives, but there are a number of other factors that come into play. One factor is the amount of time it takes to move cartridges from tape slots to tape drives, as well as the time it takes for the tape cartridge to become 'ready' once it is inserted into the tape drive. In the case of a VTL, all of these "physical" processes are virtualized and happen almost instantaneously.

Integration into an existing backup environment, without changing current backup processes. This has been the main driver for the popularity of VTLs. Most environments use physical tape today. Considering that a VTL 'looks' and acts the same as a physical tape library, it is very easy to add it into an already existing backup environment. Once the VTL is attached to the network, and appropriately zoned, the rest of the processes are the same, and the same tape drivers, bar codes schemes, etc., can be used and the respective backup policies can be pointed at the new library.

Allows easy and cost effective storage scaling to meet increasing data storage requirements, and to reduce tape media handling. Scalability is an issue in most backup environments; where physical tape drives need room for more tape slots and tape drives. This usually means expanding existing tape libraries or totally replacing them. In the straight disk-based backup world, adding storage tends to add complexity due to having to (manually) deal with file system configuration and additional space utilization issues (unless the storage is virtualized). With a VTL, running out of space leads to simply adding more disks, configuring additional cartridges and/or tape drives, updating the inventory in the backup software, and resuming backups.

Increases backup and restore reliability, as the disk drives are RAID protected. Physical tape cartridges and drives are prone to failure (detached tape leaders, mechanical failures, etc.), and in larger backup environments it is not uncommon for these types of failures to impact the ability to meet the backup window. For example, any time a physical tape drive goes "down," this has a direct impact on the amount of data that can be backed up within a specific period of time, thereby increasing the backup window. With a VTL, since the tape cartridges and tape drives are virtual, there are no failure issues that are normally associated with physical tape cartridges and drives.

There are other advanced features currently available on some VTLs, such as compression, data deduplication, physical tape integration, replication, encryption and data shredding.

Compression

The compression feature usually mimics the compression that is achieved on physical tape, since most VTL products use the same "lossless" compression algorithm (Lempel-Ziv) that is commonly provided with physical tape drives.

Compression can be software-based or hardware-based. Hardware-based compression offloads the compression functions to a dedicated chip, which enables improved performance by reducing the load on the VTL server (also known as a "head" or "engine").

Deduplication

Data deduplication plays a vital role in helping manage today's prolific data growth. Deduplication is the replacement of multiple copies of data – at variable levels of granularity – with references to a shared copy in order to save storage space and/or bandwidth.

The actual deduplication ratios achieved will vary, depending upon multiple factors, including the deduplication algorithms used and the redundancy of the data from backup to backup. In most cases, data deduplication is combined with compression. Regardless of the implementation specifics, deduplication can dramatically reduce the physical disk space required to store the data, which in turn also provides other benefits such as lower power and cooling requirements.

Physical tape integration

Some VTLs offer the ability to virtualize the physical library to the backup software, which provides a number of benefits. One benefit is to offload the backup environment when tapes need to be created for offsite or for longer term retention needs, with the VTL handling the process of sending the data to the physical tape – without having to traverse through the backup server(s). Another benefit is that this combined process of writing backup data to the VTL and writing backup data to the physical tapes will in many cases, take less time to complete. Depending on the backup application being used, there may be additional benefits with this level of integration, such as having different expiration dates for the virtual versus the physical tape cartridges.

Remote replication

Some VTLs offer the ability to install two VTL systems in two different locations, and to replicate the backup data via IP over a WAN. This provides the ability to move or copy backup cartridges to a remote site, which can help with disaster recovery (DR) planning.

From a DR perspective, the remote VTL can be pointed to by the backup application, and then the backup data can be restored from it. In addition, some VTLs allow data to be deduplicated prior to the data being replicated to the remote VTL. This reduces the time and/or bandwidth required to replicate the data to the remote site.

Encryption

Since data security is becoming a more important issue for organizations, encryption is becoming more prevalent in VTLs. Encryption can be integrated directly into the VTL, or can be provided as a "black box" interface before or after the backup data arrives at the VTL. The encryption's key management functions are either part of the "black box" solution or integrated into the VTL, or are provided by an external host system. Also, once encrypted data is on the VTL, it can be sent to physical tape. Note: Some physical tape drives support encryption on the drive itself.

Data shredding

Data shredding is a process for deleting data that is intended to make the data unrecoverable. However, while data shredding is apparently simple, many organizations may want to ensure that they follow proper standards. Until recently, the National Industry Security Program (NISP) Operating Manual (DoD 5220.22-M) gave U.S. governmental guidelines for "media sanitization," which is the public-sector term for "data destruction." However, the new publication from the National Institute of Standards and Technology, "Guidelines for Media Sanitization" (NIST Special Publication 800-88) lists the recommendations that government agencies should follow. While private organizations are not required to follow these guidelines, the recommendations are logical and straightforward. More VTL products are integrating data shredding (as per the NIST spec), as the US government and other regulatory agencies are requiring it.

Where VTLs are heading

This section will cover some features that are being considered for VTL products in the future, including alternate media support (e.g., WORM, flash), clustered VTL engines, content search capability (outside of backup applications), and alternate protocol support (e.g., NFS and CIFS).

Alternate media support

Alternate media support (such as WORM-compliant media) and in the future the use of flash storage, are two viable alternate media types for VTL solutions. Flash storage is a compelling technology since it involves no moving parts, and as prices come down this can be a much more reliable alternative to today's disk drive technology.

Using a VTL to front-end these different technologies makes it much easier to implement into a backup environment for the reasons outlined earlier. Also, different media types can be split up with separate barcode labels under separate tape pools. An example of this would be one pool of cartridges using SATA drives with a particular performance metric, and another tape pool using flash or WORM with completely different characteristics. Existing policies can be assigned to use one or the other as needed.

Clustered VTL engines

Another feature that VTL vendors are moving to is a "clustered" architecture, which allows multiple VTL servers/controllers ("heads" or "engines") to be clustered together for both high availability as well as for scalable performance. A few VTL products on the market today are already deploying this technology.

Content search

Once the backup data is on the VTL, one of the capabilities being considered is to allow searching of that data, for possible extraction (restore) to alternate media such as DVD, CD, tape, etc. One area where this would be helpful is for legal discovery requests of specific data in cases of litigation. This may be tied into existing backup applications (maybe via an API) or provided as a separate VTL function.

Alternate protocol support

Some VTL vendors already support alternate protocols, such as NFS and CIFS. This allows data to be backed up by presenting the data as a mount point(s) instead of presenting the data through a backup application and writing to the VTL via tape emulation. If the implementation supports both VTL and alternate protocols such as NFS/CIFS, for example, then deduplication can occur across all the data. The overhead for presenting data to the VTL as mount points through protocols such as NFS and CIFS can be quite high, but this approach seems to be one that many users are interested in.

Conclusion

As many thousands of VTLs have been deployed globally, it is clear that VTLs are a popular solutions to some of today's backup issues. The VTL advantages of faster backups and restores, as well as easy integration and setup with existing environments are proving worthwhile for many businesses. In the future, VTLs will go beyond just tape emulation and to provide additional value over physical tape infrastructures. However, for most companies, finding the correct balance between VTL and physical tape – to meet their respective data protection needs – will yield the best overall benefits over time.

Thomas Rivera and Jason Iehl are members of the Storage Networking Industry Association (SNIA) Data Management Forum (DMF) Data Protection Initiative. Rivera is also a data protection solutions architect at Sepaton, and Iehl is a senior technology associate at NetApp.

The SNIA Data Management Forum (www.snia.org/forums/dmf) is a cooperative initiative of IT professionals, vendors, integrators and service providers working together to conduct market education, develop best practices, and promote standardization activities. Areas of focus include the technologies and services that support information lifecycle management, data protection, and information retention and preservation.