Virtual Tape: Intelligent Storage Management

Posted on April 01, 1998

Faster data access rates, dramatic cost savings, fewer tape cartridges and libraries, and more available floor space are just a few of the potential benefits of implementing virtual tape systems in large data centers. (For an introduction to this emerging technology, see "The Virtues of Virtual Tape," InfoStor, December 1997, p. 38.)

These benefits are fully realized when 128-track tape technology is used, rather than older 36-track 3490 cartridges. The new technology provides 30GB of compressed capacity, with a 9MBps transfer rate. Also, according to a study of the average volume size of commercial data stored on 36-track cartridges, 128-track virtual tape technology cuts the total cartridge count by as much as 50:1. Therefore, a 128-track virtual tape system can store the same amount of data as a 100,000-cartridge 3490 system on as few as 2,000 cartridges. Fewer cartridges mean less floor space is needed to store the same amount of data.

Fewer cartridges can also result in dramatic savings in automated libraries. Today, 80% of the cartridges in tape libraries are rarely or never used, which means tens of libraries are needed to service 100,000 cartridges. The cost of these libraries makes automating 100% of the archived data extremely impractical. Instead, most users only automate frequently used cartridges (about 20% of all cartridges). When an off-line tape volume is needed, human intervention is required. Finding and mounting that off-line volume can be an extremely time-consuming task. If 128-track technology is used, however, the entire archive can be reduced to as few as 2,000 cartridges, which is well within the capacity of a single library.

Not only can virtual tape lower labor and equipment costs, it can also reduce the total cost of owning tape technology. "Virtual tape puts tape back on a cost-per-megabyte improvement curve similar to what we`ve experienced with disk in the last five years. Disk prices have improved close to 40% a year since 1994. We see virtual tape delivering similar price improvements," predicts McArthur.

At first glance, virtual tape looks like a simple data-archiving solution: Connect a DASD-based cache storage unit to a tape subsystem, get the two to share backup and data retrieval duties, and reap the combined benefits of disk performance, tape capacity, and cost. That said, why has virtual tape been such a long time coming?

In reality, the virtual tape concept presents several significant engineering challenges. The promise of virtual tape goes beyond its functionality as a cost-effective data warehouse. Its true value lies in its ability to intelligently manage data from the host to disk to tape and back again--and to minimize human involvement in that data management. In some environments, this may require an intelligent outboard solution that can handle errors, failures, and other problems without human intervention. Such independence can also speed access times and simplify the task of supporting multiple heterogeneous platforms.

A host-based approach, on the other hand, presents an image of the virtual tape to the host, but the actual process of re-migrating images and stacking them onto physical tapes is also handled by the host. When the data is created, it is first written to an outboard disk-storage buffer. When the disk buffer is full, the data is transferred back to the host and then written to tape. Therefore, the host handles each data transfer three times: once when it is created, once when it is brought back to the host, and once when it is moved to tape.

But the use of host cycles to move the data over these interfaces is an inefficient use of host resources and can potentially degrade overall system performance. An outboard architecture frees the host processor to perform other tasks while the virtual tape system handles the menial tasks of deciding where data should reside in the storage hierarchy and at what time. The outboard architecture also reduces the cost per MIP since data is written only once and is immediately moved to the virtual tape system for further processing.

This issue will become even more important as virtual tape technology advances. Functions like dual copying and removing data from virtual tape systems for disaster recovery rely on a technology`s ability to make intelligent choices about aggregating data. By allowing the virtual tape system to manage data as virtual volumes (the volumes are placed in the disk buffer and are managed as cache), batch times can be lowered significantly. No physical motion is required to mount or move a physical tape. Once the data is "virtualized" on disk, most tape motion commands become disk seeks, which occur in milliseconds rather than in seconds or minutes. Virtual tape systems intelligently manage the disk buffer so that it stores data volumes that users are likely to access or are going to access with a certain pattern or frequency.

In some respects, managing a hierarchical storage system is relative easy as long as the system is functioning smoothly. But what happens when things go wrong? For example, if a physical tape drive fails, automatic error recovery is essential. A virtual tape system must be able to keep track of data so that it can be recovered automatically after a failure. Database corruption problems, media failures, and drive errors must be handled without human intervention. This means the key elements that hold the data must be robust and reliable, which requires adequate error recovery and data integrity.

Virtual tape is a rapidly evolving technology, and users can expect to see major changes and improvements over the next several years. One trend is toward larger cache sizes, though 90% of data migrated to tape is typically never read again (only 10% needs to be buffered in DASD cache). Thus, while increasing cache size may be beneficial, it is equally important to ensure the efficient use of existing cache. This can be accomplished through compression techniques. Greater cost savings can result from a combination of compression algorithms and hierarchical management to intelligently determine what to keep in cache.

Virtual tape is much more than a tape-storage solution. It is a fundamental data-moving architecture behind an enterprise`s archives and daily operations. Based on a set of rules that users establish, virtual tape subsystems can help IT managers determine the lowest cost-per-megabyte location to store data. The net result: greater efficiency, increased productivity, and significant cost savings.