Information life-cycle management: Myths and realities

Posted on April 01, 2004

The Enterprise Storage Group tackles what is arguably the storage industry's hottest (and most hyped) buzzword.

By Nancy Marrone-Hurley, Peter A. Gerr, and Steve Kenniston

One year ago, Enterprise Storage Group (ESG) published an article defining information life-cycle management (ILM). The intent of the article was to educate users about ILM, which we defined then as "a combination of technology and methodology that helps users manage data from the moment it is created to the time at which it is no longer needed." This concept has not changed; however, we believe there is a difference between information and data, and the distinction is important when you are discussing information—or data—life-cycle management.

For starters, ILM is a process, not a product. Implementing the process requires a number of steps, from assessment to classification of information to the actual automation of the processes. Implementations will differ for every organization, and there is no one-size-fits-all solution.

The intent of this article is to provide further detail about what ILM really is, and is not, particularly as it relates to the current messaging from the vendor community. We believe that implementing some form of ILM/DLM processes can help organizations improve resource optimization, address corporate governance and compliance issues, increase performance, and reduce costs. Plenty of solutions exist today to help organizations reach those goals; our intent is to help users understand the myths and realities of ILM.

"The Next Big Thing"

Over the course of the past year, ILM has arguably become the most popular buzzword in the storage industry. Offering "ILM" has been the catalyst for storage companies to step outside of their traditional boundaries and start to market their value (and acquire companies) in the application management, compliance, content, and document management markets. The ILM discussion is no longer just about effectively utilizing storage resources and protecting information; instead, it is being positioned as a strategic business practice. Future implementations of ILM are requisite for the realization of delivering storage as a utility and realizing the promise of grid/autonomic computing.

All of the hype around ILM has confused the end-user community. ILM looks complicated and, worse yet, sounds expensive and disruptive to users' existing environments and processes.

Yet the reality is that ILM is a combination of technologies and processes, and most users are doing some form of "ILM" today, even if it is very rudimentary. The value of information will inevitably change over time, and as it becomes less important to the business, it should be treated accordingly. This could mean certain information is placed on lower-cost arrays, sent to tape, archived in a warehouse, and/or eventually destroyed. ILM is the process of first making those value determinations and then setting up protection, movement, and retention policies based on those relative valuations. ILM solutions should then help implement and automate those policies.

ILM vs. DLM: Is there a difference?

To make matters a bit more confusing, a number of vendors believe the term ILM is a misnomer. The argument is that although most vendors paint a vision of information life-cycle management, most of the solutions available today primarily address data life-cycle management (DLM). This may seem like an argument in semantics; however, ESG believes it is an important distinction.

Many line-of-business administrators and application vendors do perceive a difference between information and data. Applications leverage information, whereas storage systems store data. Information is the communication or reception of intelligence, while data is the digital representation of information. By using the term ILM so broadly, storage vendors (and analysts and publications) could be confusing users by blurring the line between information and data, making it more difficult for customers to understand where "ILM" solutions fit in their business processes.

Illustrating the relationship between ILM and DLM and how these processes address business requirements will help users better understand how implementing ILM/DLM will help them move to an automated, on-demand environment.

The long-term goal of the "utility data center" (or autonomic/on-demand/grid computing) is to provide a fully automated IT infrastructure that will be able to provide compute, storage, and network resources to applications on-the-fly. This concept is not some utopian dream; technological innovation is bringing us closer to this realization each day.

However, a great deal of groundwork must be done before the utility concept can become a reality. By effectively implementing and automating ILM processes, users can lay the groundwork for the utility vision.

However, this is not a one-step process. Users should take a layered approach, automating processes at each level of their IT infrastructure (storage, server, and application layers). ILM vision presentations promise solutions that have the ability to automatically understand the relationship between applications and their associated information sets, such that the ILM solution can automatically assign valuations to that information. Once those valuations are set, policies will then be enacted that migrate, protect, retain, and eventually discard that information according to business requirements.

The reality is that there are no solutions today that can understand the application/information relationship and automatically set valuations that determine where the data should reside. Currently, this is primarily a manual process aided by reporting software. Today, most of the solutions that are being pitched as ILM are actually focused on data migration, retention, and protection, which is really data life-cycle management.

Don't get us wrong: ESG believes in the long-term promise and vision of ILM. However, we believe that the majority of solutions available today are truly DLM solutions. As these solutions evolve, they will become more aware of the application/information/data association and be able to automatically set classifications and migrate data according to the real-time requirements of database, content, and retention management applications. This is the future promise of ILM, but the industry is far from that realization.

The ILM process

That said, ILM has in a short period become the common descriptor for the process of managing data throughout its life cycle. While DLM is required to implement ILM, ILM is the umbrella term. The entire process of ILM involves multiple steps, including:

Assessment

Socialization (the process of bringing the assessment results to the appropriate business managers and working with them in the process of classifying information/data according to levels of importance to the business)

Classification

Automation

Review

Currently, the first three steps, while aided by storage resource management solutions, are primarily manual. The remaining steps of automating the processes are implemented using DLM solutions.

Regardless, ILM/DLM is really about changing the way an organization thinks about its information/data assets, and changing the way it stores those assets. End users should look for solutions that enable them to effectively assess and categorize both the storage and information assets within their environment, set values on information sets, set and enforce policies according to those values, and migrate and protect data automatically according to those values.

Value-based life-cycle management

At ESG we believe that today the ILM/DLM discussion is about migrating information/data across the storage infrastructure for three key purposes:

Resource optimization;

Effective data protection; and

Ensuring application performance.

ILM can mean many things to many users. There is no single definition of an ILM solution. To one user it can simply mean archiving for compliance reasons; for another it may mean migrating aged data from higher-cost Fibre Channel or SCSI disk arrays to ATA-based arrays. A number of business drivers and information characteristics will drive companies to implement varying ILM processes.

These characteristics and their relative importance vary from business to business, and even within different departments of the same company. A brief list of these characteristics and considerations includes:

Retention cycle—Does the information need to be retained for a specific period for a corporate governance or regulatory purpose?

Disposition cycle—Once the retention cycle is complete, should the information be disposed of completely or archived to lower-cost media? Does the information need to be electronically shredded after the retention cycle is expired?

Archival cycle—Does the information need to be archived for long periods? If so, does this archival need to be stored separately from the original?

Access frequency—How frequently is the information accessed once created? Will it be "write once, read many" or "write once, read rarely," or will it have a more active access frequency?

Read/write performance cycle—Based on the access frequency of the data, what is the required performance for both read-and-write operations? What technologies are appropriate for these requirements?

Read/write permissions—Does the information need to be stored on non-erasable, non-rewritable media?

Recovery performance cycle—How quickly does the information need to be recovered?

Security issues—How will the compromise of this information at different points in its life cycle effect the business?

Ultimately, by applying certain filters and definitions to an organization's repositories of information it is possible to assign relative "values" to this information. The value of a given piece of information is not a one-dimensional metric, but the product of analyzing a variety of interrelated data points. Assessing information according to the above criteria and setting life-cycle policies according to those criteria are crucial aspects of the ILM process. Only when these are complete can organizations then move ahead to implementing automated data migration, protection, and retention schemas as part of their ILM process.

What "ILM" solutions are available today?

Again, ILM is a process, not a product. One cannot buy ILM solutions, only solutions that enable ILM processes. Vendors may choose to refer to their products as ILM or DLM, but the reality is that these products can only address a portion of the overall process. It is important to realize that an ILM process does not have to be all-encompassing, addressing the life cycle of information associated with every application in the organization. The solutions that organizations will use to implement their ILM processes will differ depending on their environment and data retention and protection requirements. The point is, just as there is no one-size-fits-all ILM process, there is no one-size-fits-all ILM solution.

The five-step process that we outlined above will require various solutions to address the requirements in each phase. For example:

Assessment/classification

Set policies that determine where data should reside throughout its life cycle; and

Set policies that determine how data should be secured and protected throughout its life cycle.

Again, the actual process of determining which data is most important to the business is a manual task. Users will set the policies that reflect the valuations, and the implementation of those policies can be automated. Solutions do exist today that can aid in this process. A partial list of vendors that provide solutions to address these tasks today includes Arkivio, CommVault, Computer Associates, EMC, Hewlett-Packard, Hitachi Data Systems, IBM, Softek, and Veritas.

Professional services and security considerations are extremely important in the overall ILM process. Professional service organizations can help users assess their environment and more importantly help them understand the relationship between the information and applications. (ESG highly recommends using professional services and or consulting services during the initial ILM phases.)

Automation

Multiple processes could be automated to provide resource optimization, data protection, and enhanced application performance. A few of those processes are listed here:

Again, security is a major concern when you are assessing and implementing software to automate processes; only authorized personnel should be able to change policies. Of course, if an assessment uncovers that certain information is critical to the business, then that information should be adequately secured. Solutions from companies like Decru and Kasten Chase can address these security concerns by automatically encrypting data.

Despite all the hype and confusion, ESG believes that implementing ILM processes is both beneficial and necessary. Organizations will lower their overall administrative and operational TCO while ensuring that administrative and operational information is adequately protected and resources are efficiently utilized.

Implementing end-to-end ILM processes is not an easy task, nor can it be accomplished today. However, organizations can begin to build ILM processes into their business practices. As long as business managers and IT administrators work closely together to determine how the effective management of information assets will meet business requirements, organizations can begin to reap benefits from automating ILM processes today. We recommend that users overcome any reluctance they may have to move toward ILM and put ILM-enabling solutions to use.

Nancy Marrone-Hurley, Peter A. Gerr, and Steve Kenniston are senior analysts with the Enterprise Storage Group (www.enterprisestoragegroup.com) in Milford, MA.