Beating Big Data Blues

Data is the lifeblood of a successful organization and effective management of data resources plays a vital role in its smooth operation. The ever growing number of processes and regulations result in the accumulation of large amounts of both business and non-business related content.

According to Gartner, 47% of large enterprises identify data growth as the biggest data center hardware infrastructure challenge. On average, the data capacity in enterprises is growing at 40% to 60%. Research further shows that more than 52% of an organizations digital content is unstructured data such as files, documents, image files, video etc. while just 31% is structured. Over 70% of this content is generated by end-users within the organization. Employees often store personal data on company resources as they know that it will be securely maintained and regularly backed up. Thus, the data pool contains a mixture of data which is business-critical as well as data which has lesser or no business value. Even business related data can get stale over a period of time i.e. it becomes inactive and has low business relevance. The failure to analyze data means that all data is treated in the same manner which leads to ineffective utilization of company resources. The real challenge that organizations face lies not in having to deal with data growth - that is inevitable- but in the effective and strategic management of data. After all, while data growth is projected at 40-60% per year, growth in IT budgets is estimated to be just 2.6% which is significantly less.

Factors Contributing to Unnecessary Data Growth

Long term retention is a factor which complicates the overall data management process. Retention may be for business reasons, historical reasons, end-user driven requirements and policies and regulations which may be prescribed by the government or the organization itself. As the number of retention policies, both government and homegrown, add up, the organization and storage of data becomes complex. Maintaining multiple copies of the same data is both inefficient and expensive. Apart from causing inconsistency and placing a large overhead, redundancy can affect long-term processes such as backup. Although the cost of storage devices is reducing, having redundant data on these devices increases the time taken for backup. This causes significant increase in the network overhead and bandwidth requirement. Furthermore, most large organizations with multiple locations globally generate large volumes of data on a constant basis. Due to this global dispersion, backup windows are constantly reducing and so only critical and business relevant data should be identified and selected for regular backup. So what techniques do organizations employ to reduce their data storage requirements and effectively utilize resources?

Resource Acquisition- The Quick Fix

The most common, tactical reaction to solving the data growth problem is to simply buy more storage. Given the reducing cost of storage, this knee-jerk reaction proves to be the quick-fix but often reflects the lack of the ability to carry out predictive capacity planning. The hoarding of data is further complicated by the infinite retention policies as the data is stored without consideration for the actual content.

Archiving

Data Archiving, along with data tiering, is considered to be an effective data reduction technology. But blind archiving, without first gaining insight into the data landscape or applying any governing policy simply translates to moving data between the tiers and does not contribute to any reduction in the total volume of data being managed.

DeDuplication- Beating the Bloat

Finally, DeDuplication or dedupe for short probably the most talked about data management strategy. It is also perhaps the leading data reduction technology permitting sizable reductions in data volume. Traditionally, organizations opt for a hardware based approach to deduplication. This eliminates redundant data on back end devices. However, the challenges faced with this methodology include increase in operational management costs and impact on network overhead and bandwidth which are factors which contribute significantly to the yearly increase in storage management costs. Applying these data management strategies individually and independently won't permit efficient capacity planning which keeps capacity ahead of demand. Neither will they bring about reduction in data volume or operation expenses. So if the three most widely used strategies fall short, what really is the best solution?

Basics of Integrated Data Reduction

The Integrated Data Reduction approach is the hot topic in the data management world. By applying a combination of the three strategies, organizations can reduce their overall data volume and migrate the retained data to the most appropriate tier of storage thereby achieving significant reductions in storage costs. An Integrated Data Reduction approach is implemented in the following manner-

First, through Storage Resource Management (SRM) the organization can gain visibility into the data. This visibility is the key to understanding HOW to reduce the content volume. It helps in making informed decisions based on the business value of the data. This information can then be used to determine what should be deleted and what should be archived, and how tiering of data should be carried out.

Intelligent archiving is then made possible by SRM and it becomes an enabler of data reduction. By identifying the right candidates for archiving, granular policies can then be applied instead of hoarding of large volumes of data. This phase also involves the deletion of inactive data and freeing up of critical primary storage resources.

Once the data has been aligned with the appropriate tier and the primary storage pool has been optimized, deduplication is applied across the backup and archive pools on a global scale so as to reduce the amount of data present on the back end. Deduplication is really the key to an Integrated Data Reduction strategy as it reduces redundancy in the backup and archive pools regardless of the backend storage devices used. While deduplication ratios of 1:20 and higher are not unusual, even a conservative ratio of 1:5 would result in a drastic reduction of the operation management expenses.

The disproportionality between data and budget growth is set to increase and companies are starting to realise that addressing the problem in an ad-hoc manner is ineffective and carries severe long term implications. When properly implemented, an Integrated Data Management strategy can dramatically reduce inefficiency, enhance manageability and drastically reduce operational expenses.