Is Centralization the Right Approach to Tackle Distributed Data?

There was a time long ago, but not so long ago when digitized data was stored in disks managed by few machines and the concept of distributed data seemed to be farfetched. Very few people were required to manage this data and there was no difficulty or a long drawn out process to make backups of this valuable asset. After some time, things started to change as end users started shifting places and had to be connected to the real network in real time. This led to the development of earlier generations of networks, but these networks were painfully slow and lot of time was consumed in arriving at solutions.

Today, data is no longer confined to a room and is stored everywhere. With websites, applications, and operations running in the cloud, there is an enormous amount of data that the enterprises have to deal with on a daily basis. Business applications such as Salesforce.com and NetSuite store enterprise data in the cloud, which is of mammoth proportions has led to the scattering and distribution of data. Open sources are also responsible for the distribution of data, where thousands of open data sources have become available for public access. With time not only has the storage become fragmented; data entry has also become scattered with employees and customers using various mobile devices to enter data.

Need for Data Integration

Streams of data are increasingly becoming available from various, autonomous, distributed sources. Typical examples include stock tickers, sensors and various monitoring data including environmental, traffic, and computing resources. The need of data integration has become much more pertinent in today’s scenario and there are many reasons that bring to fore the growing importance of data integration.

Customer care can be improved drastically when sales data is integrated with data from other sources, such as the social media and others. Another example of data integration comes from the integration and amalgamation of sales and manufacturing data. These are just two of the many scenarios where integration of data can play a very crucial role. One thing is for sure that integration of databases is going to have a great impact in the coming decade and change the way operations are being run today. But there’s also another aspect to the integration of data and it resides in the fact that one should be able to make sense of the large volumes of data. This further stresses the fact that data coming from two or more heterogeneous sources should be integrated in order to get the complete picture.

The need for data integration has become all the more important with huge volumes of data being scattered all around. In the recent past the most obvious place of data integration has been the data warehouse, where data can be moved from various sources and consolidated. This becomes a more viable option when the amount of data that you are dealing with is less and manageable. But is data warehousing the answer to all the questions posed by distributed and fragmented data? In this era of ‘Big Data’, the data warehousing approach seems to be inadequate. The amount of data one has to deal with is of mammoth proportions and it becomes practically impossible for anyone to move this data to the warehouse and consolidate it. The biggest problem lies in storing this data and making sense of this staggering amount of data. Another problem that comes in the way is the escalating cost that goes in moving and storing this big data.

While centralization seems to be the right choice for integration and consolidation purposes, it might not be an ideal option to deal with the ever growing and distributed nature of the data.