Pages

Saturday, 6 October 2012

Management of Data Growth in Data warehouse

How to manage a Data warehouse? This is what I would
like to discuss in this post!

Datawarehouse is used for the storage of historical
Data.Large data warehouses grow rapidly, with an annual growth rate of over 50% are now commonplace. These historical
data even though huge are very important from the business perspective and
hence needs to be managed at any cost. Infrastructure costs are other important factors which we
need to minimize. Also we need to ensure that data is maintained incompliance to retention regulations.

Now we
will talk about how to manage the data
growth in data warehouses!

For
managing the data inside a Datawarehouse we need to first check which data is
actually used and which data is not used. This is really a tough and tedious task.
In OLTP systems we can use various methods to check for the usefulness of data
but in Datawarehouse it’s difficult because business users need all data in the
Datawarehouse.

The only
way to counter this is Constant
monitoring of the Datawarehouse. Reports generated by BI tools (like OBIEE
or BO) give us a good idea about how the data is being used. We get a picture
about the customer and products that are using the data. We can deploy a
Monitoring tool to track the usage of data .Using the above methods we can
decide the importance of the historical data in Datawarehouse.

Once we
have prioritized the data in Datawarehouse the next step is to decide how to
mange it. There will be data that is no longer used in the Datawarehouse and also
there are data that has still got importance with the business. For those data
that is never used, we can Purge
those and data is no longer used but needs to kept for compliance purpose we
can Archive them. We can deploy an
Archiving solution to maintain the data and move the data to archive.

Overall
monitoring the Datawarehouse and prioritizing them for Archiving can help in maintaining
the Datawarehouse. Apart from that there are technologies like Map Reduce when data is too unstructured