Data Warehousing - Metadata Tool Selection

Buy vs. Build

Only in the rarest of cases does it make sense to build a metadata tool from scratch. This is because doing so requires resources that are intimately familiar with the operational, technical, and business aspects of the data warehouse system, and such resources are difficult to come by. Even when such resources are available, there are often other tasks that can provide more value to the organization than to build a metadata tool from scratch.

In fact, the question is often whether any type of metadata tool is needed at all. Although metadata plays an extremely important role in a successful data warehousing implementation, this does not always mean that a tool is needed to keep all the "data about data." It is possible to, say, keep such information in the repository of other tools used, in text documentation, or even in a presentation or a spreadsheet.

Having said the above, though, it is author's believe that having a solid metadata foundation is one of the keys to the success of a data warehousing project. Therefore, even if a metadata tool is not selected at the beginning of the project, it is essential to have a metadata strategy; that is, how metadata in the data warehousing system will be stored.

Metadata Tool Functionalities

This is the most difficult tool to choose, because there is clearly no standard. In fact, it might be better to call this a selection of the metadata strategy. Traditionally, people have put the data modeling information into a tool such as ERWin and Oracle Designer, but it is difficult to extract information out of such data modeling tools. For example, one of the goals for your metadata selection is to provide information to the end users. Clearly this is a difficult task with a data modeling tool.

So typically what is likely to happen is that additional efforts are spent to create a layer of metadata that is aimed at the end users. While this allows the end users to gain the required insight into what the data and reports they are looking at means, it is clearly inefficient because all that information already resides somewhere in the data warehouse system, whether it be the ETL tool, the data modeling tool, the OLAP tool, or the reporting tool.

There are efforts among data warehousing tool vendors to unify on a metadata model. In June of 2000, the OMG released a metadata standard called CWM (Common Warehouse Metamodel), and some of the vendors such as Oracle have claimed to have implemented it. This standard incorporates the latest technology such as XML, UML, and SOAP, and, if accepted widely, is truly the best thing that can happen to the data warehousing industry. As of right now, though, the author has not really seen that many tools leveraging this standard, so clearly it has not quite caught on yet.

So what does this mean about your metadata efforts? In the absence of everything else, I would recommend that whatever tool you choose for your metadata management supports XML, and that whatever other tool that needs to leverage the metadata also supports XML. Then it is a matter of defining your DTD across your data warehousing system. At the same time, there is no need to worry about criteria that typically is important for the other tools such as performance and support for parallelism because the size of the metadata is typically small relative to the size of the data warehouse.