SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Metadata for BI Solutions in the Microsoft stack

You know, today Microsoft doesn't have a solution to handle metadata of a BI Solution. Really, they don't have nothing to handle a single layer of the architecture like the Data Warehouse.

In a (how much near?) future this gap could be filled by using Oslo. Yes, Oslo is not a solution by itself, it's a platform, but studying it I see that it is the more natural way to produce a very well integrated metadata solution.

What do you think about it? Is someone already working on this? If not, I would like to know the reasons...

Comment Notification

Comments

I’m terribly sorry, perhaps I didn’t understood you correctly, but when you say that Microsoft doesn’t have metadata for BI Solutions, aren’t you overlooking the SQL Server Metadata Toolkit and the Rowset Viewer?

I know, SQL Server Metadata Toolkit is a sample solution, but it is not a "product" and (I think) for this reason there is not an ecosystem around it.

Oslo is an infrastructure with a precise idea about a central repository of informations and dependencies. It is not committed to Data Warehouse and BI processes, but I think that, as a platform, it could be easily adapted to it.

Just to make an example of what is missing in SQL Server Metadata Toolkit: if you use database views to decouple layers in a BI solution architecture (like we did a lot in SQLBI Methodology), you lose a granular view of dependencies at the column level (and maybe also at the table level when joins and subqueries are involved).

What is necessary (to me) is:

- a central repository for metadata, open to third parties products and to customization

- a set of tools that analyze the system and populate the metadata repository, finding dependencies regardless of the technique used (sql views, DSVs, SSIS transformations, and so on)

- a set of standard APIs allowing client tools (like Excel, for example) to query the metadata repository providing dependencies AND documentation (for both end-users and technical people)

- an integration with development environment (read BIDS, but also Database modeling tools) to do impact analysis when you make any changes

What I know of Oslo is that it builds on a very essential link that is currently missing today in almost every platform (even SAP). That is the functional - natural language - fact models and business rules where applications and data models should build upon.

As you mention, this could be a great platform to build IT solutions (including BI) from. The short term problem you have this fundamentally changes the way people build software today. Looking at the past with business rule engines-companies and where they are today, this does indicate that the whole world must build model driven in the future to get most benefit. This is not realistic I think.

As a company we are currently looking into how to build BI systems model driven, but the handicap you have is that a BI systems is always are a snapshot of the metadata of OLTP systems at a certain point in time. When they do not synchronise metadata automatically I am afraid you end up building separate environments with separate lifecycles.

This is a very timely topic for me. In my consulting capacity, I have ended up writing some custom apps to help our clients manage their metadata. There are three issues I am seeing with customers relating to metadata:

1) Ability to mine existing metadata out of the SSAS and relational databases, keeping dependencies between both.

2) Ability to easily manage an online corporate glossary and tie these business terms and definitions to elements of the data warehouse. (This to me is a huge problem on the business side that often gets overlooked by us folks on the IT side).

3) Ability to tie process/job control tables or functions to the metadata in the repository.

4) Ability to tie this in with downstream applications. For instance, if you had a metadata chain from enterprise reports --> data dictionary --> data warehouse elements -->process control/ jobs, you could really add visibility to both the IT and the business sides.

Another question for all of you: Where do you think business metadata (i.ei dictionaries) should be stored and managed? Within BIDS? Within a separate application, like a web front end? Is anyone actually using extended properties or descriptions to house their business metadata?

I am very interested in hearing if this resonates with people or have you already found solutions out there that do all of this?

I completely agree with you about the issues that arises in the real world. Exactly for this reason, I think that an external repository, based on a wide distributed metadata platform open to third-party extensions (and Oslo could be the right one for this) could be the base to found an ecosystem that aggregates and integrates informations coming from different sources.

However, this is just an opportunity and everything has to be written here. Limits of existing solutions are their closeness and/or their relationship with technology-specific choices that make an impact (or at least some constraint) on the BI solution architecture.

On a side note, I would only mention that in SQLBI Methodology (http://www.sqlbi.com/sqlbimethodology.aspx) we just touched the issue of dependencies between SSAS and relational detabases by making a specific use of database VIEWS, beacuse their dependencies are easily trackable with existing tools.

"Master Data Hub that provides central management of master data entities and hierarchies and provides a comprehensive role based security model that ensures fine-grained, secure access to master data."

and

"Flexible Data Model completely defined in a metadata driven data model. Data administrators are free to define the data model in any way that satisfies the needs of their organization. As new business entities appear or current business entities change, the master data hub data model can change to accommodate the new business requirements."

it appears to be the masterdata management tool you were asking for.

What i would like to know is how this data will be made available to the organisation? By means of Dublin with entities described by Oslo? and how the BI get access to this data, Gemini to use the Berlin webservices? It would alter the way we work greatly but i like the SOA way of getting the data to not only BI but other applications in the organisation. Or would this MDM replace our datawarehouse and would we Query it to fill our cubes/reports.

Uhm - I think MDM is another topic by itself. It cannot replace the data warehouse, its scope is more about what we would call the "dimensions" side of the data (qualitative attributes), but quantitative attributes still comes from OLTP systems. Data Warehouse is the central repository for these data. Metadata should be able to track data relationships between source (ie OLTP) and reports (ie Olap Cubes, Data Marts, and so on) covering all the intermediate steps (ETLs and so on).

MDM is about distributing likewise business keys throughout diverse apps easyfying integration efforts.

MS Oslo, for now, provides the foundational layers which will expand over the total MS stack to provide MDA engineering. Somewhat different with current MDA approaches it is able to run Apps as a model instead of codegenerated apps which 'comes from models'. The difference in MDA style , like the UML PIM camp it could serve, can be found here http://loekb.blogspot.com/2006/02/will-real-mda-please-stand-up.html. A MDA fashion also executed by Mendix.com, a 'cool vendor' stipulated by gartner.

Have you seen Microsoft's Project Barcelona yet? It looks really neat. I was impressed by the "Web Crawler" approach to collecting enterprise-wide meta-data. Plus, they plan on letting us write our own crawlers for non-MS products (like Oracle, or Business Objects for example...).

In my experience, existing products are not completely integrated with the development process and lack in automation, and nobody is willing to invest the time required to keep a separate set of metadata information updated.

In an ideal world, the metadata information is automatically generated by analyzing the solution. With Microsoft BI Stack this is teorically possible but in practice it hase been done only by third party tools, until now.

I realize this is an old post but it still ranks well so I'm leaving a note for others that stumble across it. LeapFrogBI is a metadata driven etl development platform designed from the ground up to streamline the creation and maintenance of SQL Server data marts. Check out this SaaS platform at LeapFrogBI.com and start reaping the benefits of metadata driven etl.

Noticed Oracle recently intorduced a product in this relm (Nov-14) wondering if SQL server has done anythiny. As for 3rd party Wherescape RED has an interesting product for SQL Server Meta Data Management.

Leave a Comment

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.