At the time, the company’s MetaCenter Enterprise Metadata Repository product, which provides end-to-end data lineage across multiple data sources, was cited as a leader in the Challengers’ Quadrant. The research firm noted that its strengths included customer support, solution performance, and an easy-to-use user interface. It referenced as among its advantages the company’s library of hundreds of virtual machine images that lets it replicate MetaCenter customer environments and reproduce their issues in order to address them, as well as its simple GUI driven customization and built-in workflow capabilities.

CEO Geoff Rayner recently sat down with DATAVERSITY® to discuss the latest happenings at DAG and around Metadata.

DATAVERSITY (DV): Can you tell us about some of the work you’ve been doing over the last few months to expand the capabilities of MetaCenter?

Rayner: We have been working to introduce some additions to MetaCenter, including an update to its history management capabilities that traditionally have enabled business and IT staff to access the latest copy of database, BI, ETL or Script Metadata. The version of that feature that is now in beta uses a Graph Database to enable the arbitrary tracking of Metadata changes across any information asset that MetaCenter scans, at any level of detail and from any point in time. We are initiating a way to tag each change to understand what database or other product release the change was associated with that might have caused something to break.

Among other capabilities introduced over the last few months are also Metadata Management support for new data tools, including Amazon’s Redshift fully managed Data Warehouse, Relational Database Service, and Simple Storage Service (S3). Tableau and the Progress OpenEdge Database are also among the solutions for which DAG has added Metadata support. We aim to continue adding additional Metadata Management support to the tune of one or two systems per quarter based on customer requests.

MetaCenter also has gone through the evaluation process to be formally certified as a Metadata Management solution for Cloudera Navigator Data Governance solution for Hadoop [which provides lineage within the Hadoop environment itself] and Hortonworks Apache Atlas [which provides scalable governance for Enterprise Hadoop].

DV: What are your thoughts about supporting Metadata Management in a world of Big Data and Hadoop environments, which are marked by the ability to add and modify data systems as needs change?

Rayner: We are experts in that area but it’s hard to go after Metadata in a platform that’s not stable. For better or worse, the more flexible a platform is the harder it is to govern effectively. It’s not like an ETL tool like PowerCenter where there’s one way to do things all in the workflow, but instead most things are driven by scripts and custom programs and it’s a lot harder.

There are no formal semantics in HDFS either; instead it’s a very physical way of dealing with storage that gives programmers flexibility but for consumers of data it adds complexity. Sure, it will happen over time that the Hadoop platform itself will get a little more mature and more stable, too. But it’s always a tricky platform to do Metadata Management on because it’s more of a programming model.

People are using solutions like Hadoop, though, because they’re tired of paying [a lot of money for I/O to] companies like IBM and Teradata. So, they’re doing all their pushdown processing in Hadoop instead of those expensive platforms. They can be slashing costs by 80% a year by taking some workloads away from those big vendors, and they should use some of that budget to think about Metadata from the start.

If you want something to be documentable, you need to make the process itself Metadata-driven. The way you engineer it matters. If you think about that upfront and put that in as a requirement when building solutions on the platform, it’s straightforward. But usually people find that pain once they’ve played around for awhile. They don’t usually bring in folks like us until they’ve already done a lot and they’re looking for a band-aid.

DV: Can you talk about how your expertise in Metadata Management comes through on such platform scenarios?

Rayner: The reality is that when you do any kind of analysis, you want to have trusted datasets to combine as needed. We don’t normalize all Metadata in systems but create images of it in the systems we scan, so it’s like a one-to-one representation of that data. We put a semantic layer on top of it and create different types of representations of that data at a high-level mapping. That way you always have all the detail and can fall back to it. Once you normalize out the detail you can’t get it back, and with Hadoop you always have to have that detail.

DV: Why does Metadata Management fall into that after-the-fact category?

Rayner: First, it’s crazy not to have a Metadata tool if you’re thinking about soft ROI. It makes life so much easier because it lets the tech guys focus on building new stuff: It reduces the cost of doing things like root cause and dependency analysis by 95%, so it frees up their time to work on more value-added projects. It drives efficiency, accuracy and agility, and that’s regardless of platform.

But because it doesn’t have hard returns that drive top-line business, people ignore it until it becomes painful. People just keep dumping stuff into the HDFS file because it’s convenient and they don’t regularly go back to do housecleaning or organizing in any way unless they have the methodology in place to do it upfront.

Metadata Management also is like brushing your teeth or eating your vegetables. You’d rather do something else but the reality is that in the long term you are going to be a lot better off. It’s an investment in being organized and there’s not a business in the world that won’t benefit from that longer-term. A little forethought adds a lot of value down the road.

The curve is shifting, too, as we are getting more senior business managers who are more technical these days, and they feel they want to do self-service stuff and that they know enough to do that. Well, they know enough to be dangerous, anyway, so the real question is how to make them less dangerous and give them the tools like Metadata Management to leverage their technical knowledge and savvy.

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.