Announcing the General Availability of Azure Data Catalog

This post is authored by Julie Strauss, Principal Group Program Manager in the Data Group at Microsoft.

We are very excited to announce the upcoming* general availability of Azure Data Catalog, the latest addition to the big data and advanced analytics services included in Cortana Analytics Suite. Data Catalog is an enterprise metadata repository that enables the self-service discovery of data assets.

Data Catalog is a fully managed Azure service that that describes, indexes and provides information on how to access any registered data asset. It provides capabilities that enable any employee – be it a business analyst, data scientist, or developer – to register, discover, understand and consume data that exists across the enterprise.

With the explosive growth of enterprise data and ever more powerful storage capabilities, users end up spending more time trying to find, understand and access the data they need, rather than analyzing that data for insights. Data Catalog addresses this challenge, and is designed to close the gap between users seeking information and users producing it.

IT departments are typically the gatekeepers to enterprise data today. Since the metadata management systems used today are often locked-down, IT can become a bottleneck, being inundated with a wide variety of requests for data from users across the enterprise. In contrast to these traditional systems, typically designed for IT only, Data Catalog aims to engage the broader data ecosystem within an enterprise. Using a crowdsourced approach to metadata, Data Catalog ensures that every user with a perspective on any given data asset is empowered to annotate it. As a result, users can enrich the system during use, adding value for the community at large. IT is now alleviated from the burden of dealing with constant requests and they can maintain control and oversight as the system evolves. Administrators can focus on governing back-end systems, managing policies and guidelines for requesting access. For sensitive data, IT and data owners can seamlessly restrict visibility of metadata to predefined users.

Watch this video for a short overview of the key capabilities of the Azure Data Catalog:

Since the time Data Catalog entered public preview, many customers have actively engaged with us to identify and prioritize key scenarios. Your feedback has helped shape the new capabilities that are now being made available. For instance:

Data Catalog supports the most business-critical data sources used across enterprises today, ranging from on-premises assets like SQL Server, Oracle, Teradata and SAP HANA to cloud based assets like Azure Data Lake, Azure Blobs, HIVE and HDFS. A full list of supported data sources can be found here.

Built-in support for requesting access to registered data assets is now provided out of the box. This allows users to easily understand and follow existing processes defined by data source owners on how to get access to the source system and its data upon discovery.

We have included additional improvements allowing data consumers to more readily understand the quality and characteristics of registered data assets. These include asset level documentation and support for data profiling, enabled as part of the metadata extraction process.

Newly added support for Power BI Desktop and SQL Server Data Tools have broadened the scope of tools that users can launch, to connect to data sets directly from the Data Catalog in the context of a data asset. These tools are provided in addition to the existing support for opening data sets using Excel and Report Manager.

A new home page has been designed to provide an optimized experience for repeat users. New capabilities include the ability to pin individual assets to the home page, the ability to save high value search results and to easily filter search by top experts, and by most frequently used data sources.

Data Catalog offers a simple, easy to use central portal experience that provides a single entry point for all metadata management activities related to registering, managing, annotating, discovering and sharing data assets. The value and capabilities of the Data Catalog can be extended through open REST APIs. These APIs enable easy integration with 3rd party data tools to allow users to publish, discover and consume data assets in the context of the data tools they are already using.

In addition to being a key component of Cortana Analytics, Data Catalog is also offered standalone in a free and a standard edition. The free edition is an open model, allowing any user to register, enrich, understand, discover and consume data from any supported source. The standard edition provides additional governance capabilities enabling users to take ownership of registered assets and to apply asset-level authorization, allowing for restricted visibility of metadata to a limited number of users.