Enabling Trusted Data within a Teradata Analytical Ecosystem

With Teradata customers having multiple platforms that make up their broader analytical ecosystems, it is more crucial than ever to have effective mechanisms not only to ingest data into those multiple platforms, but also to provide insight into the quality of the data and access in a self-service manner. With the introduction of Teradata Vantage, the platform for pervasive data intelligence, the analytic opportunity has gotten more powerful with access to business users and data scientist across heterogeneous data to drive powerful answers that drive business outcomes.

To be able to effectively drive accurate business outcomes within the broader Vantage ecosystem, it is crucial that these users can easily ingest, access the data that they need to analyze but more importantly trust the data, Teradata Kylo, provides the analytical ecosystem software management platform that allows this to happen.

A key to tapping all this opportunity with the Vantage ecosystem is to have an environment where new types of data access and analysis across multiple platforms is not only encouraged but accelerated. This means that companies can no longer afford to have slow, timely data ingestion processes and lack of insight into data lineage and quality.

Ingest the Best

Kylo has been designed to enable companies to easily ingest and prepare data for platforms within the Teradata Analytical Ecosystem, such as Teradata Vantage, Amazon S3, Azure Blob Store and Hadoop. The platform allows a user, in a self-service manner, to create data ingestion pipelines within the Kylo GUI interface. In the creation of these pipelines, a user can choose from numerous connectors and pipeline templates that make it easy to create the process to ingest data into the Vantage ecosystem. As part of the data ingestion process, cleansing and validation rules can be set up to improve data quality and data accuracy. A user can easily choose simple standardization rules, such as credit card masking, and validation rules such as ‘is email address’ or ‘is IP address’ within their pipelines. Kylo’s plug-in architecture also enables customers to build custom standardization and validation rules to handle their unique company specific business rules.

The ease of data access, ingestion, and validation allows business users, and especially data scientists, to quickly get trusted data from new sources into Vantage for analysis, It and enables them to use the power of Vantage and its multiple processing engines to analyze new and changing data within the ecosystem.

What’s in your Data?

One of the other key components that enables trusted data within Vantage and the broader analytical environment is the ability of Kylo to track data lineage and collect and maintain all the metadata around data ingestion processes. By maintaining the data lineage, users know where the data originated from, what data was ingested and what changes were made to that data during the ingestion process. This provides technical metadata that provides data stewards and data scientists with the ability to quickly catalog, discover and qualify data for new analytical processes across the broad Vantage ecosystem.

Angling for some Wrangling

In order to gain greater insight into the quality and type of data, it is also important to be able to profile the data that exists in the platforms across the Teradata analytical ecosystem. Out of the box, Kylo delivers wrangling capabilities that provides data scientists, business users and data architects an easy mechanism to investigate and profile data. Providing this self-service profiling capability further promotes the goal of trusted data within the overall analytical ecosystem. Through Kylo, users can analyze data and use profiling functions including data range of columns; frequency of values for both numeric and categorical data; and other functions to help users understand the distribution and shape of the data.

Kylo also provides data scientists with a library of over 600 data manipulation functions, derived from Spark. This allows for complex data transformations, feature engineering and aggregation, so that data can move from raw and unstructured to a ‘ready for analytics’ state. Wrangling operations, once finalized by a data scientist, can be saved as a repeatable data feed, so that new data can be automatically transformed and made ready for analytical processing, modeling, and reporting in the Vantage platform. The visual interface makes many of these functions available via column action menus, so that data scientists can quickly impute missing values, perform common transforms such as one hot encoding, and immediately view the results.

Answers enabled with Trusted Data...

The goal of Teradata to enable customers to deliver answers while leveraging the power of Teradata analytical ecosystem. This goal is enabled by the trusted data that Kylo can provide. We continually focus on the end user experience of running analytics to get answers that enable action. As companies’ analytical environments grow and they look to operationalize the insights gained by data scientist, the users can be confident that the data for the new workloads will be of the highest quality.

By providing insight into the quality of the data and access in a self-service manner, you get the tools to enable a strategy which is user and data centric. Enable the full power of your Teradata analytical ecosystem with Teradata Vantage and Teradata Kylo.

(Author):

Mark Shainman

Mark Shainman is the Program Manager for Teradata’s ecosystem management software Teradata IntelliSphere, as well as Teradata Vantage and competitive programs. As part of Teradata’s analytical ecosystem team, Mark looks after the marketing, education, promotion and strategy-surrounding the uptake and usage of QueryGrid, AppCenter, Data Lab, Ecosystem Manager, Multi -System Viewpoint, Data Stream Architecture, Unity and Data Mover, and Teradata Vantage. Mark also continues to be the global program manager for Teradata’s Competitive programs, covering Oracle, IBM, Netezza and SQL Server migrations as well as cloud migrations and data mart consolidations. He has managed the global aspects of technology, strategy, positioning and sales support surrounding numerous products and programs at Teradata. Prior to joining Teradata, he was a senior research analyst for META Group specializing in database management systems for both online transaction processing and decision-support architectures. Shainman has advised clients on a wide spectrum of database issues, including total-cost-of-ownership analysis, data mart consolidation, disaster recovery, replication, and security, while assisting with core database product comparison and selection.
View all posts by Mark Shainman