In the previous post, we covered Holistics’ features in the “Data Imports”product that enables the moving of data from production databases and external applications into your data warehouse. In this final post as part of this “Combining Data Sources” series, we will cover the importance of transforming data sourced from different applications.

An ETL product becomes useful under the following scenarios:

Joining information coming from different sources

Standardizing and Cleaning information

Pre-processing of reporting data for performance optimization

Joining Data

The data from different sources might have populated into different schemas in a data warehouse, and a single SQL snippet can create a “Transform” and “Load” job, as shown below. As the processing is based on SQL, making changes to accommodate newer business requirements is as easy as modifying the conditions, filters, and columns returned by the SQL.

Standardizing and Cleaning Information

Data captured by applications and systems can differ based on how each of them handle data; for instance, in applications capturing a user’s location, some applications may require input from the user whereas others will pick up the location based on the user’s GPS. The former could be very noisy because users may input data typos or use short or alternate versions of a city name. Thus, SQL column operations can be leveraged to clean data from different sources, and to enforce a standard for some columns. An example is as below:

Preprocessing Reporting Data

Is this spinning image a common aspect of your reporting solution?

Most BI Reporting solutions start to slow down, and users typically start to complain about the performance. There are many things that affect how quickly data is returned by the underlying database/data warehouse, such as:

Lack of scalability of the underlying data model

Concurrency of multiple users

Data growth — Volume and Variety

Scalability of Data Model

A common problem that companies face is that the data warehouse may not be dimensionally modeled, and that the source data is stored in the original, operational form. Thus, a report will involve a query including multiple joins and CTEs. The image below illustrates the complicated nature of a relational model.

Concurrency

When the number of users accessing a dashboard or a reporting solution starts to increase, the query might be executed each time the report is run. Database administrators usually enforce a maximum query execution limit, thereby forcing some users to be on an invisible queue, causing delays.

Data Growth

Over the course of a business, the volume (Rows) and variety (Tables) of data will increase, making your reporting SQLs both slow and complex. Dashboards and Reports using such queries will suffer from some issues.

With Holistics’ “Data Transforms” product, you can seamlessly move the report’s SQL to an “ETL” process to create a table that pre-populates the information consumed in the report. For instance, in a Hotel e-commerce context, if there were reports summarizing “Reservation Information” or “Daily Traffic Information”, this report will run slower as the volume of data increases. Furthermore, the job can be scheduled to run during off-hours so that the data is available during daytime for your reporting users.

Now, the Data Reporting module rendering the visualization will perform optimally as the “heavy-lifting” has been completed behind the scenes. Furthermore, the ability to create tables and load them with information lets the team to build a well-modeled datamart or data warehouse, thereby addressing issues presented by a lack of a data model.

Conclusion

When your organization’s BI needs mature, you can continue to leverage Holistics to build a strategic datawarehouse that serves multiple stakeholders. Creating tactical datamarts might be the initial solution, and transitioning to a full-fledged datawarehouse will be easy with Holistics’ Data Preparation suite, consisting of Data Imports and Data Transforms.