Data Validation During ETL Process

Data Warehousing is not a new thing anymore, it’s much more a mature and widely applied technology based on heavy technical literature and application experience, which no serious organization or enterprise can allow to overlook. The process lying under the surface published to the users is not a place for clean solutions, as in most cases there’s a lack of IDEs armored with comfortable extensions, which are so common in today’s application development. This doesn’t necessarily mean that there aren’t any tools for getting the job done, rather those are mostly expensive, complex, hard-to-use software products, and the developers of the Data Warehouse can’t avoid to do some special scripting tasks by hand every now and then. The ETL process accounts for most of the backroom processes, and the Cleaning phase, which deals with the verification, cleansing and validation of the data, is part of that process. The focus of my work was creating a metadata-driven framework, that tightly fits into this phase. To make operating the framework and keeping contact and reaching to an understanding with the business users as painlessly as possible, I present a web application as part of the solution, which is a mixture of studying the literature and real life experience.