Data Schemas and Curation – What’s the Deal?

14 Mar 2019

Share on

Data schemas are an effective way to describe the structure of your data – you could think of it as a data dictionary or shared standard. Schemas define what you want your data to look like and make it possible to automate the curation to reach that goal. Having a standard dictionary, either internal or widely used in your field, of how your data should look like helps you to easily transform your data according to set standards; this comes in handy especially when working with data that is coming from multiple different sources. With data schemas you can also set organization-wide data standards to harmonize and provide consistency across teams and organizations.

When the data you work with is transformed into a structured and harmonized format, your data quality increases and leads up to more accurate, reliable and reproducible results. The quality of data is particularly important in the healthcare and life science field as the results from that data can have an effect on treatment decisions for example – we all want it to be correct.

In our data curation solution Accurate, applying data schemas has been a key step to further enhance the data curation process. Accurate helps you reduce the time-consuming manual curation: you can easily and quickly get your data into the desired format. Using schemas you can predefine the allowed values or ranges for clinical variables, and as schemas are automatically detected, you will immediately get insights about the quality of your data. This can be used to guide users through the curation process, and at the same time also to assure that the data is consistent and of high quality.

Data schemas also show their true power when combined with smart memory – Accurate will check if the values match with your new datasets, and allows you to apply existing curation rules, thus automating the steps by memorizing the user’s previous behavior. The more you use the solution the faster the curation process becomes as Accurate remembers your previous curation steps.

Using data schemas in Accurate is made easy and user experience has been the focus from day one. You can design your data schemas project-by-project or you can define that these are the schemas I want all of my data to follow, creating an automated process for the future. You will get the biggest value of data schemas in Accurate from repetitive use as the system remembers your previous choices, allowing you to focus on the next step – innovation and discovery!