Overview about Data Quality Handling options in PI System

Version 2

Created by erakovska on Nov 22, 2016 10:01 AM. Last modified by erakovska on Nov 24, 2016 1:24 PM.

We would like to hear more about your stories with Data Quality Handling in PI System. Please share your implementations, solutions, ideas and challanges. Feel free to extend this summary and please ask questions.

Data quality is quite wide term and the first step needs to be defining what Bad/Good Quality means and how should it be handled. This needs to be defined by the customer as it varies largely from company to company, sometimes even between departments.

In general, we can divide Data Quality checks in multiple groups:

Hardware problems like communication failure, power outages, network problems… In general whenever the data are not send from the interface

Bad quality data from data source due various reasons, In general we collect data, but they don't have good quality

Data gaps/missing data/not updating for subset of PI Tags, when we don't see new values at expected times

Data below/over limits

Statistical quality control

Communication failures

One of the possibilities for monitoring of a communication failures is using PI Performance Monitor Interface, which allows to read and archive data from Windows performance counters.

Data quality information can be stored additionally to a value information in different ways, depending on the data source options and PI Interface configuration.

For example, the OPC Data Access standard specifies a set of quality flags. PI Interface for OPC DA translates the quality flags to the closest approximation in the default System digital state set.

PI Interface for OPC DA stores either the quality or the value in a PI point, whereas the OPC server returns value and quality in separate fields. If a value is good, it is stored in the point. If a value is bad, a digital state that describes the quality is stored. For questionable-quality data, you can configure the interface to treat the values as good and store them, or treat them as bad and store a digital state. You cannot configure the interface to store a bad-quality value.

The Systemdigital state set is the default set for PI Data Archive. It contains over 300 pre-defined digitalstates that may apply to any point. Examples are Point Created (Pt Created), I/O Timeout, No Data and Archive Offline, Over Range and Under Range. Use the DigitalStates tool to verify that the Systemdigital state set is up to date.

Three flags can be set on a PI value in the archive or the snapshot: substituted, annotated and questionable.

Annotated: This flag reflects the fact that an annotation is attached to the value. This flag cannot be directly changed by the user. It changes only when an annotation is added or removed from an event.

Questionable: This is a quality indication - it can be used according to the standards of each system. The PI system does not use this flag internally to make any decisions about an event. This flag can be set and cleared by the user via any event-editing interface. (API/SDK etc)

Substituted: This indicates that the current value in the archive or the snapshot is different from the original value. This flag is set by the system during editing. It cannot be directly set or cleared by the user. Only changes to the value itself are considered substitution. Setting the questionable bit or adding/changing annotation are NOT changes to the value.

How to access Questionable flag in Datalink and ProcessBook can be found here: https://techsupport.osisoft.com/Troubleshooting/KB/3167OSI8 . Questionable flags are not directly visible in Coresight, but you can use IsSet function in AF, to determine if a PI value is annotated, substituted, or questionable. The result can be then visible in PI Coresight.

In general, if the quality information should be used later in calculation, it is easier to store bad quality data as System Digital State. That means writing e.g. “Bad Quality” string into numeric values. PctGood function exists as an optional parameter for the most of the aggregate functions. It considers as "bad" everything from System Digital State. Aggregate calculated values are time-weighted such that the “bad” values are not considered. The minimum PctGood parameter ensures that results are not reported based on untrustworthy data. For example, if only 20% of the values for a tag are “good”, a performance equation will report “Calc Failed” rather than the result of a calculation based on the scant data. Note that the default for PctGood is 80%, but can be easily changed when calling the functions. PctGood does not consider questionable values as "bad" values, therefore a different condition using IsSet function needs to be used in such a case.

Data Gaps/Missing Data:

There are multiple useful functions to check if the data rate is as expected. The link to the documentation for all functions: Expression functions reference. Few examples might be:

EventCount (Find the number of events for an attribute within a specified time interval)

HasChanged (Returns True if an attribute has any event in the specified time period; otherwise returns False.)

HasValueChanged (Determine if the value of an attribute or expression has changed since last evaluated during an analysis.)

NumOfChanges (Return the number of changes in value for an attribute within a specified time range.)

SecSinceChange (Return the time in seconds since an attribute value changed.)

PctGood (Find the time percentage, over a specified range, when values for an attribute are good.)

Badval (Test a value to see if it is bad. For real and integer points, a bad value is a digital state value. For digital points, a bad value is a value outside the point's digital state set.)

IsSet (Determine if a PI value is annotated, substituted, or questionable.)

All calculations can be defined on the template level and so ensure we are always using the same criteria for the same kind of data.

Data below/over limits

In AF, we have options to set specific limit child attributes, automatically recognized by e.g. Coresight:

We can then use this limit further in calculations/Notifications as child attribute, e.g:

Statistical quality control

Again, there are multiple function to use for statistical calculation. There is also a new type of SQC analysis, which can be used for example to keep track of Outside limits, Outside Sigma events, etc.

PStDev

Returns the population standard deviation for the arguments. For numerical arguments the result is a number. For arguments that are time expressions (absolute times or time periods), the result is a number indicating a time period expressed in seconds.

The population standard deviation of a population x1...xn is

where

is the mean of the arguments; that is,

SStDev

Returns the sample standard deviation of the arguments. If the arguments are numbers, the result is a number; if they are times or time periods, the result is a time period (number of seconds).

The standard deviation of a sample x1...xn is equal to

Where

is the sample mean; that is,

StDev

Returns the time-weighted standard deviation for attribute values from the specified time range.

StDev(attname, starttime, endtime [, pctgood])

TagAvg

Returns the time-weighted average of attribute values during a specified time range.

TagAvg(attname, starttime, endtime [, pctgood])

TagMax

Returns the attribute's maximum value during the specified time range.

TagMax(attname, starttime, endtime [, pctgood])

TagMin

Returns the attribute's minimum value during the specified time range.

TagMax(attname, starttime, endtime [, pctgood])

TagMean

Returns the attribute's average value over the given time. Notice that the average is not time-weighted.