Given a series of time-stamped numerical values, using the Outlier operator in a query can identify values in a sequence that seem unexpected, and would identify an alert or violation, for example, for a scheduled search.

To do this, the Outlier operator tracks the moving average and standard deviation of a numerical field. An outlier is identified based on a specified threshold of standard deviations around the expected value. If a data point is outside the threshold, it is considered to be an outlier.

Rules

Limitations

Because the most recent time bucket in a query may have incomplete data, it is ignored by outlier. Consequently, if an alert is set to trigger on <field>_violation changing to 1, this alert will trigger one timeslice later.

Examples

IIS logs

Run the following query to find outlier values in IIS logs over the last 6 hours.

Use an additional “group by” clause to find outliers for multiple values of _sourceHost.

This way, you can run outlier analysis separately for each value of _sourceHost, as shown.

This example will only produce an aggregation table, not a chart, but the indicator and violation fields will correctly reflect each _sourceHost processing.

Alert on an outlier

This query counts the number of errors over time and sends an alert when an outlier is detected. When an outlier is detected, the value of <field_name>_violation will be set to 1. In the example below, the <field_name> is _count. By creating a Real Time Alert and sending a notification if greater than 0 results are found, you can alert on an outlier.

"error"
| timeslice by 15m
| count as today by _timeslice
| compare timeshift -1d as vs_yesterday // create a delta field that represents the difference between historical and current data
| (today - today_vs_yesterday) as delta // use an Outlier to statistically monitor spikes or dips in the delta
| outlier delta

Multidimensional Outlier Detection

The Outlier operator supports multidimensional or multi-time series detection. Multidimensional outlier detection is useful when you want to monitor the behavior of each user, server, application feature, or other single “entity”, rather than some aggregation across all entities.

For example, you could detect failed logins by user. To do so, you would want to understand whether any user account, individually, has experienced a strange amount of failed logins, not whether we’ve seen some spike in the average or total amount of failed logins across all users. The latter may be useful, but with hundreds or thousands of users (entities), a spike in failed logins may get lost in the noise of a “normal” amount of total failed logins, and you could miss a spike in failed logins for one specific user.

Other examples include:

Detecting anomalies while tracking page faults, disk operation, or CPU utilization for all the nodes in a cluster simultaneously.

Monitoring the performance of every workstation simultaneously, without the need to build an outlier report for each one.

Monitoring failed image uploads for every user of an application (not total failed uploads across all users).

If you have used the outlier operator, it is easy to create a multidimensional outlier operation. Just add by <dimension> to the end of the query.

For example, the following example query will determine many time series, one per each _sourceHost:

You can display the raw results of a multidimensional time series in a table chart, but currently other chart options are not available.

In the following table chart, a value of 1 in the _count_violation column indicates that the data point corresponding to that timeslice is an outlier.

Alerts Based on Multidimensional Outlier Results

To create an alert based on the multi-series outlier table above, extract _count_violation.

This way, you won’t need to build an alert for each series of data (each _sourcehost in the previous example), and you can automatically monitor a dynamic series for deviating behavior.

The following example query allows you to monitor when application users experience failures. It monitors all user accounts by unique user ID, and applies outlier to the amount of “fail” messages that occur across every user account:

Once you have run the query, you can click Save As to create a Scheduled Search and configure it to send an alert when any user account experiences an unusual amount of failures, or other event you want to monitor each series of data for.

To visualize your results, on the Search page, you can create a column chart, then change the stacking property to normal to display alerts by unique user_id (the multidimensional aspect).

Chart Multidimensional Outlier Results

This section provides two examples of how to display multidimensional outlier results in charts.

Example 1: Outlier Distribution Across Time

In this example, we’ll extract _count_violation from the multi-series outlier table and display that. This allows you to display the distribution of outliers among various time-series.

When you select a line chart, this example will display something like the following:

Example 2: Outlier Ranking

This example query uses the _count_error (distance from the expected value for that timeslice) and the value of the standard deviation for the baseline, then determines how many standard deviation a data point is from its expected value.

This way, you can display outliers visually in terms of deviation from the expected value.

When you select a line chart, this example will display something like the following:

In the line chart, you can see which series is producing the most “deviating” outliers.

This approach effectively displays the severity of the outlier, because the spikes represent the magnitude (how many standard deviations the value is from the mean) in one time-series compared to another time-series.

Recommended articles

Sumo Logic is the industry’s leading secure, cloud-native, machine data analytics service, delivering real-time, continuous intelligence across the entire application lifecycle and stack. More than 1,000 customers around the globe rely on Sumo Logic for the analytics and insights to build, run and secure their modern applications and cloud infrastructures.