The challenge

Get the right data for the best predictions

Companies rely on data analysis to identify patterns, make correlations, gain insights and make better predictions. Often, "big data" is seen as a solution for making these predictions. But collecting a vast quantity of data doesn't necessarily yield greater insights.

When companies don't know how much information the data they are collecting contains, they can waste time and money on data that doesn't help their predictions.

Our response

With sparse data, every point counts

We developed Determinant – a software platform that makes accurate predictions by estimating the information content of new data before it is acquired, so that each new data-point provides the maximum possible improvement to making predictions.

Using a technique called active learning, Determinant models its own confidence and how that confidence would improve with more data, allowing it to effectively manage sparse data problems.

Determinant asks users to acquire expensive data iteratively: As it ingests each new point, it instructs the user what to measure next for maximum new information, enabling the end user to determine what data they need to get the most accurate outcome.

Determinant also knows when more data will give diminishing returns. This ensures that every dollar used for data is money well spent.

Surveys and demographic sampling

Determinant has been used to predict the national sales performance of a product. It did this by learning a relationship between local sales data sets and nationwide demographics acquired from sources such as the Census .

Crucially, Determinant can also suggest where additional market surveys should be acquired. This allows it to improve its accuracy in a manner that complies with a wide range of constraints, such as finite survey budgets, time limitations and risk reduction.

A map of Australian Local Government Areas (LGAs) predicting the outcome of a survey taken only in Victoria. The prediction is a measure of average job satisfaction (purple is higher), and uncertain predictions are represented with transparency.

Determinant can be used to select which local government area to sample next to best reduce uncertainty of the model.

Determinant can be used to select which local government area to sample next to best reduce uncertainty of the model.

Geographic sampling

Understanding the chemical composition of soil is important to agricultural and mineral industries for crop yield and resource estimation, respectively.

Taking a sample at the brightest points improves the predictive model the most. The input data is courtesy of Geoscience Australia, who collaborated with Data61 on this work.

Taking a sample at the brightest points improves the predictive model the most. The input data is courtesy of Geoscience Australia, who collaborated with Data61 on this work.

Using Determinant, we analysed dense coverage satellite imagery to learn a relationship between hyperspectral reflectance and pre-existing soil samples in order to predict the chemical composition across the entire locality.

The traditional method for soil composition data acquisition often involves arbitrary uniform grid surveys and random samples spread across the region. But by using Determinant we were able to identify the optimal locations for future soil sample collection that would be the most informative to the model. This meant that considerably fewer soil samples were needed to construct an accurate predictive map of the region, saving both time and money.

The results

Higher accuracy with smaller data

The technology can be applied across a range of industries: from financial services, mining, government services to agriculture.

Quantifying confidence in predictions through probability is critical to making decisions in data-sparse environments. Determinant uses state of the art Bayesian machine learning algorithms to give robust estimates of its own confidence in predictions. This enables users to make better decisions with explicit risk/reward trade-offs.

Determinant is also cloud-hosted with a well-documented REST API for managing data, models and predictions. With support of a variety of open data formats, Determinant is simple to integrate with existing data pipelines.