Machine Learning Techniques in Reading Tracking (Research Series)

As one of the primary suppliers of AI and machine learning consulting on the local market, our squad partnered with Beehiveor R&D Labs for a joint research project on a reading assistant system (RAS). We were inspired by different fitness apps and wrist trackers, which use accelerometers, heart rate sensors and GPS data among other things to help people do physical activities better. Similarly, if we can track gaze movement, why not to use this data to facilitate visual activities. Reading was chosen as a primary research objective since it’s one of the activities humans perform daily.

There are certain problems people encounter while reading, and everyone resolves those problems in a unique way: rereading difficult parts of a text, googling unknown words, writing down details to remember. What if this could be automated? For instance, to track human reading, evaluate speed, distinguish reading gaze movements from other activities, and annotate hard-to-read places. These are features that can be extremely useful for people who have to tackle huge amounts of text every day.

The definition

Before we get into the research process and findings, let’s figure out what Reading Assistant System (RAS) is and why it is so important. RAS is defined as an AI and gaze tracking-based system for various reading analysis purposes. RAS can be used in many settings, and the modern application is well documented in education, medicine, HR, marketing and other areas.

Here’s how the technology will behave in practice. The RAS program can automatically track movement of your gaze and match it with the text on the screen. It will allow to process reading patterns in real time and store all the metadata for further analysis.

Thanks to deep learning-trained neural networks, a RAS can identify with up to 96% accuracy whether the person is reading at a given moment.

The hypothesis

The initial purpose of our research was tri-fold:

To check a high accuracy hypothesis

To create a real-time model

Further research the gaze patterns

Our team of 3 people carried out research activities over 2 months in the scope of DataRoot University activities. The primary task was to create a program for human visual activity analysis by gaze movement that would incorporate:

Dataset gathering

For tracking human activity we used GazePoint eye tracker. The tool allowed us to receive gaze coordinates with a 1-1.5 angle error after calibration. During each session, the GazePoint Analysis app recorded face and screen video along with tabular data about gaze movement.

The whole dataset consists of 2 parts: 51 reading and 85 non-reading time series. Each participant performed the following actions in the scope of our research:

Dataset preprocessing and features selection

In the course of our research, we found that tracked coordinates cannot be ideal. Blinking, head movements, variable lighting – all of those factors interrupt or spoil data flow.

We figured additional steps could alleviate the situation, and we picked smoothing gaze movement as a primary option, even though the value of this approach has limits. Smoothing eliminates one important feature – microsaccades. A saccade is a quick, simultaneous movement of both eyes between two fixation points. A microsaccade is a movement within one fixation which provides an answer as to how users fix their gaze. Though smoothing is not the ideal choice when it comes to saccades, it does help with approximating word detection.

Here’s what filtering meant for our research purposes:

Filtering by BPOGV, FPOGV

Filtering only screen gaze movements

In order to easily manipulate the dataset and train/test models, we selected a 100 observations-wide window (average time for reading a single plane line on A4 paper). This resulted in splitting all dataset into 24,568 reading and 14,288 non-reading time series of 100 observation length, considering a 90% overlap.

Reading/non-reading classification

Our squad used three main techniques for time series classification, described in detail below.

Time series research and manual feature extraction with MLP. We created three feature groups:

Linear trend detection for FPOGX

MSE after linear approximation for FPOGX. In other words, trying to fit a line to the x-axis trajectory.

The feature extraction, process took us 1 week, with multi-layer perceptron (with one hidden layer) providing an 85% accuracy on the test subset. To further improve this technique requires additional manual feature extraction. In our opinion, the selected features are not informative and do not describe data well.

This approach gave us even less accurate results at around 63%.Adding extra LSTM layers did not help us either. We figured the possible reason for this could be the small dimension of input data and inability of LSTM to accumulate global information about time series.

Convolutional neural network. dX, dY features were used. After some tuning we found an optimal architecture with 113,006 trainable parameters:

This model produced a 96%accuracy rate on the test subset and was subsequently chosen as a base model for future research.

Reading patterns clustering

Our main task here was to classify every fixation group (observations grouped by FPOGID) as one of the 3 main patterns: saccade, sweep, regression.

A major obstacle we encountered at this stage was dataset labeling since data with a 60 Hz frequency is inherently hard to label. This turned out to be a problem for clustering as well. Some of the minor issues we tackled were the high similarity between regressions and sweeps, as well as fixations during scrolling turning out to be outliers. To exclude fixations during scrolling we used a reading classification algorithm.

The entire dataset was filtered from points to saccades only. To obtain saccade data we grouped points by fixation id (FPOGID) and took only the last observation from every group. As a result, all of the identified saccades were split into min/max values with a naive algorithm along the horizontal axis, and the minimal saccades were divided into sweep and regression groups. We used K-means clusterization on the three basic saccade features to achieve the required results:

Results and challenges

The outcome of our project is a machine learning model that’s able to predict with a 97% accuracy whether the user read/did not read the text for 1.6 seconds of recording. Some of the major findings in the course of the research include:

an algorithm that relies on predictions of a previous ML model can count gaze movement (regression, sweeps, and saccades) and calculate relative reading speed;

an algorithm that gives information about a reader’s point of interest. It provides a weight coefficient for a given word that may represent importance for the reader.

Scientists who research such human behavior as gaze tracking can uncover a previously locked domain in health tech and business. As substantiated by our algorithm, you can make real-time predictions based on gaze tracking tech that can significantly improve the usage and rate and accuracy of any RAS and possibly go beyond that with more scientific application nobody thought was possible before.

We use cookies to make your exploring a better experience. Please see our cookies' policy.

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

You can adjust all of your cookie settings by navigating the tabs on the left hand side.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

disable

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.