Andy Roxburgh

Research Student @ University Of Liverpool

My Research

Using Machine Learning techniques to boost language
acquisition

A Quick Summary:

Firstly, I'm trying to see if it's
possible to predict the words a toddler may learn next, by gathering
information about their family, environment, current vocabulary and activities. In fact we know
it is possible[1] - for roughly the following month - but by recording a lot of extra
information and using different techniques, I'm hoping to improve the
accuracy of this prediction.
Then the next stage is for the system to recommend that the parent concentrates on certain
words, with the hope that by learning those words the child may pick up language quicker than normal.
Ultimately if it works, then this may have potential use as a technique for helping children
that have communication delays (such as those with ASD).

I need as much data as I can get, so I need volunteers!
I'm not gathering data yet but I'm interested in hearing from anyone that may be interested when I
do start the process. If you are pregnant or have a child aged
under 4, I'd love you to help me - click here!
You'll only need to record any new words you hear your child use, and answer a few
questionnaires. Phase 2 of the study will ask you to put particular effort into helping you
child learn particular words.
It will all be done via an app or website.

In more detail:

Children’s level of language acquisition from around the age of two years upwards has been
shown to be positively correlated with their later performance at school[2, 3]. It follows that one
way to improve a child’s future school performance would be to encourage him or her to acquire language as
early as possible.
Children prefer to learn words that they can categorise with other words that they already
know [4] – ﬁrstly through a similarity of shape (the 'shape bias'[5]) and then though other more complex
associations[6] as the child's mind creates more categories. It follows, then, that if a system used by the
parent – such as a mobile device based application – could be used to log information about the child's language
development, it could also give advice to the parent about which words to encourage the child to learn
next, those words having been judged by the system to be the best for expanding the child's vocabulary.
This may lead to the child learning more language at an earlier age. This could have particular application
to children who are already in groups likely to experience a delay in language acquisition (by
virtue of demographic factors[7] or for medical reasons).

However, no two children are the same. A child living on a farm may have diﬀerent
environmental inﬂuences
on their vocabulary compared to a child living in an inner-city area. Two
words that may be
closely semantically linked in one child's mind may not be linked at all in the mind of
another.

Contemporary machine learning techniques combined with cheaper and higher-performance computer
hardware have shown great success in moving the
ﬁeld of pattern recognition forward in recent years, and are being used to improve many applications
of artiﬁcial intelligence. Recent advances include individual patient health prediction based on
collective health record data [9, 10, 11]. In general terms, diagnosing possible health problems in a patient based on
that patient’s health records and history – with access to a large volume of other patient health data – is
analogous to our problem; although instead of predicting the likelihood of a particular health issue, we
are predicting the likelihood of the child learning a particular group of words.
In our proposed system, machine intelligence techniques should suggest the best words to learn next,
based on the child's vocabulary and environmental profile. But it should also be continually learning
and attempting to improve itself. Of particular relevance here is the
Reinforcement Learning[12] technique, and more speciﬁcally Reinforcement Learning with Long Short-Term
Memory [13, 14].
So it may be that by using machine learning techniques on our existing child vocabulary
data, the individual’s environmental data, and current personal language acquisition data, subtle
individualised associative connections can be identiﬁed and used to inform the ’advisor’ part of the system.

If successful, the system may also have applications for non-typical subjects such as
children with Autism
Spectrum Disorder. Ultimately it may even have applications for individuals with brain
injuries aﬀecting speech and language.

Supervisors:

Dr Floriana Grasso
Dr Terry Payne

Research Question:

Using machine learning methods, is it possible to create a computational model that will
predict the words
that a given typically-developing child – represented by static and time-varying
environmental and vocabulary
data – is most likely to learn next? If so, can we use the model to boost the rate of
language acquisition?
If the model works on typical children, can the model be trained on atypically developing
children such as
those with ASD?

Phase 1

Initial experimental work

Collection of public data, replication of existing experiments to verify method, testing of different techniques.

Phase 2

Data Collection

Data Collection will be via a hybrid web app. Volunteers will
answer a questionnaire about their communication environment,
and record words learned by their child along with the timestamp, and a general idea of what
the child has been up to. They will subsequently be asked to complete a questionnaire (UK-CDI)
via the app at regular intervals.

Phase 3

Data Collection and feedback

Data Collection will again be via a hybrid web app. Volunteers will
again answer a questionnaire about their communication environment,
and record words learned by their child along with the timestamp, and a general idea of what
the child has been up to. This time however the app will encourage the volunteer to concentrate
on certain
words. They will subsequently be asked to complete a questionnaire (UK-CDI)
via the app at regular intervals.

Phase 4

Analysis

The collected data will be analysed to determine if it is boosting language acquisition.