WEBVTT
00:00:00.000 --> 00:00:08.796 align:middle line:90%
00:00:08.796 --> 00:00:10.170 align:middle line:84%
One of the most
common challenges
00:00:10.170 --> 00:00:12.260 align:middle line:84%
we find when working
with data is simply
00:00:12.260 --> 00:00:14.390 align:middle line:84%
identifying and getting
hold of the right data
00:00:14.390 --> 00:00:16.870 align:middle line:84%
to answer the particular
question at hand.
00:00:16.870 --> 00:00:20.060 align:middle line:84%
And we can be quite creative in
finding other data sources that
00:00:20.060 --> 00:00:22.820 align:middle line:84%
contain information that help
us to solve whatever problem it
00:00:22.820 --> 00:00:23.930 align:middle line:90%
might be.
00:00:23.930 --> 00:00:25.580 align:middle line:84%
And to pick one
example, we recently
00:00:25.580 --> 00:00:27.440 align:middle line:84%
worked with Highways
England on the problem
00:00:27.440 --> 00:00:30.560 align:middle line:84%
of identifying fog patches
on motorway networks, which
00:00:30.560 --> 00:00:33.080 align:middle line:84%
is a very difficult problem
for various reasons.
00:00:33.080 --> 00:00:35.330 align:middle line:84%
Unfortunately, we don't
have direct measurements
00:00:35.330 --> 00:00:38.300 align:middle line:84%
of fog from sensors
that we can trust
00:00:38.300 --> 00:00:39.980 align:middle line:84%
and that are sufficiently
good quality.
00:00:39.980 --> 00:00:43.760 align:middle line:84%
But we were able to look at
traffic patterns and movements
00:00:43.760 --> 00:00:46.070 align:middle line:84%
of traffic, speeds of traffic,
in the different lanes
00:00:46.070 --> 00:00:49.070 align:middle line:84%
of the motorway, and use that
as a kind of proxy measurement
00:00:49.070 --> 00:00:51.560 align:middle line:84%
that might be indicative
of the presence of fog.
00:00:51.560 --> 00:00:54.440 align:middle line:84%
Understanding the quality of
the data, by which I really mean
00:00:54.440 --> 00:00:55.940 align:middle line:84%
the fitness for
purpose of the data
00:00:55.940 --> 00:00:57.380 align:middle line:84%
to address a
particular solution,
00:00:57.380 --> 00:00:58.850 align:middle line:90%
is extremely important.
00:00:58.850 --> 00:01:00.920 align:middle line:84%
For example, shadows
in satellite images
00:01:00.920 --> 00:01:03.050 align:middle line:84%
might affect the
quality of the image.
00:01:03.050 --> 00:01:06.260 align:middle line:84%
We might be looking at gaps
in sensor records caused
00:01:06.260 --> 00:01:08.960 align:middle line:84%
by weather conditions
or by interruptions
00:01:08.960 --> 00:01:11.860 align:middle line:84%
in internet connectivity
or something like that.
00:01:11.860 --> 00:01:15.170 align:middle line:84%
And we really need to understand
all those particular nuances
00:01:15.170 --> 00:01:18.330 align:middle line:84%
of the data in order to be
able to use them effectively.
00:01:18.330 --> 00:01:20.330 align:middle line:84%
In many of our projects,
we need to combine data
00:01:20.330 --> 00:01:22.080 align:middle line:90%
from lots of different sources.
00:01:22.080 --> 00:01:24.590 align:middle line:84%
One of the key challenges is
simply that data collected
00:01:24.590 --> 00:01:27.322 align:middle line:84%
by different organisations
may be registered differently.
00:01:27.322 --> 00:01:29.030 align:middle line:84%
For example, they may
use different names
00:01:29.030 --> 00:01:30.260 align:middle line:90%
for the same place.
00:01:30.260 --> 00:01:32.600 align:middle line:84%
They may use different ways
of identifying a position
00:01:32.600 --> 00:01:33.910 align:middle line:90%
on the Earth's surface.
00:01:33.910 --> 00:01:36.360 align:middle line:84%
Or there may be even more
difficult challenges than that.
00:01:36.360 --> 00:01:38.840 align:middle line:84%
So we need to really understand
the nature of the data
00:01:38.840 --> 00:01:41.185 align:middle line:84%
in order to be able to
combine them successfully.
00:01:41.185 --> 00:01:43.310 align:middle line:84%
When getting to grips with
the complexities of data
00:01:43.310 --> 00:01:45.809 align:middle line:84%
from different sources, one of
the most important techniques
00:01:45.809 --> 00:01:47.600 align:middle line:90%
that we use is visualisation.
00:01:47.600 --> 00:01:50.000 align:middle line:84%
To create a picture of the
data that we can understand
00:01:50.000 --> 00:01:51.860 align:middle line:84%
as humans, that give
us a lot of insight
00:01:51.860 --> 00:01:53.300 align:middle line:84%
into the nature
of the data, what
00:01:53.300 --> 00:01:55.220 align:middle line:84%
it's good for, what
it's not so good for,
00:01:55.220 --> 00:01:57.590 align:middle line:84%
and also for communicating
the results of our work
00:01:57.590 --> 00:02:00.427 align:middle line:84%
to our customers and
to the wider world.
00:02:00.427 --> 00:02:02.510 align:middle line:84%
We have a lot of technical
solutions and computing
00:02:02.510 --> 00:02:05.780 align:middle line:84%
solutions for working with big
data in all its various forms,
00:02:05.780 --> 00:02:07.460 align:middle line:84%
but what's really
important to remember
00:02:07.460 --> 00:02:10.580 align:middle line:84%
is that we really need experts,
human experts, domain experts,
00:02:10.580 --> 00:02:12.170 align:middle line:84%
who understand the
nature of the data
00:02:12.170 --> 00:02:15.490 align:middle line:84%
and how it can be applied
in any given situation.