Research

The research of our lab revolves around three major directions:

High-Performance & Approximate Data Analytics

The amounts of data we can use to gain develop insight and knowledge are growing rapidly and are vast. Analysing these amounts of data requires substantial computational resources which today can only be found in high-performance computing infrastructure. Scaling out analyses on supercomputers therefore is key!In this area, we therefore develop new approaches to high-performance data analytics. We optimize existing analytics algorithms and develop new ones for high-performance computing infrastructure. To provide further efficiency and scalability, we develop novel approximate analytics approaches: by sacrificing little precision (with provable error bounds) we can accelerate analytics substantially, e.g., by sacrificing less than .01% precision, we accelerate analytics by more than one order of magnitude (see ADvANCe, MOVE and others).

In this area, we therefore develop new approaches to high-performance data analytics. We optimize existing analytics algorithms and develop new ones for high-performance computing infrastructure. To provide further efficiency and scalability, we develop novel approximate analytics approaches: by sacrificing little precision (with provable error bounds) we can accelerate analytics substantially, e.g., by sacrificing less than .01% precision, we accelerate analytics by more than one order of magnitude (see and others).

Spatial data is everywhere and is generated in vast amounts, be it from satellite surveys, smart sensors/IoT, GPS traces, semantically enriched, computational fluid dynamics (CFD), medical imaging and many more. At the same time applications as interactive maps, urban planning, medical diagnostics, simulation of CFD models depend on the efficient analysis and processing of vast amounts of spatial data.

In this line of work, we develop novel methods to efficiently analyze spatial data in the broadest sense. We work with road network and neuroscience data (modelling parts of the brain) to develop spatial analytics to efficiently extract subsets in large datasets, to find intersections between objects in vast amounts of spatial data and many others (see FLAT, OCTOPUS, THERMAL-JOIN, TRANSFORMERS, RUBIK and others). With our work, we aim at efficiently and effortlessly support the large-scale analysis of spatial data across applications.

Data Management on Novel Hardware

Hardware, i.e., CPU’s, storage and memory technology and others, evolves at a rapid pace. Understanding in detail the characteristics of new hardware, like the read/write performance characteristics of new storage technology, is key to adapting as well as optimizing data analysis algorithms.

In this line of work we consequently develop and optimize algorithms for enabling the efficient and scalable analysis of large amounts of data on novel hardware. We focus in particular on new ideas around storage, e.g., cold storage devices or shingles magnetic recording disks (SMR’s) for archiving data and its occasional analysis, and computing, e.g., neuromorphic hardware as a scalable & energy efficient analytics platform.

Applications

Our research is always motivated by real-world applications and use cases. It is currently driven by two major areas, scientific applications and spatial analytics.

Scientific Applications & Neuroscience

Scientists across different disciplines produce vast amounts of data through experimentation and simulation. While the amounts of data produced are already so big that they can barely be managed, the problem is certain to become worse as more and more data is generated and collected. A lot of our research is therefore driven by the needs of scientists in general and neuroscientists in particular.

We address the problems of neuroscientists on their quest to understand and simulate the rat brain. More specifically, we work with neuroscientists in the Human Brain Project (http://humanbrainproject.eu) to manage the vast amounts of data they use and produce. Their research, modeling and simulating a fraction of the rat brain, produces terabytes of data. Current solutions are inadequate to manage this data volume and we are thus investigating new methods to index and store it in order to provide efficient and scalable access. A particular problem we are currently addressing is the retrieval of objects in space, i.e., accessing neurons based on their position. While it is simple to index several thousand neurons, the neuroscientists have to do it for several millions or even billions of neurons. We are developing new spatial indexes to solve this problem.

Improving mobility and decreasing congestion are some of the biggest challenges facing cities today. Congestion impacts the daily lives of commuters, as well as businesses and visitors to any city. Sensors, the Internet of Things (IoT), GPS data and other sources of data provide city planners with a wealth of data. The data contains important hints to develop smart transport solutions that reduce congestion as well as to optimise the use of city public transport. Extracting the information and hints in this deluge of data, however, is a challenge due to the size as well as the number of heterogeneous sources.

We work with transport authorities to address these issues. More precisely, we develop the infrastructure to integrate and analyse heterogeneous data sources (data with a spatial aspect, e.g., sensors, GPS, maps, weather radar and others) to enable spatial analytics on it. Spatial analytics is used for applications like city planning and to optimise the use of limited road space (even in real time).

Demos & Visuals

We turn as much of our technology as possible – research projects and student projects – into cool demos! Check out our super whizz bang videos of some of our applications below.

Spatial Analytics on Novel Interfaces

The emergence of tablets has changed the way we interact with data substantially. No longer do we use slow and cumbersome scroll bars to look at query results but instead use touch interfaces which enable us to browse and analyse data extremely fast. Clearly, the underlying data infrastructure which was optimised for scrolling and thus sequential data access must be able to efficiently and scalable enable this fast and rather random access.

In this video, we show how our implementation of an index on the iPad (FLAT, NEURO) enables the efficient exploration of data via a touch interface. In a pich the user can choose a small subset (query region) of a small model showing a small model representing several thousand neurons. The app subsequently loads a more detailed representation of the query region so the user can inspect the neurons in more detai

Virtual Reality Scientific Data Exploration

This video shows our ground-breaking approach to visualise and analyse large scale scientific models. Using an HTC Vive headset as well as haptic gloves, users can immerse into a detailed model of the brain and analyse in virtual reality. Walking in the model to look at different parts of the brain, they can use gestures to pan, zoom and select further subsets to study the model in great detail. The video shows a first demo of the visualisation which can be extended to visualise other models and can also support sophisticated analyses.This particular video shows the message propagation mode: after selecting a subset of the model, messages will be injected into the branches crossing the subset of the model. The messages travel along the branches of the neutrons and leap over between them. The visualisation helps to understand the connectivity of the brain model.

Energy-efficient Classification on Wearables

Sensors are becoming ever more pervasive and more powerful, meaning that we can perform increasingly complex tasks on them. This video shows our demo of a wearable device which collects data and classifies it using a neural network on the wearable device itself. The neural network is optimised for size (thus accuracy of classification) as well as energy efficiency. This particular application/demo shows the classification of physical exercises (e.g., push ups) on a mobile phone. The sensor on the body uses an inertial measurement unit to collect acceleration data. The data is classified on the wearable device using a neural network and is sent to a mobile phone using Bluetooth. The phone gives the user feedback on the quality and quantity of different exercises done.

Publication Highlights

Below is a selection of our publications. Click here for a complete list.

Spatial joins are becoming increasingly ubiquitous in many applications, particularly in the scientific domain. While several approaches have been proposed for joining spatial datasets, no single method can efficiently join two spatial datasets in a robust manner with respect to their data distributions. Some approaches do well for datasets with contrasting densities while others do better with similar densities. None of them does well when the datasets have divergent data distributions. In this paper we develop TRANSFORMERS, an efficient and robust spatial join approach that is indifferent to such variations of distribution among the joined data. TRANSFORMERS achieves this feat by departing from the state-of-the-art through adapting the join strategy and data layout to local density variations among the joined data. We experimentally demonstrate that TRANSFORMERS outperforms state-of-the-art approaches by a factor of between 2 and 8.

Analyzing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, however, current approaches for data analysis struggle. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time.Crucial to the data analysis and clustering process, however, is that it is rarely straightforward. Instead, parameters need to be determined through several iterations. Entirely accurate results are thus rarely needed and instead we can sacrifice precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach to approximating DBSCAN. ADvaNCE uses two measures to reduce distance calculation overhead: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations. Our experiments show that our approach is in general one order of magnitude faster (at most 30x in our experiments) than the state of the art.

Spatial joins are becoming increasingly ubiquitous in many applications, particularly in the scientific domain. While several approaches have been proposed for joining spatial datasets, no single method can efficiently join two spatial datasets in a robust manner with respect to their data distributions. Some approaches do well for datasets with contrasting densities while others do better with similar densities. None of them does well when the datasets have divergent data distributions. In this paper we develop TRANSFORMERS, an efficient and robust spatial join approach that is indifferent to such variations of distribution among the joined data. TRANSFORMERS achieves this feat by departing from the state-of-the-art through adapting the join strategy and data layout to local density variations among the joined data. We experimentally demonstrate that TRANSFORMERS outperforms state-of-the-art approaches by a factor of between 2 and 8.

Team & Jobs

Teamwork makes the dream work! We are always looking for talented and driven individuals who want to join our team. Get in touch if you are interested. If you are interested in joining as a PhD student, see here. For additional funding opportunities, see here. We also have regular openings for PostDocs (advertised on the usual job portals). See here for additional funding opportunities for postdoctoral research.

STUDENT TESTIMONIALS

I liked his jokes.

The lecturer did a great job on patiently explaining the concepts to us, thank you Dr Heinis.

CO130 Student

Dr Heinis lectured this course very well.

CO130 Student

...I love your self deprecating humour and your down-to-earth attitude and I can wholeheartedly say that I really enjoyed the lectures.

CO130 Student

Thank you very much for an excellent lecture. The work and topics covered has already given me an ’edge’ in my current work environment.

Student

Contact Us

Get in touch with us! No matter if you have questions about our research, if you want to join our team or in case you want to explore opportunities for collaboration, use the form below to contact us or send a message to doc-scalelab@imperial.ac.uk!

Ph.D. Funding Opportunities

We are always looking for driven and talented Ph.D. applicants interested in developing novel data management techniques deployed and used across different disciplines. We are particularly looking for students with a strong background in data management and ideally also with a background in a different field (life sciences, natural sciences etc.)

Most funding opportunities are for European Union students, but there are several opportunities for overseas applicants as well:

PostDoc Funding Opportunities

If you are interested in collaborating with our group based on an externally-funded scholarship, for example a Marie-Curie post-doctoral fellowship, as an experienced researcher, please get in contact with us. Several other opportunities for fellowships (including partial ones) are listed below. Please send a message to t.heinis@imperial.ac.uk if you are interested.

Applying for a Ph.D. position

We are looking for aspiring researchers that want to pursue a Ph.D. (in 3 to 3.5 years) in the broad area of scientific data management. The group focuses on scientific data management and high impact interdisciplinary research, i.e., developing ground breaking and novel data management techniques strongly motivated and used in other disciplines (see examples of past research here and demos here. The research interests of a successful applicant have to overlap considerably with the group's interests:

Big Data, Distributed Indexing & Processing

Scientific Data Management

Spatial Data, Spatial Indexing

Spatio-Temporal Indexing

High-dimensional Indexing/Clustering

In-Memory Indexing

To apply you will need to have a strong background in computer science (M.Sc. or B.Sc. in Computer Science or very closely related) and ideally solid experience with data management. Given the interdisciplinary nature of our group's research, the ideal candidate also has a background in a different discipline.

You must have excellent communication skills and prioritise work to meet deadlines. All applicants must be fluent in spoken and written English. Preference will be given to applicants with publications in the relevant areas.
How to apply: please send a message to t.heinis@imperial.ac.ukApplications must include the following:

A full CV

Scan of your transcripts of your studies

Contact information for 2 references who have agreed to speak about you, your work, and your potential

Starting date: as soon as possibleClosing date: open

About Imperial College and London

Imperial College is first class address to pursue excellent, high impact research. Imperial College consistently ranks among the top 5 schools in the world (Times Higher Education & QS rankings). The Department of Computing is also a leading department of Computer Science among UK Universities. It has consistently been awarded the highest research rating (5*) in Research Assessment Exercises (RAE), coming 2nd in the 2008 RAE, and was rated as "Excellent" in the previous national assessment of teaching quality.

Noisy, vibrant and truly multicultural, London is a megalopolis of people, ideas and frenetic energy. The capital and largest city of both the United Kingdom and of England, it is also the largest city in Western Europe and the European Union. Situated on the River Thames, London is an international capital of culture, music, education, fashion, politics, finance and trade which offers ample activities (besides research that is) for every interest, be it culture, sport events, shopping and clubbing.