Student Projects (2010)

NetOS

This page collects together various Part II project suggestions from
the Network and Operating Systems part of the Systems Research Group.
In all cases there is a contact e-mail address given; please get in
touch if you want more information about the project.

Under construction:
please keep checking back, as more ideas will hopefully be added to
this page during the coming weeks.

Note: there are a number of stumbling blocks in Part II project
selection, proposal generation and execution. Some useful guidance
from a CST alumnus (and current NetOS PhD student) is
here.

Graph theory learnt from the textbook is restricted to static, 2D
representation of nodes and edges, however with real traces of human
contacts and massive OSN's such as facebook and twitter, it is clear
that graphs change over time. However, adding another dimension to
this analysis clearly adds complexity in visualising and understanding
such rich information.

This project will develop novel visualisations to convey and
highlight important features of temporal graphs to aid researchers in
their analysis of such data. The tool will be evaluated on real,
human mobility traces and massive online social networks
i.e. facebook, twitter etc. with millions of nodes with timestamped
edges. The ideal candidate will be comfortable with graph theory, 2D
and 3D programming and optimising code in the programming language of
their choice. Familiarity with complex network analysis or social
network analysis would be helpful but not strictly necessary. This
project will suit students interested in a career in online social
networks and related businesses.

In the context of the FRESNEL
project a number of sensors will be deployed around the computer
laboratory. The deployment will consist of sensors with varying
sensing capabilities, including temperature, humidity, ambient light,
Bluetooth scanning, audio sensors and cameras.

The purpose of this project is to build an application that will
utilise the varying sensing capabilities of the network to determine
the occupancy of office spaces. The general idea is to collect a
range of sensor readings and to correlate these readings with the
ground truth in order to identify an algorithm for detecting office
occupancy. Data fusion techniques may need to be employed
(i.e. comparing the ambient temperature in a room with the reading of
a particular sensor to identify if an interesting event is recorded by
that sensor).

Many people would like to have secure off-site backups of their
personal data. Recently, so-called "cloud storage" services
(such as e.g. Amazon S3) have
started providing infrastructure for this. However, there is are many
inherent issues to do with trust and control: users may not trust a
cloud provider with their personal data, or they might feel
uncomfortable about it being stored in a country with a jurisdiction
that might not protect their privacy. Furthermore, cloud storage is
still prohibitively expensive. A better concept might be for a
group of friends to arrange a mututally beneficial backup system
whereby everyone offers some storage space to the others (this could
be on servers, desktop machines, laptops or mobile devices). The
challenge is to do this in a secure way that maintains privacy and
offers compelling performance. Different pairings of people
might have different degrees of trust and this could be reflected in
the strenght of pairwise encryption (though one would have to consider
the case of a friend accidentally giving parties with weaker trust
getting accessing the data!). Alternatively, data could be distributed
across multiple different friends' devices, and only the combination
would be sufficient to reconstruct the old data. This project
has lots of scope for the design and implementation of clever
synchronization algorithms and cryptographical constructs.Difficulty: ★★ to ★★★

Some people (including myself) in the SRG are currently working on
Skywriting, a
Turing-powerful cluster computing framework for cloud
computing. Skywriting is written in Python and has been ported to the
Android operating system, allowing us to run mobile phones as
Skywriting workers. This project would extend Skywriting to
enable delay-tolerant computation, i.e. computation that is
scheduled on a mobile phone and which returns a result at some
(potentially undefined, potentially weakly bounded) point the
future. For example, the phone could compute only while it is being
charged and thus connected to mains power. Furthermore, the bandwidth
on 3G networks is limited, so the phone could wait until it has a fast
network connection before transferring input and output data (an
ambitious student could even look into ad-hoc networks between
multiple Skywriting-enabled phones to epidemically spread input and
result data). A student attempting this project should be
familiar with Python and not afraid of working with research code that
is in active development. The evaluation of this project would
probably entail some simulations and a small practical evaluation with
some phones in the lab.Difficulty:
★★

Many people have recently become increasingly worried about
Facebook taking a rather relaxed attitude towards the privacy of
personal data. However, attempts at building more secure social
networks with technical solutions that ensure data privacy, such as
encryption, have not enjoyed much success because Facebook captialises
on the "network
effect" of everyone else using it. It would be very
useful if we could still use Facebook, but encrypt all the data stored
there, enabling only friends who also use the same browser extension
to see the cleartext (or the unscrambled photos, etc.). There are
various directions one could take with this project: you could make it
so that only messages exchanged with friends who also use the
extension are encrypted, or so that everything is encrypted. To
implement the plugin, you could use a toolkit such as Greasemonkey,
or implement it entirely yourself.Difficulty: ★
to ★★

Homomorphic
encryption is the idea of performing arbitrary computations on an
encrypted piece of data ("ciphertext") without decrypting it
first. In other words, the operations manipulate the ciphertext to
generate another, different, ciphertext, which when decrypted produces
a plaintext that is equivalent to the original plaintext with the
operations applied. For example, operation "add 37" on
plaintext "5" gives "42". Under homomorphic
encryption, we apply "add 37" to E(5) (where
E is an encryption function) and get back E(42),
which when decrypted gives us "42" (the correct answer). The
utility of this is that using homomorphic encryption, secure
computation could be performed in a potentially hostile environment
(e.g. untrusted servers on the "cloud"). An important
observation is that a fully homomorphic scheme only needs to support
two operations: addition and multiplication. These
are sufficient to bootstrap any computation that we can imagine (even
though it would be very, very, very slow!). Homomorphic
encryption has long (since the 1980s) been considered to be an
unsolved "holy grail" of cryptography. However, recently, severalworkingschemes
have been published (the last is easiest to understand and implement
as it does not require understanding of lattices). Most of the work in
this space has been in theory though, and there is implementation of
any of these systems available. Your task in this project would
be to implement one of the published schemes and show that it works
with small examples (due to the exponential increase in computing
power required, you are likely to only be able to try it with small
inputs and keys, but that is enough for a proof-of-concept). As
a possible extension, you could investigate how much of the process
can possibly be parallelised (a question that has received very little
attention so far) to speed it up. Alternatively, you could look into
data structures and algorithms that could be implemented using a
weaker form of homomorphic encryption (e.g. partial schemes) that can
be implemented at better performance.

N.B.: This is
a very challenging project, full of formal maths and theory
despite being an implementation project, and is rooted in cutting-edge
research (the publications in this area are not even a year old). If
you would like to attack this, it will require lots of dedication and
hard work! (However, if you manage, some glory will be yours!
:-)). Difficulty: ★★★★

N.B. (ii): A variation of this project could also be
done as an MPhil project – in this case, the
"extension" would form a mandatory part of the project.

Recent studies have shown a large number of OS failures are due
to faulty hardware drivers and/or transient errors in the
interaction between the OS and devices. In this project, you
will provide functionality so that the OS (and applications) is
able to recover from transient and permanent errors in network
devices, implemented as either kernel support to transfer network
state between physical NICs or a user-level library that
intercepts and reroutes network system calls between different
physical NICs (your choice).

To tackle this project you need to have strong practical skills,
and some working knowledge of Linux. Difficulty: Medium

Files are the basic unit for naming and controlling access to data in
modern OSes. Files may be distinct for various reasons (security,
ease of management, etc). While file-level naming and access control
works most of the times, it is sometimes useful to be able to
construct a "pseudo-file" that is composed of parts of other files
(e.g. consider the case where I want to expose only the dates and
message identifiers of received and sent emails in my private INBOX so
that my boss can verify that I am replying to emails in a timely
fashion). In this project you will construct a system (and associated
tools) to allow users to create pseudo-files in the filesystem.

To tackle this project you need to have strong practical skills,
and some working knowledge of Linux. Difficulty: Medium

Oftentimes, developing countries have quite bad traffic jams at
rush hours due to underdeveloped infrastructure being overutilized
coupled with inadequate traffic reporting and control mechanisms. In
this project you will develop a smartphone (Android) based application
that will utilize the sensors in the phone (GPS and accelerometer) to
automatically provide traffic information and rerouting information to
users. Ideally, the application will determine and its location and
speed automatically, reporting this to a central server. Other
instances of the application running on different devices can then use
this information to suggest alternative routes to avoid traffic jams
thus helping to ease traffic congestion in a distributed
manner. Difficulty: Medium

People often get unsolicited emails, SMSes or IMs. Some of these
are legitimate and some are scams. When this happens to me I hit the
web to find out exactly what is going on before taking further action
(e.g. Did I really win 3750 pounds in that accident I had?, Is it
important for me to find out where this 0800 number originated? etc).
On the web it quickly becomes obvious what the true situation is
(e.g. Ah,a spammer just waiting for me to reply to his premium rate
number). In this project, you will attempt to build an application
that quickly and transparently tries to verify the origin and/or
provide information about unsolicited contacts by using the web (and
other useful data sources) to better inform the user about who is
contacting him or her (and what it's likely to be about). Difficulty:
Medium

This project is about creating technological solutions that make
forgetting a tiny bit easier than remembering in the digital world.
The idea is to build a Twitter client based on human-controlled
forgetting (which have been recently proposed for emails). You will
build a Twitter client that allows users to post tweets along with
corresponding expiration dates (technically, a tweet will be tagged
with an expiration date using the Twitter API). The client
automatically deletes tweets that have reached or exceeded their
expiry dates. Users should also be able to change expiration dates in
case they feel that a tweet has lost its informational value over time
or, on the contrary, remains important beyond the initially set
lifespan. The effectiveness of this approach should be evaluated with
a small user study.
* “Best If Used By”: Expiration Dates for Email. CHI '09
Difficulty: Medium

Approaches for automatically classifying the sentiment (either
positive or negative) of Twitter messages have been recently
introduced. They are used by online shoppers to check the sentiment
associated with products, or by companies to monitor the popularity of
their brands. One limitation of those approaches is that they work for
Twitter messages in English. This project considers the three most
promising existing approaches (Naive Bayes, Maximum Entropy, and SVM)
and evaluates them for Twitter messages in any continental European
language. The three approaches will be made available through an API
(similar to http://twittersentiment.appspot.com/).
Difficulty: Medium

In this project, you will implement standard topic models (e.g. LDA)
and use them to map the content of a Twitter profile into
topics/categories. You will then explore the extent to which the
restricted length of Twitter messages affects these schemes by running
a comparative study on a Twitter dataset.
Difficulty: Medium

The legibility of a city is “the ease with which [a city’s] parts can
be recognised and can be organised into a coherent pattern” and is
associated with the emotional and physical well-being of city
dwellers. In 1972, Stanley Milgram designed a "scene sampling"
procedure to measure the legibility of a city. This project proposes
to build the first web-based tool that automates Milgram's "scene
sampling" procedure. Users of this tools will be presented with scenes
of their city (retrieved from geo-tagged pictures on Flickr) and will
asked whether they could recognise each scene. Based on their answers,
legibility measures of different parts of the city will be computed
and will be shown on a geographical map.
Difficulty: Medium

There are several projects to work under the umbrella of ErdOS. There are opportunities
both for MPhil and Part II students depending on the research required
by them. There are other possibilities in addition to the ones
described below. Please, do not hesitate to contact us for further
details (Jon Crowcroft and Narseo Vallina-Rodriguez).

The resources available in a mobile handset have different power
states and energy demands that might depend on the handset context
(e.g. location impacts on the energy consumption of cellular nets). We
would like to know how much of the energy consumption accounts for
each individual resource and applications. We want to develop an energy
profiler that uses exclusively information available in Android
handsets (e.g. Kernel and Android API), however, the framework
obtained could be also used for other embedded devices such as
sensors. We have indications about the right variables that we should
look at but more research is required. Once the model has been built,
it is necessary to validate the results with the ones obtained with a
high-resolution external measurement tool. As a result, we could
obtain an energy profiler that estimates the relative energy
consumption of each application individually, customised for each
handset and without using inaccurate offline information and external
tools. That project might require some research and working at the
Kernel level for Android OS. Good knowledge of Java and C++ is
recommendable, Previous experience on compiling kernels might be
useful.
Keywords: Mobile computing, resources management, energy allocation,
energy profiling
Difficulty: Medium - High

ErdOS profiles energy at an high-level abstraction called users
activities (e.g. working, going to the pub with friends, ...). The
goal behind this design is being able to predict future energy demands
at those activities by monitoring what kind of applications are
executed by the users at those scenarios and also the way users
interact with their handsets (when and where they charge the handset,
when they set the handset in airplane mode, etc). It will help ErdOS
to efficiently allocate resources both in local and remote handsets
proactively. It requires understanding how applications demand OS
resources (intercepting system calls) and mapping them to the right
activity inferred from contextual information (e.g. location and
time). That project can be split in two sides, the implementation and
the research one (context-awareness). It might require working at the
Kernel level for Android OS. Good knowledge of Java and C++ is
recommendable. Previous experience on compiling kernels might be
useful.
Keywords: Mobile computing, resources management, context-awareness
Difficulty: Medium

ErdOS devices must be able to discover neighbouring resources quickly
and in an energy efficient way. However, current wireless interfaces
such as Bluetooth have a time consuming protocol that can require up
to 60 seconds to perform an enquiry. A potential way of performing
fast discovery is by adding information of the available resources and
the resources demands of the handset on the radio beacons of wireless
interfaces (e.g. Bluetooth and WiFi) and also by audio beacons taking
advantage of the low-energy required by the microphone and the
speakers available in the handset (see
http://anil.recoil.org/papers/audio-networking.pdf). An implementation
of those mechanisms accompanied by an evaluation of the energy
- Hide quoted text -
consumption required by them and the execution time can be easily
done. An extension can be done as identifying the triggers to perform
those advertisements efficiently. It might require Linux knowledge and
C/C++ programming skills to modify the drivers. Previous experience on
compiling kernels might be useful.
Keywords: Mobile computing, resources management, discovery protocols,
energy consumption, pervasive computing
Difficulty: Medium

Naming resources and users in ErdOS is another potential line of work.
An implementation of ErdOS name manager following Haggle-enfant name
graphs structure. It should comprise the handset available resources
in the local handset, potential user names and social contacts from
the users. Efficient algorithms for accessing that information are
also required. An extension can be implementing an interface to define
the access policies to the resources based on the different users'
groups and affiliations (i.e. taking advantage of UNIX permissions for
user, groups and others). It can be implemented on Java at the user
space.
Keywords: Naming systems, Access Control, OS
Difficulty: Low