Discovery group: Data Mining for Pattern and Link Discovery

We develop novel methods and tools for pattern and link discovery.
Our focus is on structured and heterogeneous data, such as graphs, and
also on sequences. The importance of data mining in heterogeneous and
structured data will only grow in the future. There will be an
increasing amount of challenging and important problems, especially in
scientific applications.
Our current main applications are in bioinformatics,
in collaboration with applied scientists and companies.

Our research topics are motivated by novel problems in
applications. Our current emphasis is on analysis and link discovery in
weighted
(biological) graphs (Biomine
project). We identify computational problems in them, develop new
algorithms, and apply them. While we value fielded applications with
an impact, we also emphasize solid, application independent methods
and results.
We recently introduced (variable
length) Markov models to the problem of reconstructing haplotype
strings
[BMC
Bioinformatics, software].
We have developed novel concepts and methods for gene mapping, for
instance, based on
discovery of genetically motivated tree-structured patterns
[EEE/ACM
Transactions on Computational Biology and Bioinformatics,
American
Journal of Human Genetics,
software].
These methods have turned out to be very useful in the practice of
medical genetics. In context-
sensitive computation, the group developed the ContextPhone
software that is in wide use in several research institutions all over
the world
[IEEE
Pervasive Computing,
software].

Biomine: A biological search engine
We view biological databases of sequences, proteins, genes etc. as
weighted graphs
and develop methods for link discovery and analysis in such graphs.
Try out the prototype search engine at
biomine.cs.helsinki.fi!
(Funding: National Technology Agency (Tekes) and companies.)

Context: Context Recognition by User Situation
Data Analysis
The Context project studies characterization and analysis of
information about user's context and its use in proactive adaptivity.
We have developed data analysis algorithms as well as
ContextPhone, a mobile context-aware prototyping platform, available as
free software.
(Funding: Academy of Finland, PROACT Programme; formally finished, work
continues with internal funding.)

ContextPhone -
Context-aware platform for mobile phones
ContextPhone is an open software platform for context-aware
applications.
It can be used to collect, analyze and transmit information about its
context, as well as to tag and publish contextual media.

Bassist -
MCMC simulation for Bayesian statistical models
Bassist is a tool that automates the use of hierarchical Bayesian
models
in complex analysis tasks, by generating a model-specific MCMC sampler.
Bassist is not supported any more.

Mobile
Communication and Context Dataset,
Mika Raento.
In Proceedings of the Workshop Towards Benchmarks and a Database
for Context Recognition, In International Conference on Pervasive
Computing. ETH, 2004.

Kill
your Personal Data Dead,
Mika Raento.
In Proceedings of the Workshop On Location Systems Privacy and
Control, in 6th International Conference on Human Computer Interaction
with Mobile Devices and Services MobileHCI'04. University of
Strathclyde, 2004.