For most of these topics you will find a (part of a) chapter in the text book.
You can also find articles by using google or looking at the
journals/conferences which are mentioned at the end of the slides of the first
lecture.

You can start looking at what topic you are interested in and find literature.
Come and show/discuss the text/articles you will use for the presentation latest
on April 13. The actual presentations will be on June 1-2.
A draft of your presentation slides will need to be handed in to the examiner latest 1 week before your presentation, i.e. May 24 so that we can book a discussion session if needed.

The total expected work time for this part is ca 80 hours per student (excluding
presentations). Observe that attendance to all presentations is mandatory.

Project

For the project we expect you to run association analysis/clustering
algorithms on data sets and analyze and discuss the results. You can implement
the algorithms yourself or use weka, SAS or another tool. You will be assigned a supervisor
for the project.

There are a number of ways to define a project.
- Find a topic and data set that is interesting to you.
Use association analysis/clustering to analyze the data in different ways and
derive conclusions.
- Find an article that uses association analysis or clustering.
Use the same strategies on different data sets and discuss differences and
similarities.
- Find an article that uses association analysis or clustering.
Use other algorithms on the same data sets and discuss differences and
similarities.
- Ask a teacher if they have a clustering/association analysis problem connected
to their research.

Send a proposal to the examiner latest April 24 and he will assign a supervisor to you. After you have received a supervisor, get the project approved by him or her (after discussion and possible changes in the proposal) latest May 15.

The examination of this part consists of writing a report on the project. We may
ask you to come to the office and ask questions about the content of the
reports. DEADLINE: June 15.

The report should be between 5 and 10 pages and contain at least the following (you can also look at this):
- introduction - describe the area/problem, motivate why is this interesting to
look at, what kind of solutions have been used before
- background - any background knowledge that is needed to read the report (e.g.
domain of the application)
- algorithms - describe the algorithms that you have used, motivate why you
chose these algorithms, why do you think they should give good results?
(don't describe the basic algorithms that we have seen in the course, but any
extensions or other algorithms should be described)
- test: describe the test data and the test set-up
- test results: describe the results
- discussion: analyze the results, compare with what was done before, did you
get good/bad results?, were these results expected? discuss
- future: ideas for how to improve? other things that could be done?

Make sure to properly reference other people's work.

The total expected work time for this part is ca 130 hours per student.