Abstract:

In this thesis we document the development of a system to perform Speaker Diarization, that is, automatically trying to identify who spoke when in a conversation or any other piece of speech with several speakers. The intended usage is to be able to provide this functionality for broadcast news, with data provided by the Finnish broadcasting company YLE under the Next Media programme, financed by TEKES, the Finnish Funding Agency for Technology and Innovation.

Another goal is to produce a system compatible with existing Aalto University speech recognition software, in order to open the door to future improvements and research.

The produced system, a newly implementation of established methods, with the parameters we determined were the best for our use case, obtains a performance that is very close to current stat-of-the-art systems, while still being compatible with the existing speech recognition software of the Aalto University and having a reasonable speed performance. Further improvements to the system are being made as we speech, opening the door to more research options.