EvaSy Evaluation Package

Package d’évaluation EvaSy

ID:

ELRA-E0023

The EvaSy Evaluation Package was produced within the French national project EvaSy (Evaluation of speech synthesis systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The EvaSy project enabled to carry out a campaign for the evaluation of speech synthesis systems using French text data. This project is an extension of the only campaign that was ever carried out for French in this field within the AUPELF campaigns (Actions de recherche Concertées, 1996-1999).

This package includes the material that was used for the EvaSy evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

The campaign is distributed over three actions: 1) Evaluation of grapheme-to-phoneme conversion: it consists in evaluating the capacity of speech synthesis systems to phonetize text data.2) Evaluation of prosody: it consists in evaluating the capacity of speech synthesis systems to forecast text prosody (duration and fundamental frequency of phonemes) from the text itself.3) Global evaluation of the quality of speech synthesis systems:- ACR tests (Absolute Category Rating): they consist in evaluating the overall quality of speech synthesis voices, by asking a number of subjects to evaluate several general characteristics of the speech synthesis voice, such as its naturalness, its fluency, its intelligibility.- SUS tests (Semantically Unpredictable Sentences): they consist in evaluating the intelligibility of the speech synthesis voice, by using syntactically correct as well as semantically unpredictable sentences (which have no meaning).

The EvaSy evaluation package contains the following data and tools:1) For the evaluation of the grapheme-to-phoneme conversion module: 1) About 8,000 proper names (4,115 pairs firstname-surname) were extracted from Le Monde newspaper of 1992–2000 (over 200 million words), manually phonetised with variants and annotated with linguistic tags. The reference phonetisation was checked and corrected after the adjudication phase.2) A corpus of emails (about 115,000 words) anonymised, segmented by paragraph and phonetised in SAMPA. The reference phonetisation was not checked. The evaluation of thos data was not carried out within EvaSy.3) The SCLITE tool (developed by NIST) was used to compare the reference phonetisation with the one from the evaluated system, and to calculate the number of mistaken phonemes (inserted, forgotten or substituted phonemes).4) The Post-align tool was used to align the reference phonetisation with the one from the evaluated system on a word-by-word basis.

2) For the evaluation of the prosodic module:- Text data: 7 phonetically-balanced sentences extracted from the BREF corpus (cf. ELRA-S0067), with a duration lasting from 4 to 11 seconds.- Speech data: 7 sentences read by one speaker.- The Mbroli tool, which converts *.pho prosodic files into *.wav speech files, together with the MBROLA fr1 diphone database.- The Mbrolign tool, which aligns the phonemes with the signal, extracts the prosodic parameters of the signal and copy them in the MBROLA diphone databas.