Description and terms and
conditions of use of BISON simulated contact center data

1.Introduction

The BISON project has been
funded by the EC Horizon 2020 Framework Programme and aims at bringing
significant innovations in speech data mining for contact centers (CC). Please
see project web-page http://bison-project.eu
for more information about BISON.

The data is one of
the crucial resources for the innovations planned in the BISON project. At the
same time it is one of the main assets in order to perform big data speech
analytics and one of the main goals within this project. In BISON, we have to
take into account that we are operating in a commercial environment, with data
from customers that holds both significant commercial value and severe legal limitations
and usage restrictions.

As the CC data is very sensitive and the BISON consortium needed data
for public demonstration and dissemination activities, we have collected a
limited amount of simulated CC data including no true personal information.
This data is comm only called “fake data”. This document accompanies
the public release of this data and contains its technical description and
terms of use.

2.Languages and content

The four most relevant languages for CC partners - Czech, English, French
and Spanish - were chosen, and CC partners EBOS, ComData and Telefonica Móviles
prepared fake campaigns and recruited speakers (among their employees) that
performed the calls. The collection procedure is followed several prepared
scripts resembling real calls as much close as possible to the business use
cases in BISON.

3.Law abiding data collection

As the whole BISON project, BISON simulated CC recording followed
strictly the compliance with applicable law.

3.1.Speakers and informed consent

The CC clients are played by known people that sign the informed consent
form allowing the intended use of the speech data: “Recorded data will be
used for the purposes of the BISON project and will contribute to the provision
of better services to CC customers. The recordings may be used for both
academic and commercial research and development, and may be made publicly
available on the Internet in order to support international R&D community
in the area of speech data mining.”

3.2.Personal data

Special focus was on personal data that could not be collected. At
the same time, we required the data to be useful for BISON purposes: e.g.,
keeping real format for IDs, telephone numbers and addresses and several topics
around CC operation and relevant use cases for BISON. The following procedures
of fake personal data generation to ensure ethics and law-abidingness, with the
most important issues summarized below:

●their personal data are never
mentioned: fake data are used and combined with the most common data for
population in given country (see details below);

●brand names are replaced by
fake ones to avoid problems with real producers.

●phone numbers, even if randomly generated, do not correspond even
potentially to real customers – that is, they are not only unused today,
but also unassignable tomorrow to a new customer, based on the current
numbering schemes in the specific countries;

●names are general enough to prevent singling out an
individual, even indirectly;

●addresses do not correspond to real ones, yet are realistic
enough for the purpose; therefore, a suitable mix of street addresses,
non-existing street numbers, and street/city coupling has been used for the
purpose.

The following table shows the procedures for the generation of fake
personal information for the public release data.

ComData
(Czech Republic)

EBOS
(United Kingdom, France)

TME
(Spain)

Fake identities

Most common names in the Czech Republic, owned by
a high number of persons, e.g. Novak, Prochazka, Novotny

Most common first and last names in UK and
France, e.g. James Johnson, Louise Dubois

Web application used to generate real structure
of Spanish identities, but with numbers that do not exist nowadays and may
not exist in the foreseeable future.

5.Terms and conditions of use

5.1.Purposes

The simulated CC data is publicly
available through the BISON public web-site. It can be used for all
legitimate purposes including (but not limited to) Academic Research and
Development, Industrial Research and Development, Education, CC agent training,
Demonstration, Testing of own speech analytics software, testing of third party
speech analytics SW, serving as example for similar data collection, and
others.

5.2.Collection of information

BISON consortium however collects
information on who downloaded the data for which purposes, and the data
will be made available only after filling in required information. The
individuals, laboratories and companies interested in this data might be
contacted with questionnaires, and eventually with business offers, after
obtaining lawful informed consent thereto.

5.3.Acknowledgements

In case of publication of
results on BISON simulated CC data, you are kindly requested to acknowledge the
EC funding and the BISON project by stating:

“Collection of BISON
simulated CC data was funded by the European Union’s Horizon 2020 research and
innovation programme under grant agreement No 645323. The data is available at http://bison-project.eu/data”.