The purpose of the current investigation is to predict post-editor profiles based on user be-
haviour and demographics using machine learning techniques to gain a better understanding of
post-editor styles. Our study extracts process unit features from the CasMaCat LS14 database
from the CRITT Translation Process Research Database (TPR-DB). The analysis has two main
research goals: We create n-gram models based on user activity and part-of-speech sequences
to automatically cluster post-editors, and we use discriminative classifier models to character-
ize post-editors based on a diverse range of translation process features. The classification and
clustering of participants resulting from our study suggest this type of exploration could be
used as a tool to develop new translation tool features or customization possibilities.

In this paper an attempt has been made to predict the gaze fixation duration at source text using supervised learning techniques. The machine learning models used in the present work make use of lexical, syntactic and semantic information for predicting the gaze fixation duration. Different features are extracted from the data and models are built by combining the features. Our best set up achieves close to 50% classification accuracy.

Files in this item: 1

These lecture notes present the basic principles of phrase structure that apply in English. We start by presenting in some detail the most complex phrase type in English, the noun phrase. Having done that, we demonstrate that all the other main phrase types, the AP, the PP and the VP, are modelled on the same structural principles as noun phrases.

This WP presents the empirical foundations for the development of the CasMaCat workbench.
A series of experiments are being run to establish basic facts about translator behaviour in
computer-aided translation, focusing on the use of visualization options and input modalities
while post-editing machine translation (sections 1 and 2). Another series of studies deals with
cognitive modelling and individual di erences in translation production, in particular translator
types and translation/post-editing styles (sections 3 and 4).
This deliverable, D1.2, is a progress report on user interface studies, cognitive and user
modelling. It reports on post-editing and interactive translation experiments, as well as cognitive
modelling covering Tasks 1.1, 1.2, 1.3 and 1.5. It also addresses the issues that were raised in
the last review report for the project period M1 to M12, in particular:
the basic facts about the translator behaviour in CAT (sections 1 and 4) highlighting
usage of visualization and input modalities (see also D5.3).
the individual di erences in translator types and translation styles, (section 3, see also
terminology, section A.1)
the results and conclusions of preliminary studies conducted to investigate post-editing
and translation styles (section 2 and 5)
From the experiments and analyses so far, it is clear that the data collected in the CRITT
TPR-DB (Translation Process Research database) is an essential resource to achieve the Cas-
MaCat project goals. It allows for large-scale in depth studies of human translation processes
and thus serves as a basis of information to empirically grounded future development of the
CasMaCat workbench. It attracts an international research community to investigate human
translation processes under various conditions and to arrive at a more advanced level of understanding.
Additional language pairs and more data increase the chances to better underpin the
conclusions needed, as will be shown in this report, and as concluded in section 5.

Files in this item: 1

Workpackage 7 comprises of dissemination activities of the casmacat project. In this report,
we summarize the promotion of project goals, progress and outcomes to the larger academic
research community, the commercial sector targeted by the work, and beyond.

In this paper, we present the newly established Danish speech corpus PiTu. The corpus consists of recordings of 28 native
Danish talkers (14 female and 14 male) each reproducing (i) a series of nonsense syllables, and (ii) a set of authentic
natural language sentences. The speech corpus is tailored for investigating the relationship between early stages of the
speech perceptual process and later stages. We present our considerations involved in preparing the experimental set-up,
producing the anechoic recordings, compiling the data, and exploring the materials in linguistic research. We report on
a small pilot experiment demonstrating how PiTu and similar speech corpora can be used in studies of prosody as a
function of semantic content. The experiment addresses the issue of whether the governing principles of Danish prosody
assignment is mainly talker-specific or mainly content-typical (under the specific experimental conditions). The corpus is
available at http://amtoolbox.sourceforge.net/pitu/.

We present the speech corpus SMALLWorlds (Spoken Multi-lingual Accounts of Logically Limited Worlds), newly established and still
growing. SMALLWorlds contains monologic descriptions of scenes or worlds which are simple enough to be formally describable. The
descriptions are instances of content-controlled monologue: semantically “pre-specified” but still bearing most hallmarks of spontaneous
speech (hesitations and filled pauses, relaxed syntax, repetitions, self-corrections, incomplete constituents, irrelevant or redundant
information, etc.) as well as idiosyncratic speaker traits. In the paper, we discuss the pros and cons of data so elicited. Following that,
we present a typical SMALLWorlds task: the description of a simple drawing with differently coloured circles, squares, and triangles,
with no hints given as to which description strategy or language style to use. We conclude with an example on how SMALLWorlds may
be used: unsupervised lexical learning from phonetic transcription. At the time of writing, SMALLWorlds consists of more than 250
recordings in a wide range of typologically diverse languages from many parts of the world, some unwritten and endangered.

On the basis of a pilot study using speech recognition (SR) software, this paper attempts to illustrate the benefits of adopting an interdisciplinary approach in translation. It shows how the collaboration between phoneticians, translators and interpreters can (1) advance research (2) have implications for the curriculum (3) be pedagogically motivating and (4) prepare students for employing translation technology in their future practice as translators. In a two-phase study in which 14 MA students translated texts in three modalities (sight, written, and oral translation using an SR program), Translog was employed to measure task times. The quality of the products was assessed by three experienced translators, and the number and types of misrecognitions were identified by a phonetician. Results indicate that SR translation provides a potentially useful supplement to or alternative for written translation.

This document contains details about the design of the casmacat workbench. It outlines the
major components, their interaction, and gives also implementation guidelines. The deliverable
is a snapshot of the document at the beginning of the casmacat project, it will be re ned
throughout development and serves as technical documentation.

Files in this item: 1

Digital hearing aids use a variety of advanced digital signal processing methods in order to improve
speech intelligibility. These methods are based on knowledge about the acoustics outside the ear as well
as psychoacoustics. This paper investigates the recent observation that speech elements with a high
degree of information can be robustly identified based on basic acoustic properties, i.e., function words
have greater spectral tilt than content words for each of the 18 Danish talkers investigated. In this paper
we examine these spectral tilt differences as a function of time based on a speech material six times the
duration of previous investigations. Our results show that the correlation of spectral tilt with information
content is relatively constant across time, even if averaged across talkers. This indicates that it is possible
to devise a robust method for estimating information density in the speech signal based on
computationally simple short-term band-level differences. The principle described here has the potential
to improve speech transduction in hearing aids and cochlear implants. In addition, the concept of
information-based speech transduction may also be applicable in automatic speech recognition systems.

Files in this item: 1

Effective communication requires texts to be organised into a coherent discourse structure. But
languages vary considerably in how they do this, posing a challenge for effective intercultural
communication. Instead of relying on our own preferred persuasion style to be the most
effective, we need to take into consideration that people with different linguistic and cultural
backgrounds do not necessarily employ the same linguistic means in similar communication
situations. This is of particular importance in a business context, and a profound understanding
of cross-linguistic differences in the organisation of argumentative texts is needed.
In order to address this challenge, this thesis presents a study of structural characteristics in
argumentative texts across three different languages. The aim of the study is to examine some of
the linguistic means that writers of different languages employ when creating persuasive
discourses. The study is based on 150 Danish, English and Italian speeches held by Members of
the European Parliament in their native language.
The linguistic means under investigation are conceptualised as belonging to three different
structural domains which account for different ways of linking discourse units in a text: a
syntactically organised text structure, a rhetorically organised discourse structure and an
information packaging organised information structure. The structural domains are defined from
a cognitive-functional perspective and juxtaposed into a single analytical framework.
The analyses show that writers across the three languages generally use the same rhetorical
relations to build up persuasive discourses. But the analyses also reveal that the Danish, English
and Italian writers textualise relations differently. The Danish writers use almost exclusively
finite verb forms in coordinate and subordinate structures. The English writers tend to avoid
explicating the rhetorical relations between discourse units, and the Italian writers tend to
include more units inside the same sentence than the Danish and English writers.
The analyses also suggest that the cross-linguistic differences in textualisation can be
correlated with certain persuasive strategies. The Danish writers tend to persuade by analogy,
making use of typical features from narratives. The English writers make use of presentational
persuasion style, involving themselves in a more personal way than the Danish and Italian
writers. And lastly, the Italian writers make use of typical features from quasilogical persuasion
style, adopting a formal register and argumentation.
This thesis formulates an analytical framework for a systematic investigation of the structure
of discourse across languages, pairing theories and methods from the two parallel disciplines of
linguistics and rhetoric in order to gain more insights into effective intercultural communication.

Files in this item: 1

Workpackage 7 comprises of dissemination activities of the casmacat project. In this report,
we summarize the promotion of project goals, progress and outcomes to the larger academic
research community, the commercial sector targeted by the work, and beyond.

This work presents a conceptual framework for
learning an ontological structure of domain knowledge, which
combines Jaccard similarity coefficient with the Infinite Relational
Model (IRM) by (Kemp et al. 2006) and its extended
model, i.e. the normal-Infinite Relational Model (n-
IRM) by (Herlau et al. 2012). The proposed approach is applied
to a dataset where legal concepts related to the Japanese
educational system are defined by the Japanese authorities
according to the International Standard Classification of Education
(ISCED). Results indicate that the proposed approach
effectively structures features for defining groups of concepts in
several levels (i.e., concept, category, abstract category levels)
from which an ontological structure is systematically visualized
as a lattice graph based on the Formal Concept Analysis (FCA)
by (Ganter and Wille 1997).

This paper reports on the results of a user
satisfaction survey carried out among 16
translators using a new computer-assisted
translation workbench. Participants were
asked to provide feedback after performing
different post-editing tasks on different
configurations of the workbench, using
different features and tools. Resulting
from the feedback provided, we report on
the utility of each of the features, identifying
new ways of implementing them according
to the users’ suggestions.

Files in this item: 1

This paper outlines work-in-progress research suggesting that domain-specific knowledge in terminological resources can be transferred efficiently to end-users across different levels of expertise and by means of different information modes in-cluding articles (written mode) and con-cept diagrams (graph mode). An experimental approach is applied in an eye-tracking laboratory, where a natural user situation is replicated for Danish professional potential end-users of a ter-minology and knowledge bank in a cho-sen pilot domain (taxation).

This paper aims to elaborate on the role of user
modelling for personalization and enhanced attention support.
User modelling is an important element in the management of
personal profiles and identity of users, but also a key element
for providing adaptive features and personalized interaction.
In this paper, we present personalization as the process
consisting on the customization, and the adaptation of the
interaction along the structure, the content, the modality, the
presentation and the level of attention required. The paper
surveys personalization techniques and provides concrete
examples of personalized interaction. In particular, the paper
focuses on the role of user modeling for enhanced, personalized
user support within interactive applications. The key
contribution of the paper is to propose a framework of
personalization techniques and to identify new forms of
personalization that aim at taking into account human
cognitive capabilities and emotions.

Foreign language and culture learning suffers from a bad image in Danish Upper Secondary schools and German is not an exception. It means that the majority of Danish Upper Secondary school students are not particularly interested in learning the language. Therefore, intrinsic motivation plays a pivotal role in German language and culture learning in Denmark. One didactic initiative proposed to remedy the lack of intrinsic motivation is the introduction of various ICT (Information and Communication Technology) tools. This is the background for the research described in this article. Our study which was conducted on the basis of semi-structured focus group interviews with n=50 high school students and n=2 high school teachers shows that the ICT tools Photostory, MovieMaker and Voki indeed have an influence on students’ perceived intrinsic motivation in connection with German language and culture learning. Depending on the nature of the tool, our thematic analysis indicates that such tools facilitate different aspects of perceived intrinsic motivation. Still, our study shows that the tools have a limited effect on perceived intrinsic motivation, unless they are addressed and used strategically in the proper pedagogical context.

Files in this item: 1

Valency deals with the question of how many participants a certain verb logically presupposes in order for the event denoted by the verb to be realizable.
For instance, it takes only one individual to carry out a sleeping event. Each and every one of us can do that without any assistance from others. Therefore, we say that a verb (or verbs) denoting a sleeping event presupposes one argument, namely the individual doing the sleeping.
A full sentence describing a sleeping event, then, typically consists of an appropriate form of the verb plus a phrase, typically an NP denoting the individual who sleeps, as in (1):
(1) John sleeps
Accordingly, the verb sleeps is described as belonging to the class of Mono-valent verbs, which comprises all intransitive verbs, die, wither, walk, run, liquidate, etc..
In this sentence the argument is realized as an NP with the sentential grammatical function of subject, and the subject has the semantic role of AGENT. Note that sleeping is an intentional act since (more often than not) you can decide whether you want to sleep or not1.