One of the aims of the Eye-to-IT project (FP6 IST 517590) is to integrate
keyboard logging and eye-tracking data to study and anticipate the behaviour
of human translators. This so-called User-Activity Data (UAD) would
make it possible to empirically ground cognitive models and to validate hypotheses
of human processing concepts in the data. In order to thoroughly ground a
cognitive model of the user in empirical observation, two conditions must be met
as a minimum. All UAD data must be fully synchronised so that data relate to
a common construct. Secondly, data must be represented in a queryable form so
that large volumes of data can be analysed electronically.
Two programs have evolved in the Eye-to-IT project: TRANSLOG is designed
to register and replay keyboard logging data, while GWM is a tool to record and
replay eye-movement data. This paper reports on an attempt to synchronise and
integrate the representations of both software components so that sequences of
keyboard and eye-movement data can be retrieved and their interaction studied.
The outcome of this effort would be the possibility to correlate eye- and keyboard
activities of translators (the user model) with properties of the source and target
texts and thus to uncover dependencies in the UAD.

Files in this item: 1

This paper examines some typological differences in the discourse structure of Italian and Danish. The results of the study indicate that there are significant differences in information packing in the two languages, especially in their use of deverbalisation. Italian sentences tend to include a larger number of Elementary Discourse Units (EDUs), especially propositions, than Danish. A higher percentage of these is rhetorically backgrounded by means of non-finite and nominalised predicates. Danish text structure, on the other hand, is more informationally linear and characteristic of a higher number of finite verbs and topic shifts. The study also suggests that a more fine-grained classification of non-finite and nominalised EDUs is needed for a complete in-depth analysis of discourse constraints in different language families.

Files in this item: 1

The paper introduces a new research strategy for the investigation
of human translation behavior. While conventional cognitive research methods
make use of think aloud protocols (TAP), we introduce and investigate User-
Activity Data (UAD). UAD consists of the translator’s recorded keystroke and
eye-movement behavior, which makes it possible to replay a translation session
and to register the subjects’ comments on their own behavior during a retrospective
interview. UAD has the advantage of being objective and reproducable, and,
in contrast to TAP, does not interfere with the translation process. The paper gives
the background of this technique and an example on a English-to-Danish translation.
Our goal is to elaborate and investigate cognitively grounded basic translation
concepts which are materialized and traceable in the UAD and which, in a
later stage, will provide the basis for appropriate and targeted help for the translator
at a given moment.

Files in this item: 1

Reordering has been an important topic in statistical machine translation
(SMT) as long as SMT has been around. State-of-the-art SMT systems such
as Pharaoh (Koehn, 2004a) still employ a simplistic model of the reordering
process to do non-local reordering. This model penalizes any reordering no
matter the words. The reordering is only selected if it leads to a translation
that looks like a much better sentence than the alternative.
Recent developments have, however, seen improvements in translation
quality following from syntax-based reordering. One such development
is the pre-translation approach that adjusts the source sentence to resemble
target language word order prior to translation. This is done based on
rules that are either manually created or automatically learned from word
aligned parallel corpora.
We introduce a novel approach to syntactic reordering. This approach
provides better exploitation of the information in the reordering rules and
eliminates problematic biases of previous approaches. Although the approach
is examined within a pre-translation reordering framework, it easily
extends to other frameworks. Our approach significantly outperforms a
state-of-the-art phrase-based SMT system and previous approaches to pretranslation
reordering, including (Li et al., 2007; Zhang et al., 2007b; Crego
& Mari˜ no, 2007). This is consistent both for a very close language pair,
English-Danish, and a very distant language pair, English-Arabic.
We also propose automatic reordering rule learning based on a rich set
of linguistic information. As opposed to most previous approaches that
extract a large set of rules, our approach produces a small set of predominantly
general rules. These provide a good reflection of the main reordering
issues of a given language pair. We examine the influence of several parameters that may have influence on the quality of the rules learned.
Finally, we provide a new approach for improving automatic word alignment.
This word alignment is used in the above task of automatically learning
reordering rules. Our approach learns from hand aligned data how to
combine several automatic word alignments to one superior word alignment.
The automatic word alignments are created from the same data that
has been preprocessed with different tokenization schemes. Thus utilizing
the different strengths that different tokenization schemes exhibit in word
alignment. We achieve a 38% error reduction for the automatic word alignment

Files in this item: 1

A Program for Recording User Activity Data for Empirical Reading and Writing Research

Carl, Michael(Frederiksberg, 2012)

[More information]

[Less information]

Abstract:

This paper presents a novel implementation of Translog-II. Translog-II is a Windows-oriented program to record and
study reading and writing processes on a computer. In our research, it is an instrument to acquire objective, digital data of
human translation processes. As their predecessors, Translog 2000 and Translog 2006, also Translog-II consists of two
main components: Translog-II Supervisor and Translog-II User, which are used to create a project file, to run a text
production experiments (a user reads, writes or translates a text) and to replay the session. Translog produces a log files
which contains all user activity data of the reading, writing, or translation session, and which can be evaluated by external
tools. While there is a large body of translation process research based on Translog, this paper gives an overview of the
Translog-II functions and its data visualization options.

Files in this item: 1

The paper discusses a method to triangulate process and product data. We
suggest converting Translog data into a relational format which contains
both process and product data. We outline how this representation allows
us to retrieve and correlate the various dimensions of the data more easily.
The concept of Alignment Unit (AU) is introduced and contrasted with that
of Translation Unit (TU). While AUs refer to translation equivalences in the
source and target texts of the product data, TUs refer to cognitive entities that
can be observed in the process data. With an (almost) exhaustive fragmentation
of the source and target texts into AUs, we are able to distribute and
allocate the entire set of keystroke data to appropriate AUs. Using the properties
of the keystroke data, AUs are quantified in a novel way which enables
us to visualise and investigate the structure of translation production on a
fine-grained scale.

Files in this item: 1

This paper argues that translators can greatly benefit from contrastive studies of discourse structure. Cross-linguistic studies of Italian
and Danish point to significant typological differences in information packaging in the two languages, especially in their use of
deverbalisation. Italian sentences tend to include a larger number of Elementary Discourse Units (EDUs), especially propositions,
than Danish. A higher percentage of these is rhetorically backgrounded by means of non-finite and nominalised predicates. Danish
text structure, on the other hand, is more informationally linear and characterised by a higher number of finite verbs and topic shifts.
These typological differences are transferred into three simple translation rules concerning 1) the number of EDUs, 2) the rhetorical
structure, and 3) the textualisation of rhetorical satellites.