Abstract

It is well-known that translated texts read differently from texts that have been written without the constraints imposed by source texts from another language. One of the features that can confer a distinctive feel to translations is the frequency with which certain lexical items are represented in them. Previous research has compared the frequency of specific words in translations and in texts that are not translations, and unveiled substantial differences in their distributions. Most of these studies adopt a bottom-up approach. Their starting point is a given word whose frequency in translated and non-translated texts is then compared. In this study, I adopt an explorative, top-down approach instead. I begin with a Portuguese language corpus of translated and non-translated literary texts, and attempt to identify lemmas which are markedly over- and under-represented in the translations. Our results not only appear to support existing bottom-up intuitions regarding distinctive lexical distributions, but also disclose a number of unexpected contrasts that would not have been discernible without recourse to corpora.