“Woah! It's like Spotify but for academic articles.”

Instant Access to Thousands of Journals for just $40/month

Design and development of Iberia: a corpus of scientific Spanish

Zamorano, Jordi Porta; García, Emilio del Rosal; Lara, Ignacio Ahumada
Design and development of Iberia: a corpus of scientific Spanish
Iberia is a synchronic corpus of scientiï¬c Spanish designed mainly for terminological studies. In this paper, we describe its design and the infrastructure for its acquisition, processing and exploitation, including mark-up, linguistic annotation, indexing and the user interface. Two preprocessing tasks affecting a large number of words are described in detail: dehyphenation and identiï¬cation of text fragments in other languages. We also show how some of the reported statistics, namely, dispersion and association, are used for research on lexis. 1. Introduction The Iberia project3 was launched to bridge the gap between corpus linguistics and linguistic research in scientiï¬c Spanish. The aim of the Iberia project was two-fold: (i) the creation of a synchronic representative corpus of scientiï¬c texts in Spanish and (ii) the creation of the infrastructure for its linguistic processing and exploitation. Iberia will be part of an observatory of neologisms in science whose purpose is to study terminological usage of words in a variety of ï¬elds, to detect term obsolescence and to track recent 1 Departamento de IngenierÃ­a InformÃ¡tica, Universidad AutÃ³noma de Madrid, Campus de Cantoblanco, c/ Francisco TomÃ¡s y Valiente, 11, Madrid 28049, Spain. Correspondence to: Jordi Porta Zamorano, e-mail: jordi.porta@uam.es 2 Centro de Ciencias
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.pngCorporaEdinburgh University Presshttp://www.deepdyve.com/lp/edinburgh-university-press/design-and-development-of-iberia-a-corpus-of-scientific-spanish-0U02yFzKP0