Terminology data banks as the bodies of knowledge: The model for the systematization of terminologies

At the Faculty of Social Sciences, together with colleagues from five research institutions, we prepared a terminological database named TERMIS. We started in 2007 with a less ambitious goal – the corpus of Slovenian texts on public relations, KoRP, which has been freely available from the start. Since then, the idea has grown beyond its initial boundaries, and today, a terminological database which could serve as a model for the systematisation of terminologies in other fields can be accessed at the Termania web portal. The database was built between 2011-2013 with the help of state-of-the-art lexicography tools and over 50 expert from the field of public relations and related fields. In 2011, the Slovenian Research Agency recognised the project as an important contribution to scientific development in Slovenia, and decided to support it financially. The Chamber of Commerce and Industry of Slovenia, Pristop and the Public Relations Society of Slovenia were also included in the project with their professional knowledge and experience.

Our sponsors were also part of this development story. We thank them for recognising the added value in the project and for co-creating it with us.

Project's code:

Duration:

Finances:

RESEARCH BACKGROUND

Terminology constitutes an important part of all modern national standard languages. The need to name everything, along with new knowledge and the latest discoveries make it necessary for all languages to tackle issues of managing and resolving this ubiquitous impulse. Our goal is to preserve a language as a nation-wide communication tool which fully, and with no doubt, functions in all situations. By satisfying the need to name the world around us, we can help to achieve this goal. The development of terminologies - especially their standardisation by means of dictionaries (standards, encyclopaedias) - calls for the interdisciplinary collaboration of professionals, linguists and computer scientists responsible for building databases.

Since The Bank of English corpus was established in 1980, it has been inconceivable to produce dictionaries without an awareness that the contemporary lexicographic approach requires the inclusion of corpora, if only partly. Corpora are among the sources of proof of better descriptions of language formation and usage, while their computer-manageable format has upgraded these descriptions with precise measures of various aspects of language. Slovenia has yet to see the consistent implementation of the concept of substantial and carefully considered documented material.

The research project will follow the corpus approach, which we believe produces technologically and methodologically more advanced and user-friendly results. At the same time, the approach enables the partial automation of the terminological candidates’ identification procedure (word lists, extraction).

Terminological dictionaries are only one of the numerous types of dictionary. Various people make use of them (professionals, students, translators); therefore, future terminological dictionaries and databases should contain both the content and information. In other words, e-dictionaries and terminological databases should incorporate as much valuable information as possible for all their users. In recent decades, foreign lexicographers have placed dictionary users at the centre of their work, and started to conduct various empirical studies which have shown why people use dictionaries, how often they use them, how they find the information they need, etc.

The goal of our project was the production of a terminological database of the public relations field. We anticipated an online free of charge access. We expected the findings could be applicable as a model for other sciences and disciplines.

RESEARCH RESULTS

The research started from a lemmatised and morpho-syntactically tagged corpus of public relations KoRP, which contains 1.8 million words and is monolingual, synchronic, written and static corpus of professional texts. From 2013 the KoRP corpus has been available in the NoSketch Engine and CUWI concordancer.

From the KoRP corpus we extracted the basic headword list by conducting an automatic term extraction using the LUIZ term extraction tool. In the next stage we extracted typical context: collocations and examples of use. The method we used for extracting lexical information for single and multi-word terms uses the Sketch Engine tool (http://www.sketchengine.co.uk/) and its Word sketch function. Extracted lexical information along with the headlist was imported into the dictionary editor of the Termania portal where the rest of the editing was performed.

The terminological database for the field of public relations contains 2000 terms with information on accent, norm, explanations, English translations, typical collocations and examples of use. Each entry contains links with related entries, as well as links to the concordances in the KoRP corpus and the Gigafida corpus. The database TERMIS is now freely accessible online at http://www.termania.net.

Technological infrastructure of the Termania portal upgraded in the project contains tools for the production and processing of dictionary entries in a way that helps the non-specialist to quickly start making a dictionary of their field. This infrastructure is available online, free of charge.

PUBLICATIONS

LOGAR BERGINC, Nataša: Slovene Terminologies / TERMIS: Preserving Slovene terminology in a globalising world. International Innovation: Disseminating science, research and technology. Bristol: Research media, avgust 2012. 79-81. //International Innovation is the leading global dissemination resource for the wider scientific, technology and research communities, dedicated to disseminating the latest science, research and technological innovations on a global level. More information and a complimentary subscription offer to the publication can be found at: www.researchmedia.eu.//