You are here

Tour de CLARIN: Sweden

Since parliamentary speech has a great societal impact on account of its language and content, the creation and availability of big parliamentary multimodal corpora—a topic that has been the subject of a recent CLARIN-PLUS workshop—plays a pivotal role in humanitarian and social research.

The Riksdag's open data is one such corpus. It is the digitized collection of Swedish parliamentary data and consists of roughly 30,000 documents pertaining to Sweden’s national political decision processes. It has been made available for download on the website of the Swedish parliament. In addition, the Swedish National Library has digitized and published the public reports of inquiry for the period between 1922 and 1999 under the CC0 license on the parliamentary website, with newer reports now being digitized from the very outset.

This parliamentary corpus is available in Korp and consists of 1.25 billion tokens. It can also be downloaded in the XML format from the resource page of Språkbanken. The annotation was performed with the SWE-CLARIN’s tool Sparv and consisted of tokenisation, lemmatisation, as well as lemgram (inflectional paradigm) and word sense identification, and compound splitting.

Norén has also collaborated with Roger Mähler from the Center of Digital Humanities at Umeå University to analyse the changes in governmental discourse on the basis of the nouns’ distribution. Using topic modelling they were able to identify how information discourse arose in the 1960s and infiltrated governmental policies. Norén and Pelle Snickars have also used similar methods to analyse policies related to Swedish film in the 20th century on the basis of 4500 reports in the SOU corpus. All in all, digitized language data like the Riksdag’s open data corpus have made it possible to study the evolution of concepts like information in great detail, and by extent, they unveil historic change in a more precise and nuanced manner than ever before.