We just relaunched KorAP providing a large subset of DeReKo (with most data from the W archive of COSMAS II, comprising more than 11 million documents). The data is annotated with part-of-speech information from CoreNLP, MarMoT, OpenNLP and TreeTagger, additional morphological features from MarMoT, lemma annotation from TreeTagger, constituency annotations from CoreNLP and dependency annotation from Malt.

To grant access to the restricted corpora, we are currently fixing a critical bug in the integration of the user management of COSMAS II – therefore, KorAP is temporarily not accessible from outside the IDS until we finished the integration.

We are happy to announce the open source release of Rabbid (“Recherche- und Analyse-Basis für Belegstellen in Diskursen”). Rabbid is a standalone rapid application development environment for KorAP and used in production for the creation and management of collections of textual examples in the area of discourse analysis and discourse lexicography.

The programme of the 3rd meeting of the workshop on Challenges in the Management of Large Corpora (CMLC-3) has been posted, with the open-content publication of the proceedings volume scheduled for the beginning of July at the latest.

KoralQuery, the general Corpus Query Protocol used for inter-component communication in KorAP, was presented on May 11th at the workhop on Innovative Corpus Query and Visualization Tools (QueryVis). The workshop was part of the 20th Nordic Conference of Computational Linguistics (Nodalida) in Vilnius, Lithuania. Proceedings are already available.

We would like to thank the reviewers and organizers for a great workshop!

We are happy to announce the open source release of Krill, the Lucene-based search backend for KorAP! Krill is the reference implementation for KoralQuery, covering most of the protocols features, including …

Fulltext search

Token-based annotation search

Span-based annotation search

Distance search

Positional search

Nested queries

… and many more!

You can download Krill on GitHub – feedback and contributions are very welcome!

We are happy to announce the release of Koral, the module which KorAP uses to translate queries from its supported query languages into KoralQuery, a general protocol for queries to corpus analysis systems. Taking a query string as its input, Koral generates a corresponding KoralQuery instance which represents that query independently of the source query language, such that the system may work in a query language-agnostic fashion. Besides the actual linguistic query, KoralQuery also has facilities to represent virtual collection definitions as well as error and warning messages that may arise during query processing.

You can access and download the Koral sources from the KorAP GitHub repository. Please note that the current version 0.1.0 is not a final version and subject to work in progress, which will result in further releases in the not-so-far future.