1 Introduction

#+title: Library for working with CoNLL-U files with CL
The =cl-conllu= is a Common Lisp library to work with [[http://universaldependencies.org/format.html][CoNLL-U]],
licensed under the [[http://www.apache.org/licenses/LICENSE-2.0][Apache license]].
It is developed and tested with SBCL but should probably run with any
other implementation.
* Install
The =cl-conllu= library is now available from quicklisp distribution,
if you are not planning to change the code, just use:
#+BEGIN_SRC lisp
(ql:quickload :cl-conllu)
#+END_SRC
If you don't have quicklisp installed already, follow [[https://www.quicklisp.org/beta/#installation][these steps]].
If you plan on contributing, clone this project to your
=local-projects= quicklisp directory (usually at
=~/quicklisp/local-projects/=) and use the same command as above to
load the code.
* Documentation
See the https://github.com/own-pt/cl-conllu/wiki
* How to cite
http://arademaker.github.io/bibliography/tilic-stil-2017.html
#+BEGIN_EXAMPLE
@inproceedings{tilic-stil-2017,
author = {Muniz, Henrique and Chalub, Fabricio and Rademaker, Alexandre},
title = {CL-CONLLU: dependências universais em Common Lisp},
booktitle = {V Workshop de Iniciação Científica em Tecnologia da
Informação e da Linguagem Humana (TILic)},
year = {2017},
address = {Uberlândia, MG, Brazil},
note = {https://sites.google.com/view/tilic2017/}
}
#+END_EXAMPLE

2.1 cl-conllu

This library provides a set of functions to work with CoNLL-U files. See https://universaldependencies.org/format.html for details about the CoNLL-U format adopted by the Universal Dependencies community. The library has functions for read/write files, apply rules for sentences transformation in batch mode, tree visualization, compare and evaluation trees etc. Documentation available in https://github.com/own-pt/cl-conllu/wiki.

The attachment score is the percentage of words that have correct
arcs to their heads. The unlabeled attachment score (UAS) considers
only who is the head of the token, while the labeled attachment
score (LAS) considers both the head and the arc label (dependency
label / syntactic class).

In order to choose between labeled or unlabeled,
set the key argument LABELED.

The attachment score is the percentage of words that have correct
arcs to their heads. The unlabeled attachment score (UAS) considers
only who is the head of the token, while the labeled attachment
score (LAS) considers both the head and the arc label (dependency
label / syntactic class).

In order to choose between labeled or unlabeled,
set the key argument LABELED.

Converts the collection of sentences (as generated by READ-CONLLU)
in CONLL, using the function TEXT-FN to extract the text of each
sentence and ID-FN to extract the id of each sentence (we need this
as there is no standardized way of knowing this.) Also the
generated Turtle file contains a lot of duplication so when you
import it into your triple-store, make sure you remove all
duplicate triples afterwards.

Converts a list of sentences (e.g. as generated by READ-CONLLU)
in SENTENCES, using the function TEXT-FN to extract the text of each
sentence and ID-FN to extract the id of each sentence (we need this
as there is no standardized way of knowing this.)

Verifies if a sentence tree is projective. Intuitively, this means
that, keeping word order, there’s no two dependency arcs that cross.
More formally, let i -> j mean that j’s head is node i. Let ’->*’
be the transitive closure of ’->’.

A tree if projective when, for each node i, j: if i -> j, then for
each node k between i and j (i < k < j or j < k < i), i ->* k.

Restricted to words which are classified as of syntactical class
(dependency type to head) DEPREL, returns the precision:
the number of true positives divided by the number of words
predicted positive (that is, predicted as of class DEPREL).

We assume that LIST-SENT1 is the classified (predicted) result
and LIST-SENT2 is the list of golden (correct) sentences.

ERROR-TYPE defines what is considered an error (a false negative).
Some usual values are:
- ’(deprel) :: for the deprel tagging task only
- ’(head) :: for considering errors for each syntactic class
- ’(deprel head) :: for considering correct only when both deprel
and head are correct.

Restricted to words which are originally of syntactic class
(dependency type to head) DEPREL, returns the recall:
the number of true positives divided by the number of words
originally positive (that is, originally of class DEPREL).

We assume that LIST-SENT1 is the classified result
and LIST-SENT2 is the list of golden (correct) sentences.

ERROR-TYPE defines what is considered an error (a false negative).
Some usual values are:
- ’(deprel) :: for the deprel tagging task only
- ’(head) :: for considering errors for each syntactic class
- ’(deprel head) :: for considering correct only when both deprel
and head are correct.

Receives SENTENCE, a sentence object, and returns a string
reconstructed from its tokens and mtokens.

If IGNORE-MTOKENS, then tokens’ forms are used. Else, tokens with
id contained in a mtoken are not used, with mtoken’s form being
used instead.

It is possible to special format some tokens. In order to do so,
both SPECIAL-FORMAT-TEST and SPECIAL-FORMAT-FUNCTION should be
passed. Then for each object (token or mtoken) for which
SPECIAL-FORMAT-TEST returns a non-nil result, its form is modified
by SPECIAL-FORMAT-FUNCTION in the final string.

Inserts TOKEN in a SENTENCE object. It will not be inserted exactly
as given: its ID will be the same (place where it’ll be inserted)
but its head should point to id value prior to the insertion.
Therefore, it will be modified. Its TOKEN-SENTENCE slot will be
modified as well in order to point to SENTENCE. It changes the
SENTENCE object passed.

The original file contains a set of sentences. The modified file
contains some sentences from original modified, this function
replaces in original the sentences presented in modified file,
matching them using the sentence ids. If the modified file contains
sentence not in original, the flag ’add-new’ , if true, says that
these sentence must be added in the end of the original file.

Remove the token with the given ID if it is not part of a
multi-word token and it does not contain childs. It returns two
values, the sentence (changed or not) and a boolean (nil if the
sentence was not changed and true if changed. If the removed token
is the root of the sentence, a new root must be provided.