Generating Input Data

This project relies on third party software components, such as the Stanford Parser, and provides the functionalities to extract KeLP data structures from text snippets. Being a general purpose machine learning platform, KeLP is not limited to only Natural Language Processing tasks. However, for the moment, we do not provide any feature extraction capability for different fields.

In order to preserve the lightweight of the main KeLP project, kelp-input-generator is not included in kelp-full. If you want to use the kelp-input-generator functionalities in your maven project you can easily include it with the following Maven repository:

Currently, kelp-input-generator allows to easily generate TreeRepresentations from text snippets. In particular, it provides the capabilities to extract the LOCT, LCT and GRCT representations, which are a tree views of a dependency graph, as introduced in (Croce et al., 2011).

KeLP uses its own format for representing graph data. However, a converter from the popular gSpan format is available inkelp-input-generator: it.uniroma2.sag.kelp.input.graph.GspanFormatConverter. The main method on the class can be invoked passing as parameter the gSpan file (and optionally a file with the target labels if they are available and they are not included in the gSpan file).

If your input graphs are in a format supported by Open Babel, the following script converts graphs from one of the Open Babel to gSpan. Therefore, all 111 Open Babel formats are indirectly supported as well.