Not Logged In

pypolibox 1.0.2

pypolibox

pypolibox is a database-to-text generation (NLG) software built
on Python 2.7, NLTK and Nicholas FitzGerald’s pydocplanner.

Using a database of technical books and some user input, pypolibox
generates sentences descriptions. These descriptions are then used by
the OpenCCG surface realiser to generate written sentences in German.

Installation

Prerequisites

In order to generate sentences (instead of abstract sentence
descriptions), you will need to install OpenCCG (tested with version
0.9.5). Make sure that you can call tccg from the command line,
e.g. by adding the openccg/bin directory to your $PATH.

The following example query will generate HLDS XML snippets describing books
about Prolog written in German:

pypolibox --language German --proglang Prolog --output-format hlds

Further usage examples can be found in the pypolibox.database.Query
class documentation.

Library usage

If you’d like to access pypolibox from
within a Python interpreter, you can simply use the same arguments.
Instead of a string like -l German -p Prolog, you will have to
provide your arguments as a list of strings:

Query(["-l", "German", "-p", "Prolog"])

This query would be equivalent to the command line queries above.
pypolibox is built as a pipeline, where each important step is
represented by a class. Each of these classes function as the input
of the next class in the pipeline, e.g.:

If you instanciate a Query with your query arguments, you can use
this Query instance as the input of a Results instance
(which contains the data that the database provided for your query),
which in turn can be used as the input of a Books instance etc.

Of course, you wouldn’t want to chain all those classes just to retrieve
textplans. To do so, simply use one of the functions provided in the
debug module, either by running the debug.py file in
the interpreter or by importing it:

import debug
debug.gen_textplans(["-l", "German", "-p", "Prolog"])

This function call would return the same results as the aforementioned
command line calls. For further testing, try
debug.testqueries and debug.error_testqueries, which
basically are lists of predefined valid and invalid query arguments and which
can be used to query the database (and see how errors are handled).

Documentation

The documentation is available online,
but you can always get an up-to-date local copy using Sphinx.

You can generate an HTML or PDF version by running these commands in
pypolibox’s docs directory:

make latexpdf

to produce a PDF (docs/_build/latex/pypolibox.pdf) and

make html

to produce a set of HTML files (docs/_build/html/index.html).

Package Overview

The pypolibox package contains the following modules:

The pypolibox module is the main module, which is invoked from the
command line.

The database module handles the user input, queries the database and
returns the results.

The textplan module takes those propositions and turns them into
messages. In contrast to propositions, messages do not contain duplicates
and add comparative information. Rules will be used to combine those
message into constituent sets and ultimately into one text plan. The
textplan module also allows exporting those text plans in XML format.

The rules module contains the rules used by be the textplan module
to combine messages into constituent sets and textplans, respectively.

The messages module generates messages from propositions, which will
be used by the textplan module.

The lexicalize_messageblocks is the “main” module of the
lexicalization. For each message block in a textplan, it generates one or
more possible lexicalizations which are then realized by the
realization module.

The lexicalization module generates lexicalizations (in HLDS-XML
format) for each message, which are used by the
lexicalize_messageblocks module to form lexicalizations of complete
message blocks.

A note on terminology: A message block in pypolibox is basically an
instance of the Message class, e.g an “id” message block. This
“id” message block in turn consists of several messages, e.g. an
“authors” message and a “title” message.

The realization module takes a lexicalized phrase or sentence (in
HLDS-XML format) and converts it into a surface realization (with the
help of OpenCCGs tccg executable).

The hlds module allows to convert textplans from a
nltk.featstruct-based format to HLDS-XML and vice versa. In addition, the
module can produce attribute-value matrices of these textplans as
LaTeX/PDF files.

Licence

Author

Arne Neumann

Acknowledgements

This software reimplements parts of the Java-based JPolibox
text-generation software written by Alexandra Strelakova, Felix Dombek,
Mathias Langer and Till Kolter. pypolibox also includes a heavily
modified version of Nicholas FitzGerald’s pydocplanner, which he
released under a Creative Commons license (not specified further).
The German OpenCCG grammar fragment that comes with pypolibox was written by
Martin Oltmann.