Project description

The streamcorpus_pipeline python module contains tools for processing
streamcorpus.StreamItem objects stored in Chunks. It includes
transform functions for getting clean_html, clean_visible, creating
labels from hyperlinks to particular sites (e.g. Wikipedia), and
taggers like LingPipe, Serif, and Factorie, which make Tokens and
Sentences.