jsoup is a Java library for working with real-world HTML. It can parse HTML from a URL, file, or string. It can find and extract data, using DOM traversal or CSS selectors. The HTML elements, attributes, and text can be manipulated. It can clean user-submitted content against a safe white-list. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup; jsoup will create a sensible parse tree.

LEPL is a recursive descent parser library written
in Python. It is based on parser combinator
libraries popular in functional programming, but
also exploits Python language features. Operators
provide a friendly syntax, and the consistent use
of generators supports full backtracking and
resource management. Backtracking implies that a
wide variety of grammars are supported; appropriate memoisation ensures that even left-recursive grammars terminate.

Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.

listparser is a Python library that parses subscription lists (also called reading lists) and returns all of the feeds, subscription lists, and "opportunity" URLs that it finds. It supports OPML, RDF+FOAF, and the iGoogle exported settings format.

JWPL is a language independent, database-driven, high performance Wikipedia API that provides structured access to information nuggets like redirects, categories, articles, and link structure. It contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page or standalone with other text, TimeMachine, which reconstructs a snapshot of Wikipedia from a specific date, or multiple snapshots from a time span, and RevisionMachine, which offers efficient access to the history of articles using a dedicated storage format which decreases storage space by 98%. This enables random access to the whole revision history without requiring several terabytes of storage for a single Wikipedia dump.

cardme is a Java library implementation of RFC 2426, VCard. It provides Java applications with a way to read and write from and to the VCard file format. The project's goals are to provide a flexible and easy to use library with excellent documentation.

pyC11 is a grammar to parse programs in the C programming language following ISO/IEC 9899:2011. It is written using pyPEG, a parsing framework for Python. The grammar supports Python 2.7 and 3.x. The test bench requires py.test.

YAJL (Yet Another JSON Library) is a small event-driven (SAX-style) JSON parser written in ANSI C, and a small validating JSON generator. It's highly portable, data representation independent, fast, generates verbose error messages including context of where the error occurs in the input text, can parse JSON data incrementally off a stream, and is tiny.