Project description

Cypunct is designed to solve the problem of quickly splitting a Unicode
string based on a set of characters.

Cypunct is designed to work on Python 2.6, 2.7, and 3.3+. Because
Cypunct is a Cython extension, it will (probably) only work in the CPython
runtime.

For Python versions 2.6 and 2.7, Cypunct will only run if these
CPython runtimes are compiled with the flag
--enable-unicode=ucs4. Cypunct will throw an exception
if your Python 2 runtime was not compiled with UCS-4.

Installation

Installation is easiest with pip. Just run

pip install cypunct

Usage

Cypunct takes a Unicode string and a frozenset of delimiter characters,
and splits the string based on that set. Every delimiter character
should be a single Unicode code point – len(char) should be 1.

A simple example, where we provide a small frozenset is below.

>>>fromcypunctimportsplit>>>split("James Mishra is the... best human ever, or so I think.",frozenset({' ','.',','}))['James','Mishra','is','the','best','human','ever','or','so','I','think','']

However, if you only need to split on whitespace characters, str.split() much
better performance. If you only need to split on one character, str.split(char)
will also be much faster.