Fast Japanese Tokenization with a Single Pip Install

I just released a version of fugashi with
support for installing UniDic directly through
pip. With this release you can have a fully functional and fast Japanese
tokenizer running after just one command. Here's how you can install it:

pip install fugashi[unidic-lite]

Note this will take up roughly 250MB on disk after installation. Since wheels
are provided for Linux, OSX, and Win64 you shouldn't need a C compiler or
anything else to get this working. Special thanks for this release goes to Aki
Ariga for help testing on Windows.

While there are other packages you can install through pip that will give you a
working tokenizer in one command, like
Janome, their ease of use comes at the
cost of speed, sometimes by orders of magnitude.

In order to fit a dictionary under PyPI's limit of 60MB I had to use an old
version of UniDic from 2013. That said, if you want to use the latest UniDic
that's an option too, it just has an extra step:

pip install fugashi[unidic]
python -m unidic download

That will download the latest version of UniDic I've packaged, currently 2.3.0,
which takes up 1GB on disk. This doesn't fit in PyPI, but uses Github Release
artifacts in a style similar to spacy-models.

If you have an open source machine learning or other project and would like to
add Japanese support, or if you have Japanese support but it's hard to get
working, please feel free to contact me about improving it.

There are a few more convenience features I'd like to add to fugashi, like a
command-line mode for environments where fugashi is installed but MeCab isn't,
but for the most part I think it's ready for a 1.0 release at this point. Going
forward I'm looking into ways to get away from MeCab entirely. Hopefully
there'll be progress on that front before too long. Ψ