Performance

Author of the package is a little bit obsessed about efficiency.

This package is efficient because it is carefully written in C++, which also means that text2vec is memory friendly. Some parts (such as GloVe) are fully parallelized using the excellent RcppParallel package. This means that the word embeddings are computed in parallel on OS X, Linux, Windows, and even Solaris (x86) without any additional tuning or tricks.

Other emrassingly parallel tasks (such as vectorization) can use any parallel backend which supports foreach package. They can achieve near-linear scalability with number of available cores.

Finally, a streaming API means that users do not have to load all the data into RAM.