Other sites

Blog Archives

Introducing text2vec 0.4
Today I’m pleased to announce new major release of text2vec - text2vec 0.4 which is already on CRAN.
For those readers who is not familiar with text2vec - it is an R package which provides an efficient framework with a concise API for text analysis and natural language processing.
With this release I also launched project homepage -...

Today I’m pleased to announce preview of the new version of text2vec. It is located in the 0.3 development branch, but very soon (probably in about a week) it will be merged into master.
To reproduce examples below, please install [email protected] from github:
devtools::install_github('dselivanov/[email protected]')Also I’m waiting for feedback from text2vec users, please spend a few minutes:
What APIs are not clear /...

Before reading this post, I very recommend to read:
Orignal GloVe paper
Jon Gauthier’s post, which provides detailed explanation of python implementation. This post helps me a lot with C++ implementation.
Word embedding
After Tomas Mikolov et al. released word2vec tool, there was a boom of articles about words vector representations. One of the greatest is GloVe,...

Today I will start to publish series of posts about experiments on english wikipedia. As I said before, text2vec is inspired by gensim - well designed and quite efficient python library for topic modeling and related NLP tasks. Also I found very useful Radim’s posts, where he tried to evaluate some algorithms on english wikipedia dump....

In the last weeks I have actively worked on text2vec (formerly tmlite) - R package, which provides tools for fast text vectorization and state-of-the art word embeddings.
This project is an experiment for me - what can a single person do in a particular area? After these hard weeks, I believe, he can do a lot.
There are a lot...

Introduction
In the next series of posts I will try to explain base concepts Locality Sensitive Hashing technique.
Note, that I will try to follow general functional programming style. So I will use R’s Higher-Order Functions instead of traditional R’s *apply functions family (I suppose this post will be more readable for non R users). Also I will use brilliant...