Project news

28/01/2019 — RusVectōrēs tutorial notebook is updated. It shows how to preprocess Russian words in order to look them up in our models, and how to work with the models themselves. It takes into account the changes introduced in 2019.

18/01/2019 — New models are released, many significant changes, read in details.

30/06/2017 — We added the model trained on Araneum Russicum Maximum, which is one of the largest Russian web corpora (more than 10 billion words). In addition, all the models are re-evaluated using RuSimLex965 semantic similarity test set (it is more consistent than RuSimLex999).

30/06/2017 — Visualizations are substantially upgraded. You can now use several word sets as an input: they will be labeled with different colors in the visualizations. If there is only one set, the colors will correspond to the words’ parts of speech. Additionally, it is now possible to visualize your data in TensorFlow Embedding Projector with one mouse click.

09/03/2017 — We present a separate page for models where you can download both recent and archived models, and compare them against each other. Also, we added the links to Russian test sets and to the conversion table from Mystem to the Universal PoS Tags.

02/02/2017 — Major update of the models: news corpus now covers events up until November 2016, Wikipedia dump is updated to the same date. Additionally, PoS tags are converted to the Universal Parts of Speech standard, and the models’ vocabularies now feature multi-word-entities (bigrams).

22/10/2016 — There are now hints as you type a query. Note that there are words not covered by hints as well (though they are rare and strange)!

01/07/2016 — For security reasons, the option to automatically train models on user-supplied corpora is now disabled. However, if you have an interesting corpus, contact us, and we will be glad to train a model for you.

07/04/2016 — RusVectōrēs source code is released on Github as Webvectors framework.