It’s a true story that Germans love their long words. However, this fact may not be so loved for text processing procedures. The lack of NLP libraries in Python adapted to German makes it difficult to properly analyze this kind of words. Let us share with you our NLP tool to split word compounds. It will transform the AI market.

While a picture may be worth a thousand words – a thousand words may be worth thousands of dollars. Never thought how valuable all your company unstructured data would be? This heterogeneous knowledge can turn out to be quite useful for companies, however, there is still much to be learned.

Did you hear that many people were banned at Twitter just by typing in Cyrillic? The reason was that thousands of Russian bots sent plenty of tweets in the two days preceding the EU referendum. It’s true that Russian language uses letters from the Cyrillic script, but the same is true for more than 20 languages around the world!

A ‘word embeddings’ approach has been widely adopted for machine learning processes. While an extensive research has been carried out during these years to analyze all theoretical underpinnings of algorithms such as word2vec, GloVe or fastText, it is surprising that little has been done, in turn, to solve some of the more complex linguistic issues raised when getting down to business.

Machine learning algorithms require a great amount of numeric data to work properly. Real people, however, do not speak to bots using numbers, they communicate through the natural language. That’s the main reason why chatbot developers need to convert all these words into digits so that those virtual assistants can understand what users are saying. And here is where word embeddings come into play.