Chinese sentiment analysis now available

Chinese sentiment analysis is now part of the Repustate API

We are very proud to announce our new Chinese sentiment analysis engine. Based on the same engine that we used to create our world-leading Arabic sentiment engine, the Chinese sentiment analysis engine is blazingly fast and accurate.

For the impatient ones, if you want to try it out, you can use our online demo here. If you're interested in how our Chinese sentiment analysis works, read on!

Conditional Random Fields

Unlike English or Latin-based languages, Chinese (simplified) doesn't necessarily disambiguate words using whitespace. For example the following string of symbols is a completely normal sentence in Chinese:

(For those who don't read Chinese, this is a review of a restaurant). Now you'll see a few white spaces here & there but there's actually many more words being expressed than there are separated tokens. So how do we know where one word (or idea) begins and the next ends?

We use a technique called conditional random fields which uses probabilistic models to infer what the meaning of a particular glyph (character) is given the glyphs around it. With a large enough pre-tagged corpus of Chinese text, Repustate can achieve almost 100% perfection in identifying the individual words or ideas being expressed in a long chain of Chinese glyphs.

Part of speech tagging & sentiment

Now that we know which words are being used, we can apply part of speech tagging (nouns, verbs, adjectives etc.) to help construct a grammatical overview of a piece of text. This then allows us to perform sentiment analysis using our proprietary engine. It's the same engine that powers our Arabic sentiment analysis. Sentiment analysis uses a combination of probabilistic models, a dictionary of terms or phrases which connote sentiment as well as hand-tuned heuristics that are language specific. All of this is done in a split second so you can still analyze hundreds of Chinese documents in one HTTP request using the Repustate API.