Detect Text Language using Azure Cognitive Services in .NET Core

There are many ways to identify what is the language of a text, in this era of AI, we can also add AI capabilities to our own application. I will show you to see how to use the AI Cognitive Service provided by Microsoft Azure to detect the language of a text.

The prerequisite is that you will need an Azure subscription.

Create Azure Cognitive Services Account

Click "Create a resouce", then search "Translator", select "Translator text", which is one of the applications of Azure Cognitive Services, it's main purpose is to do translation, but we can also use it to identify the language of the text.

Specify a name in Name, which can be arbitrary and does not affect development. Choose a Pricing tier, here I choose F0, which is free. Resource group can also be specified arbitrarily, it won't affect development.

After the creation is complete, copy a Key, Key1 or Key2 can both be used.

Use Cognitive Services in .NET Core

Azure Cognitive Services provides a REST interface, so we can construct requests and parse returned JSON strings in. NET Core, just as we would with any rest API.

TextLanguageDetector

Create a new class named TextLanguageDetector. It is used to encapsulate actions that call Azure Cognitive Services. Define properties Host, Route, SubscriptionKey. The SubscriptionKey is the key that was previously copied from Azure portal. We need to allow the caller to assign this freely according to his or her Azure account, so leave it in the constructor parameters. Host and Route are fixed, so they can be hard coded in the program.

Very straightforward. A constructed body is submitted to the endpoint address of the Cognitive Service using the POST action, and the content Text is the input parameter of the method, that is, the text to be recognized. The API is authenticated in a way that uses SubscriptionKey. The final JsonResponse is the result, which is converted to the DetectResult type.

Assuming that Simplified Chinese is recognized and no exception occurs, then the return JSON for Azure Cognitive Services will be like this:

language is the language code, Zh-hans is Simplified Chinese. score is AI believes how likely it is to be the language, and 1.0 is very sure. For the recognition of the text "予力地球上每一人、每一组织，成就不凡", two kinds of languages are emerged: Simplified Chinese and Japanese. But Japanese is alternatives, so AI basically concludes that the language is Simplified Chinese. To see the specific language code and language name correspondence, you can try:

var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures);

Constructing DetectResult

In order for our program to be more user-friendly, we will not return only JSON. I constructed the DetectResult type based on two scenarios that Azure Cognitive Services might return: success and failure:

RawJson is used to store the JSON itself returned by the Cognitive Service, allowing the caller to do some more advanced custom parsing. IsSuccess indicates whether the call was successful, and if it is unsuccessful, the user can check ErrorMessage to get a specific error message. If successful, you can call the ToCogresults() method to parse the result into the TextCogResult type. This method returns a list because the text you enter does not necessarily have only one language.

Many thanks. We where test crawling our website and noticed a difference in the URLs. Now we know why :) It's already hard to keep consistency when developing applications in a team. Now even need to be more careful with selecting which UrlEncode to use by default.