Last year was a good year for uClassify. The main theme was to offer classifiers in multiple languages (English, Spanish, French and Swedish). The task was non trivial and we decided to keep it in ‘beta’ for a long to make sure it works and scales as intended. Now we feel confident to move out of beta and start to promote the service.

We created a few new classifiers for our users, the most popular are the IAB Taxonomy V2 and Language Detection classifiers (I am particularly proud of its capability to detect 370 different languages!) .

For the second half of 2017 I went on parental leave, during this time I mostly monitored uClassify, answered emails and pushed a few fixes.

As a hobby project I created a site with tons of generated number sequences, sequencedb.net, if you are into that kind of thing.

Thoughts about 2018

In the beginning of 2018 we will add more classifiers in different languages and move out of beta and do some promoting.

As for the next big features we are not entirely sure, there is a big request for URL batching, for different reasons we’ve been dodging this in the past, but it deserves a reconsideration.

During parental leave I played a lot with numeric, images and time series classification (as opposed to text). This is something I’m thinking of might find it’s way into the platform, although not sure in what form.

Another thing we should do is to publish api clients in different languages (Java, Python, C# etc).

During the coming month (my last month on parental leave) I’ll start with some of the tasks and set a plan for the rest of the year.

During the last half year the Sentiment classifier have been beta enabled for Spanish, French and Swedish. The test period has been very successful and we have decided to expand multi language support to more popular classifiers such as the Gender Analyzer, Mood and Myer Briggs classifiers.

Classifiers with multiple languages are have flags displayed like the icons above. From the GUI you can test them by clicking the flag first, from the API you simply add the language code (/es, /fr, /sv) to the request URL, for more information see the documentation.

The service is still in beta, as we still need to make sure it scales when more users start to use it. The API will probably not change.

The Interactive Advertising Bureau (IAB) has released a version 2 of their taxonomy as of the first of March 2017. The new taxonomy contains more topics than the old and has gone through a general overhaul to make it more clear.

We have build a new classifier, IAB Taxonomy V2, that conforms with the latest standard.

The new ‘Content’ category has been left out but you can get content language by calling our Language Detector.

Any feedback is appreciated and we may add more training data if necessary.

Class name format

The new taxonomy has up to 4 tiers this is reflected in the class names. The format of the class names is level1_leaf_id1_id2_id3_id4 the ids correspond to the IAB codes and are integers.

You can read more about the taxonomy at their homepage where you also can find the complete id mapping.

Some of the rare languages (about 30) may have insufficient training data. The idea is to improve the classifier as more documents are gathered. Also we may add more languages in the future, so make sure your code can handle that.

Here is the full list of supported languages

Language Name

ISO 639-3

Type

Abkhazian

abk

living

Achinese

ace

living

Adyghe

ady

living

Afrihili

afh

constructed

Afrikaans

afr

living

Ainu

ain

living

Akan

aka

living

Albanian

sqi

living

Algerian Arabic

arq

living

Amharic

amh

living

Ancient Greek

grc

historical

Arabic

ara

living

Aragonese

arg

living

Armenian

hye

living

Arpitan

frp

living

Assamese

asm

living

Assyrian Neo-Aramaic

aii

living

Asturian

ast

living

Avaric

ava

living

Awadhi

awa

living

Aymara

aym

living

Azerbaijani

aze

living

Balinese

ban

living

Bambara

bam

living

Banjar

bjn

living

Bashkir

bak

living

Basque

eus

living

Bavarian

bar

living

Baybayanon

bvy

living

Belarusian

bel

living

Bengali

ben

living

Berber

ber

living

Bhojpuri

bho

living

Bishnupriya

bpy

living

Bislama

bis

living

Bodo

brx

living

Bosnian

bos

living

Breton

bre

living

Bulgarian

bul

living

Buriat

bua

living

Burmese

mya

living

Catalan

cat

living

Cebuano

ceb

living

Central Bikol

bcl

living

Central Huasteca Nahuatl

nch

living

Central Khmer

khm

living

Central Kurdish

ckb

living

Central Mnong

cmo

living

Chamorro

cha

living

Chavacano

cbk

living

Chechen

che

living

Cherokee

chr

living

Chinese

zho

living

Choctaw

cho

living

Chukot

ckt

living

Church Slavic

chu

ancient

Chuvash

chv

living

Coastal Kadazan

kzj

living

Cornish

cor

living

Corsican

cos

living

Cree

cre

living

Crimean Tatar

crh

living

Croatian

hrv

living

Cuyonon

cyo

living

Czech

ces

living

Danish

dan

living

Dhivehi

div

living

Dimli

diq

living

Dungan

dng

living

Dutch

nld

living

Dutton World Speedwords

dws

constructed

Dzongkha

dzo

living

Eastern Mari

mhr

living

Egyptian Arabic

arz

living

Emilian

egl

living

English

eng

living

Erzya

myv

living

Esperanto

epo

constructed

Estonian

est

living

Ewe

ewe

living

Extremaduran

ext

living

Faroese

fao

living

Fiji Hindi

hif

living

Finnish

fin

living

French

fra

living

Friulian

fur

living

Fulah

ful

living

Gagauz

gag

living

Galician

glg

living

Gan Chinese

gan

living

Ganda

lug

living

Garhwali

gbm

living

Georgian

kat

living

German

deu

living

Gilaki

glk

living

Gilbertese

gil

living

Goan Konkani

gom

living

Gothic

got

ancient

Guarani

grn

living

Guerrero Nahuatl

ngu

living

Gujarati

guj

living

Gulf Arabic

afb

living

Haitian

hat

living

Hakka Chinese

hak

living

Hausa

hau

living

Hawaiian

haw

living

Hebrew

heb

living

Hiligaynon

hil

living

Hindi

hin

living

Hmong Daw

mww

living

Hmong Njua

hnj

living

Ho

hoc

living

Hungarian

hun

living

Iban

iba

living

Icelandic

isl

living

Ido

ido

constructed

Igbo

ibo

living

Iloko

ilo

living

Indonesian

ind

living

Ingrian

izh

living

Interlingua

ina

constructed

Interlingue

ile

constructed

Iranian Persian

pes

living

Irish

gle

living

Italian

ita

living

Jamaican Creole English

jam

living

Japanese

jpn

living

Javanese

jav

living

Jinyu Chinese

cjy

living

Judeo-Tat

jdt

living

K’iche’

quc

living

Kabardian

kbd

living

Kabyle

kab

living

Kadazan Dusun

dtp

living

Kalaallisut

kal

living

Kalmyk

xal

living

Kamba

kam

living

Kannada

kan

living

Kara-Kalpak

kaa

living

Karachay-Balkar

krc

living

Karelian

krl

living

Kashmiri

kas

living

Kashubian

csb

living

Kazakh

kaz

living

Kekchķ

kek

living

Keningau Murut

kxi

living

Khakas

kjh

living

Khasi

kha

living

Kinyarwanda

kin

living

Kirghiz

kir

living

Klingon

tlh

constructed

Kölsch

ksh

living

Komi

kom

living

Komi-Permyak

koi

living

Komi-Zyrian

kpv

living

Kongo

kon

living

Korean

kor

living

Kotava

avk

constructed

Kumyk

kum

living

Kurdish

kur

living

Ladin

lld

living

Ladino

lad

living

Lakota

lkt

living

Lao

lao

living

Latgalian

ltg

living

Latin

lat

ancient

Latvian

lav

living

Laz

lzz

living

Lezghian

lez

living

Lįadan

ldn

constructed

Ligurian

lij

living

Lingala

lin

living

Lingua Franca Nova

lfn

constructed

Literary Chinese

lzh

historical

Lithuanian

lit

living

Liv

liv

living

Livvi

olo

living

Lojban

jbo

constructed

Lombard

lmo

living

Louisiana Creole

lou

living

Low German

nds

living

Lower Sorbian

dsb

living

Luxembourgish

ltz

living

Macedonian

mkd

living

Madurese

mad

living

Maithili

mai

living

Malagasy

mlg

living

Malay

zlm

living

Malay

msa

living

Malayalam

mal

living

Maltese

mlt

living

Mambae

mgm

living

Mandarin Chinese

cmn

living

Manx

glv

living

Maori

mri

living

Marathi

mar

living

Marshallese

mah

living

Mazanderani

mzn

living

Mesopotamian Arabic

acm

living

Mi’kmaq

mic

living

Middle English

enm

historical

Middle French

frm

historical

Min Nan Chinese

nan

living

Minangkabau

min

living

Mingrelian

xmf

living

Mirandese

mwl

living

Modern Greek

ell

living

Mohawk

moh

living

Moksha

mdf

living

Mon

mnw

living

Mongolian

mon

living

Morisyen

mfe

living

Moroccan Arabic

ary

living

Na

nbt

living

Narom

nrm

living

Nauru

nau

living

Navajo

nav

living

Neapolitan

nap

living

Nepali

npi

living

Nepali

nep

living

Newari

new

living

Ngeq

ngt

living

Nigerian Fulfulde

fuv

living

Niuean

niu

living

Nogai

nog

living

North Levantine Arabic

apc

living

North Moluccan Malay

max

living

Northern Frisian

frr

living

Northern Luri

lrc

living

Northern Sami

sme

living

Norwegian

nor

living

Norwegian Bokmål

nob

living

Norwegian Nynorsk

nno

living

Novial

nov

constructed

Nyanja

nya

living

Occitan

oci

living

Official Aramaic

arc

ancient

Ojibwa

oji

living

Old Aramaic

oar

ancient

Old English

ang

historical

Old Norse

non

historical

Old Russian

orv

historical

Old Saxon

osx

historical

Oriya

ori

living

Orizaba Nahuatl

nlv

living

Oromo

orm

living

Ossetian

oss

living

Ottoman Turkish

ota

historical

Palauan

pau

living

Pampanga

pam

living

Pangasinan

pag

living

Panjabi

pan

living

Papiamento

pap

living

Pedi

nso

living

Pennsylvania German

pdc

living

Persian

fas

living

Pfaelzisch

pfl

living

Picard

pcd

living

Piemontese

pms

living

Pipil

ppl

living

Pitcairn-Norfolk

pih

living

Polish

pol

living

Pontic

pnt

living

Portuguese

por

living

Prussian

prg

living

Pulaar

fuc

living

Pushto

pus

living

Quechua

que

living

Quenya

qya

constructed

Romanian

ron

living

Romansh

roh

living

Romany

rom

living

Rundi

run

living

Russia Buriat

bxr

living

Russian

rus

living

Rusyn

rue

living

Samoan

smo

living

Samogitian

sgs

living

Sango

sag

living

Sanskrit

san

ancient

Sardinian

srd

living

Saterfriesisch

stq

living

Scots

sco

living

Scottish Gaelic

gla

living

Serbian

srp

living

Serbo-Croatian

hbs

living

Seselwa Creole French

crs

living

Shona

sna

living

Shuswap

shs

living

Sicilian

scn

living

Silesian

szl

living

Sindarin

sjn

constructed

Sindhi

snd

living

Sinhala

sin

living

Slovak

slk

living

Slovenian

slv

living

Somali

som

living

South Azerbaijani

azb

living

Southern Sami

sma

living

Southern Sotho

sot

living

Spanish

spa

living

Sranan Tongo

srn

living

Standard Latvian

lvs

living

Standard Malay

zsm

living

Sumerian

sux

ancient

Sundanese

sun

living

Swabian

swg

living

Swahili

swa

living

Swahili

swh

living

Swati

ssw

living

Swedish

swe

living

Swiss German

gsw

living

Tagal Murut

mvv

living

Tagalog

tgl

living

Tahitian

tah

living

Tajik

tgk

living

Talossan

tzl

constructed

Talysh

tly

living

Tamil

tam

living

Tarifit

rif

living

Tase Naga

nst

living

Tatar

tat

living

Telugu

tel

living

Temuan

tmw

living

Tetum

tet

living

Thai

tha

living

Tibetan

bod

living

Tigrinya

tir

living

Tok Pisin

tpi

living

Tokelau

tkl

living

Tonga

ton

living

Tosk Albanian

als

living

Tsonga

tso

living

Tswana

tsn

living

Tulu

tcy

living

Tupķ

tpw

extinct

Turkish

tur

living

Turkmen

tuk

living

Tuvalu

tvl

living

Tuvinian

tyv

living

Udmurt

udm

living

Uighur

uig

living

Ukrainian

ukr

living

Umbundu

umb

living

Upper Sorbian

hsb

living

Urdu

urd

living

Urhobo

urh

living

Uzbek

uzb

living

Venda

ven

living

Venetian

vec

living

Veps

vep

living

Vietnamese

vie

living

Vlaams

vls

living

Vlax Romani

rmy

living

Volapük

vol

constructed

Võro

vro

living

Walloon

wln

living

Waray

war

living

Welsh

cym

living

Western Frisian

fry

living

Western Mari

mrj

living

Western Panjabi

pnb

living

Wolof

wol

living

Wu Chinese

wuu

living

Xhosa

xho

living

Xiang Chinese

hsn

living

Yakut

sah

living

Yiddish

yid

living

Yoruba

yor

living

Yue Chinese

yue

living

Zaza

zza

living

Zeeuws

zea

living

Zhuang

zha

living

Zulu

zul

living

Attribution

The classifier has been trained by reading texts in many different languages. Finding high quality, non noisy texts is really difficult. Many thanks to

Did you know there’s a way to classify texts without having to leave Excel? We have paired up with SeoTools for Excel, a Swiss army knife Excel-plugin, which offers a tailored “Connector” for all uClassify users.

In this blog post, we will show how SeoTools allows you to classify lists of texts or URLs with the classifiers of your choice, and having the results ready for analysis in a matter of seconds.

Don’t be worried if your Excel spreadsheet doesn’t look as the example above. The extra ribbon tab “SeoTools” is added when SeoTools for Excel is installed. At the end of this post you find all the links necessary to setup your uClassify account.

Selecting a classifier

The uClassify Connector is, as the name suggests, connected to uClassify library. Clicking on “Select” opens a window of all available classifiers. It is also possible to choose input type (Text or URL) and if the results include classification and probability.

When you are satisfied with your settings, click “Insert”, and SeoTools will generate the data in columns A and onwards.

Save time and automate the process

Exporting and filtering Excel data from web based platforms takes time, especially if it’s required on a daily or weekly basis. The filtering part of standardized files is also associated with human error. SeoTools solves this with saving and loading of “Configurations”:

Next time, just load a previous configuration and you will get classifications based on the same settings as last time.

Use Formula Mode to supercharge your classification

The beauty of combining uClassify with Excel is the ability to create large numbers of requests automatically. Instead of populating cells with values, select “Formula” before Inserting the results:

Next, you can change the formula to reference a cell and the uClassify Connector will generate results based on the value or text in that cell.

In the following example, company A has been mentioned 100 times on Twitter in the last week and we want to determine the text Language and Sentiment for these tweets.

First, select the Text Language Classifier and enter a random character in the Input field (we will change this in the formula to reference the tweets). Also, don’t forget select “Exclude headers in result” since we only want the values for each row.

When the formula has been inserted in cell C2, change the input “y” to B2, and SeoTools will return the language with the highest probability. Repeat the same steps for the Sentiment classifier, but insert it in cell D2. It should look like this:

To get the results for all rows, select cell C2 and D2 and drag the formula down and SeoTools will generate the classifications for all tweets. In the example below, we’ve started on row 16 to illustrate the results:

Do you want to try it with your uClassify account?

⦁ Sign up for a 14-Day Trial and follow the instructions to download and install the latest version of SeoTools.

⦁ Register your access key under “Upgrade to Pro” and access uClassify in the Connectors menu:

⦁ Next, go to API keys in the top menu of your uClassify account and copy the Read key

⦁ Finally, copy your API-key and paste it in the “Options” menu:

The complete documentation of the uClassify Connector features can be found here.

We get a lot of requests for classifiers in different languages and as a next step we are building a translation API. The idea is to have an affordable in-house machine translation service that can quickly translate requests to the classifier language, classify the request and send back the response. Since the majority of classifiers are in English, the primary focus will be to target English.

Initially we support French, Spanish and Swedish to English translations.

Translation demo

The API is accessible with your ordinary API read key and a GET/POST REST protocol.

Upon popular request I’ve built a new topics classifier based on the IAB taxonomy. EDIT: We also support the IAB Taxonomy V2 now.

The classifier has two levels of depth, a main category (sports, science…) and a sub category (soccer, physics…). In total there are about 360 different classes following the IAB Quality Assurance Guidelines (QAG) Taxonomy specification.

I am very happy to announce this performance update that means that classification will have better accuracy than before.

When I was building a new topic classifier based on the IAB taxonomy I did notice some weird behaviour for classes with much less training data than the others. As I started to investigate this I was able to understand how the overall classification could be improved, not only those with low training data. After weeks of testing different implementations I found a few improvements that significantly gave better results on the test datasets.

In short classifiers are much more robust and less sensitive to imbalanced data.

This update doesn’t affect any api endpoints it will only give you better probabilities.