Per-country rates of Esperanto speakers

Esperanto

In August 2009 a danish popular science book made a big impression on me. It was “Det Virkelige Menneske” (The Real Human) by Dennis Nørmark and Lars Andreassen, and it explained many sides of human culture using evolutionary theory. About language they write (my translation):

The fact that languages are the best indicators of who is in and who is out, might explain why most languages contain very complicated, and often completely unnecessary rules. […] If you mess up the grammar, it is immediately revealed that you do not belong to the group, and the more complicated language, the easier it is to discover misfits.

The quotation addresses one of the secondary purposes of languages, namely to maintain identity by exposing foreigners. Esperanto is a language without that purpose. With easy pronunciation, logical grammar, and a modular word building mechanism, it is a lot easier than conventional languages. Esperanto was constructed in 1887 by LL Zamenhof to unite and bring peace to the world. Peace is not the best description of the following 100 years, but Esperanto is still a very good language. A visionary dream of me and most Esperanto speakers is to make Esperanto a global second language, which would make the world more optimal and more equal. Therefore, Esperanto is also a missionary language which size and demographics are of particular interest.

For explanation of the columns see the method section. I fitted a model to this data and used it to estimate the relative frequency of Esperanto speakers in each country. Using population census data from Lithuania, Estonia, Russia, New Zealand (but not Hungary), I have scaled all relative frequencies to get the total number of Esperanto Speakers in each country. The total number of Esperanto speakers is estimated to 62983.9 with confidence interval [59077,68176]/[31460,183420](See Confidence intervals for explanation).

Every country is colored according to its density of Esperanto speakers. The square root transformation is only visual. The pink countries are countries without data.Densities of Esperanto speakers in the European countries.

The numbers behind the maps are in a table in the end of the post. Even though the maps do not show, Andorra has the highest density of Esperanto speakers. The countries with the highest number of Esperanto speakers are (in order) Brazil, France, USA, Germany, Russia, Poland and Spain.

The model itself assumes that numbers of members are proportional to Esperanto speakers. This is not always true because the organizations are not equally popular in all countries. The model tries to make up for it by allowing some deviation in a few categories without changing the relative estimated frequency. However, if all categories are underrepresented in a country, the relative estimated frequency will be too low. Hungary is such a country because the model estimates the number of Esperanto speakers to be 1997.5 while a recent population census found that number to be 8397. One explanation could be that Esperanto is being taught in hungarian schools which produces esperanto speakers who are less reliant on international organisations.

The definition of an Esperanto speaker is someone who would answer Esperanto when asked about spoken languages by the authorities.

Methods

Model

I will give an intuitive explanation of the model using Lithuania as an example. It has the observations

pop

UEA

Lernu!

esp.dir

paspo.

edukado

nat. org.

Lithuania

3 million

43

5127

8

13

32

960

According to the UEA column, 43 members of the 5501 UEA members are Lithuanian. That indicates that 0.78% of all Esperanto speakers come from Lithuania. However, according to the Lernu! column, 2.88% of all Esperanto speakers come from Lithuania. We would like a single number that takes all columns into account. One could take an average, , but then Lernu! is as important as esperantujo.directory despite the fact that it has more than 100 times more users. Another option is to take the average of all the numbers, that is , but this number will always be very close to the Lernu! average. I would like something in between the two. Something that puts extra weight on the many Lernu! users,but does not let Lernu! determine everything.

The number of members of organization in country is where

is the number of inhabitants in country .

is the relative frequency of Esperanto speakers in country in .

is the total number of members of organization .

is the number to make the equation be true. Hopefully it is close to 1.

For each country is chosen such that the ‘s are not ‘far away’ from 1. Defining ‘far away’ is also model choice.

‘far away’ is defined such that it both satisfies some nice mathematical properties and reduces the distance between the ‘s and 1.

Let be the actual number of Esperanto speakers in country . We know that number for Estonia, Lithuania, Russia and New Zealand. Using those numbers we calculate the number of Esperanto speakers per unit of per inhabitant.

Calculate for the four mentioned countries.

Caluclate the average

The number of Esperanto Speakers in country is

Strict mathematical explanation of the model

Let be the number of members of organization in country . Let be the population size of country . Let . Then the model is

with independence within the ‘s (conditioned on the $p_i$’s and $\kappa_j$’s), within the ‘s, and within the ‘s. The rates are ideally estimated with

.

but in practice I make the constraint because of convergence problems.

The confidence intervals were calculated using Bayesian bootstrap. In the first type of confidence interval, we assume that is known. In the second type, we assume that the census numbers are random and bootstraps those also.

Confidence intervals

We would not be surprised if the true number of esperanto speakers in the world was not precisely 62983.9. We rather believe that the true number could be somewhere in an area around 62983.9. A confidence interval specifies such an area using statistics. One can believe any value in the confidence interval without disagreeing with the assumptions of the model.

I calculated two types of confidence intervals. The first assumes that the scaling factor is known. The scaling factor is the number of census esperanto speakers per ‘internet’ esperanto user (see for more precise definition). The second confidence interval does not assume that the scaling factor is known. Instead it includes the randomness of the scaling factor estimate. I have boldfaced the intervals I recommend looking at. When the two intervals were identical, I only write one.

Data

Data is collected from the following websites

UEA is an international Esperanto interest group whose goal is to spread Esperanto and promote language equality. They have put their member numbers on their website, but some clicking is necessary to retrieve the numbers.

lernu.net is the largest international learning portal for Esperanto. Many profiles are inactive and belong to people who were only briefly interested in Esperanto. Therefore, the number of users from a country could be higher than the actual number of Esperanto speakers in that country.

There are missing data in the dataset. E.g. Andorra has a missing value on esperantujo.directory because the country is too small to appear on their map. Angola does not have a value on national organization because they do have a national esperanto organization but its size is not disclosed on the UEA website. The missing values do not cause inference problems in the model.

Resources

The data and my R-scripts are available on github. I have used datamaps.co to make the maps.

Results

Below are all the countries listed according to their frequency of Esperanto speakers. Frequency is the number of Esperanto speakers per 1 million inhabitants. Total is the total number of Esperanto speakers. Proportion is how big a share each country has of the total number of Esperanto speakers.