Category Archives: Mandarin

I have no basis on which to agree or disagree with your assessment of the linguistic situation in China. However, aren’t nearly all Chinese born after 1949 sufficiently conversant with official Mandarin to understand it, read it and also carry on a conversation in it? In that case, China would be a country with diglossia, with all the non-Mandarin languages/dialects spoken in informal settings between locals, and official Mandarin spoken in formal settings and between people of different regions.

Regards. James

The younger people speak, read, and write Putonghua (a version of Mandarin) very well. A lot of the older adults can do the same. I believe there may be some monolinguals of the other tongues out there. And there are also monolinguals under age 5. Some Westerners adopted a 2-3 year old girl, and the girl could only speak some obscure Gan language. It took them a while to figure what the Hell language she even spoke because it was not obvious and the tongue was not well-known.

A problem is that some varieties have actually developed their own Putonghuas now! So in a sense the experiment is having unexpected consequences. Putonghuas of various regions can hardly be understood by Putonghua speakers of other regions. So even the standard is starting to split! However, getting everyone to speak, read, and write was definitely a good idea.

My father was stationed in China in 1946 after the war for a while. The US occupied China for a while there. He said that when he was in Peking (the old Beijing), there were rickshaw drivers everywhere. If you wanted to get anywhere, you summoned a rickshaw. He said that the rickshaw drivers had the pens and pads and they were always running around offering the pens and pads to passengers and other drivers because the other person spoke some other lect, so they could not understand each other. But most of them could read and write Mandarin! So if worse came to worse and you could not talk to each other, you could always write it down! So actually, China had a Putonghua of sorts even before the Communist victory and the introduction of Putonghua.

And I do not believe that Putonghua was introduced in 1949. I think it took the Communists a little while to come up with it and formulate it properly.

The “Speak Mandarin” campaign has had some unintended consequences because it is not allowed to teach school in any language but Mandarin for Sinitic speakers. I believe that speakers of other tongues such as Tibetans can have home language education, which is considered a progressive thing. I know that teachers were still teaching classes in Shanghainese not so long ago. Also speaking dialects was discouraged and possibly even punished at school. I am not sure if even today you can take courses in other Chinese languages at school. But the Mandarin only campaign went too far and it has led to the destruction of a lot of the less spoken varieties, which in many cases are full languages and not dialects at all. So it has been very controversial.

If you want to know where I have been the past few days, I have been working on this piece. I work on it for hours every day. So far, I have put in over 500 hours on this piece. That’s over three months of full-time work. My haters say I don’t work. The Hell I don’t work! I’d like to see them try to do this sort of work. This piece has been ridiculed by some linguist idiots on the Net. I worked a bit with linguists outside the Web. In fact, one of the top Sinologists (they have a Wikipedia entry) has been mentoring me on this project for some time now. I will not reveal this person’s name.

The number of languages has increased vastly from 365 to 526. Actually there are probably more than that. I have reason to believe that there may be 1,000-2,000 separate Chinese languages using the 90% intelligibility barrier (<90% = separate language, >90% = dialect). Just to contrast, Wikipedia says there are 14 Chinese languages (a grotesque underestimate) and the Chinese government insanely lies that there is only one Chinese language. Despite their superior IQ’s, a huge number of Chinese people fall of this idiot lie that is obviously based on political BS and not science. The Net is littered with otherwise intelligent Chinese people arguing strenuously that there is only one Chinese languages. Just goes to show that you have a high IQ and still be an idiot if your thought processes are too biased, which is the basic problem with all human thinking anyway.

Just to toot my own horn a bit and in response to my detractors, this is the most elaborate and extensive overview of the Chinese languages written in English in terms of pure classification that I have ever seen. There may well be works of this caliber or even beyond that are written in Chinese. In fact, one of the problems with this work is that so much of the original research is in Chinese. My Chinese is not very good, so it’s hard for me to read that stuff. This is not a finished product at all. This work will undergo revisions for quite some time if I keep working on it. It may not be done when I die. It’s a Herculean project.

I had to paste this in from a Word document, which is why the formatting looks so strange. But the post has received a huge update, in particular the Hokkien, Teochew, Wu and Cantonese sections. I am up to 526 languages. I would like any speakers of any Chinese language to look this over for me and add any corrections, explanations, elaborations, etc. I am especially interested in any mutual intelligibility data you might have as I have a bit of a mutual intelligibility fetish.

Warning: Very long. Runs to 87 pages in a Word document.

A Reworking of Chinese Language Classification

by Robert Lindsay

The Chinese languages have undergone a lot of reclassification lately (Mair 1991), from one Chinese language a couple of decades ago up to 14 Chinese languages today according to the latest Ethnologue.

However, Jerry Norman, one of the world’s top experts on Chinese, has stated that based on mutual intelligibility, there are 350-400 separate languages within Chinese (Mair 1991). According to Gong Xun, a Sichuan Mandarin speaker in Deyang, China, by my criteria of distinguishing between language and dialect, there would be 300-400 separate languages in Fujian alone.

So far, 2,500 dialects of the Chinese language have been identified, and a number of them are separate languages.

Based on the criteria of mutual intelligibility, I have expanded the 14 Chinese languages into 526separate languages.

There are different ways of calculating mutual intelligibility. Mutual intelligibility is hard to determine. I am not interested in typological studies of varieties involving either lexicon, phonology or tones, unless this can be quantified in terms of mutual intelligibility in a scientific way (Cheng 1991). For the most part, what I am interested in is, “Can they understand each other?”

I decided to put it at 90%, with >90% being dialect and <90% being a separate language. This is based on what appears to be Ethnologue‘s criteria for establishing the line between a dialect and a language.

In the cases below where I had mutual intelligibility data available, a number of Chinese languages had no more than 65% intelligibility between them (Cheng 1991).

The best way to see this study is as a pilot study. The purpose of the classification below is more to stimulate academic interest and sprout new thinking and theory. It is not intended to be an end-all or be-all statement on the subject; in fact, it is quite the opposite. Pilot studies, which is what this is, are de facto never accurate and precise.

I assume this paper will be controversial. Keep in mind that this work is extremely tentative and should not be taken as the last word on the subject by a long shot.

Interested scholars, observers or speakers of Chinese languages are encouraged to contribute any knowledge that they may have to add to, confirm or criticize this data below. So far as I know, this is the first real attempt to split Chinese beyond the 14 languages elucidated by Ethnologue.

There are many problems with the data below. In many cases, “separate language” just means that the variety is not intelligible with Putonghua. Unfortunately, I currently lack excellent mutual intelligibility data within the major language groups such as Gan, Xiang, Wu, and the branches of Mandarin. There is probably quite a bit of lumping still to be done below. Where varieties are mutually intelligible below, I have tried to lump them into one language with various dialects.

In many cases, we seem to be dealing with dialect chains. This is particularly the case with the Mandarin languages, incorrectly referred to as the Mandarin dialects.

For instance, in Henan each major city can understand the next city over fairly well, but at the second or third city over, you run into serious comprehension difficulties. But even there, the languages are fairly close, with intelligibility at ~70%, and after three weeks of close contact, they can communicate fairly well. In many cases, it is a matter of working out the tone changes, for tone changes are very common even among the Mandarin lects.

Mandarin

Putonghua is Standard Mandarin, based on the Beijing Mandarin dialect as of 1949, but it has since diverged wildly, and many Putonghua speakers today cannot understand Beijing Mandarin. Putonghua is being promoted as the national language of China.

In addition to Putonghua, there 1,500 other dialects of Mandarin spoken in China. In general, other Mandarin dialects are not intelligible to Putonghua speakers (Campbell 2009). However, the Northeastern Mandarin dialects and the dialects around Beijing are more intelligible with Putonghua than the Mandarin dialects in the rest of the country.

The implication is that there may be over 1,500 Mandarin languages in China. However, many of these Mandarin dialects are intelligible with at least some other Mandarin lects. Hence, despite the lack of intelligibility with Putonghua, there is a lot of potential lumping within Mandarin.

The degree to which Mandarin dialects are intelligible to each other is very much an open question and in general is poorly investigated.

We should also note here that even Putonghua, the language that was meant to tie the nation together, seems to be evolving into regional languages.

Guangdong Putonghua is not fully intelligible to speakers of the Putonghuas of Northern China and hence is probably a separate language.

Shanghai Putonghua is often not intelligible with Putonghua from other regions. It has heavy interference from Shanghaihua, which seriously effects the Putonghua accent. Even after four years of exposure, Standard Putonghua speakers often have problems with it.

Anhui Putonghua has poor intelligibility with Standard Putonghua due to its phonology. Therefore, it is a separate language.

In addition, Jianghuai Putonghua and Zhengcao Putonghua are not intelligible with Putonghua from other areas (Campbell 2009). These varieties of Mandarin cause a particular interference with Putonghua Mandarin that results in a severe dialectal disturbance in their Putonghua.

These Putonghuas are spoken in the regions native to the Jianghuai and Zhengcao branches of Mandarin. Jianghuai Mandarin is spoken in Anhui, Jiangsu, Hubei and to a much lesser extent Zhejiang Provinces. Zhengcao Mandarin is spoken in Anhui, Henan, Shandong, and Jiangsu, with one dialect spoken in Hebei.

There are also varieties of Putonghua that are spoken in Singapore and Taiwan. Claims that Taiwan Mandarin is fully intelligible with Putonghua are incorrect. Taiwanese Mandarin is about 80-85% intelligible with Putonghua. Based on that intelligibility figure, Taiwanese Mandarin is a separate language.

Singapore Mandarinhas fewer differences with Putonghua than Taiwanese Mandarin and hence is a dialect of Putonghua.

Malay Mandarin is said to be quite different but nevertheless mutually intelligible with Putonghua. Nevertheless, Malay Mandarin speakers say they have to make speech adjustments with Chinese speakers, otherwise their speech is poorly intelligible. This implies that Malay Mandarin is indeed a separate language.

Yunnan Putonghua is intelligible with Putonghua from other regions (Campbell 2009).

The Beijinger variety of Beijing’s hutongs and taxi drivers is legendary for being hard to understand.

The truth is that Putonghua was never entirely based on Beijinghua. It was in terms of pronunciation but in for vocabulary. Putonghua got only 35% of its vocabulary from Beijinghua. Most of its vocabulary came from Japanese Kanji words. They used a form of Mandarin that was based on Chinese scholars who went to study in Japan at the end of the Qing Era. So Putonghua, like Standard Italian which is based on Florentine Italian of Dante circa 1400, is in a sense frozen in time.

The two lects may also have taken separate trajectories. This has also occurred in Italian, where, though Standard Italian was based on Florentine Tuscan, Standard Italian and Tuscan Italian have taken separate trajectories since. If you see old Tuscan men on TV in Italy, a speaker of Standard Italian from Southern Italy would need subtitles to understand them, but one from Northern Italy would not.

Others say that Putonghua was based on the language of the Beijing suburbs, not the city itself.

For whatever reason, Beijinghua often seems to have less than 90% intelligibility with Putonghua, though the question needs further research. Beijinghua, in its pure and least mutually intelligible form, seems to be spoken mostly in the innermost hutongs and among taxi drivers and other low-income and working class people. The variety of people with more education and money is probably a lot more comprehensible.

I would describe the real, pure, Putonghua as “CCTV speech”, the variety you hear on Chinese state television. Evidence that Beijinghua lacks full intelligibility with Putonghua is here, here, here, here, here, here, here and here.

The question of whether or not Beijinghua is a separate language from Putonghua is sure to be highly controversial. Perhaps intelligibility testing could settle the question.

Cangzhou Jilu Mandarin shares some similarities with Tianjin Jilu Mandarin and Baoding Jilu Mandarin, but it is probably not fully intelligible with either.

Tianjin Mandarin‘s tones are quite different from Putonghua’s, its tone sandhi is much more complicated, and it is more closely related to varieties 150-500 miles away, since originally Tianjin Mandarin speakers came from Anhui (Lee 2002). Nevertheless, Tianjin Mandarin is a dialect of Beijing Mandarin.

Baoding Jilu Mandarin appears to be a separate language because there are people from the city who cannot speak it at all.

Beijing is in group called the Beijing Group of Jilu Mandarin. It contains 43 separate varieties and may contain more than one language.

Jinan is a member of the Liaotai Group of Jilu Mandarin Group, which has 37 lects.

Shenyang Northeastern Mandarin is the main dialect in this group, and it is intelligible with Harbin Northeastern Mandarin, Liaoning Northeastern Mandarin, Changchun Northeastern Mandarin, and Heilongjiang Northeastern Mandarin.Harbin Northeastern Mandarin is also intelligible with Tianjin Jilu Mandarin and Beijing Jilu Mandarin. Nanjing City Northeastern Mandarin, Hebei Northeastern Mandarin, and much of the rest of NE Mandarin are all mutually intelligible.

Shenyang is a member of the Jishen Group of Northeastern Mandarin, which has 44 lects.

Within Jishen, Shenyang is a member of the Tongxi Group, which has 24 lects.

Harbin is a member of the Hafu Group of Northeastern Mandarin, which has 64 lects.

Within Hafu, Harbin Mandarin is a member of the Zhaofu Group, which has 18 lects.

Nanjing Zhongyuan Mandarin (evidence) is also a separate language – now mostly spoken in the suburbs, as city speech is not a separate language anymore. The city language is intelligible with the general Northeastern China Mandarin spoken in Beijing and Hebei.

Luoyang Zhongyuan Mandarin, Kaifeng Zhongyuan Mandarin,Changyuan Zhongyuan Mandarin, and Zhengzhou Zhongyuan Mandarin, all in Henan Province, are not intelligible with Putonghua. However, all four are mutually intelligible, so they are dialects of a single language, Henan Zhongyuan Mandarin.

Xinyang Zhongyuan Mandarin, also spoken in Henan, is a separate language and cannot be understood by Luoyang Zhongyuan Mandarin speakers.

Nanyang Zhongyuan Mandarin has high but not complete intelligibility with Luoyang Zhongyuan Mandarin. Intelligibility between Nanyang Zhongyuan Mandarin and Luoyang Zhongyuan Mandarin is probably ~70%. Nanyang Zhongyuan Mandarin has 15 million speakers.

Gushi Zhongyuan Mandarin is not intelligible with Putonghua. In addition, Gushi Zhongyuan Mandarin is different from Nanyang Zhongyuan Mandarin and is probably not intelligible with it.

Intelligibility between Xinyang Zhongyuan Mandarin and Gushi Zhongyuan Mandarin is not known.

In general, intelligibility between many varieties in Henan is not full, but after a few weeks or so of close contact, they can start to understand each other. Mutual intelligibility between Xinyang Zhongyuan Mandarin, Gushi Zhongyuan Mandarin, and Nanyang Zhongyuan Mandarin may be ~70%.

Bozhou Zhongyuan Mandarin (evidence), Yingshang Zhongyuan Mandarin(evidence), and Fuyang Zhongyuan Mandarin (evidence), spoken in Anhui, are at least unintelligible with Putonghua. Fuyang Zhongyuan Mandarin is very different. The unnamed variety spoken 300 km. south of Jinan around Mengcheng in rural Anhui is said to be completely unintelligible with Putonghua, Tianjin Jilu Mandarin, and Beijinghua. For the time being, we will refer to this as one language, Anhui Zhongyuan Mandarin. Intelligibility between varieties of Anhui Zhongyuan Mandarin is not known.

The Mandarin spoken in Qinghai, Quinghai Zhongyuan Mandarin, is very different from that spoken in Gansu.

Xian, Huxian, and Zhouzhi are members of the Guanzhong Group of Zhongyuan Mandarin, which has 45 lects.

Yanan, Hanzhong, and Xining are members of the Qinlong Group of Zhongyuan Mandarin, which has 67 lects.

Luoyang is a member of the Luoxu Group of Zhongyuan Mandarin, which has 28 lects.

Kiafeng, Nanyang, Zengzhou, Changyuan, and Bozhou are members of the Zhengcao Group of Zhongyuan Mandarin, which has 93 lects.

Xinyang and Gushi are in the Xinbeng subgroup of Zhongyuan Mandarin, which has 20 lects.

Tongwei and Sale are part of the Longzhong Group of Zhongyuan Mandarin, which has 25 lects.

Yingshang is a member of the Cailu Group of Zhongyuan Mandarin, which has 30 lects.

Speakers of Chengdu Southwestern Mandarinsay that Zigong Southwestern Mandarinand Meishan Southwestern Mandarin are not intelligible to them. Chengduhua is still very widely spoken in Chengdu by people of all ages.

Ziyang Southwestern Mandarin is intelligible with the koine but has a heavy accent.

Leshan Southwestern Mandarin is a separate language. It is unintelligible with the koine, but it can be learned in a few weeks of exposure (Xun 2009).

Intelligibility between Leshan Southwestern Mandarin and Sichuan Southwestern Mandarin may be ~70%.

Hankou Southwestern Mandarin is a separate language, with 80% intelligibility between it and Chengdu Southwestern Mandarin (Cheng 1997).

The many small Southwestern Mandarin varieties around Mt. Emei are not intelligible with Sichuan Southwestern Mandarin, appear to be be very different and may be one or more separate languages.

Wuhan Southwestern Mandarin is not intelligible to speakers of Southwestern Mandarin from other provinces; for instance, it is only 80% intelligible with Chengdu Southwestern Mandarin. Once you go an hour in any direction from Wuhan, Wuhan Southwestern Mandarin is no longer intelligible.

Dali Southwestern Mandarin is spoken in the city of Dali near Kunming. The variety is still widely spoken.

Dahua Southwestern Mandarin, spoken in and around Dahua village on the Puduhe River near Dongchuan in Yunnan Province, is apparently a separate language.

Another language spoken in Hunan in Zhangjiajie County is called Zhangjiajie Maoxi Southwestern Mandarin. The Maoxi are a tribal group there that speak a strange variety of Southwestern Mandarin.

Tuoyuan Southwestern Mandarinin Hunan is not fully intelligible with other Southwest Mandarin lects, or at least not with Sichuan Southwestern Mandarin.

Gaoping Southwestern Mandarin and Baixi Southwestern Mandarin in Hunan speak mutually intelligible varieties, even though Gaoping is in Longhui County and Baixi is in Xinhua County. Although they are very far from each other, the two towns can communicate with each other in their own varieties without problems. This is because an extended family left Gaoping 150 years ago and moved to Baixi, marrying the two languages. It would be best to call this language Gaoping Southwestern Mandarin.

Xinfeng Southwestern Mandarin is traditionally categorized as Southwestern Mandarin. It is a Southwestern Mandarin dialect island spoken in Ganzou City in Xinfeng County, Jiangxi surrounded by Gannan Hakka lects. Over time, it has seen so much Hakka influence that it may now be characterized as a mixed dialect. Given the massive Hakka influence, Xinfeng Southwestern Mandarin is no doubt a separate language.

Gong’an Southwestern Mandarin is a very unusual Southwestern Mandarin variety spoken in Gong’an City in Hubei. Hunan is to the south. It is nearly a mixed language, having features of both Southwestern Mandarin and Xiang. As such, no doubt it is a separate language.

Guilin, Luocheng, Yangshuo, Liuzhou, and Lingui are members of the Guiliu Group of Southwestern Mandarin, which has 57 lects.

Leshan and Longchang are members of the Guanchi Group of Southwestern Mandarin, which has 85 lects.

Within Guanchi, Longchang is a member of the Renfu Group, which has 13 lects.

Yichang, Chengdu, Chongqing, and Yingshan are members of the Chengyu Group of Southwestern Mandarin, which has 113 lects.

Menghai, Kunming, Wenshan, and Guiyang are members of the Kungui Group of Southwestern Mandarin. The Kungui Group itself has an incredible 95 lects.

Lanping is in the Dianxi Group of Southwestern Mandarin, which has 36 lects.

Within Dianxi, it is a member of the Baolu subgroup, which has 21 lects.

Taoyuan is a member of the Changhe Group of Southwestern Mandarin, which has 14 lects.

Wuhan is a member of Wutian Group of Southwestern Mandarin, which has nine lects.

Dali is a member of the Dianxi Group of Southwestern Mandarin, which has 36 members.

Within Dianxi, Dali is a member of the Yaoli Group, which has 15 members.

Southwestern Mandarin itself has a stunning 519 lects. There are 240 million speakers of Southwestern Mandarin (Olson 1998).

Jianghuai Mandarin is a separate branch of Mandarin that is very different from the rest of Mandarin. Language and is not fully intelligible with Putonghua. Some say that this is not even part of Mandarin, as it is better seen as in between Mandarin and Wu.

Jianghuai Mandarin, especially the variety spoken around Taizhou, is not intelligible at all with Anhui Zhongyuan Mandarin or Sichuan Southwestern Mandarin. Jianghuai Mandarin speakers cannot even tell that the Anhui Zhongyuan Mandarin or Sichuan Southwestern Mandarin speakers are speaking Mandarin because the language is so foreign.

Yangzhou Jianghuai Mandarinis considered to be a separate language by a 200 word Swadesh test (Ben Hamed 2005). Yangzhou Jianghuai Mandarin has about 52% intelligibility with the other branches of Mandarin (Cheng 1997). Phonetically, it resembles Wu.

Dongtai Jianghuai Mandarin is a separate language (evidence). Dafeng Jianghuai Mandarin, Taizhou Jianghuai Mandarin, Xinghua Jianghuai Mandarin and Haian Jianghuai Mandarin are said to be similar to Dongtai Jianghuai Mandarin, so for the time being, we will list them as dialects of Dongtai Jianghuai Mandarin.

Jianghuai Mandarin is composed of an incredible 120 varieties. It has 65 million speakers (Olson 1998).

Yangzhou, Lianyungang, Yancheng, Huaian, Nanjing, Hefei, Anqing, the Tongchengs, and Chuzhou and Dangtu are in the Hongchao Group of Jianghuai Mandarin, which has 82 lects.

Dongtai, Dafeng, Taizhou, Haian, Xinghua, Jinsha, Nantong, Tongdong, Rudong, and Rugao are in the Tairu Group of Jianghuai Mandarin. Tairu has 11 different lects.

Jiujiang and Xingzi are members of the Huangxiao Group of Jianghuai Mandarin, which has 20 lects.

Lanyin Mandarin in the far northwest is also a separate language (Campbell 2004). Though Lanyin Mandarin is said to be intelligible with Putonghua, that does not appear to be the case. Minqin Lanyin Mandarin,(evidence) and Lanzhou Lanyin Mandarin (evidence) in Gansu are not fully intelligible with Putonghua, nor is Yinchuan Lanyin Mandarin (evidence) in Ningxia.

Intelligibility within Lanyin Mandarin is not known, but Jiuquan Lanyin Mandarin at least appears to be a completely separate language inside Lanyin Mandarin.

Jiuquan is a member of the Hexi Group of Lanyin Mandarin, which has 18 lects.

Yinchuan is a member of the Yinwu Group of Lanyin Mandarin, which has 12 lects.

Lanzhou is a member of the Jincheng Group of Lanyin Mandarin, which has four lects.

The Jiaoliao Mandarin spoken in Shandong as Shandong Jiaoliao Mandarin contains varieties such as Qingdao Jiaoliao Mandarin and Wehai Jiaoliao Mandarin which are not fully intelligible with Putonghua. Yantai Jiaoliao Mandarin is a dialect of Wehai Jiaoliao Mandarin. Qingdao Jiaoliao Mandarin, Wehai Jiaoliao Mandarin, Yantai Jiaoliao Mandarin and Yangzheng Jiaoliao Mandarin are all mutually intelligible. Dalian Jiaoliao Mandarin is quite different from Putonghua.

Wehai, Dalian and 21 other varieties are members of the Denglian Group of Jiaoliao Mandarin, which has 23 lects.

Jiaoliao Mandarin is composed of 45 lects. Jiaoliao is not fully intelligible with Putonghua. Intelligibility inside of Jiaoliao Mandarin is not known, but there may be multiple languages inside of it because some Shandong Peninsula varieties sound very strange even to speakers used to hearing Shandong Jiaoliao Mandarin.

Wutun or Wutunhua, is an unclassified language, a Mandarin-Mongolian-Tibetan creole mixed language spoken by 2,000 Tu or Monguar people in Eastern Qinghai Province. The Monguars speak Bonan, a Mongolic language with heavy Tibetan and Mandarin influence. Although the government regards them as Monguar Mongolians, the group self-identifies as Tibetan.

The source of the Mandarin is not known, but it is thought that the group came from outside the region, either Jilu Mandarin speakers from Tianjin in the northeast or from a group of Southwest Mandarin-speaking Hui Muslims in Sichuan Province who converted to Lamaist Buddhism for unknown reasons. They have been in their present location since at least 1585.

This is best seen as a Mandarin language that came under heavy influence of Bonan and to a lesser extent Tibetan after which when it was changed into an agglutinative language under the influence of these two other languages. The lexicon is 60% Mandarin with the tones lost, 25% Tibetan and 10% Bonan.

The Mandarin spoken around Tiantai in Zhejiang is not intelligible with Putonghua and may be a separate language. It is also unclassified.

Jin

Although it is related to Mandarin, Jin is a completely separate language, with only 57% intelligibility with other forms of Mandarin (Cheng 1997). The differences between Jin and Mandarin are somewhat greater than the differences between Mandarin itself.

Outside of Gan Proper,Leping Gan is very different. It is not at all intelligible with Nangchang Gan, and hence is a separate language.

NangchangGan and Anyi Ganare apparently separate languages within Gan based on a 200 word Swadesh test (Ben Hamed 2005). Nanchang Gan has a great deal of dialectal diversity, with several dialects covering different cities and the rural areas. Intelligibility between these dialects is not known. Nanchang Gan is still spoken very heavily in Nanchang.

Boyang Gan is spoken in another part of Jiangxi and is apparently a separate language from Nanchang Gan.

The nine major dialectal splits in Gan are apparently not mutually intelligible. Similarly, they must surely be separate languages, so Yichun Gan, Ji’an Gan, Fuzhou Gan, Yingtan Gan,Leiyang Gan,Huaining Gan, Daye Gan, Wanzai Gan, and Dongkou Gan are all separate languages. There is diversity even among these groups. For instance, Ji’an is divided into Nanxiang Ji’an in the south and Baixiang Ji’an in the north. The two are not intelligible with each other.

In the Yingyi Group, Chaling Dongxian Ganin Hunan near the Jinxiang border is a variety with mixed Gan and Xiang features. The best analysis is that this is a Gan variety. Due to the heavy Xiang mixture, it is no doubt a separate Gan language.

Linchuan Gan, spoken in East-Central Jiangxi, is a very interesting Gan that differs from all others. This seems to be the remains of the old language that was brought into Jiangxi by the ancestors of the Hakka, and it indicates a possible close relationship between Gan and Hakka.

Central Min or Min Zhong

Central Min or Min Zhong is a separate language not intelligible with Northern or Eastern Min. It has three lects, Shaxian Central Min, Sanming Central Min, and Yongan Central Min, but we don’t know if there are languages among them. The tones of the three varieties are quite different. Further, there are many dialects in the interior of Sanming Prefecture, so there may be more than one language there. Central Min has 3.5 million speakers.

Eastern Min or Min Dong

Within Eastern Min, Chengguan Eastern Min, Yangzhong Eastern Min, and Zhongxian Eastern Min are separate languages, all spoken in Youxi County. Zhongxian Eastern Min is spoken in the south of the county, Chengguan is spoken in the middle of the county, and Yangzhong is spoken in the north of the county. The three varieties have markedly poor intelligibility between them (Zheng 2008).

Beyond that, Eastern Min is reported to have several other mutually unintelligible languages inside of it. One of them is Fuqing Eastern Min. Fuzhou speakers can understand Fuqing speakers better than the other way around. Fuzhou and Fuqing are about 65% intelligible in praxis, and it is about the same with the rest of the Hougan Group (Ngù 2009).

NingdeEastern Min, FudingEastern Min and NanpingEastern Min are other languages in this family (evidence). There are many dialects in the Eastern Min-speaking areas of Nanping, and there may be more than one language here. Of these three, Ningde Eastern Min is definitely a separate language. According to George Ngù, a passionate proponent of Fuzhou Eastern Min, “Fuzhou is not intelligible even within its many varieties.”

MatsuEastern Min is spoken on Matsu Island off the coast of China. It is similar to but probably not intelligible with Changle Eastern Min. Matsu may well be a separate language like all the rest of Hougan.

There are two other varieties lumped in with Eastern Min – Man, Mango or Taishun ManjiangEastern Min is spoken in the central part of Taishun County in Southern Zhejiang in the far southern end of the Wu-speaking area, and Manhuaspoken in the eastern part of Cangnan County. Both of these names mean “barbarian speech.”

Both are probably mixtures of Southern Wu (Wenzhou etc.), Eastern Min, Northern Min, and maybe even pre-Sinitic languages. Manhua and Manjiang are not intelligible with Fuzhou Eastern Min. However, Manjiang has affinity with Shouning Eastern Min in phonology, vocabulary, and grammar. Whether or not it is intelligible with Shouning Eastern Min is not known.

Min Nan speakers who have looked at Manjiang data say that it doesn’t even look like a Sinitic language. It is best seen as an Eastern Min language with very strong substratum of a Tai-Kadai or Austroasiatic language.

Manhua is best dealt with as a form of Wu. I discuss it further below under Wu.

Malaysian Eastern Min is spoken in Sibu, Sarawak and in Singapore. These people were originally Fuqing and Fuzhou speakers who came in the 1800’s and is spoken in two lects based on those two cities. Malaysian Fuqing Eastern Min and Malaysian Fuzhou Eastern Min only have 12% intelligibility, much less than the 65% of the parent languages in China. The two Malaysian lects are obviously not the same language, but intelligibility of the two lects with the parent languages in China is not known.

Fuding, Fuan, Shouning, Xiapu, Zherong, and Zhouning are in the Funing Group of Eastern Min, which has six lects.

Eastern Min contains 24 separate lects, all of which are separate languages.

Southern Min or Min Nan

Hokkien

Within Min Nan or Southern Min, a macrolanguage, there are a number of separate languages. There is a proposal to split Xiamen, Qiongwen and Teochew into three separate languages before SIL. In fact, all three of those are macrolanguages also.

Amoy Hokkien and Taiwanese Hokkien are the same language, as Taiwanese is an Amoy dialect. A good name for the entire language of Amoy-Taiwanese Hokkien is Xiamen Hokkien.

Amoy, the variety spoken in Amoy city in China, is identical to certain Taiwanese dialects. It is more or less intelligible with Taiwanese, as the differences between the two are minor, akin to British and American English. There have only been 120 years of separation between Amoy and Taiwanese. Most of the differences are in modern and local vocabulary.

Amoy and Qaunzhou Hokkien are no longer intelligible with each other due to lack of a standard and the dialectal variations in each. Also, Amoy has developed more modern meanings for certain words, while Quanzhou retains more of the older meanings for the same terms.

Amoy, like Taiwanese, is a mixture of Quanzhou and Zhangzhou Hokkien.

Jinmen or Kinmen Hokkien is a dialect of Amoy spoken on Jinmen Island only two miles off the coast of Amoy. It has good intelligibility with Taiwanese.

A better name for Xiamen according to the Chinese literature is QuanzhangHokkien(Campbell 2009). This would actually be a macrolanguage. Quanzhang is a combination of Quanzhou and Zhangzhou, two of the most important varieties in the language. Xiamen has only 51% intelligibility with Teochew.

Xiamen is still widely spoken in Taiwan as Taiwanese Hokkien. However, it is in trouble as fewer young people speak it anymore. 20 years ago in Đàoviên, Taiwan, it was common to hear young women in their late teens and twenties speaking Hokkien, but now it is uncommon (Kirinputra 2014).

Within Taiwanese Hokkien, the situation regarding TaipeiHokkien in the past was interesting. The dialects of the city were a mix of Zhangzhou and Quanzhou.

The dialect of the center of the city, Taipei CityHokkien, was mixed between the two, with a slight Quanzhou lean to it.

The dialect spoken in Sulim, Sulim(Shilin)Hokkien, heavily favored Zhangzhou. Other districts spoke a Tong’an-type dialect, which is just Quanzhou mixed with Amoy.

All these conditions are more common with the older generation. The Taiwanese Hokkien of the young generation speaks either the mixed Zhangzhou-leaning “Southern” style favored in the media, or they do not speak any Hokkien at all.

The YilanHokkien dialect on Taiwan is so differentthat it alone has posed serious problems for the task of standardizing Taiwanese, yet it is intelligible with Standard Taiwanese Hokkien. Yilan is a city in Taiwan.

LugangHokkien is also very different but is intelligible with Standard Taiwanese (Campbell 2009).

Elsewhere on Taiwan, there are some communication problems for TainanHokkien speakers hearing Taipei, but it appears that they are still intelligible with each other (Campbell 2009). Tainan is a city in Taiwan. A similar dialect is spoken in Gaoxiong as Gaoxiong Hokkien. Tainan and Gaoxiong are the prestige dialects of Taiwanese Hokkien that Standard Taiwanese is based on.

Taichung Hokkien is another dialect of Taiwanese spoken in the city of that name.

Tong’an Hokkienis said to be a dialect of Amoy, but the truth is that it is in between Amoy and Quanzhou. Tong’an Hokkien is spoken in the city of that name. A Tong’an variety is also spoken in Malaysia and Indonesia.

There is a group of Hokkien speakers among the Tanka fisherpeople located to the north of the Four Counties area. They speak a language that resembles Anxi Hokkien. We will call this Hong Kong Tanka Hokkien for now. They communicate well with speakers from the Hokkien homeland, so it looks like their language has not changed much. Most of them arrived in Hong Kong in the 1930’s and 1940’s.

Longhai Hokkien is very similar to the standard variety, while Zhangpu Hokkien is somewhat different.

Zhao’an Hokkien,Yunxiao Hokkien, and Dongshan Hokkienare all spoken in Southern Zhangzhou. They have been strongly effected by Teochew such that there is controversy over whether they are Teochew or Hokkien. Yunxiao and Dongshan have changed n → ng and t → k as in Teochew. Zhao’an resembles Teochew more than the others, as it has an ir vowel. Intelligibility data for these diverse Zhangzhou varieties is not available.

With the possible exception of the three varieties mentioned above, all Zhangzhou varieties are mutually intelligible.

Zhangzhou and Quanzhou are not fully intelligible with each other in China. Taiwanese speakers can no longer understand the pure Quanzhou spoken in the Chinese city of that name, and some Quanzhou speakers say they cannot understand Taiwanese either. Nevertheless, Taiwanese has 80% intelligibility of Quanzhou and Zhangzhou. After all, Taiwanese itself is just a mixture between Zhangzhou and Quanzhou.

Zhangping Hokkien, though close to Xiamen, is a separate language according to a 200 word Swadesh test (Ben Hamed 2005).

Pinghe Hokkien is said to be a separate language.

Diaspora, Nusantaran or Overseas Hokkien, that is all Hokkien spoken outside of China in the area for a few hundred miles up and down the coast in either direction from Amoy in China, could be seen as being composed of two main groups. It is a language in trouble as young people everywhere in the diaspora switch to Mandarin, and many children are not learning Hokkien. Technically, Taiwanese is included in Overseas Hokkien, but since it is merely a dialect of Amoy, we put it under Amoy instead.

50 years ago, we could learn interesting things about Overseas Hokkien forms spoken in Jakarta, Yangon, Bandung, Phuket, Trang, Cebu, and possibly Palembang and Surabaya. Now Hokkien may be extinct in Jakarta, Yangon, Palembang and Surabaya and is in trouble in Phuket, Bandung and Cebu (Kirinputra 2014).

The first group, called Eastern Hokkien, is in the north and encompasses Taiwan (Kirinputra 2014).

The second group, which we shall call Malayland Hokkien for lack of a better term, is spoken in Malaysia and in Indonesia in Sumatra and Kalimantan. Malayland is heavily laced with Teochew.

However, the Hokkien spoken in the Philippines is classed as Malayland Hokkien because it is intelligible with Southern Malayland Hokkien even though it is in the east.

Malayland is split into two languages, Southern Malayland Hokkien and Northern Malayland Hokkien. The first language, Northern Malayland Hokkien, was formerly spoken in Northern Malaysia from Taiping along the coast formerly all the way to Phuket, Thailand but is now spoken for the most part only to Penang and over to Terangganu in Malaysia and in Medan and other places in Northern Sumatra in Indonesia.

The language is also referred to asPenang Hokkien or Medan Hokkien, after the very similar dialects spoken in those cities. Terangganu Hokkien isdifferent. On Penang Island, two dialects are spoken, Baba Hokkien, which is heavily-creolized, and Sin Khek Hokkien, a more pure variety. There are also differences between Penang Island Hokkien and Butterworth Hokkien spoken in Butterworth just across the strait.

Hokkien is still very widely spoken in Penang, and it is possible to go through your entire day speaking nothing but Hokkien.

Northern Malayland is still spoken up into Thailand towards Phuket and in the Burmese Panhandle all the way to Rangoon. In Myanmar, the speakers are mostly elderly, and the language is dying out. Burmese Hokkien looks very much like Penang because many speakers came from Penang to Rangoon. Northern Malayland is still spoken in Surat Thani on the east side of the peninsula in Thailand by a few older speakers. On the Phuket side of the peninsula facing the Indian Ocean, it has been decimated.

All varieties of Northern Malayland are apparently mutually intelligible.

Speakers of Northern Malayland have a hard time understanding the Southern Malayland spoken in Klang and Malacca. Southern Malayland speakers in general say they cannot understand Penang.

Northern Malayland Hokkien is more of a Zhangzhou variety in terms of its accent. It is also heavily creolized, with a lot of Malay and Thai embedded deeply in the language. The differences between the two Malayland Hokkien languages are as great as between Hokkien and Teochew. Intelligibility between the two may be as low as 50%.

In Kuala Lumpur and Selangor, Southern and Northern Malayland mix, and it is difficult to say which language is being spoken here. However, the variety spoken in Selangor, Selangor Hokkien, is best described as Southern Malayland, as they cannot understand Penang well. Hokkien is still very widely spoken in Selangor.

The second language, Southern Malayland Hokkien, encompasses Southern Malaysia from Johor up to Kelantan where it is known as in the cities of Selangor, Kelang, Malacca, Muar, Tangkak, Segamat, Batu Pahat, Pontian, Singapore, Riau, the Riau Islands, and Johor Bahru. Kelang Hokkien, and Johor Hokkien are recognized as specific dialects, and Hokkien is still very widely spoken in both cities.

It is also widely spoken in Singapore and Brunei. In Indonesia, it is spoken in the state of Riau as Riau Hokkien, which is very close to Singapore Hokkien, and the city of Bagansiapiapi on Sumatra. It is also spoken in Bangkok, Thailand and in Saigon, Vietnam, where it is dying out (Kirinputra 2014).

Southern Malayland is less creolized than Northern Malayland, if it is creolized at all. Southern Malayland is more of a Xiamen Hokkien variety, while Northern is a type of Zhangzhou.

Kelantan, Kelantanese or Kelantan Peranakan Hokkien is spoken in the Malay state of Kelantan. It is wildly creolized with Malay and is probably not intelligible with any other form of Hokkien.

The variety of Hokkien spoken in Kuching, Sarawak, Kuching Hokkien, is also very different and is said to resemble Kelantan Hokkien. Nevertheless the Hokkien dialect situation in Kelantan is poorly understood, and there are said to be two different types of Hokkien spoken in this area, Kelantan Hokkien A and Kelantan Hokkien B (Kirinputra 2014). Kelantanese is still widely spoken.

The version of Southern Malayland Hokkien spoken in Singapore is calledSingapore Hokkien and is based on Amoy, and possibly even more on Jinmen, but speakers also came from Tong’an, Zhangzhou, Quanzhou, Anxi, and Hui’an. It is similar to Taiwanese, but Singaporean speakers can no longer understand Taiwanese well, though they have partial understanding of it. For instance, they have only 30-40% intelligibility with Yilan Taiwanese Hokkien.

Southern Malayland lies between Northern Malayland and Taiwanese Hokkien on the continuum.

A Singapore speaker, if immersed in Taiwan, could pick up Taiwanese fairly quickly, within three months.

Singapore has been isolated from Taiwanese for quite some time, so it has retained older features that are losing ground in mainland Hokkien varieties. Word-final unvoiced stops p, t and k and starting to be lost in Zhangzhou on the mainland and replaced with a glottal stop, whereas in Singapore, they are still preserved.

Singapore speakers, even the older ones, now mix a lot of Mandarin, English and Malay in with their speech. They have been isolated from the main Hokkien-speaking communities in Amoy and Taiwan for so long that they have lost many of the subtler aspects of the language spoken in these areas.

Singapore has withered into a weakened and corrupted version of the more pure Hokkien spoken in Taiwan and Fujian. Further, the language has changed a lot since the Singaporean speakers left the region, and Singaporean Hokkien speakers have not kept up with the continuously evolving Hokkien language spoken in the Hokkien homeland.

Singaporean has also become so heavily admixed with Teochew that it is more properly seen as Hokkien-Teochew than Hokkien Proper.

Much of the good intelligibility between Bagan and Taiwanese seems to be due to bilingual learning. They speak like the Hokkien speakers of Tong’an, China. There are only a few thousand speakers remaining, and the language seems to be on its way out.

Another very pure version is the moribund Southern Malayland dialect still spoken by a few people in Saigon, Saigon Hokkien (Kirinputra 2014).

The Southern Malayland dialect spoken in Bangkok is called Bangkok Hokkien and contains Malay loans.

This seems to imply a large trading community involving Saigon, Bangkok and Malayland which exchanged words via different speech forms (Kirinputra 2014).

Intelligibility of Bangkok and Saigon with the rest of Southern Malayland is not known, but it is assumed to be full.

The version of Southern Malayland spoken in the Philippines is called Banlam-ue, Banlamhue, Binamhue, Lanlang-ue, Minnanhua orPhilippines Hokkien by speakers. Although its tones are quite different from Indonesian Southern Malayland Hokkien, the two varieties are fully intelligible. Hence Philippines Hokkien is a dialect of Southern Malayland.

Philippines is not readily intelligible with Standard Hokkien. Speakers came to the Philippines long ago, so their Hokkien contains many old words that have fallen out of other Hokkien varieties. It derives from the Jinjiang and Sheshi dialects on the outskirts of Quanzhou. Lanlang-ue means “our language.” Minnanhua is the name of this language in Mandarin (Kirinputra 2014).

At present, it is not intelligible with Quanzhou or Xiamen. That is, Philippines speakers claim that they can only understand about 70% of Taiwanese television.

Despite intelligibility issues, Philippines and Taiwanese have a very similar lexicon. The lexicons of both are similar to Amoy speech. Apparently the Amoy-Luzon-Taiwan trade route produced a convergence in the lexicons of these varieties (Kirinputra 2014). Philippines is full of Tagalog words. Philippines, like Northern Malayland, resembles Zhangzhou from the late 1800’s.

Phillippines is spoken in Manila, Cebu, Zambaonga, Sulu, and Jolo. The standard is based on the variety spoken in Manila. Zamboanga Hokkiendiffers from Manila Hokkien in that it has more Spanish and Chavacano borrowings and fewer Tagalog words. The dialect on Sulu Island, Sulu Hokkien, is different from the rest of Philippines, sounding more like Amoy and Taiwanese with a trace of Singapore. Cebu Hokkien, spoken on Cebu, resembles Jolo Hokkien, which is spoken on the far southern island of Jolo.

Cebu and Jolo Islands were part of an important route for smuggling goods into the Philippines for centuries. Most of the smugglers were Hokkien Chinese. Philippines is still widely spoken on Sulu, in Zamboanga and in the Binondo region of Manila. Cebu is in trouble with a declining number of speakers. The situation with Jolo is not known.

Southern Min: Chaoshan Min or Teochew

Chaoshan Min or Teochew is a macrolanguage spoken in a nine-county region of Guangdong. It is also spoken a lot in Thailand. Most Overseas Chinese in Thailand speak Teochew. The Mandarin name for the language is Chaozhou, but Teochew speakers do not accept that appellation and prefer Teochew instead.

Shantou Teochew, Raoping Teochew and Jieyang Teochew are spoken outside of the Chaoyang-speaking area which hugs the coastline southwest of the Shantou area (Kirinputra 2014), which may explain why they have a hard time understanding Chaoyang.

Shantou is more intelligible with Hokkien than other types of Teochew, but intelligibility is still only 54%. However, Hokkien is utterly unintelligible with Jieyang (Kirinputra 2014). This implies that Shantou and Jieyang are quite different. The implication is that Jieyang Teochew is a separate language.

Shantou speakers cannot understand Chaozhou, as Shantou is quite a bit different from the other Teochew lects, and they also seem to have a hard time understanding other Teochew lects, as they say the Teochew changes every hour or so as you travel and becomes difficult to understand. Shantou Teochew is a separate Teochew language.

Sources report that Teochew varieties can vary greatly in the pronunciation of even single words, and the tones can be quite different too.

Intelligibility data for Raoping, Huilai Teochew, and Jindengzhan Teochew with the rest of Teochew is not known.

Teochew was formed by a group of Hokkien Min speakers who broke off from Zhangzhou Hokkien about 600-1,100 years ago. They moved down to Northeastern Guangdong, and after hundreds of years, a heavy dose of some sort of unknown substrate languages went into the language, possibly including a Cantonese-type variety, producing modern Teochew (Kirinputra 2014).

Overseas Teochew is a significant branch of Teochew that is spoken outside of the Teochew are in China in Vietnam, Cambodia, Thailand, Malaysia, Indonesia, and the Philippines. Overseas Teochew is an extremely variable macrolanguage consisting of a number of different languages.

Malayland Teochew is spoken in Malaysia, Singapore and Indonesia. Malayland Teochew, instead of being a language, is a macrolanguage composed of several languages.

The Teochew variant spoken in Malaysia, Malay Teochew, is composed of many highly variant lects. A different Teochew variety is spoken in each subregion, and varieties sometimes differ dramatically in pronunciation and tones. Whether or not they are mutually intelligible is not known.

Malay Teochew is spoken in four different places in Malaysia in two places at the southern tip of the peninsula and in Kedah and North Perak on the far northwestern coast where there are substantial Teochew populations. Malay is not intelligible with other SE Asian Teochew varieties. Malay has converged more with Hokkien than other types of Teochew.

It seems logical to split at least North Perak Teochew and Kedah Teochew along with Southern Malay Teochew A and Southern Malay Teochew B for the time being.

Singapore Teochewis different from Malay, and both have undergone separate divergent influences, so each one should be regarded as a separate language. However, Singapore Teochew is similar to Shantou because most Singaporean speakers came from there. Singaporean is regarded by Teochew speakers on the mainland as a heavily corrupted and impure variety of Teochew. Singaporean is not intelligible with any of the Teochew spoken in China anymore, not even the Shantou that it came from.

It has come under such heavy influence from Singaporean Hokkien that it is not better regarded as Singaporean Teochew-Hokkien than a pure Teochew tongue. Many of the original Teochew terms have been replaced with Hokkien words. It is also now heavily admixed with Malay and a lot of the characteristics of Mainland Teochew have been lost.

There are variations even among Singaporean Teochew. Speakers of some of the coarser, more rural dialects can only understand 50% of the purer varieties. This is derived from the early days when only some of the immigrants from Shantou were educated and most were uneducated peasants. The peasants did not speak the same higher, more refined Shantou than the educated people did.

In time, the differences became more dramatic. As these varieties still exist, we can call them High Singaporean Teochew and Low Singaporean Teochew, two separate languages. Lo Thia Khiang, the leader of Singapore’s Workers Party, speaks High Singaporean Teochew and is poorly understood by speakers of Low Singapore Teochew.

The variety spoken in Medan, Indonesia on Sumatra, Medan Teochew, is particularly interesting. It has heavy Malay, Hokkien and Cantonese influence and cannot be understood by other Teochew speakers (Kirinputra 2014). The town of Brahang 12 miles from Medan speaks Teochew.

Teochew is also spoken in other places in Indonesia such as Riau, Dabo Singrep, Tanjung Penang, Bantam Island, and Pontianak.

The Teochew spoken in Indochina – in particular in Vietnam and Cambodia (Indochinese Teochew) is a macrolanguage. Some Indochinese Teochew speakers who have returned to their family villages on the mainland say they could only understand 70% of the speech there.

Cambodian Teochew speakers say that Cambodian Teochew, Vietnamese Teochew, and Thai Teochew are all separate languages, and they cannot understand each other (Tek 2016).

Thailand Teochew or Diojiu-we is spoken in Thailand. The Chinese lingua franca in Thailand is not Mandarin but Teochew. There are 5 million Chinese Thais with roots in the Teochew region, and 3 million of them speak Diojiuwe.

Teochew is spoken in the Philippines, but there is little information available about Philippines Teochew.

Chaoyang, Shantou, Raoping, Jieyang, Huilai, Jindengzhan, Thai, Cambodian, Vietnamese, Medan, Singapore, Malay, Kedah, North Perak, Southern Malay A and B, Borneo, and Philippines are part of the Teochew, which has 17 lects 12 of which are separate languages.

Hailufeng Min

Hailok’hong, Hailufeng or Haklau Min is a separate language in Southern Min that represents a later move of Zhangzhou speakers 400-500 years ago towards Northeastern Guangdong by the same group that formed Teochew. Since then there has been convergence with Teochew (Kirinputra 2014). It also has substantial Hakka influence. Hailok’hong (Haklau) Min is spoken down the coast between the Teochew zone and the Hong Kong area.

Hailufeng Min is usually better known as Hailok’hong or Haklou Min. It has at least three dialects, Haifeng Hailufeng Min, Lufeng Hailufeng Min, and Shanwei Hailufeng Min, and has limited intelligibility of Teochew proper.

The city of Haifeng has mostly Hailufeng speakers. Lufeng is spoken in the western half of Lufeng. Shanwei is the name of the prefectural city that encompasses Lufeng and Haifeng Counties. Shanwei Min is spoken more in the urban area of Shanwei.

Intelligibility among the three main Hailufeng Min varieties is full.

There is a group of Hailufeng speakers who originally came from Shanwei living in Hong Kong as part of the Tanka fisherpeople community. They live in the northern part of Hong Kong north of the Hokkien-speaking Tankas. They originally came from the Shanwei area which is just to the north. We will call them Hong Kong Tanka Hailufeng Min for now. Intelligibility data for this lect is not available.

Many insist that Hailufeng is a Teochew language because this area was redistricted into the Teochew area administratively in the 20th Century. Chinese people are jealously loyal to their home districts and see all languages spoken in their district in geographical and not linguistic terms. So to admit that Hailufeng is not Teochew would be a sort of treason to the homeland if you will (Kirinputra 2014). The area where the language is spoken along the coast of Guangdong is actually to the south of the Teochew area.

Hailufeng is said to be halfway between Teochew and Zhangzhou. Hailok’hong or Haklou etymologically is Haihong + Lok’hong, which is the same thing Haifeng + Lufeng, so it is a combination of Haifeng and Lufeng. Haklau is also cognate with Hokkien Holo and Cantonese Hoklo, referring either to Taiwanese Hokkien or Teochew. In an overall sense, it meant Hokkien + Teochew, which is a good description of the language (Kirinputra 2014). Hailufeng is still confused a lot with Hokkien in many casual descriptions.

Many Hailufeng speakers can now understand Teochew, but that is due to bilingual learning (Kirinputra 2014).

Lufeng is said to have over 90% intelligibility with Xiamen Hokkien, but if it is really halfway between Teochew and Hokkien, it should have 75% intelligibility instead. Intelligibility testing may be needed. There are 3 million speakers of Hailufeng Min.

Zhenan Min

Zhenan Min, spoken in pockets in Yixing, Anji, and Linan in Southern Jiangsu and Wenzhou in Changxing in Southern Zhejiang Province around Pingyang and Cangnan and in the Zhoushan Islands, is a separate language. Speakers are found in Anhui Guangde, Nigguo, Langxi, the eastern part of Wuhu, Jiangxi Shangrao, Yushan Island, and Guangfeng County, in addition to Pucheng on the northern border of Fujian. It is spoken along the coast far to the north of the general Min-speaking area.

Zhenan Min has 574,000–848,000 speakers. Zhenan Min is influenced by Eastern and Northern Min and has limited intelligibility with other Min languages. In the area around Wenzhou, it has come under heavy Wenzhou Wu and Manhua Wu influence. Zhenan Min is still confused with Hokkien in casual descriptions.

Intelligibility among Zhenan Min varieties is not known. Zhenan Min is a result of a migration of Hokkien speakers from Hui’an, Jinjiang, Quanzhou, Nan’an, Xiamen, and Jinmen to the area in middle of the Ming Dynasty about 800 years ago due to pirate attacks and civil wars in the region they fled from. Once they arrived at their new home, high waves prevented them from returning, so they decided to make their new homes here in the north.

Jujiang Zhenan Minis spoken in Taishan County near the Manhua-speaking area.

Baizhang Zhenan Minis spoken as a dialect island in the south of Taishan County. It has come under severe influence from Luoyang Wu and Manhua. It is presently near extinction. Baizhangappears to be a dialect of Jujiang.

A grammar written around 1900 on the Bun-Sio dialect of Hainanese Min stated that a number of the more distant Hainanese Min varieties were “perfectly unintelligible” to Bun-Sio Hainanese Min speakers (De Souza 1903).

Bun-Sio is spoken in an area called the Bun-Sio District, also known as the Wenchang District, on Hainan. This region encompasses the far northeastern end of the island. There are also Hainanese Min speakers in Malaysia and Vietnam. These speakers speak a version of Bun-Sio which looks a lot like the type described 100 years ago.

From a glance at this grammar, Bun-Sio or Wenchang Hainanese Min has more of a Tai-Kadai substrate than Southern Min in general. There is also a trace of Cantonese and more of a Mandarin influence than in the rest of Hokkien and Teochew. All in all, it is probably acceptable to split off Bun-Sio as a separate language.

Hainanese tones also vary from region to region, once again implying more than one language. The Hainanese Min tone system does not seem to be well described.

Leizhou Minis made up of two main groups: Leizhou Min and Zhanjiang Min. Leizhou Min is a separate language, and it has a close relationship with Hainanese. Nevertheless, Leizhou consists of seven different lects. Haikang is a dialect of Leizhou.

At least some of the other six Leizhou varieties are very differentin phonology and lexicon. Intelligibility data is not known, but they may be mutually intelligible. Leizhou, with four million speakers, has low intelligibility with other Min varieties and has only 85% intelligibility with Hainanese, similar to Spanish and Portuguese.

Zhanjiang Min is apparently not intelligible with Leizhou. It is spoken in Zhanjiang City in the far southwest of Guangdong. It seems to be a separate language.

Shaojiang Min or Min Gan

Shaojiang Min or Min Ganis a completely separate high-level division of Southern Min. It is spoken in Nanping County in the far northwest of Fujian bordering the Northern Min and Wu-speaking area to the east by about 984,000 people. It has four languages inside of it – Shaowu Shaojiang Min, Guangze Shaojiang Min, Jiangle Shaojiang Min, and Shunchang Shaojiang Min – that have limited mutual intelligibility. There are subdialects within these larger lects.

The substratum of Shaojiang is not for the most part Min, Gan or Hakka – instead, it is the ancient Baiyue language, however, there are lesser Hakka and Gan influences. Others say that this is not Southern Min at all. Instead it is a division of Northern Min where Central Min is also included. This would make sense due to its location and the fact that Shaojiang split away from Northern Min several hundred years ago. These are Northern Min speakers who came under heavy influence of Hakka, Gan, and Baiyue.

Shaowu, Guangze, Jiangle, and Shunchang are all part of Shaojiang, which has four lects, all are separate languages.

It has limited intelligibility of other Min languages – for instance, Puxian Min has 60% intelligibility of Xiamen Hokkien Min, but the mutual intelligibility is lopsided, as Xiamen intelligibility with Puxian Min is lower at 30% (Terng 2016). Hence Puxian-Xiamen intelligibility is only 45% (Terng 2016).

The name is derived from the names of two different cities in China where this language is spoken – “Pu” for Putian and “Xian” for Xianyou.

Puxian Min has seven dialects. There is full intelligibility between all of the dialects, although there are some minor pronunciation and vocabulary differences (Terng 2016). The two main divisions of Puxian Min are into Putian Puxian Min and Xianyou Puxian Min, hence the name Puxian Min being a mix of the two main varieties. Both are dialects of the main Puxian Min language.

There are at least four subdialects spoken in Putian County, all subdialects of Putian Puxian Min. They are Jiangyou Putian Puxian Min, Changli Putian Puxian Min, two spoken in Putian City called North Putian City Puxian Min and South Putian City Puxian Min. There are other Putian Puxian varieties spoken in the county to the north and south of the Putian City other than Chengli and Jiangyou, but their names are not known. We will call them North Putian County Puxian Min and South Putian County Puxian Min.

There are three dialects spoken in Xianyou County, one in Xianyou City called Xianyou CityPuxian Min or Central Xianyou Puxian Min, another in the north of the county called North Xianyou County Puxian Min, and a third in the south of county calledSouth Xianyou County Puxian Min. All are subdialects of a single dialect of Puxian Min, Xianyou Puxian Min. All three subdialects are fully intelligible with each other with only some minor differences in pronunciation and some different vocabulary (Terng 2016).

For instance, North Xianyoukou, “to throw,” is lacking in Xianyou City.

South Xianyou has [i] and [e] for [y] and [ɵ] in Xianyou City and

North Xianyou has [θ] for Xianyou City [ɬ] (Terng 2016).

Xianyou city trades a lot with the north and south of the county, so there is a lot of contact between the subdialects. The city gets rice and rice-derived goods from the south and fish and shellfish from the south.

There is also a lot of intermarriage between speakers of the three subdialects. Most speakers of one of the Xianyou dialects have relatives who speak another of the dialects. The only research on Xianyou Putian Min has focused on the dialect of the city – Central Xianyou – with other two dialects being poorly known (Terng 2016).

Intelligibility between Xianyou and Putian Puxian Min is good at 90%-100%. There are some vocabulary differences.

For instance, “white”: Xianyou City pann, Chengli Putian 城里, Putian City pa; “officer”: Xianyou City kuann, Chengli Putian melonkua, are two pairs that cause some confusion. In these cases, Chengli Putian has lost nasalization that Xianyou City has retained. As we shall see below, loss of final nasalization is not just seen in Chengli Putian but in all of Putian. Nevertheless, Xianyou City intelligibility of Chengli Putian is full at 100% (Terng 2016).

There is some different vocabulary there too, and in some cases of common words, the differences are striking.

For instance, “children”: Xianyou kann en, Putian ta a; “wet”: Xianyou iunn, Putian tang. Once again we see than Xianyou has retained the older nasalization, whereas it appears that all of Putian, not just Chengli, has lost it (Terng 2016).

There are also rhyme differences between Putian and Xianyou. Xianyou has retained more rhymes at 50 rhymes, whereas Chengli Putian has 40, and Jiangyou Putian has 36 rhymes (Terng 2016).

So in addition to loss of nasalization, there may have been rhyme reduction in Putian also. It appears that Xianyou may be the older form of the Puxian Min language and that Putian broke away from it more recently.

However, there is a form of Puxian Min spoken in Singapore, Hinghua Puxian Min, which lacks full intelligibility with Puxian Min in China. Hinghwa Puxian Min speakers are a minority in Singapore, and their language has mixed a lot with Singapore Hokkien, Malay, English, and other languages spoken in Singapore, resulting in a separate language.

South Putian City, North Putian City, Chengli, Jiangyou, North Putian County, and South Putian County are part of Putian Puxian Min.

Xianyou City or Central, South Xianyou, and North Xianyou are part of Xianyou Puxian Min.

Xianyou City, South Xianyou, North Xianyou, South Putian City, North Putian City, Chengli, Jiangyou, North Putian County, South Putian County, and Highwa are all part of Puxian Min, which has 10 lects, two of which are separate languages.

Zhongshan Min

Zhongshan Min, a macrolanguage, has 130-150,000 speakers and has limited intelligibility with other Min lects. It is located to the south of Hailufeng Min just north of the Cantonese zone along the Southern Guangdong Coast.

This group is possibly a Northern or Eastern Min group stranded far down in Guangdong. They are sometimes referred to in old literature as “Northeastern Min”. That’s not really a category. It often means Northern Min, but sometimes it means Eastern Min. These languages have all borrowed extensively from Siyi Cantonese spoken in the Pearl River Delta.

Looking at the whole picture, it appears that various immigrants speaking Puxian Min, Northern Min, and Southern Min all settled around Zhongshan. These various Min elements, along with a hefty dose of Cantonese, have gone into the creation of Zhongshan Min.

Two Zhongshan lects, Namlong or Zhangjiabian Zhongshan Min (also spoken in Zhongshan), and Sanxiang Zhongshan Min, are separate languages. Each one is a dialect island surrounded by Cantonese speakers, and all three populations are unconnected.

Sanxiang is more divergent. Further, there are more dialects within these three languages, and dialectal divergence is considerable.

Sanxiang Min has at least two dialects, Phao Zhongshan Min and Tiopou Zhongshan Min. Phao is fairly uniform across a number of villages, but Tiopou is quite different. Nevertheless, there is near-full intelligibility between Phao and Tiopou (Bodman 1988).

For now, it is best to list Sanxiang, Namlong, and Longdu as separate languages, with possible dialects Phao, Tiopou, Namlong A, Namlong B, Longdu A, andLongdu B, among them.

Longyan Min or Coastal Min

Longyan Min or Coastal Min (Branner 2008) is a separate language. It is spoken in Longyan City’s Xinluo District and Zhangping City deep inside Fujian to the west of the Hokkien-speaking area. There is an overseas group of Coastal Min speakers in Malaysia in Penang around Parit Buntar. Although the language has been dying out in Malaysia for some time now, the language is still quite alive in Parit Buntar.

The language has anywhere from 300,000 (Branner 2008) to 740,000 speakers and has limited intelligibility with other Min languages. It has heavy Hakka influence due to the large number of Hakka speakers in the surrounding areas. Some put Coastal Min in a Southern Min Nan division of its own, others put it in Hokkien, and others put it outside of all other major Min varieties in its own Min category. The best analysis seems to be that it belongs in its own Southern Min division.

Koongfu Coastal Min and Shizhong Coastal Min are dialects of Coastal Min, but on examination, they are quite different. Koongfu is spoken in Kanshi Township in Yongding County. Shizhong is spoken in Southern Longyan County. Considering the rather extreme divergence of Coastal Min varieties in Wan’an, Koongfu Coastal Min and Shizhong Coastal Min are separate languages.

Another Coastal Min group is best called Wan’an Coastal Min. This is actually a macrolanguage comprising a number of separate languages in Wan’an County of Fujian.

Wan’an and Longyan are not mutually intelligible (Branner 2008).

Wan’an is a small township in northwestern Longyan County in Western Fujian which consists of very rugged, hard to access mountains with scattered very isolated villages made up of poor farmers. Some of these villages were visited for the first time by a Westerner only in the 20 years (Branner 2000).

To give you an idea of how remote the area is, to walk between two villages in Wan’an would take six difficult and confusing hours down ancient cobblestone paths through dark forests. But to take a bus between the two towns that are six hours walking distance away would take three days (Branner 2000)!

With many of these lects, they don’t understand each other at first, but after they talk to each other for a while, they start to figure out the other variety (Branner 2008). Owing to difficult intelligibility from village to village, the best analysis seems to be that all of the above are separate languages. Intelligibility among the Wan’an languages is ~70%.

Coastal Min seems to have about 85% intelligibility with Taiwanese Min. The intelligibility of Coastal Min with Penang Northern Malayland Hokkien is very poor.

She Min

A very strange variety called She Min is spoken by the She people in Zhejiang, Fujian and Guangdong. The She language was originally Hmong-Mien, which then added a Cantonese layer, then a Hakka layer, next a Min layer, and in Zhejiang, a Wu layer. It is best described as a Hmong-Mien language that has been Sinicized. There are probably 200,000 speakers of this language.

Zhejiang She Min is no doubt a separate language due to the distance between it and the other two principal varieties in addition to the Wu layer.

Fujian She Min is also a separate language.

In Eastern Guangdong, the She speak Chaosan or Teochew She Min. They live in the Phoenix Mountains in Chao’an County in Chaozhou Prefecture. The language has had heavy contact with Teochew. This is probably a separate language, unintelligible with other She languages and Teochew.

There is also an original She language that is non-Sinitic (Hmong-Mien) and is spoken by only about 1,000 people in Guangdong.

Datian Min

Hakka

Hakka is an extremely diverse group of languages spoken in Southern China. There may be up to 1,000 lects in Hakka. The dialect situation with Hakka is quite confused and somewhat contradictory. Some speakers report adequate intelligibility between lects, while others report difficulty. There are also reports of great diversity and difficult intelligibility even from village to village in Western Fujian, Gannan County in Jiangxi and Northern Guangdong. Intelligibility testing could clear up some of the confusion.

Hakka Proper (Meixian or Moiyen, formerly Jieyang) is spoken in Mei County in Northeastern Guangdong.

Hakka is very different from all other forms of Chinese. Although Southern Min and Hakka are said to be close, Taiwanese Hokkien can understand only 1% of even Taiwanese Hakka.

Meixian Hakka is the central Hakka version used as Standard Hakka. It is at least understood by 75% of Hakka speakers, so it is often used for communicating with Hakkas who speak other Hakka languages. Meixian was chosen as the standard because the region where it is spoken is one of the major strongholds of Hakka language and culture. In addition, it has preserved most of the original Hakka phonology and has less influence from Cantonese and Hokkien.

Nevertheless, Changting Hakka preserves more of the original Hakka than Meixian does.

Xingning Hakka, Zhenping Hakka, and Wuhua Hakka are all dialects of Meixian.

Tonggu speakers came from Wuhua a while back. Intelligibility data for these varieties is not available, but Tonggu Hakka is in its own separate group of Hakka, so it must be a separate language.

Meixian was formerly known as Jiaying Hakka. The Hakka varieties of Meixian, Pingyuan Hakka, Dabu Hakka, Xingning, Wuhua, and Jiaoling Hakka used to be included in Jiaying.

Dapu or Dabu Hakka, while close to Meixian,is a separate language. It is spoken in Dapu County, Guangdong. Dapu was the basis for Taichung Dongshi Hakka spoken in Taiwan. Actually, Donshi Hakka was derived directly from Chisan Hakka spoken by the founder of the Hakka community in the county. However, Donshi is now very different from Chisan. Intelligibility data for Chisan is not available.

Fengshun Hakka is a dialect of Dapu. Fengshun has five different varieties. Fengshun is also spoken in Bangkok as Bangkok Fengshun Hakka. Although it has been affected by Teochew influence in Bangkok, Bangkok Fengshun is still relatively pure.

Hopo Hakka is not intelligible with Dabu, Hailu or Meixian. Hopo Hakka has deep influence from Teochew because it is located right next to the Teochew area.

Intelligibility data on Huangbu Hakka, Huiyang Hakka, and Chetian Hakka is not known. Huiyang is close to Hong Kong Hakka. However, diversity is great within Longchuan, and dialects differ from village, with difficult intelligibility from village to village.

Huizhou Hakka is in its own group of Hakka, so it must be a separate language. Huizhou is heavily spoken in Huizhou City. Huizhou is not intelligible with Moiyen, Taipu, Hopo, or Taiwanese.

Banshan Hakkais spoken in the Chengkang District of Tangnan town in close proximity to Jindengzhan village, where Teochew is spoken, and Changlin village in Tangnan town in Fengshun, Guangdong where Hakka called Changlin Hakka is spoken. Banshan is a dialect island surrounded by Teochew. Banshan may have significant Teochew influence. Banshan is quite probably a separate language.

Liannan Hakka is spoken in Northwest Guangdong and Wengyuan Hakkais spoken in Northwest Guangdong. They are members of the Yuebai Group of Hakka, which is highly divergent.

In Northern Guangdong, there may be many different Hakka languages, since dialects tend to differ from village to village, and in many cases, communication is difficult between villages.

The Yuemin Group of Hakka from Southern Fujian and Southeastern Guangdong is a separate language.

Heyuan Hakka is spoken in Central Guangdong.

Jiexi Hakka is spoken in Southeastern Guangdong.

Dongguan Qingxi Hakka is spoken in South-Central Guangdong.

Haifeng Hakka,Lufeng Hakka, and Luhe Hakka, located near each other in Haifeng, Lufeng, and Luhe Counties in Shanwei City of Guangdong, appear to be dialects of a separate language called Hailufeng Hakka. It is spoken most heavily in Luhe County, where most people speak Hakka. This is a Hakka with heavy influence from Hailufeng Min.

Sanxiang Hakka, spoken in Zhongshan Prefecture, is different from all other Hakka. In all probability, it is a separate language.

Intelligibility between Hong Kong Hakka, Huiyang and Bao’an is not available.

Despite the fact that Hong Kong Hakka lects seem similar to Hakka lects spoken in Eastern and Northeastern Guangdong, many Hong Kong Hakka trace their origins to Guangxi.

Hong Kong Hakka has three principal dialects, Dongguan Hakka, Taipu Hakka, and Wakia Hakka. The language is similar to the Hakka spoken around Huiyang in Eastern Guangdong. They moved from that area to Hong Kong as the beginning of the Qing Dynasty, so they came to Hong Kong 375 years ago.

Shataukok has a number of dialects within it, and they are different, but they may be more or less mutually intelligible. However, the MI is difficult to characterize, as it is said that speakers of other dialects can “get the gist” of what the other speakers are saying. “Getting the gist” of a variety usually implies less than 90% intelligibility.

Another variety of Hong Kong Hakka is spoken in Shuijian Village in the southern part of Yuen Long. This lect is completely different form the rest of Hong Kong Hakka. They moved to Hong Kong from Western Fujian150 years ago. It is said to be similar to Boluo Hakkain Northeastern Guangdong, but this has not been proven.

The best name for this is Shujian Hakka, and it is best seen as a separate language, completely apart from the rest of Hong Kong Hakka. This language is now spoken only by older people who are ashamed of their language and generally refuse to speak it with outsiders.

The Gannan Hakka Group spoken in Southern Jiangxi is extremely diverse compared to the Hakka of Guangdong and Fujian. Gannan Hakka varieties differ even from village to village.

With Gannan, we may be dealing with a situation of many different languages, as with Wu, Hui, Tuhua, and Xiang. In fact, it quite possible that with Jiangxi Hakka, we may be dealing with every Hakka variety being a separate language.

There are two separate groups there, Bendi Hakka and Keji Hakka. Bendi varieties are some of the most divergent Hakka varieties of all, while Keji varieties are more traditional, having moved out of the core Jiaying area within the last 300 years.

Ruijin Hakka, spoken in Southeastern Jiangxi, is very different and may well be a separate language. It looks a lot like Gan.

Xinfeng Tieshikou Hakka is in all probability a separate language, spoken in Xinfeng County by 90% of the population.

Many extremely diverse forms of Hakka are spoken in Fujian. Sources say that each Hakka village in Western Fujian speaks its own variety, and that the varieties are far enough apart to make communication from village to village very difficult.

Whether these are dialects of separate languages is difficult to determine. Usually they cannot understand each other at first, but after a while, they figure out how to communicate with each other (Branner 2008). There is significant enough difficulty in communicating between these villages that a local Mandarin dialect is used for inter-village communication (Branner 2008), suggesting difficult communication from village to village. This suggests that it is valid to split all of the above off into separate languages.

Hakka is also spoken in the south of Guangxi. There are 3.6 million Hakka speakers in Guangxi.

Dayu Hakka is spoken in Southern Guangxi.

Mengshan Xihe Hakka is spoken in Eastern Guangxi.

Each one is probably a separate language.

Mashan Old Naxing Hakka is spoken in Mashan Old Naxing village in Guangxi. It is located far from other Hakka and has come under the influence of other Sinitic and non-Sinitic languages such that it is now very different. It is surely a separate language.

Binyang Hakka is also spoken in Guangxi. They are Meixian speakers who came to Guangxi 400 years ago. The language is now very different from Meixian. It is quite probably a separate language.

Hakka speakers immigrated to Sichuan a long time ago.

Chengdu Hakkais spoken in Chengdu, Sichuan. It is quite different from other forms of Hakka and has poor intelligibility with other forms. At the moment, Hakka is the main means of communication in the Jinjiang, Jinniu, Chenghua, Longquanyi, Xindu, and Qingbaijiang Districts in Chengdu.

Longcheng Hakka is spoken in Longcheng by Hakka who immigrated there a long time ago. It has since come under heavy influence from Longcheng Southwestern Mandarin.

Longtanshi Hakka speakers came from Mei County in Guangdong long ago, but now Meixian and Longtanshi are very different. It resembles Wuhua and Xingning more and has since come under heavy influence from Chengdu Southwestern Mandarin.

The present koine is called Sihai Taiwanese Hakka and is a combination of Sixian Taiwanese Hakka and Hailu Taiwanese Hakka, the two most widely spoken lects. Dongshi Taiwanese Hakka comes from Dapu County, Guangdong. Hailu Hakka comes from Huizhou prefecture.

Sixian itself is currently the most widely spoken Hakka variety in Taiwan. The name comes from the four Guangdong counties of Meixian, Jiaoling, Xingning, and Pingyuan. But the Sixian speakers who came to Taiwan generally came from Jiaoling, so Sixian currently resemblesJiaoling Hakka more than Meixian. Sixian is divided into two main dialects, Miaoli Taiwanese Hakka and Liudui Taiwanese Hakka. The differences between the two appear to be great, and they may well be separate languages.

In general, speakers of other kinds of Hakka find Taiwanese Hakka to be hard to understand, possibly due to Southern Min influence. Hakka speakers make up only 5% of the population of Taiwan. Almost all are proficient in Mandarin or Hokkien, and there are few monolinguals left.

The Hakka spoken in Kunming, Sarawak, in Malaysia is known as Ho Po Hak Hakka. It is similar to Hopo Hakka, spoken in Hopo, near Meizhou.

Although Ho Po Hak speakers make up 70% of the Sarawak Hakka population, there are also speakers of Dapu, Fengshun, Huizhou, Bao’an, Dongguan, Lufeng, Wuhua, Meixian and Yongding on Sarawak. These speakers probably cannot be classed as Ho Po Hak. Intelligibility between these forms of Sarawak Hakka, Ho Po Hak and the Hakkas they are derived from is not known. Ho Po Hak is very different from the Hakka spoken in Sabah, Malaysia.

Hakka speakers make up the majority (57%) of the Chinese in Sabah where Sabah Hakka is spoken. Many arrived in the 1860’s fleeing the massacres perpetrated by the Manchus following the failed Taiping Rebellion. This group settled in Sandakan.

Others were brought from Longchuan County, Guangdong to Kudat in 1882 as laborers by the North Borneo Chartered Company. Sabah Hakka is identical to Huiyang/Fuiyong Hakka spoken in the Huiyang District of the city of Huizhou, near Shenzhen in Guangdong. Huizhou Hakka has heavy Cantonese influence. Most people in Huizhou are Hakka speakers. The main Hakka centers in Sabah are the cities of Sandakan, Kudat, Kota Kinabalu, and Tawau.

Dapu is still spoken in Malaysia and Singapore. Kuala Lumpur Dapu Hakka is very different from the Dapu spoken in China. It is now heavily creolized with Malay. It is quite probably a separate language. It is heavily spoken in the Serdang and Ampang regions of the capital.

There are also some Hakka speakers around Ipoh. It is not known what type of Hakka they speak.

In the 1800’s, there were Hakkas speaking Jiaying Hakka (Jieyang Hakka was the old name for Meixian), Yongding, Fengshun, and Jengcheng Hakka from Guangdong in Singapore, Penang, Malacca and Tel Anson on the Malay Peninsula. Whether they are still present is not known. Meixian speakers were known from Singapore as recently as 1950. A type of Huiyang is still spoken in Penang as Penang Hakka.

Bangka Island Indonesian Hakka, spoken on Bangka Island in Indonesia, has diverged so radically with its tones that it is now a separate language. That is, speakers of other Indonesian Hakka varieties say that they cannot understand Bangka Island speakers. It’s a Hakka creole more than anything else.

In Indonesia, two other major Hakka varieties are spoken,Kun Dian Indonesian Hakka, spoken in Borneo, and Belitung (Ngion Voi) Indonesian Hakka, spoken mostly on Sumatra and Borneo.

Belitung is spoken mostly on Sumatra and Borneo and is characterized by a soft way of speaking. Belitung speakers mostly derived from Meixian speakers.

Belitung and Bangka Island say they cannot understand Kun Dian, but Kun Dian speakers say they can understand the other two for the most part.

Most old people in Belitung and Singkawang are Hakka monolinguals who cannot speak Bahasa Indonesia at all. These elderly speakers have to bring interpreters with them when they go to the doctor.

A type of Meixian is spoken in East Timor as East Timor Hakka.

Although some Indonesian Hakka speakers speak a very pure Hakka similar to the Huizhou spoken on the mainland, these are mostly the oldest generation. The younger generations speak a language that is very heavily adulterated with Indonesian languages.

Wuhua, Meixian, and Dabu are members of the Xinghua subgroup of Yuetai Group of Hakka, which which has five lects. Xinghua Hakka has 3.4 million speakers (Olson 1998).

Bao’an, Lufeng, Haifeng, and Hailufeng are in the Xinhui subgroup of Yuetai Hakka, which has nine lects. Xinhui Hakka has 2.4 million speakers (Olson 1998).

The Yuetai Group of Hakka has 23 lects.

Gaoxiong, Xinzhu, Dongshi, Jiaying, and Miaoli are members of the Jiaying Group of Hakka, which has seven lects.

Tingzhou, Yongding, Liancheng, Changting, Xinquan, Basel Mission, Wuping, Ninghua, Qingliu, and Mingxi are all part of the diverse Tingzhou Group of Hakka. All told, Tingzhou Hakka has 10 lects, most of which are separate languages.

Longchuan, Boluo, and Heyuan are members of the Yuezhong Group of Hakka, which has five lects.

Huizhou in its own subgroup of Hakka.

Xingguo and Ningdu are in the Ninglong or Gannan Group of Hakka, which has 13 lects. There may be as many as 13 different languages in this group.

Xiang

In fact, Changsha itself is divided into multiple languages in the city itself. We do not know how many there are, but we know that they exist. For the moment, we shall just add one variety to Changsha, and divide it into Changsha City Xiang A and Changsha City Xiang B, but there may be more. Furthermore, there are significant differences within the Changsha spoken in Changsha City and in the surrounding countryside.

Shuangfeng is also very different within itself, as the vocabulary changes every 10 miles or so. Intelligibility data is lacking.

Lingshuijiang Xiang, also spoken in Hunan by 300,000 people, may well be a separate language.

Shuangfeng and Lingshuijiang are both part of the Luoshao group of Xiang. Shuihui Xiang and Suantang Xiang are also part of this group, however, Shuihui is so different that it is recommended to split it from Luoshao into its own group with Suantang Xiang. Suantang itself is very different. It has Southwest Mandarin and Xiang elements along with Hmong and Dong influences.

Suantang is so different that it is controversial whether it was Southwestern Mandarin or Xiang, but the best analysis seems to be that it is a Xiang variety. Clearly Shuihui Xiang and Suantang Xiang are separate languages.

Mao Zedong spoke Xiangtan Xiang, a notoriously difficult Xiang language in Hunan, about which it was said, “No one can understand it.” Xiangtan itself is internally diverse, with differences between the dialects of the city and rural areas, but intelligibility data is lacking.

Shaoshan Xiang and Lianyuan Xiang are both spoken near Xiangtan, and both are surely separate languages. There are a number of dialects within each of these languages.

Ningxiang Xiang is said to be very different from Changsha. Given the dramatic divergence present even as background in Xiang, this must mean that Ningxiang is at least not intelligible with Changsha.

Ningxiang County is split into two separate dialects, North Ningxiang Xiang and South Ningxiang Xiang. The differences between the two are great. Upper Ningxiang Xiang looks more like a Lianyuan dialect, and Lower Ningxiang Xiang looks more like a Changsha dialect.

Beyond that, Ningxiang is split into four major divisions – Chengguan Xiang, Shuangjiangkou Xiang, Huaminglou Xiang, and Liushahe Xiang. Surely each is a separate language.

Liuyang Xiang is a separate Xiang language, actually a macrolanguage, spoken in Liuyang county-level city in Changsha prefecture east of Changsha City near the Jiangxi border in Hunan. Liuyang is split into five divisions – North Liuyang Xiang, South Liuyang Xiang, West Liuyang Xiang, East Liuyang Xiang, and Liuyang City Xiang.

South Liuyang Xiang and East Liuyang Xiang are separate languages, mutually unintelligible with the others. Liuyang City Xiang has recently arisen as a sort of a Liuyang koine that is understandable to speakers of all Liuyang lects. None of the three Liuyang languages is intelligible with Changsha. On closer observation, none of the Liuyang varieties are intelligible with each other. Therefore, North Liuyang Xiang and West Liuyang Xiang are separate languages also.

Even within this classification, each of the five Liuyang Xiang varieties has multiple dialects. Each village is said to have its own variety in Liuyang Xiang.

In the city of Yiyang, Henan Province, three Chinese varieties are spoken. One is a Yiyang Changyi Xiang variety, another is a Yiyang Luoshao Xiang variety, and a third is Luoyang Southwest Mandarin, a dialect of Henan Mandarin, described above. All appear to be separate languages.

We will call the two Xiang varieties Yiyang Changyi Xiangand Yiyang Luoshao Xiang.

Huangxu Xiang, a Xiang dialect island in the Southwestern Mandarin-speaking city of Deyang in Sichuan, is very different from the rest of Xiang and must surely be a separate language.

Quanzhou Xiang in Guangxi is another Xiang dialect island. It has extreme differences with Hunan dialects like Shuangfeng.

According to good sources, there is a tremendous amount of variety diversity in Western Hunan, most of it probably involves Xiang lects, while most or all of these varieties are not mutually intelligible. But until we get more data, we cannot carve any languages out of this mess yet.

Shuangfeng, Shuihui, Suantang and Lingshuijiang are members of the Luoshao Group of Xiang, which has 21 lects.

Changsha City A, Changsha City B, Changsha Rural, Hengyang, Shaodong, Xiangtan, Shaoshan, Baishi, Liling, Lianyuan,Qianshan, Houshan, Jiashanqiang, Ningxiang, Chengguan, Shuangjiangkou, Huaminglou,Liushahe,North Liuyang, South Liuyang, East Liuyang, West Liuyang, and Liuyang City are members of the Changyi Group of Xiang, which has 32 lects.

Jishou and Huayuan are members of the Jixu Group of Xiang, which has eight lects.

Xiang is composed of 74 lects. Many or possibly all of them are separate languages. The various languages of Xiang have 50 million speakers (Olson 1998).

Wu

Wu is a major group of diverse Chinese languages that is often divided into Northern Wu and Southern Wu. Southern Wu has 18 million speakers. My opinion is that in general, the Wu varieties are mostly separate languages; however, some are merely dialects of other Wu lects.

A good general rule for Zhejiang Wu varieties is that you can sort of understand the variety of next city over, but the language of two cities away is incomprehensible. For instance, in the Taizhou Prefecture region, there are between four and five mutually unintelligible Wu varieties across a 12 mile area. In Zhejiang, the mountains go all the way down to the sea, so there are few flat areas where language can spread out and become mutually comprehensible.

Although the Suzhou City administrative area is large, Suzhou Wu language is spoken only in the city proper and its suburbs. Suzhou City dwellers say that people in the suburbs have a rural or “hard” accent, while the speech of Suzhou City is called “soft.” Suzhou is presently divided into two sets of speakers, one over 50 and another under 50. Differences between age groups in Suzhou were noted as early as the 1930’s. Suzhou Wu is still very widely spoken in the area.

Suzhou is 70% similar to Shanghaihua. That is not enough for full intelligibility. Shanghaiese find Suzhou to be incomprehensible. The differences between Suzhou and Shanghainese are much greater than between suburban Shanghai languages. A Shanghainese speaker would need a few months in Suzhou to learn Suzhou. This is about the same as the difference between Castilian-Catalan and Castilian-Asturian.

Suzhou is more complex phonologically and tone-wise than Shanghainese, so it is harder to learn. Even native Suzhou speakers have problems with the tones sometimes. Further, tone sandhi in Suzhou is quite complex.

Zhangjiagang Wu may be intelligible with Suzhou, but data is lacking. Suzhou is only 43% intelligible with Wenzhou (Cheng 1997). None of these varieties is intelligible with Shanghainese.

Wuxi Wuis spoken in the city of Wuxi. Wuxi is spoken in two areas, referred to as East and West Mountain. East Mountain refers to the city of Dongshan, and West Mountain refers to the city Wuxi. Wuxi is not intelligible with Changzhou or Suzhou. Wuxi is only 20% similar to Shanghainese. Wuxi can understand Shanghainese, but that is no doubt due to bilingual learning. Shanghainese do not understand Wuxi well.

Changzhou Wu is not intelligible with Shanghainese, Wuxi or Suzhou. Changzhou and Wuxi have high but not full intelligibility. Changzhou and Wuxi are part of a dialect chain in which eastern Changzhou speakers can communicate with eastern Wuxi speakers, but as one moves further west into Wuxi or east into Changzhou, intelligibility drops off. It is best then to split Wuxi and Changzhou into separate languages.

Ningbo Wu is close to Shanghainese, and Ningbo speakers can learn Shanghainese in ~two months. This is because many Ningbo speakers moved to Shanghai in the past 100 years and Ningbo became a prestige language in Shanghai in the first part of the 20th Century, so Shanghainese has a lot of Ningbo influence in it.

Many of the local Wu varieties around Shanghainese Wu say that they can understand Shanghaiese well but not the other way around.

The reason for this is complex. About 100 years ago, Suzhou became a very prestigious language in Shanghai and was widely spoken there. However, in the past century, many immigrants came to Shanghai from other parts of China. In particular, many speakers of Ningbo came to Shanghai. Ningbo is quite a bit different from either Shanghaiese or Suzhou.

With speakers of Ningbo, Suzhou and Shanghaiese all present in the city in large numbers, a koine needed to develop. Shanghainese was chosen as the koine and because speakers of three different languages were communicating, Shanghainese got dramatically simplified phonologically in order for it to be better understood by everyone.

Hence, Shanghainese has evolved in a highly simplified form of Taihu. This is why many speakers of nearby Wu languages say that they can understand Shanghainese but not the other way around.

Several varieties are spoken in the suburbs of Shanghai. Reports vary, but Shanghai residents generally report that these varieties are not mutually intelligible with Shanghainese (Gilliland 2006).

Pudong Wu, the older form of the Shanghai language, is still spoken in the Pudong District of the city, but it is dying out. There is a question of whether or not it is mutually intelligible with Shanghainese, but Shanghainese speakers seem to feel it is not mutually intelligible (Gilliland 2006).

These Shanghai suburbs varieties above are probably not fully mutually intelligible. For instance, Fengxian is not fully intelligible with Jiading. Intelligibility between the two may be ~70%, but it only takes a few weeks’ exposure for a Fengxian speaker to learn Jiading Wu.

Qidong Wu, spoken in the city of Qidong, is a separate language. Qidong is said to be very close to Chongming Wu, so for the time being, we will list Chongming as a dialect of Qidong. Chongming, spoken on Chongming Island in suburbs of Shanghai, is not intelligible with Shanghainese.

These varieties spoken in the suburbs of Shanghai are closer to the Old Shanghainese, which is quite a bit different from the New Shanghainese spoken in the city center nowadays.

Changyinsha Wu is very similar to Chongming and Qidong, so it is probably a dialect of Qidong also. Another name for Qidong is Qihai, which refers to the speech of Qidong, Haimen and Tongzhou. For the time being, we will list Changyinsha and Chongming as dialects of Qidong. Chongming, and hence Qidong, are not intelligible with Shanghainese.

Nanjing Wu is a separate language. It is close to Shanghainese Wu but is not fully intelligible with it.

Jiangyin Wu is spoken in Jiangyin city. It is related to Changzhou and has high intelligibility with Changzhou and Wuxi. It has some definite differences with Suzhou. Nevertheless it appears to be a separate language because it cannot be understood outside the city. Many older people still speak only Jiangyin.

Jinxiang Wu also has its own Wu variety with Mandarin influences. This is a Taihu (Northern Wu) outlier spoken far to the south of the Taihu region.

The standard version is spoken in Lucheng District by 1 million people and can be referred to as Lucheng Wu. Ouhai Wu, Yongjia Wu and Ruian Wu are said to be to be dialects of Wenzhou Wu, but Ouhai, spoken in the Ouhai District, is not intelligible with Ruian. Ruian is spoken by 1 million people in the city of Ru’ian, and is related to Pingyang Wuspoken in Pingcheng County.

Yongjia, spoken in Yongjia County, is separate too, since if you go five miles in any direction in Wenzhou, there’s a new dialect, and it’s hard to understand people.

Chu River Wu is a closely related separate language from Wencheng spoken in Luoyang County in Zhejiang.

Since there are 11 different cities and counties in Wenzhou, and the language changes every five miles or so, it would be logical to assume that there are 11 separate languages within Wenzhou. However, closer analysis reveals at least 14 languages within Wenzhou.

So we should then split off at least one Wenzhou language for each major division. This gives us Cangnan Wu spoken in that county and Longwan Wu and Dongtu Wu spoken in those two districts. Although aberrant Wu varieties probably not a part of Wenzhou are spoken in Taishun and Cangnan, varieties of Wenzhou are also spoken there, so it makes sense to split those two off.

In addition, in Taishun County, there is an aberrant Wu variety spoken in the town of Luoyang influenced by both Manjiang Eastern Min and Oujiang Wu. We can call this Luoyang Wu. This is best seen as the southern extension of Yesou Wu. Liqu Wu is another Luoyang variety spoken in the area.

There is another Wu variety similar to Manjiang Eastern Min spoken in the town of Hedi in Qingyuan County in Lishui. We will call this Hedi Wu. In all probability, it is a separate language.

Manhua Wu, a macrolanguage, is quite different. It is spoken around Cangnan and Wuzhou City in Northern Zhejiang on the southern coast of Wuzhou City in about five townships. The word man literally means “barbarians.”

There is a controversy over whether or not Manhua is Macro-Min or Macro-Wu. It is probably Macro-Wu based on phonology, and it also shares some similar Min-like traits with other Wu varieties such as those in the Chuqu group. Some think it originated in a Southern Min variety that came under the influence of a non-Sinitic language. Word order is completely different from Chinese word order. However, the word order is changing under the influence of Mandarin, and many younger people are using a more Mandarin word order.

Some theories think it has Proto-Vietnamese, Austronesian, and She influences. The major components seem to be Old Cantonese, Old Chinese, and Mandarin. Some also suggest Northern Min, Eastern Min, Southern Min and especially Wu influences. It has 200,000-400,000 speakers.

Within Manhua Wu, there is a northern group spoken in the town of Yishan and a southern group spoken in the towns of Qianku, Qianku Manhua Wu, and Jinxiang, Jinxiang Manhua Wu. Qianku Manhua Wu is the standard for Manhua Wu. Although the internal differences in Manhua Wu are not great, Jinxiang Manhua Wu and Qianku Manhua Wu are not mutually intelligible. It is also very heavily spoken in the city of Lengkang.

Jiaojiang Wu and Huangyan Wu cannot understand Linhai Wu. The area has split into so many mutually unintelligible languages mostly due to terrain.

For instance, Taizhou and Huangyan are only a 10 minute bus ride away from each other, but the highway was only built recently, and there is a huge mountain in between both cities. Taizhou and Jiaojiang are only another 10 minute bus ride apart, but there is a huge river separating them and it could be crossed only by boat until a ferry was built in the 1990’s.

Chuqu Wu is split into two subgroups, Chuzhou Wu and Longqu Wu. It contains contains at least 22 languages. Some members of this group extend south beyond Zhejiang into Northeastern Jiangxi and Northern Fujian. We are going to cautiously classify almost of Chuqu Wu as separate languages, since it is much more divergent and much less mutually intelligible than Taihu Wu, and Taihu Wu itself has low internal intelligibility.

Changzhou, Yixing, Jiangyin, the Haimens, and seven others are in the Piling Group of Taihu Wu, which has 12 lects. Piling Wu has 8 million speakers.

Wenzhou, Ouhai, Yongjia, Ruian, Wencheng, and seven others are in the Oujiang Group of Taihu Wu, which contains 14 separate languages.

Hangzhou has its own group, the Hangzhou Groupof Taihu Wu.

Shaoxing, Fuyang, Xiaoshan, Linan, Yuyao, Zhuji, and six others are in the Linshao Groupof Taihu Wu, which contains 12 lects.

Fenghua, Zhoushan, and nine others are in the Yongjiang Group of Taihu Wu. Yongjiang Wu contains 11 lects and has 4 million speakers (Olson 1998).

Changxing and four others are in the Taioxi Group of Taihu Wu, which has five lects.

Taihu Wu is composed of 85 separate lects, most of which are separate languages. Taihu Wu has 47 million speakers.

The Taizhous, Huangyan, Jiaojiang, Sanmen, Tiantai, Wenling, Xianju, Leping, and Yuhuan are members of the Taizhou Group of Wu, which has 13 lects, all separate languages.

The Yiwus, Dongyang, Jinhua, Jinhua Xiaohuang, Lanxi, Tangxi, Wuyi, Pan’an, Pujiang, and Yongkang are all members of the Wuzhou Group of Wu, which contains 27 lects, almost all of which are separate languages. Wuzhou Wu has 4 million speakers (Olson 1998).

Chuqu Wu has two subgroups, Chuzhou Wu and Longqu Wu.

Lishui, Qingyuan, Jingning, Jinyun, and Taishun, and four others are in the Chuzhou group of Chuqu Wu, which contains nine languages. Chuzhou Wu has 1.5 million speakers.

Pucheng, Shangrao County, Shangrao City, Jiangshan, Songyang, Guangfeng, Longquan, Kaihua, Changshan, Suichang, Longyou, Yushan, and Quzhou and one other are members of the Longqu Group of Chuqu Wu, which has 14 languages and 5 million speakers (Olson 1998).

Huizhou/Hui

Hui or Huizhou is a major group of many different languages with wide internal variation. There is a possibility that all Hui varieties are separate languages. Hui is spoken in the historical area of Huizhou, located mostly in Southern Anhui but also partly in Zhejiang and Jiangxi. The area is very mountainous, leading to strong differentiation among the lects. Every county in the area has its own Hui version unintelligible to outsiders.

Xidi Hui, spoken in a village at the foot of Huangshan Mountain in Anhui, is a separate language. Xidi is unintelligible even to villages a few miles away.

Within Qimen County itself, there are six different Hui lects with low intelligibility between them. It is quite possible that we are talking about six different languages here. One of them appears to be Chilingkou above. The others we will just call: Qimen Hui A, Qimen Hui B, Qimen Hui C, Qimen Hui D and Qimen Hui F.

In the Yangzhou Groupof Hui, Jiande Hui and Chunan Hui are separate languages. Chunan is spoken in Jiangxi. There are two other varieties in the group, Suian Hui and Shouchang Hui. Suian and Chunan are very diverse and are in all probability separate languages. Shouchang is also extremely diverse, and Jiande has some differences with Shouchang.

The Yangzhou languages are interesting because there is controversy whether they are Wu or Hui languages. Careful examination reveals that they cannot be subsumed under Southern Wu due to their great divergence from it, despite having some similarities with Wu. Some authors feel that they are Hui-Wu merged lects, and their similarity with both is given as a reason for merging Wu and Hui into a supergroup.

While it is best to classify them as Hui, they are much different from most Hui lects. All are spoken in western Zhejiang. Discussion here.

Jiande, Chuan, Suian and Shouchang are members of the Yangzhou Group of Hui. Yangzhou Hui has four lects, all separate languages.

Huangshan, Tunxi, Wuyuan, Xiuning, and two others are members of the Xiuyi Group of Hui, which has six lects.

Meixi Xiang, the Qimens, Chilingkou, Jingde, Ningguo, Shitai, and two others are members of the Jingzhan Group of Hui. Jingzhan Hui has 12 lects.

Jixi, Huizhou, Hongmen, the Shexians, and She are members of the Jishe Group of Hui. Jishe Hui has six lects, all separate languages.

Dexing, Dongzhi, Fuliang, and two others are members of the Qide Group of Hui. Qide Hui has five lects.

Xidi is unclassified.

There are 37 different Hui lects, at least 24 of which are separate languages. The various Hui languages have 3.2 million speakers.

Cantonese

Cantonese is a major language group spoken in the south of China. Cantonese speakers are said to be a mix between the Yue people and the Han. They have great pride in their speech which is closer to ancient Chinese than Mandarin.

Some Cantonese activists denounce Mandarin as a pidgin language spoken by Manchu and Mongol invaders glommed onto the Chinese of the people they conquered.

Various attempts are utilized to determine intelligibility between lects. They vary in efficacy, as the following shows.

A better method is presented in Szeto 2000, in which sentences in other varieties, say Varieties B and C, are played to speakers of Variety A, and speakers of Variety A are asked to give the basic meaning of the Variety B and C sentences played to them. A sentence is recorded as correct if the basic meaning was ascertained.

In contrast, the more complex method through the use of complex lexical, tonal, grammatical and phonological formulae not relying on actual informants gives false positives. By this method, Cantonese has 54.7% intelligibility of Hakka, 47.45% of Teochew, and 43.5% of Hokkien. This method falsely overestimates the intelligibility of Hakka by 7.6X, of Teochew by 16.1X and of Hokkien by 19X.

Standard Cantonese is traditionally said to have nine tones, but phonemically there are only six tones, since the last three are just three of the first six with a voiceless stop consonant on the end.

These are often called entering tones in traditional Chinese scholarship. Entering tones disappeared from most Mandarin varieties about 800 years ago due to the influence of invading Mongols speaking Turkic languages but are still present in Cantonese, Hakka and Min.

The original entering tones of Middle Chinese have merged into other tones or into Mandarin’s four tones. Traditional Chinese tones or contour tones end in a vowel or a nasal. However, in Standard Cantonese, the entering tone has retained its original short and sharp character from Middle Chinese, so in a sense, it has a different sound quality.

One of the most well-known divisions in Cantonese is Yuehai. Yuehai contains four divisions: Guangfu, Sanyi, Zhongshan, and Guangbao.

The other major divisions of Cantonese are Goulou and Yongshun, found in the watershed of the Pearl River, and Siyi, Gaoyang, Wuhua and Qinlian.

Standard or Guangzhou Cantonese is based on the Guangzhou dialect spoken in the city of that name.

A very pure form of Cantonese is spoken in Sabah in Malaysia as Sabah Cantonese. It resembles Standard Cantonese so much that the speaker community is called Little Hong Kong.

Hong Kong Cantonese is spoken in Hong Kong. There are a few differences with Guangzhou but not enough to impair communication.

Macao Cantonese is spoken in Macao.

Xiguan Cantonese is spoken in the suburban areas of Guangzhou. It has a few differences with Guangzhou but presumably not enough to impair communication. It spoken mostly by the older people now, as young people now speak Xiguan Guangzhou Cantonese, which is more properly part of Guangzhou. The dialect is dying out.

Dialects spoken in Guangzhou City include Nishimura Cantonese, Dongshan Cantonese, and some others. Dongshun is spoken in the downtown area. Nishimura is spoken by a few old people in the Nishimura zone, but it is going extinct.

Wenzhou Cantonese is very close to Guangzhou.

Huizhou Cantonese is a Cantonese variety spoken in Huizhou City to the east of Guangzhou to the northeast of Dongguan and to the west of Shanwei. This is part of the Pear River Delta. Huizhou has very heavy Hakka influence such that it is probably a separate language.

Vietnamese Cantonese is quite different from Standard Cantonese, but it is said to be nevertheless intelligible with it. However, other Standard Cantonese speakers say they cannot understand Vietnamese Cantonese very well.

Malayland Cantonese is also quite different from Standard Cantonese. Cantonese speakers who talk to Malayland speakers say that Malayland sounds like a foreign language. Therefore, Malayland appears to be a separate language. Malayland is mostly spoken in Kuala Lumpur and Ipoh, less so in Singapore. There are dialects inside of Malay such as Kuala Lumpur Cantonese and Ipoh Cantonese.

Cantonese is the most commonly spoken Chinese language around Kuala Lumpur. Although Singapore South Malayland Hokkien is the most widely popular non-Mandarin Chinese language in Singapore, Cantonese is the most commonly spoken language in Chinatown.

FoshanandNanhai are close to Standard Cantonese and may be intelligible with it. Nanhaiand Shunde Cantonese are mutually intelligible. Foshan, Xiquiao, and Jiujiyang are quite similar to Shunde.

Panyu Cantonese is definitely a separate language (Chan 1981). Panyu Cantonese is spoken in Xiaolan and Huangpu in the Zhongshan area.

Shunde Cantoneseis almost the same language as Panyu, so if Panyu is a separate language, then Shunde is also. Shunde and Panyu may well be a single language, and if Nanhai is intelligible with Shunde, then Nanhai is also a part of this language. Shunde is spoken in Daliang, Longjiang, Ronggui and Beijiao.

There is at least one separate language inside of Sunde centered around Shunde, Panyu, and Nanhai, all of which are known as the Three Counties Area.

The Zhongshan Group of Cantonese spoken in Guangxi, composed of Shiqi Cantonese and Sanjiao Cantonese, is a separate language. Speakers of Standard Cantonese cannot necessarily understand Shiqi, but Shiqi people can understand Standard Cantonese. Shiqi is spoken in the urban part of Zhongshan City. Whether Shiqi and Sanjiao Cantonese are mutually intelligible is not known. It is best to call this language Shiqi Cantonese for now.

TheGuangbaoGroup of Cantonese is spoken east of the Pearl River Delta in Shenzen, Dongguan and Hong Kong. Within Guangbao are three major divisions, Dongguan Cantonese, Bao’an Cantonese, and Dapeng Cantonese.

Dongguan Cantonese is not intelligible with Standard Cantonese. It is spoken in Dongguan City. A lot of young people are forgetting how to speak it under the influence of Standard Cantonese.

Dongguan is divided into Guangcheng Cantonese, Houjie Cantonese, and Humen Cantonese. Guangcheng is spoken in the Guangcheng subdistrict. Humen is spoken Humen Township on the east side of the Pearl River. Houjie is spoken in Houjie Township to the north of Humen.

Danija Cantonese is the Cantonese variety spoken by the Tanka fisherpeople who live on boats off the coast of Guangdong, Guangxi, and Zhejiang. The Tanka People also live in Fujian and Hainan. In Fujian, they speak Fuzhou Northern Min. In Hainan, they speak some form of Hainanese Min.

Another group of Tankas in Hong Kong in Aberdeen and Taio to the north of the Hokkien-speaking area are former Hakka and Hokkien speakers who speak Weitou Cantonese, a Cantonese variety close to Standard and Dongguan but closer to Dongguan. It is not intelligible with Hong Kong Hakka.

Nantou Cantoneseis spoken in the Namtam area of Nantou by 5,000 people. Intelligibility with the rest of Bao’an is not known.

In Hong Kong, Gashiau Cantonese is spoken by a group of fisherpeople related to the Tanka. This language is related to Danija/Weitou but is not intelligible with it.

Dapeng Cantonese is spoken on the Dapeng Peninsula in the city of Dapeng, in Hong Kong, and Shenzen, in Tung Ping Chau on the Ping Islands in Hong Kong, and in Tai Kok. It has been very heavily influenced by Hakka. It is so different that it must be a separate language. It may be related to or the same thing as the Junhua or Military Language, a mixed language now classified as Mandarin. If so, it is not Cantonese at all, and instead it is a Mandarin lect. In Hong Kong, Tung Ping Chau Dapeng is highly endangered.

150 years ago, there were fewer, but still significant differences between Siyi and Sanyi (Standard Cantonese), but Siyi was disparaged as a “hill dialect” of poor farmers, while Sanyi was elevated as the prestige variety of the cultured and cosmopolitan. This is why Sanyi became the Standard Cantonese variety. The Siyi incorporated this negative view into their self-image even to the point where they held overseas meetings meeting in Sanyi.

Taishanese, Hoisonese, Hoisan Cantonese, or Toison Cantoneseis spoken north of Macao in Taishan County where there are 20 townships, and there is a different lect in every township. Taishanese is the Standard Siyi dialect. As late as the early 1990’s, children in this area were still being taught in the local Taishanese lect. Taishanese is still widely spoken in Chinatowns in the US such as in San Francisco (especially Stockton Street) and in New York.

The varieties in Taishan County can be quite different. For certain, there are at least three distinct languages within Taishanese besides the standard variety, Taishan Cantonese A, Taishan Cantonese B and Taishan Cantonese C, and these three have a hard time understanding each other.

There are clearly at least 17 dialects within Taishan Proper alone. Each town has its own dialect, and in fact, each village has its own dialect. The main town dialects are Taicheng Cantonese, Dajiang Cantonese, Shuibu Cantonese, Sijiu Cantonese, Baisha Cantonese, Sanhe Cantonese, Chonglou Cantonese, Doushan Cantonese, Duhu Cantonese, Chixi Cantonese, Duanfen Cantonese, Guanghai Cantonese, Haiyan Cantonese, Wencun Cantonese, Shenjing Cantonese, Beidou Cantonese, and Chuandao Cantonese.

Baisha is spoken in Bei Hou.

Speakers of Enping Cantonese, spoken in Enping County, cannot understand some other Siyi lects. Therefore, Enpingis a separate language.

Kaiping or Chikan Cantonese, spoken in Kaishan County, is not fully intelligible with Enping until they get used to each others’ sounds. Kaiping is so different from Taishanese that it is hard to imagine how they can communicate well, though there is partial intelligibility. There are many different dialects inside of Kaiping alone, and pronunciation varies almost from neighborhood to neighborhood. One dialect is called Gee Cantonese. However, they seem to be mostly mutually intelligible.

In Xinhui, there is a dialect called HetangCantonese that is very divergent and has many strange features not found in other Siyi lects. Doubtless it is less than fully intelligible with other Siyi lects.

XinhuiCantonese is somewhat different from Taishanese but appears to be intelligible with it.

Heshan Cantonese is intelligible with Xinhui and Taishanese.

Siqian Cantonese, Doumen Cantonese and Jiangmen Cantonese are three other Siyi varieties. Intelligibility data for these three lects is not known.

Curiously, Nanning Cantoneseis said to be intelligible with Standard Cantonese.

The Goulou Group of Cantonese is a separate from all of the rest of Cantonese and is linked with Ping and Tuhua. It is made up of Yulin Cantonese, Baobai Cantonese, Lizhou Cantonese, Guangning Cantonese, Huaiji Cantonese, Fengkai Cantonese, Deqing Cantonese, Shanglin Cantonese, Binyang Cantonese, Yangshan Cantonese, Ertang Cantonese, Shuishan Cantonese,Yunan Cantonese, and Tengxian Cantonese.

Yulin Cantoneseis a representative variety in Goulou Cantonese and is the existing form of Chinese that is closest to Old Chinese.

Baobai Cantoneseis spoken in Baobai south of Yulin. Yulin and Baobai are mutually intelligible, but they are not intelligible with the rest of Goulou Cantonese.

LizhouCantonese has difficult intelligibility with Standard Cantonese. It is spoken apart from the main group, so it may be a separate language.

Wuzhou Cantonese is a very divergent Cantonese variety spoken in Wuzhou City in Eastern Guangxi that is very hard even for other Cantonese speakers to understand.

The GaoyangGroup of Cantoneseis a division of Cantonese that is composed of Gaozhou Cantonese, Yangiang Cantonese, Liangiang Cantonese and Maoming Cantonese.

Maoming Cantoneseis an extremely diverse Cantonese variety that must be a separate language. Intelligibility of Maoming Cantonese with Yangiang Cantonese, Liangiang Cantonese and Gaozhou Cantonese is not known.

The group is divided into urban varieties which share a high degree of mutual intelligibility with each other and even with other urban varieties in the Yongxun and Gaoyang Groups but have poor intelligibility with the rural varieties.

The reasons for the higher mutual intelligibility with urban varieties even outside of the group may be due to the cities themselves, even outside of known groups, being closer to each other than rural varieties even within the same group. This may have to do with histories of intense trade between cities even outside of groups which made them closer together.

The urban varieties are Qinzhou Cantonese, Fangcheng Cantonese, Dongxing Cantonese, and Lingcheng Cantonese. They would seem to constitute a language called Urban QuinlianCantonese.

The rural varieties are split into three major groups: Lianzhou Cantonese, Lingshan Cantonese, and Xiaojiang Cantonese.

LianzhouCantonese varieties have a Ping base with some Min and Hakka blended in. They are spoken in Hepu, the southern part of Pubei, and the coastal areas of Qinzhou. Lianzhou is so different from even the rest of the rural varieties that it is a separate language.

Hepu Cantonese is a Lianzhou Cantonese lect.

Lingshan Cantonese varieties are spoken in the countryside of Qinzhou, Lingshan and Pubei.

Xiaojiang Cantonese varietiesare spoken in Pubei.

The rural varieties have poor intelligibility with the urban lects. A separate language called Rural QuinlianCantonese seems reasonable.

Beihai Cantonese is very widely spoken in the area around Nanning as the major language. Beihai itself has five separate dialects within it, Beihai Cantonese A, Beihai Cantonese B, Beihai Cantonese C, Beihai Cantonese D and Beihai Cantonese E.

Jimmi Cantonese is an unclassified Cantonese language spoken in Jilong and Tiechong in Huidong and Erbu and Chishi in Haifeng. The popular notion is that this is a blend of Cantonese, Hakka and Min. Hailufeng Min is widely spoken in the area, and Haifeng Hakka is also spoken. Jimmi varieties appear to be mostly Cantonese with some Hakka and an even smaller trace of Min. Surely Jimmi must be a separate language.

Namlong Cantonese, is an unclassified Cantonese language from the Pearl River area. It is also a separate language or at least it was in 1949. Whether it still exists is not certain, but native speakers must still be alive.

Ping/Pinghua

Ping, now recognized as a major split from Cantonese, is composed of GuinanPing,Guibei Ping,and Benihua Ping. Guinan and Guibei are definitely separate languages, and Benihua appears to be one also. There is high but apparently not full intelligibility between Guinan and Guibei.

Guinan Ping is spoken in Northern Guangxi around the city of Guilin near the Southern Mandarin-speaking area.

Guibei Ping is spoken in Southern Guangxi around the city of Nanning. It is close to Cantonese, especially Nanning Cantonese spoken in the same area. Guibei has some loans from Zhuang.

Benihua is a Ping language that has been heavily influenced by the Gong language, and as such, no doubt it is a separate language.

Guinan Ping has 22 lects.

Yongjiang Pinghua, Guandao Pinghua and Rongjiang Pinghua are members of Guibei Ping, which has 11 lects.

There is one Ping variety that is unclassified.

Ping has 34 lects. Ping has 2 million speakers.

Tuhua

Tuhua is a separate branch of Chinese spoken in Northern Guangdong, Western, Southeastern, and Northeastern Hunan Province and parts of Southern Guangxi. It has 132 separate lects. Tuhua is not really a language group but a wastebasket group for various varieties derisively referred to as tuhua – or “farmer’s language.”

Initial examination suggests that a number of things.

First of all, that the Tuhua lects, especially those of Southern Hunan, are very diverse, possibly as diverse as Wu, Xiang and Hui. Many or all of them may well be separate languages. If Tuhua is really as diverse as Wu, Xiang and Hui, then quite probably there is a different Tuhua language spoken in every county. Further, they are poorly studied and dialectally very diverse. There are many dialects inside the known Tuhua lects, and these dialects are often very different. So there appear to be languages inside even the known Tuhua lects.

Further, there appear to be links between the Tuhua varieties of Southeastern Hunan and northern Guangdong and the Ping language of Northern Guangxi, as they border each other. They all appear to be related and to have descended from a common ancestor.

Tuhua may have originally begun as a Sinicized form of the Yao language, and many of its speakers are still Yao people. One theory is that Tuhua is simply an extension of Ping. Another theory is that Tuhua started out as Middle Gan and then mixed with Cantonese, Hakka and Southwestern Mandarin.

Additionally, many Tuhua varieties are starting to splinter recently, as influences from Hakka, Cantonese and Southwest Mandarin begin to affect the younger speakers such that the language of the youngest speakers is quite a bit different from the language of the older speakers.

The best known of the Tuhua varieties is Shaozhou, referred to here as Shaozhou or Shaoguan Tuhua. Sometimes this name is used to describe all Tuhua varieties. It is spoken on the border of Hunan, Guangdong and Guangxi. Most of the speakers are in Northern Guangdong, but there are also some speakers in Southeastern Hunan.

Shaozhou is composed of eight lects, all of which appear to be separate languages. Of these, Shibei Shaozhou Tuhua and Xiangyan Shaozhou Tuhua, spoken in adjacent towns, are separate languages. Shibeihas heavy Hakka influence, and Xiangyang is turning more Cantonese. Xiangyang has only been in contact with Cantonese for a few decades, while Shibei has been in contact with Hakka for centuries.

Zhoutian Shaozhou Tuhua and Shitang Shaozhou Tuhua are spoken in Renhua County. These they may both by separate languages.

Really all of the Shaozhou varieties seem to be separate languages, so Nanxiong Shaozhou Tuhua is also. Nanxiong apparently shares a common ancestor with Hakka.

Longgui Shaozhou Tuhua, spoken in Qujiang County in Guangdong, is a separate language. Longgui has 2,000 speakers.

Besides Shaozhou, another major split in Tuhua is Lianzhou Tuhua. It is spoken in Lianzhou County and in Liannan Autonomous Yao County in Quingyuan City in Northern Guangdong Lianzhou is composed of Xi’an Lianzhou Tuhua, Fengyang Lianzhou Tuhua, Xingzi Lianzhou Tuhua, and Bao’an Lianzhou Tuhua. Each is spoken in a distinct township or townships, so no doubt each is a separate language.

Also in Hunan, in northeastern Quiyang County, another Tuhua variety is spoken – Quiyang Tuhua. This must certainly be a separate language. There is a great deal of dialectal diversity within Quiyang Tuhua. Yantang Quiyang Tuhua and Yangshi Quiyang Tuhua are two of these dialects.

Xintian Tuhua, spoken in Linwu County in Southern Hunan, is a major split in Tuhua, so it is surely a separate language.

The Lengshuitan varieties appear to represent at least one language. Lengshuitan Lanjiaoshan has at least one dialect, Lengshuitan Shamuqiao Lanjiaoshan Yongzhou Tuhua. It has a close relationship to Dong’an Xuaqiao Yongzhou Tuhua.

The second type is a Jiangyong-Daoxian type comprising nine lects. At least seven of them are clearly separate languages.

Daoxian Xianglinpu Yongzhou Tuhua must be a separate language, as it is named after a county.

Daoxian Xiaojia Yongzhou Tuhua must be separate language also, as it is a major split in this group.

There are many dialects even within the town of Yunshan where Jiangyong Yunshan is spoken. Jiangyong Yunshan is transitional between Jiangyong Chengguan and Jiangyong Xiacengpu.

Jiangyong Xiacengpu has 21 different dialects.

Jiangyong Huilongxu is the language was the basis for the famous nishu, “women’s script”, a secret language of women (Leming 2004), originating from the Shangjiangxu (Xiao River) region of Northeastern Jiangyong County in Hunan, of which much has been written lately.of the famous Jiangyong women’s script referenced above.

Jianghua Sumitang Qidouhua Yongzhou Tuhua has a reasonably close relationship to Jiangyong Songbai Yongzhou Tuhua and Jiangyong Chengguan, and all three are thought to have derived from the same base. Although it is spoken in the same county as Jianghua Baimangying, it appears to be completely different, so it must be a separate language.

Jianghua Baimangying Yongzhou Tuhua also appears to be quite different, so it is probably a separate language also.

As the other eleven main lects in this group are separate languages,

Intelligibility between varieties is not known, but dialectal divergence within Tuhua varieties is typically great, and some or all of the above may be separate languages. There are clearly at least 18 different languages here, and there may be up to 31 different languages.

Of these, Lanshang Tushi Yongzhou Tuhua may well be a separate language. Guiyang Yongzhou Liuhe Tuhua is probably part of a separate language also, as Guiyang is a county in Southeastern Hunan. Gangyu Yongzhou Tuhua, Xiangyu Yongzhou Tuhua, Lanshang Taiping Yongzhou Tuhua, and Shuangpai Lijiaping Yongzhou Tuhua appear to represent the names of separate counties, so no doubt each one is a separate language.

Xintian Northern Rural Yongzhou Tuhua is apparently completely different from Xintian Southern Rural Yongzhou Tuhua, so it is probably a separate language also.

Another Tuhua variety spoken in Yongzhou in the southern part of the region, Huasheng Southern Yongzhou Tuhua, may have as many as 75 different dialects inside of it. This is undoubtedly a separate language.

Luojin Chongshan Tuhuais spoken in Yongfu in Southern Guangxi. It has a close relationship to Guibei Pinghua. It is clearly a separate language.

Unclassified

Danzouis a separate group of unclassified Chinese languages. Danzou is spoken in the northwest of Hainan, and Hainanese speakers cannot understand it. It is either related to the language spoken by the Lingao people or is the same language.

Yet the Danzou people speak nine different lects, including varieties described as Hakka, others described as Cantonese, and others described as Mandarin, so obviously there are at least three separate languages inside Danzou. Let us call these Danzou Cantonese, Danzou Hakka and Danzou Mandarin.

Lingling or Linghua is an unclassified language spoken in Longsheng County, Guangxi. Linghua is a separate language. It is spoken by 20,000 ethnic Hmong in Taiping, Pingdeng Township in Longsheng. It is spoken only by residents inside the city as a sort of secret language. Southwestern Mandarin is used with outsiders. The language is a mixture of Hmong and Southwestern Mandarin.

Junhua or Military Language is spoken in Taoyuan County and Luidui in Pingtung County in Taiwan, Lufeng County and Huizhou City in Guangdong; Sanya, Changjiang, Danzhou, Zonghe, and Lingao in Hainan; Guangxi; around Hakka speakers in Wuping County in Zhongshan, Fujian, and other places.

On a Mandarin base, Junhua adds Hakka, Cantonese and Taiwanese. It is considered to be an Old Mandarin language and is normally placed in Southwest Mandarin in a group called the Junhua Group, which contains four lects. But others say that different Military Language varieties are either Hakka or Gan. Wherever these varieties are spoken, they are not understood by people nearby.

Junhua seems to derive from a lingua franca spoken by soldiers in the Ming Dynasty Army and was widely learned and understood by all soldiers at the time. It bears a strong resemblance to Ming Era Chinese.

Military Language is not the same language in the various areas where it is spoken.

Huping Junhua, spoken by 16,000 people in Zhongshan, is not understood by the surrounding peoples and is not considered part of Hakka. The language began in the area in the 1390’s when the Ming Dynasty sent its army to Zhongshan to put down a rebellion. Soldiers came from all over China and remained in the area after the fighting, creating a new languages out of all of their languages mixed together along with local lects. Actually this is thought to be more of a Gan language with Hakka influences.

Taiwanese Junhua in Taiwan is not the same language as the Military Language elsewhere. This language also has heavy Hakka influences, but it also has Min Nan, Mandarin and even Japanese influences. Some say this is a Hakka language.

Uncertain Affiliation/Possibly Not Sinitic

Maojiahua is a language spoken by 20,000 Hmong in southwest of Hunan, in the northeast of Guangxi and in some areas of Hubei. Ethnologue originally listed this language as a form of Chinese, but it now listed as a Eastern Xiangxi Hmong. Another argument is that this is a Chinese language with heavy Hmong influence. As the matter is not yet settled and Ethnologue lists it as Hmong, we will not list it as Chinese.

Waxiang is an unclassified Chinese variety spoken by the Waxiang ethnic group in Luxi, Guzhang and Yongshun counties in Xiangxi Tujia and Miao Autonomous Prefecture, Zhangjiajie prefecture-level city in Dayong and Chenxi, Xupu and Yuanling Counties in Huaihua prefecture-level city in Northwestern Hunan. It is nothing like the Southwestern Mandarin, Xiang, Tujia and Xo Miao Hmong languages that surround it, and none of them can understand it. There are 362,000 speakers of Waxiang.

It shares some lexical influences from the Bai language, suggesting a substratum from the Bai languages. This is either an unclassified Chinese language or a separate minority tongue, maybe related to Hmong. Others view it as a Xiang-Hmong mixed language.

I recently talked to a man who is learning Min Nan, which is a Sinitic language often called a dialect of Chinese. He told me that Min Nan speakers say that the tones are so hard that no one who doesn’t grow up speaking Min Nan ever seems to get it very well.

Cantonese is a similar language that is very difficult. It is much harder than Mandarin, and many native Mandarin speakers say they tried to learn Cantonese and gave up on it because it was too hard. Cantonese has nine tones.

Basque is said to be very hard to learn unless you grow up with it. There is a joke that the Devil spent seven years trying to learn Basque, and he only learned how to say Hello and Goodbye.

Navajo would also be hard. Even Navajo children struggle quite a bit learning Navajo and don’t seem to get it well until maybe age 12. When Navajo children arrive at school, they often do not speak Navajo well yet.

Korean is a surprise, but apparently it is very hard to learn well. A native Korean speaker told me that Korean is so hard that no Korean speaker ever speaks it with 100% accuracy, and everyone makes errors.

Czech is also hard. Even most Czech speakers never get Czech all the way. They have TV contests in Czechoslovakia where they try to stump native speakers with hard forms in the language. If you can last 30 minutes without making even one error, you win. I think only two men have been able to do it, but one was a non-native speaker!

Piraha, spoken in the Brazilian Amazon, is also very hard. Over the course of a few centuries, several Portuguese speaking priests had tried to learn Piraha, but they had all given up because it was too hard. And these same priests had been able to master a number of other Indian languages, but Piraha was just too much. Daniel Everett learned the language and wrote important papers on it. He is only of the only non-native speakers who was able to learn the language.

Tsez, spoken in the Caucasus, is also murderously hard. Every verb can have over 100,000’s of possible forms. I understand that even native speakers make regular errors when speaking Tsez.

Nice little comment here on an old post, Primitive People Have Primitive Languages and Other Nonsense?

I would like to dedicate this post to my moronic field of study itself, Linguistics, which believes in many a silly thing as consensus that have never been proved and are either untrue or probably untrue.

One of the idiocies of my field is this belief that in some way or another, most human languages are pretty much the same. They believe that no language is inherently better or worse than any other language, which itself is quite a dubious proposition right there.

They also believe, incredibly, that no language is more complex or simple than any other language. Idiocy!

Another core belief is that each language is perfectly adapted for its speakers. This leads to their rejecting claims that some languages are unsuitable for the modern world due to lack of modern vocabulary. This common belief of many minority languages is obviously true. Drop a Papuan in Manhattan, and see what good his Torricelli tongue does him. He won’t have words for most of the things around him. He won’t even have verbs for most of the actions he sees around him. His language is nearly useless in this environment.

My field also despises notions that some languages are better suited to poetry, literature or say philosophy than others or that some languages are more or less concise or exact than others or that certain concepts or ways of thinking are better expressed in one language as opposed to another. However, this is a common belief among polyglots, and I would not be surprised if it was true.

The question we are dealing with below is based on the notion that many primitive languages are exceeding complex and the common sense observation that as languages acquire more speakers and civilization increases, one tends to see a simplification of language.

My field out and out rejects both statements.

They will tell you that primitive languages are no more complex than more civilized tongues and that there is no truth to the statement that languages simplify with greater numbers of speakers and increased civilization. However, I have shot these two rejected notions to many non-linguists, and they all felt that these statements had truth to them. Once again, my field violates common sense in the name of the abstract and abstruse “we can’t prove anything about anything” scientific nihilism so common in the intellectually degraded social sciences.

Indeed, some of the most wildly complex languages of all can be found among rather primitive peoples such as Aborigines, Papuans, Amerindians and even Africans. Most language isolates like Ket, Burashaski and Basque are pretty wild. The languages of the Caucasus are insanely complex, and that region doesn’t exactly look like Manhattan. Siberian languages are often maddeningly complex.

Even in China, in the remoter parts of China, language becomes highly differentiated and probably more complex. I know an American who was able to learn Cantonese and Mandarin who told me that at age 35, for an American to learn Hokkien was virtually impossible. He tried various schemes, but they all failed. He finally started to get a hold of the language with a strict eight hour a day study schedule. Anything less resulted in failure. Hokkien speakers that he spoke too said you needed to grow up speaking Hokkien to be able to speak the language well at all. By the way, this is another common sense notion that linguists reject. They say there are no languages so difficult that it is very hard to pick them up unless you grew up with them.

The implication here is that Min Nan is even more complex than the difficult Mandarin or even the forbidding Cantonese, which even many Mandarin speakers give up trying to learn because it is too hard.

Min Nan comes out Fujian Province, a land of forbiddingly high mountains where language differentiation is very high, and there is often difficult intelligibility even from village to village. In one area, fifteen years ago an American researcher decided to walk to a nearby village. It took him six very difficult hours over steep mountains. He could have taken the bus, but that was a four-day trip! A number of these areas had no vehicle roads until recently and others were crossed by vast rivers that had no bridges across them. Transportation was via foot. Obviously civilization in these parts of China is at a more primitive level, and it’s hard to develop Hong Kong-style cities in places with such isolating and rugged terrain.

It’s more like, “Oh, those people on the other side of the ridge? We never go there, but we heard that their language is a lot different from ours. It’s too hard to go over that range so we never go to that area.”

In the post, I theorized that as civilization increased, time becomes money, and there is a need to get one’s point across quickly, whereas more primitive peoples often spend no more than 3-4 hours a day working and the rest sitting around, playing and relaxing. A former Linguistics professor told me that one theory is that primitive people, being highly intelligent humans (all humans are highly intelligent by default), are bored by their primitive lives, so they enjoy their wildly complex languages and like to relax, hang out and play language games with them to test each other on how well they know the structures. They also like to play tricky and maybe humorous language games with their complicated languages. In other words, these languages are a source of intellectual stimulation and entertainment in an intellectually impoverished area.

Of course, my field rejects this theory as laughably ridiculous, but no one has disproven it yet, and I doubt if the hypothesis has even been tested, hence it is an open question. My field even tends to reject the notion of open questions, preferring instead to say that anything not proven (or even tested for that matter) is demonstrably false. That’s completely anti-scientific, but that’s the trend nowadays across the board as scientistic thinking replaces scientific thinking.

Of course this is in line with the terrible conservative or reactionary trend in science where Science is promoted to a fundamentalist religion and scientists decide that various things are simply proven true or proven not true and attempts to change the consensus paradigm are regarded derisively or with out and out fury and rage and such attempts are rejected via endless moving of goalposts with the goal of making it never possible to prove the hypothesis. If you want to see an example of this in Linguistics, look at the debate around Altaic. They have set it up so that no matter how much existing evidence we are able to gather for the theory, we will probably never be able to prove it as barriers to proof have been set up to make the question nearly unprovable.

It’s rather senseless to set up Great Wall of China-like barriers to proof in science because at some point, you are hardly proving anything new, apparently because you don’t want to.

Fringe science is one of the most hated branches of science and many scientists refer to it as pseudoscience. Practitioners of fringe science have a very difficult time as the Scientific Establishment often persecutes them, for instance trying to get them fired from professorships. Yet this Establishment is historically illiterate because many of the most stunning findings in history were made by widely ridiculed fringe scientists.

The commenter below rejects my theory that increased civilization itself results in language simplification, as it gets more important to get your point across as quickly as possible with increasing complexity and development of society. Instead he says civilization leads to increased contact between speakers of different dialects or language, and in such cases, language must be simplified, often dramatically, in order for any decent communication to occur. Hence increased contact, not civilization in and of itself, is the driver of simplification.

I like this theory, and I think he may be onto something.

To me the simplification of languages of more ‘civilized’ people is mostly a product of language contact rather than of civilization itself. If the need arises to communicate with foreign people all of the time, for example in trade, then the language must become more simple in order to be able to be understood by more people.

Also population size matters a lot. It has been found that the greater the number of speakers, the greater the rate of language change. For example Polynesian languages, although having been isolated centuries or even millennia ago, still have only minor differences from one another.

In the case of many speakers, not all will be able to learn all the rules of a language, so they will tend to use the most common ones. And if the language is split in many dialects, then speakers of each dialect must find a compromise in order to communicate, which might come out as simple. If we add sociolects, specific registers for some occasions, sacred registers, slang etc, something that will arise in a big and stratified civilization, then the linguistic barriers people will need to overcome become greater. So it is just normal that after some centuries, this system to simplify.

We don’t need to look farther than Europe. Most languages of the western half being spoken in countries with strong trade links to one another and with much of the world later in history are quite analytic, but the languages of the more isolated eastern part are still like the older Indo-European languages. Basques, living in a small isolated pocket in the Iberian Peninsula, have kept a very complex language. Icelanders, also due to isolation, have kept a quite conservative Germanic language, whereas most modern Germanic languages are ridiculously simplified. No one can argue in his sane mind that Icelanders are primitives.

On the other hand, Romanian, being spoken in the more isolated Balkans, has retained more of the complex morphology of Latin compared to West Romance languages. And of course advance of civilization won’t automatically simplify the language, as Turkish and Russian, both quite complicated languages compared to the average European tongue, don’t seem to give up their complexity nowadays.

On the other hand, indigenous people were living in a much more isolated setting compared to the modern world, the number of speakers was comparatively low, and there was no need to change. Also, neighboring tribes were often hostile to one another, so each tribal group sought to make itself look special. That is the reason why places with much inter-tribal warfare like New Guinea have so many languages which are so different from one another. When these languages need to communicate, we get ridiculously simple contact languages like Hiri Motu.
So language simplification is more a result of language contact rather than civilization itself.

My Internet enemies (you know who you are) love to rip me to pieces over this stuff, but I suspect that is because they operate under the cover of anonymity plus the general loud-mouthed jerk “troll culture” of the Internet combines to provides a Linguisticus Sociopathicus that is seldom found in the hallowed halls of reserved academe.

The funny this is, if this Chinese work is so horrible, why has it earned praise from some of the world’s top Sinologists, who in fact actually assisted me with the project? Perhaps they should answer that. If I “know less about Linguistics than a Linguistics 10 student” then why do I sit on the review board of a peer-reviewed linguistics academic journal? Why did an 80 page paper of mine that will soon be published in a book make through two peer reviews and a dozen editors, including some of the world’s top Turkologists?

The funny thing is that I get along pretty well with other linguists outside of the Internet. We work together calmly, chat about this, that and the other, share papers and gather information from each other, all the things that academics do. I even get addressed as Dear Colleague. And then on the Internet, suddenly I’m so stupid I don’t know what a verb is. Whatever.

Anyway, a huge project of mine,A Reworking of Chinese Language Classification, has received a massive update. It underwent a ton of fixes, a lot of dead links were removed, and many matters were cleared up or explained better. Also the language count jumped by 200 from ~360 to 573. Now some of these may not be full languages and I may be exaggerating but I believe that using the 90% intelligibility criterion, there are a good 2,000 separate languages within Sinitic alone.

We simply cannot carve them out because the Chinese government will go crazy, and no Sinologist wants to make the Chinese government mad. The Chinese government lies and says there is one Chinese language with 3,000+ dialects in it, including such massive lects as Cantonese, Hakka, Min, Hui, Wu, Peng, Gan and Ji? Not to mention that Mandarin itself is of course not a single language but is actually a collection of scores or more languages inside of itself.

The project involves a brief description in English of the Chinese lects, stating such things as names, where they are spoken, the number of speakers, classification, degree of endangerment, linguistic history and development, classification issues, mutual intelligibility issues, dialects within, membership in language groups, the language/dialect question, anthropological history, sociolinguistic issues historical and modern, future trends, controversies, and sometimes more arcane linguistic data.

I am not trying to brag here and I am not real familiar with the literature, but my account of Chinese dialects is the most thorough such account I have ever run across so far in English. Now there may be better publications out there, but I am not aware of them. Further, most do not seem to have tackled the dialect vs. language problem.

Almost all of the good material on this stuff is in Chinese, and I do not read Chinese, so this caused massive problems, but I seem to be able to deal with them ok, as a lot of the research that I referenced was in Chinese and I am able to sort of make my way through it to get the gist of it despite the language barrier. I have also come up with a few native speaker informants who have given me excellent information on their particular lects. For instance, I recently ran into a speaker of something called Cambodian Teochew (I had no idea such a thing existed) who told me that the four SE Asian Teochew lects, Malay Teochew, Thai Teochew, Cambodian Teochew and Vietnamese Teochew, were not mutually intelligible. That is, there are four separate languages within Overseas Teochew alone! Unbelievable.

Results. A ratings system was designed in terms of how difficult it would be for an English-language speaker to learn the language. In the case of English, English was judged according to how hard it would be for a non-English speaker to learn the language. Speaking, reading and writing were all considered.

This post will look at the Mandarin language in terms of how difficult it would be for an English speaker to learn it.

Sino-Tibetan
Sinitic
Chinese
Mandarin

It’s fairly easy to learn to speak Mandarin at a basic level, though the tones can be tough. This is because the grammar is very simple – short words, no case, gender, verb inflections or tense. But with Japanese, you can keep learning, and with Chinese, you often tend to hit a wall, often because the syntactic structure is so strangely different from English (isolating).

Actually, the grammar is harder than it seems. At first it seems simple, like a simplified English. No word is capable of declension, and there is no tense, case, and number, nor are there articles. But the simplicity makes it difficult. No tense means there is no easy way to mark time in a sentence. Furthermore, tense is not as easy as it seems. Sure, there are no verb conjugations, but instead you must learn some particles and special word orders that are used to mark tense. Mandarin has 12 different adverbs for which there is no good English translation.

Once you start digging into Chinese, there is a complex layer under all the surface simplicity. There is such things as aspect, serial verbs, a complex classifier system, syntax marked by something called topic-prominence, a strange form called the detrimental passive, preposed relative clauses, use of verbs rather than adverbs to mark direction, and all sorts of strange things. Verb complements can be baffling, especially potential and directional complements. The 把, 是 and 的 constructions can be very hard to understand.

The topic-prominence is interesting in that only a few major languages have topic-comment syntax, and most of those are Oriental languages with a lot of Chinese borrowing. Topicalization is not marked morphologically.

There are sentences where the entire meaning changes with the addition of a single character. Chinese sentences are SVO (Subject -Verb – Object) at their base, but that is a bit of an illusion. A sentence that causes you to discuss time duration makes you repeat the verb after the direct object – SVOVT (T= time phrase). In the case of topicalization, sentences can have the structure of OSV (Object – Subject – Verb). Relative clauses and all subordinate clauses come before the noun they modify. In other words:

English: The man who always wore red walked into the room.
Chinese: Who always wore red the man walked into the room.

The relative clause in the sentences above is marked in bold.

In Chinese, the prepositional phrase comes between the subject and the verb:

English: The man hit the ball into the yard.
Chinese: The man into the yard hit the ball.

The prepositional phrase is bolded in the sentences above.

In Chinese, adjectives are actually stative verbs as in Nahuatl and Lakota.

那个热的菜很好吃。 Nàgè rède cài hěnhǎochī. The it is hot food is good to eat. The hot food is delicious.

The 的 symbol turns food hot into food it is hot, an attributive verb. 的 means something like to be.

There are dozens of words called particles which shade the meaning of a sentence ever so slightly.

Chinese phonology is not as easy as some say. There are way too many instances of the zh, ch, sh, j, q, and x sounds in the language such that many of the words seem to sound the same to an. There is a distinction between aspirated and nonaspirated consonants. There is also the presence of odd retroflex consonants. All of these will present problems to an English speaker.

Chinese orthography is probably the most hardest orthography of any language. The alphabet uses symbols, so it’s not even a real alphabet. There are at least 85,000 symbols and actually many more, but you only need to know about 3-5,000 of them, and many Chinese don’t even know 1,000. To be highly proficient in Chinese, you need to know 10,000 characters, and probably less than 5% of Chinese know that many.

In addition, the characters have not been changed in 3,000 years, and the alphabet is at least somewhat phonetic, so we run into a serious problem of lack of a spelling reform.

The Communists tried to simplify the system (Simplified Mandarin) but instead of making the connections between the phonetic aspects of character more sensible by decreasing their number and increasing their regularity (they did do this somewhat but not enough), they simply decreased the number of strokes needed for each symbol typically without dealing with the phonetic aspect of all. The simplification did not work well, so now you have a mixture of two different types of written Chinese – Simplified and Traditional.

In addition to all of this, Chinese borrowed a lot from the Japanese symbolic alphabet a full 1,000 years after it had already been developed and had not undergone a spelling reform, adding insult to injury.

Even leaving the characters aside, the stylistic and literary constraints required to write Chinese in an eloquent or formal (literary) manner would make your head swim. And just because you can read Chinese does not mean that you can read Classical Chinese prose. It’s as if it’s written in a different language – actually, it is technically a different language similar to Middle English or Old English. However, few Middle English or Old English texts are read anymore, and Classical Chinese is still widely read.

However, the orthography is at least consistent. 90% of characters have only one reading. Once you learn the character, you generally know the meaning in any context.

Writing the characters is even harder than reading them. One wrong dot or wrong line either completely changes the meaning or turns the symbol into nonsense.

It’s a real problem when you encounter a symbol you don’t know because there is no often way to sound out the word. You are really and truly lost and screwed. There is a clue at the right side of the symbol, but it is not always accurate.You need to learn quite a bit of vocabulary just to speak simple sentences.

Similarly, a dictionary is not necessarily helpful when trying to read Chinese. You can have a Chinese sentence in front of you along with a dictionary, and the sentence still might not make sense even after looking it up in the dictionary.

Some Chinese Muslims write Chinese using an Arabic script. This is often considered to be one of the worst orthographies of all.

The tones are often quite difficult for a Westerner to pick up. If you mess up the tones, you have said a completely different word. Often foreigners who know their tones well nevertheless do not say them correctly, and hence, they say one word when they mean another. However, compared to other tone systems around the world, the tonal system in Chinese is comparatively easy.

A major problem with Chinese is homonyms. To some extent, this is true in many tonal languages. Since Chinese uses short words and is disyllabic, there is a limited repertoire of sounds that can be used. At a certain point, all of the sounds are used up, and you are into the realm of homophones.

Tonal distinctions are one way that monosyllabic and disyllabic languages attempt to deal with the homophone problem, but it’s not good enough, since Chinese still has many homophones even with the four tones, and meaning is often discerned by context, stress, rhythm and intonation. Chinese, like French and English, is heavily idiomatic.

It’s little known, but Chinese also uses different forms (classifiers) to count different things, like Japanese.

There is zero common vocabulary between English and Chinese, so you need to learn a whole new set of lexical forms.

In addition, nouns often show relatedness or hierarchy. For instance, in English, you can simply say my brother or my sister, but in Chinese, you cannot do this. You have to indicate whether you are speaking of an older or younger sibling.

On the positive side, Chinese grammar is fairly regular, and word derivation, compound words are sensible and the meaning can be determined by looking at the word. In other languages, compound words are not necessarily so obvious.

Many agree that Chinese is the hardest to learn of all of the major languages. A recent survey of language professors rated Chinese as the hardest language on Earth to learn.

Hello, I recently talked to a Westerner who is learning Min Nan, which is a Sinitic language often called a dialect of Chinese. He already speaks Mandarin, but he told me Min Nan if vastly harder than Mandarin. At age 35, he was studying it 2 hours a day, and at some point, he hit a wall, and he didn’t seem to be making any progress. He kept adding more study hours to the day – four hours, six hours – with little effect. Finally when he was studying it for eight hours a day, he started making some good progress. I believe he said contour tones and tone sandhi were the major roadblocks.

Min Nan speakers say that even Cantonese is easier than Min Nan, and Cantonese is deadly hard. They also say that Min Nan tones are so hard that no one who did not learn Min Nan growing up gets anywhere near native fluency.

Cantonese is a similar language that is very difficult. It is much harder than Mandarin, and many native Mandarin speakers say they tried to learn Cantonese and gave up on it because it was too hard. Cantonese has 9 tones. The general consensus among Chinese is that Cantonese is much harder to learn than Mandarin.

Basque is said to be very hard to learn unless you grow up with it. There is a joke that the Devil spent seven years trying to learn Basque, and he only learned how to say Hello and Goodbye.

Navajo would also be murderously hard. Even Navajo children struggle quite a bit learning Navajo. When they show up at school at age 5-6, they are still struggling with Navajo. There are reports that Navajo children don’t seem to get Navajo well until maybe age 12.

Korean is a surprise, but apparently it is very hard to learn well. A native Korean speaker told me that Korean is so hard that no Korean speaker ever speaks it with 100% accuracy, and everyone makes errors.

As another respondent pointed out, Japanese is also quite notorious, and most Westerners get nowhere near native fluency.

Czech is also hard. Even most Czech speakers never get Czech all the way. They have TV contests in Czechoslovakia where they try to stump native speakers with hard forms in the language. If you can last 30 minutes without making even one error, you win. I think only two men have been able to do it, but one was a non-native speaker! Czech also has a strange r sound found only in one other language on Earth. It is said that no native speaker ever gets this phoneme quite right.

Piraja is also very hard as another respondent pointed out. Only two non-natives have ever been able to speak Piraha with any fluency. When Daniel Everett went to study the language, he found a number of reports from priests who had tried to learn Piraha since the early 1800’s, and only one had succeeded. The others tried to learn but gave up because they said it was too hard.

Tsez, spoken in the Caucasus, is also murderously hard. Every verb can have tens of thousands of possible forms. Reports say that even native speakers make regular errors when speaking Tsez.

I think Lindsay is right in using mutual intelligibility as the criterion for determining what’s a language. I also think that intelligibility can be real tough to measure, and that something should be said for the kind of situation where mutual unintelligibility is only temporary, i.e. where a week of exposure has the speakers off and running.

As Campbell puts it, “But the question remains, does one actually have to specifically pick out and learn new phrases on their way to learning or can you pick them up in passing assuming to understand?”

So languages A and B are mutually unintelligible, but speakers become able to understand each other after a week of steady contact. Languages C and D are mutually unintelligible, and speakers still can’t understand each other after months of steady contact, unless they learn each other’s language or use a third language. Do we treat both situations the same and call them different languages? I think that’s worth thinking about.

Campbell brings up another valid point: attitudes influence intelligibility. Part of this is raw, conscious effort. Part of this is psychological and pretty much subconscious.

Another point that nobody has brought up yet is topic dependency. Mutual intelligibility usually varies depending on what the speakers are trying to talk about. A “deep” Taiwanese Hokkien speaker and a “deep” Medan (Sumatra) Hokkien speaker could probably understand each other reasonably well across a wide range of household and agricultural topics, but if it came to fixing a car or a motorbike, they’d be speaking different languages, in effect.

The task of quantifying intelligibility gets harder if we wanna pin this down. Maybe a “basket of topics” concept could be advanced, kind of like the “basket of goods and services” concept used to measure inflation.

There’s a video on Youtube where two Siam Thai speakers go up into central Guangxi and try to communicate w/ Zhuang speakers speaking only Siam Thai. First it doesn’t work, then it starts working. They realize that it only works when the topic is one that’s heavy on shared vocabulary.

Based on intelligibility criteria, how many languages is Hokkien (what Lindsay calls “Xiamen”)? A lot of Penang Hokkien would go over a Taiwanese Hokkien speaker’s head at first exposure, just b/c of intrinsic linguistic differences. Typically, there would also be a lack of effort on the part of the Taiwanese speaker to understand a non-Taiwanese form of Hokkien.

Even beyond this, psychologically, both sides (but esp. the Taiwanese) have a hard time acknowledging an unfamiliar form of their familiar Hokkien tongue. Due to subconscious psychological reasons and a lack of effort, they may honestly not be able to understand each other (assuming the Penang speaker is one of the few with no Taiwanese Hokkien media intake). The shared vocabulary, collocations, idioms, etc., though, are definitely enough for them to understand each other w/ just an attitude adjustment.

Yet, I don’t think the shared vocabulary and grammar are “good enough” to establish that PngHk and TWHk are dialects of the same language. How do we really know? What strikes me as being much better evidence is having witnessed TWHk and PngHk speakers communicating effectively in their respective dialects w/o having to resort to another language – even though such encounters have typically resulted in a quick switch to Mandarin as of the last 10 or 15 years or so.

Intelligibility is tricky to quantify, no doubt; but lexical and syntactic similarity have got to be even trickier to measure in any meaningful way.

I have to take exception with a couple of Campbell’s minor points. They sound suspiciously like the stuff you read in papers by some (not all) Chinese scholars.

Campbell says, “Fangyan we have determined as topolect, but as used many centuries ago could also refer to any language of a different region. Today it has a specific use and currently applies to a “county”, notwithstanding the fangyan of neighboring counties may be the exact same thing.”

I don’t know what Campbell means by “today it has a specific use”. It’s not only common for laypeople to use “fangyan” to refer to the speech of a province or any other region, it’s also pretty common for scholars to spit out collocations like “Yue (~ Cantonese) fangyan”, never mind that “Yue” is a group of languages spoken across two provinces of China and taking in at the very, very least three mutually unintelligible languages.

Campbell also says, “It reminds me of Sinoxenic borrowings of Chinese words into neighboring Korean, Japanese, and Vietnamese which all now have approximately 60% of their core lexicon borrowed from Chinese. But these languages belong to other families and developed separately…”

This is kind of begging the question. What if the North Chinese political grip on Vietnam was somehow renewed? Sure enough, Vietnamese would continue to absorb “Chinese” elements deeper and deeper into its lexicon and structures, to the point where a linguist from the “modern” linguistics tradition would say it was a Chinese language.

And indeed the evidence seems to reveal that this is exactly how Hokkien, Teochew, Hailamese, Wenzhou, Hoisan (Taishan), etc. “became” Chinese languages. The best paper I’ve seen on this was by a Chinese scholar named Pan Wuyun (潘悟云). What’s Sinoxenic? Who was neighboring what? What’s core lexicon? Who developed separate and who developed together, and where and when? These are unresolved questions, not the open-and-shut case that most linguists in the field (even many non-Chinese) seem to think it is.

Campbell is probably right in saying, “Hua is usually tacked on to a place name. The “speech” of a particular place as long as there are no others competing (for example Nanning in Guangxi has several languages).” I would add that competing languages w/i counties is the rule rather than the exception throughout tropical and coastal subtropical China.

The tendency in each area (not necessarily just one county) with competing languages is for each language to go by a two or three syllable nickname where the last syllable is usually 話 (hua in Mandarin). Cantonese (but not the Hoisan type) is usually known as 白 hua. Hokciu (a.k.a. Foochow) is known locally as 平 hua (exact same name as Tuhua). In the Leizhou area, 海 hua and 黎 hua are two distinct “Min” varieties, reportedly mutually intelligible only w/ each other or at most also w/ some type of Hailamese / Hainanese Min.

Speaking of which, a primer on Hailamese was published about a century ago in Singapore. The author (de Souza) explains in the introduction which dialect of Hailamese the book is based on, and says that dialects of Hailamese from the other side of the island are “perfectly impossible to understand”. So there may actually be more than one language w/i just Hailamese Min.

Finally, about the Chinese scholars falling down on the job. I would say that, first of all, they generally don’t think this is their job. To them, “Chinese” is basically “assumed” to be one language. U could just call that shoddy academics. Secondly, though, some Chinese scholars are doing a pretty good job, such as Pan Wuyun.

In the Anglo tradition, a guy like Pan Wuyun would come out at some point with a “come-on-and-own-up, most-of-all-y’all-is-wrong” paper. But unfortunately that kind of thing is really rare in China. And so it’s left to foreign scholars or guys like Lindsay or myself to say this, w/ the disclaimer (at least in my case) that there are many individual decent scholars in China too.

The truth is that among most linguists, mutual intelligibility is not a controversial topic. There are a few loudmouths who scream that it cannot be measured, but to most of us linguists it is a ho-hum subject, not the source of a lot of screaming and yelling. Most of the tumult comes from outside the field, amateurs or simply ignorant people who are not linguists. They usually bring up all sorts of arguments, but in the field, we do not worry much about any of these rejoinders.

Often we will do more than one study. If the results are different, we just average them together and to get a mean.

Surely attitude matters, but if you test enough people, all of that levels out. You have some that really want to understand the other language and others who just give up easily. You average them all together and get a mean for the population.

There are not many languages that can be learned after only a week of contact. And if there were, we would not say they were mutually unintelligible. Even very closely related languages like Azeri and Turkish take about 3-4 weeks of close contact before they are communicating pretty well.

I have an informant in China in Hubei Province. She said about every third city over was a new Mandarin language, and you could learn the new language after about 3 weeks of close contact.

In Africa, they have a concept called 1 day languages and 2 day languages because that is how long it takes to learn them. These would not be considered languages because they are too easily learned.

As an example, I have heard Latin Americans say that when they fly into El Salvador in the morning, they don’t understand all of what the Salvadorans around them are saying, and the Salvadorans do not understand everything they are saying. However, by the end of the day, everyone is drinking and slapping each other on the back and they all understand each other.

So Salvadoran Spanish could be considered a 1 day language. Salvadoran Spanish is a dialect of the Spanish language, not a separate language.

About topic dependency: we usually test for mutual intelligibility by playing a relatively neutral recording of someone speaking in the language. I suppose you could use a video too. You cannot use two people trying to talk to each other because then you have all of this extralinguistic coaching going on that interferes with the result and makes it higher than it is.

Due to subconscious psychological reasons and a lack of effort, they may honestly not be able to understand each other (assuming the Penang speaker is one of the few with no Taiwanese Hokkien media intake). The shared vocabulary, collocations, idioms, etc., though, are definitely enough for them to understand each other w/ just an attitude adjustment.

This has been brought up by a well-known linguist as a complaint to me against using native speaker knowledge as a criterion for mutual intelligibility. He told me we could not rely on native speakers to tell us how much they understand of another language because, well, native speakers lie. Instead we could only rely in the knowledge of linguists.

He gave the example of two groups that understand each other very well but hate each other so much that say they can’t understand the speech of the other people even though they can. In other words, they lie. Realistically, I have been studying mutual intelligibility for a long time now (in fact, I am a bit of an expert in it) and I have yet to come across this situation. This really is just a red herring.

Yet, I don’t think the shared vocabulary and grammar are “good enough” to establish that PngHk and TWHk are dialects of the same language. How do we really know? What strikes me as being much better evidence is having witnessed TWHk and PngHk speakers communicating effectively in their respective dialects w/o having to resort to another language – even though such encounters have typically resulted in a quick switch to Mandarin as of the last 10 or 15 years or so.

That doesn’t really count. You might be looking at an intelligibility situation of 80-85% between those Hokkien lects. Also we do not look at two speakers negotiating a conversation because that throws in new variables.

For inherent intelligibility, we want someone listening to a recording or watching a video. Quite a few speakers of very closely related languages (and some not so closely related) can negotiate the sort of conversation described above. Yet the fact that they both revert to Mandarin instead of carrying on in different Hokkien forms implies we are dealing with two separate languages here. They abandoned their own tongues and switched to common Mandarin presumably because there are too many misunderstandings when they use their Hokkien varieties.

Intelligibility is tricky to quantify, no doubt; but lexical and syntactic similarity have got to be even trickier to measure in any meaningful way.

Not really, we have many measures of lexical similarity and we use them all the time. We also measure syntactic and morphological differences – variations in grammar. A lot of linguists decide that two tongues are different languages simply based on the fact that they are too far apart – structurally separate languages.

If you think this website is valuable to you, please consider a contribution to support the continuation of the site. Donations are the only thing that keep the site operating.

This post will look at how hard it is to learn Chinese for an English speaker.

It’s fairly easy to learn to speak Mandarin at a basic level, though the tones can be tough. This is because the grammar is very simple – short words, no case, gender, verb inflections or tense. But with Japanese, you can keep learning, and with Chinese, you hit a wall, often because the isolating syntactic structure is so strangely different from English.

Actually, the grammar is harder than it seems. At first it seems simple, like a simplified English with no tense or articles. But the simplicity makes it difficult. No tense means there is no easy way to mark time in a sentence. Furthermore, tense is not as easy as it seems. Sure, there are no verb conjugations, but instead you must learn some particles and special word orders that are used to mark tense.

Once you start digging into Chinese, there is a complex layer under all the surface simplicity. There are serial verbs, a complex classifier system, syntax marked by something called topic-prominence, preposed relative clauses, use of verbs rather than adverbs to mark direction, and all sorts of strange stuff. Verb complements can be baffling, especially potential and directional complements. The 了 character can have seemingly countless meanings. You also need to learn quite a bit of vocabulary just to speak simple sentences.

Chinese phonology is not as easy as some say. There are too many instances of the zh, ch, sh, j, q, and x sounds in the language such that many of the words seem to sound the same. There is a distinction between aspirated and nonaspirated consonants which does not exist in English.

Chinese orthography is probably the hardest orthography of any language. The alphabet uses symbols, so it’s not even a real alphabet. There are at least 85,000 symbols and actually many more (although this is controversial), but you only need to know about 4-6,000 of them, and many Chinese don’t even know 1,000. To be highly proficient in Chinese, you need to know 10,000 characters, and probably less than 5% of Chinese know that many.

The Communists tried to simplify the system (simplified Mandarin), but they simply decreased the number of strokes needed for each symbol. The Communists’ spelling reform left much to be desired.

To make matters worse, there are different ways to write each symbol – different styles of Chinese calligraphy. For instance, Classical Chinese may be written in so called “grass-style” calligraphy or in another style altogether.

It’s a real problem when you encounter a symbol you don’t know because there is often no good way to sound out the word as the system simply is not very phonetic. The Chinese alphabet is probably only 25% phonetic, and many frequently-used characters give tell you nothing about how to pronounce them. Further, you need to learn at least 300 characters before you can start to use the meager phonetics of the writing system at all.

Furthermore, word boundaries are not obvious, as one character does not necessarily equal one word. Therefore it is hard to tell where one word starts and stops and another one begins.

Similarly, a dictionary is not necessarily helpful when trying to read Chinese. You can have a Chinese sentence in front of you along with a dictionary, and the sentence still might not make sense even after looking it up in the dictionary.

Furthermore, merely learning how to look up words in the dictionary in the first place takes new Chinese learners several months and learning how to use a dictionary well is typically not possible until a year of study. Even people who have studied for several years sometimes encounter characters that they simply cannot find in the dictionary. In China, dictionary look-up contests are often held, showing that the process is not transparent at all.

A good student of Chinese often has more than one dictionary, and some have up to 20 different dictionaries. There are separate dictionaries for simplified and traditional characters and dictionaries that have both. There are entire dictionaries just for Classical Chinese particles and others for four character idioms (chéngyǔ), a type of allegorical sayings with two parts (xiēhòuyǔ), and another for proverbs (yànyǔ). There are separate dictionaries for terms that entered Chinese during the Chinese era and others for specifically Buddhist terms. There is an easier way to use a Chinese dictionary called four-part look-up, but it takes a long time to learn it and most learners never master it for whatever reason.

To solve all of these problems with the ideographic writing system, numerous romanization schemes have been invented. At last count, there were a dozen or so of them, but a number of those are rarely used. Certainly, there are 2-3 heavily used ones and that is not counting the bomofu phonetic alphabet used in Taiwan. One of the main problems with these romanization systems is that none of them are very good and they all have serious limitations. Furthermore, the romanization system you studied as a Chinese learner tends to affect your accent in Chinese.

Writing the characters is even harder than reading them. One wrong dot or wrong line either completely changes the meaning or turns the symbol into nonsense. The writing system is often so opaque that even native speakers forget how to write the characters of eve commonly used words.

Even leaving the characters aside, the stylistic and literary constraints required to write Chinese in an eloquent or formal (literary) manner would make your head swim. And just because you can read Chinese does not mean that you can read Classical Chinese (wenyanwen) prose. It’s actually written in a different language, so to learn to read Chinese properly like an educated Chinese person does, you will have to learn not one language but two.

One rejoinder is that Classical Chinese to Chinese people is similar to Greek and Latin to an English speaker, but this is a bad analogy, as Classical Chinese is widely studied in Chinese secondary schools and some of the finest Chinese prose is written in this language (see the Confucius and Mencius examples below). Further, after studying French for a few years, you should be able to read French authors who wrote 300 years ago, but after a similar period of studying Chinese, you will not be able to read Confucius or Mencius.

Hence most educated Chinese would be expected to know something about Classical Chinese, and if you wanted to learn Chinese like an educated Chinese speaker, you would have to learn this other language also.

In addition, you need to learn Classical Chinese even if you do not aspire to be an educated Chinese speaker because one encounters Classical Chinese often in modern Chinese society, often in paintings or character scrolls.

The tones are often quite difficult for a Westerner to pick up. If you mess up the tones, you have said a completely different word. Often foreigners who know their tones well nevertheless do not say them correctly, and hence, they say one word when they mean another.

One problem with the tone system is that when you want to change the meaning of a sentence in a subtle manner via changing intonation of a word, you are bound to change the tone of the word in Chinese. Merely by placing semantic emphasis on a single word, you may deliver a gibberish sentence. Chinese speakers have their own way of using tone as a way of generating subtle semantic meaning, but they do so in an entirely different way than speakers of non-tonal languages do.

However, compared to other tone systems around the world, the tonal system in Chinese is comparatively easy.

A major problem with Chinese is homonyms. To some extent, this is true in many tonal languages. Since Chinese uses short words and is disyllabic, there is a limited repertoire of sounds that can be used. At a certain point, all of the sounds are used up, and you are into the realm of homophones.

Tonal distinctions are one way that monosyllabic and disyllabic languages attempt to deal with the homophone problem, but it’s not good enough, since Chinese still has many homophones even with the tones, and in that case, meaning is often discerned by context, stress, rhythm and intonation.

It’s little known, but Chinese also uses different forms to count different things, like Japanese.

There is zero common vocabulary between English and Chinese, so you need to learn a whole new set of lexical forms and have no cognates to fall back on.

In addition, nouns often show relatedness or hierarchy. For instance, in English, you can simply say my brother or my sister, but in Chinese, you cannot do this. You have to indicate whether you are speaking of an older or younger sibling.

Many agree that Chinese is the hardest to learn of all of the major languages. In a recent international survey of language professors worldwide, these teachers rated Chinese as the hardest language to learn among languages that are commonly studied.

Mandarin gets a 5 rating for extremely hard.

However, Cantonese is even harder to learn than Mandarin. Cantonese has nine tones to Mandarin’s four, and in addition, they continue to use a lot of the older traditional Chinese characters that were superseded when China moved to a simplified script in 1949. Furthermore, since non-Mandarin characters are not standardized, Cantonese cannot be written down as it is spoken.

In addition, Cantonese has verbal aspect, possibly up to 20 different varieties. Modal particles are difficult in Cantonese. Clusters of up to the 3 sentence final particles are very common. 我食咗飯 and 我食咗飯架啦喎 are both grammatical for I have had a meal, but the particles add the meaning of I have already had a meal or answering a question or even to imply I have had a meal, so I don’t need to eat anymore.

Cantonese gets a 5.5 rating, close to hardest of all.

Min Nan is also said to be harder to learn than Mandarin, as it has a more complex tone system, with five tones on three different levels. Even many Taiwanese natives don’t seem to get it right these days, as it is falling out of favor and many fewer children are being raised speaking it than before.

Min Nan gets a 5.5 rating, close to hardest of all.

A recent 15 year survey out of Fudan University utilizing both the departments of Linguistics and Anthropology looked at 579 different languages in order to try to find the most complicated language in the world. The result was that a Wu language dialect (or perhaps a separate language) in the Fengxian district of Shanghai (Fengxian Wu) was the most complex language of all, with 20 separate vowels. The nearest competitor was Norwegian with 16 vowels.

Fengxian Wu gets a 5.5 rating, close to hardest of all.

Classical Chinese is still read by many Chinese people and Chinese language learners. Unless you have a very good grasp on modern Chinese, classical Chinese will be completely wasted on you. Classical Chinese is much harder to read than reading modern Chinese.

Classical Chinese covers an era extending over 3,000 years, and to attain a reading fluency in this language, you need to be familiar with all of the characters used during this period along with all of the literature of the period so you can understand all the allusions. Even with a knowledge of Classical Chinese, you need to read it in context. If you are good at Classical Chinese and someone throws you a random section of it, it will take you a good amount of time to figure it out unless you know context.

The language is much more to the point than Modern Chinese, but this is not as good as it sounds. This simplicity leaves a room for ambiguity and context plays an important role. A joke about some obscure historical or literary anecdote will be lost you unless you know what it refers to. For reading modern Chinese, you will need at least 5,000 characters, but even then, you will still need a dictionary. With Classical Chinese, there are no lower limits on the number of characters you need to know. The sky is the limit.

Caution: This post is very long. It runs to 200 pages on the Net. Updated January 17, 2016.

This is a continuation of the earlier post. I split it up into two parts because it had gotten too long.

The post refers to which languages are the hardest for English speakers to learn, though to some extent, the ratings are applicable across languages. Most Chinese speakers would recognize Spanish as being an easy language, despite its alien nature. And even most Chinese, Navajo, Poles or Czechs acknowledge that their languages are hard to learn. To a certain extent, difficulty is independent of linguistic starting point. Some languages are just harder than others, and that’s all there is to it.

Northeast Caucasian, Northwest Caucasian and Kartvelian

Of course the Caucasian languages like Tsez, Tabasaran, Georgian, Chechen, Ingush, Abkhaz and Circassian are some of the hardest languages on Earth to learn.

Chechen and Circassian are rated 6, hardest of all.

Northeast Caucasian

NE Caucasian languages have the uvulars and ejectives of Georgian in addition to pharyngeals, lateral fricatives, and other strangeness. They have noun classes like the Bantu languages (but usually fewer). Nevertheless, they have noun class agreement markers on verbs on adjectives. One thing NE Caucasian has is lots of case. Some languages have 40+ cases. They are built from the ground up via two forms – one a spatial form such as in, on or around and the other a directional motion form such as to, from, through or at.

Tsezic

Tsez has 64-126 different cases, making it by far the most complex case system on Earth! It is one of the few languages on Earth that has two genitive cases – Genitive 1 (-s) and Genitive 2 (-z). Genitive 1 is used when the genitive’s head noun is in absolutive case and Genitive 2 is used when the genitive’s head noun is in any other case. It also has four noun classes. It is said that even native speakers have a hard time picking up the correct inflection to use sometimes.

In Tsez, you need to know a lot Tsez grammar to communicate at a basic level. The sentence:

English: I like your mother.

Tsez: Дāьр деби энийу йетих. (Dǟr debi eniyu yetix.)

In order to speak that sentence in Tsez, you need to know:

• the words themselves (word order is not as important)
• that the verb -eti- requires the subject to be in the dative/lative case and the object to be in the absolutive
• the noun class for eniyu (class II)
• the dative/lative form of di (I), which is dǟr
• the genitive 1 form of mi (you), which is debi
• the congruence prefix y- that corresponds to the noun class of the absolutive argument of the phrase, in this case mother
• the present tense ending for vowel-final verbs -x

Tsez is rated 6, hardest of all.

Lezgic
Archi

Archi has an extremely complex phonology and one of the most complicated grammars on Earth. The extreme fusional aspects and the verbal morphology are what make the grammar so difficult. Every verb root has 1,502,839 possible forms! It is also an ergative language, but there is irregularity in its ergative system.

Some verbs take the typical ergative/absolutive case (absolutive for the subject of an intransitive very and ergative for the subject of a transitive verb – where the direct object would be in absolutive). In others the subject is in dative rather than the expected ergative/absolutive case. These are usually verbs of perception like love/want, hear, see, feel, and be bored. For instance, the verb:

-эти- = to love/want must have its subject in dative case instead of the expected absolutive or ergative case.

Among non-click languages, Archi has one of the largest consonant inventories, with only the extinct Ubykh having more. There are 26 vowels and between 76 and 82 consonants, depending on the analysis. Five of the six vowels can occur in five varieties: short, pharyngealized, high tone, long (with high tone), and pharyngealized with high tone.

It has many unusual phonemes, including contrasts between several voiceless velar lateral fricatives, voiceless and ejective velar lateral affricates and a voiced velar lateral fricative. The voiceless velar lateral fricative ʟ̝̊, the voiced velar lateral fricative ʟ̝, and the corresponding voiceless and ejective affricates k͡ʟ̝̊ and k͡ʟ̝̊ʼ are extremely unusual sounds, as velar fricatives are not typically laterals.

Samur
Eastern Samur
Lezgi–Aghul–Tabasaran

Nakh
Vainakh

Ingush has a very difficult phonology, an extremely complex grammar, and furthermore, is extremely irregular. Ingush also has a proximate/obviate distinction and is the only language in the region that has this feature. Ingush along with Chechen both have a closed class of verbs, an unusual feature in the world’s languages. New verbs are formed by adding a noun to the verb do:

shoot – do gun

Ingush is rated 6, hardest of all.

Kartvelian
Karto-Zan

One problem with Georgian is the strange alphabet: ქართულია ერთ ერთი რთული ენა. It also has lots of glottal stops that are hard for many foreigners to speak; consonant clusters can be huge – up to eight consonants stuck together (CCCCCCCCVC)- and many consonant sounds are strange. In addition, there are uvulars and ejectives. Georgian is one of the hardest languages on Earth to pronounce. It regularly makes it onto craziest phonologies lists.

Its grammar is exceedingly complex. Georgian is both highly agglutinative and highly irregular, which is the worst of two worlds. Other agglutinative languages such as Turkish and Finnish at least have the benefit of being highly regular. The verbs in particular seem nearly random with no pattern to them at all. The system of argument and tense marking on the verb is exceedingly complex, with tense, aspect, mood on the verb, person and number marking for the subject, and direct and indirect objects.

Although it is an ergative language, the ergative (or active-stative case marking as it is called) oddly enough is only used in the aorist and perfect tenses where the agent in the sentence receives a different case, while the aorist also masquerades as imperative. In the present, there is standard nominative-accusative marking. A single verb can have up to 12 different parts, similar to Polish, and there are six cases and six tenses.

Georgian also features something called polypersonal agreement, a highly complex type of morphological feature that is often associated with polysynthetic languages and to a lesser extent with ergativity.

In a polypersonal language, the verb has agreement morphemes attached to it dealing with one or more of the verbs arguments (usually up to four arguments). In a non polypersonal language like English, the verb either shows no agreement or agrees with only one of its arguments, usually the subject. Whereas in a polypersonal language, the verb agrees with one or more of the subject, the direct object, the indirect object, the beneficiary of the verb, etc. The polypersonal marking may be obligatory or optional.

In Georgian, the polypersonal morphemes appear as either suffixes or prefixes, depending on the verb class and the person, number, aspect and tense of the verb. The affixes also modify each other phonologically when they are next to each other. In the Georgian system, the polypersonal affixes convey subject, direct object, indirect object, genitive, locative and causative meanings.

g-mal-av-en = they hide you
g-i-mal-av-en = they hide it from you

mal (to hide) is the verb, and the other four forms are polypersonal affixes.

In the case below,

xelebi ga-m-i-tsiv-d-a = My hands got cold.

xelebi means hands. The m marker indicates genitive or my. With intransitive verbs, Georgian often omits my before the subject and instead puts the genitive onto the verb to indicate possession.

Georgian verbs of motion focus on deixis, whether the goal of the motion is towards the speaker or the hearer. You use a particle to signify who the motion is heading towards. If it heading towards neither of you, you use no deixis marker. You specify the path taken to reach the goal through the use or prefixes called preverbs, similar to “verbal case.” These come after the deixis marker:

up a-
out ga-
in sha-
down into cha-
across/through garda-
thither mi-
away c’a-
or down da-

Hence:

up towards me = amo-. The deixis marker is mo- and up is a-

On the plus side, Georgian has borrowed a great deal of Latinate foreign vocabulary, so that will help anyone coming from a Latinate or Latinate-heavy language background.

Georgian is rated 5, extremely difficult.

Northwest Caucasian

All NW Caucasian languages are characterized by a very small number of vowels (usually only two or three) combined with a vast consonant inventory, the largest consonant inventories on Earth. Almost any consonant can be plain, labialized or palatalized. This is apparently the result of an historical process whereby many vowels were lost and their various features became assigned to consonants. For instance, palatalized consonants may have come from Ci sequences and labialized consonants may have come from Cu sequences.

The grammars of these languages are complex. Unlike the NE Caucasian languages, they have simple noun systems, usually with only a handful of cases.

However, they have some of the complex verbal systems on Earth. These are some of the most synthetic languages in the Old World. Often the entire syntax of the sentence is contained within the verb. All verbs are marked with ergative, absolutive and direct object morphemes in addition to various applicative affixes.

These are akin to what some might call “verbal case.” For instance, in applicative voice systems, applicatives may take forms such as comitative, locative, instrumental, benefactive and malefactive. These roles are similar to the case system in nouns – even the names are the same. So you can see why some call this “verbal case.”

NW Caucasian verbs can be marked for aspect (whether something is momentous, continuous or habitual), mood (if something is certain, likely, desired, potential, or unreal). Other affixes can shape the verb in an adverbial sense, to express pity, excess or emphasis.

These are some of the strangest sounding languages on Earth. Of all of these languages, Abaza has the most consonants. Here is a video in the Abaza language.

Ubykh

Ubykh, a Caucasian language of Turkey, is now extinct, but there is one second language speaker, a linguist who is said to have taught himself the language. It has more consonants than any non-click language on Earth – 84 consonant sounds in all. Furthermore, the phonemic inventory allows some very strange consonant clusters.

Ubykh has many rare consonant sounds. tʷ is only also found in two of Ubykh’s relatives, Abkhaz and Abaza and in two other languages, both in the Brazilian Amazon. The pharyngealized labiodental voiced fricative vˁ does not exist in any other language. It often makes it onto weirdest phonologies lists. Ubykh also got a very high score on a study of the weirdest languages on Earth.

Combine that with only two vowel sounds and a highly complex grammar, and you have one tough language.

In addition, Ubykh is both agglutinative and polysynthetic, ergative, and has polypersonal agreement:

Aχʲazbatʂʾaʁawdətʷaajlafaqʾajtʾmadaχ!
If only you had not been able to make him take it all out from under me again for them…

There are an incredible 16 morphemes in that nine syllable word.

Ubykh has only four case systems on its nouns, but much case function has shifted over to the verb via preverbs and determinants. It is these preverbs and determinants that make Ubykh monstrously complex. The following are some of the directional preverbs:

above and touching

above and not touching

below and touching

below and not touching

at the side of

through a space

through solid matter

on a flat horizontal surface

on a non-horizontal or vertical surface

in a homogeneous mass

towards

in an upward direction

in a downward direction

into a tubular space

into an enclosed space

There are also some preverbal forms that indicate deixis:

j- = towards the speaker

Others can indicate ideas that would take up whole phrases in English:

jtɕʷʼaa- = on the Earth, in the Earth

ʁadja ajtɕʷʼaanaaɬqʼa
They buried his body. (Lit. They put his body in the earth.)

Number is marked on the verb via a verbal suffix and is only marked on the noun in the ergative case.

However, it does lack the convoluted case systems of the Caucasian languages next door and there is no grammatical gender.

Ubykh is rated 6, hardest of all.

Abkhaz-Abazin

Abkhaz is an extremely difficult language to learn. Each basic consonant has eight different positions of articulation in the mouth. Imagine how difficult that would be for an Abkhaz child with a speech impediment. Abkhaz seems to put agreement markers on just about everything in the language. Abkhaz makes it onto many craziest language lists, and it recently got a very high score on a weirdest language study.

Abkhaz is rated 6, hardest of all.

Burushaski

Burushaski is often thought to be a language isolate, related to no other languages, however, I think it is Dene-Caucasian. It is spoken in the Himalaya Mountains of far northern Pakistan in an area called the Hunza. It’s verb conjugation is complex, it has a lot of inflections, there are complicated ways of making sentences depending on many factors, and it is an ergative language, which is hard to learn for speakers of non-ergative languages. In addition, there are very few to no cognates for the vocabulary.

Burushaski is rated 6, hardest of all.

American Indian Languages

American Indian languages are also notoriously difficult, though few try to learn them in the US anyway. In the rest of the continent, they are still learned by millions in many different nations. You almost really need to learn these as a kid. It’s going to be quite hard for an adult to get full competence in them.

One problem with these languages is the multiplicity of verb forms. For instance, the standard paradigm for the overwhelming number of regular English verbs is a maximum of five forms:

steal
steals
stealing
stole
stolen

Many Amerindian languages have over 1,000 forms of each verb in the language.

Kootenai

Yet the Salishans (see below) always considered the neighboring language Kootenai to be too hard to learn. Kootenai also has a distinction between proximate/obviate along with direct/inverse alignment, probably from contact with Algonquian.

However, the Kootenai direct/inverse system is less complex than Algonquian’s, as it is present only in the 3rd person. Kootenai also has a very strange feature in that they have particles that look like subject pronouns, but these go outside of the full noun phrase. This is a very rare feature in the world’s languages. Kootenai scored very high on a weirdest language survey.

Kootenai is an isolate spoken in Idaho by 100 people.

Kootenai is rated 6, hardest of all.

Yuchi

Yuchi is a language isolate spoken in the Southern US. They were originally located in Eastern Tennessee and were part of the Creek Confederacy at one time. Yuchi is nearly extinct, with only five remaining speakers.

Yuchi has noun genders or classes based on three distinctions of position: standing, sitting or lying. All nouns are either standing, sitting or lying. Trees are standing, and rivers are lying, for instance. It it is taller than it is wide, it is standing. It if is wider than it is tall, it is lying.

If it is about as about as wide as it is tall, it is sitting. All nouns are one of these three genders, but you can change the gender for humorous or poetic effect. A linguist once asked a group of female speakers whether a penis was standing, sitting or lying. After lots of giggles, they said the default was sitting, but you could say it was standing or lying for poetic effect.

Also all Yuchi pronouns must make a distinction between age (older or younger than the speaker) and ethnicity (Yuchi or non-Yuchi).

Yuchi gets a 6 rating, hardest of all.

Dene-Yeniseian
Na-Dene
Athabascan-Eyak
Tlingit

Tlingit is probably one of the hardest, if not the hardest, language in the world. Tlingit is analyzed as partly synthetic, partly agglutinative, and sometimes polysynthetic. It has not only suffixes and prefixes, but it also has infixes, or affixes in the middle of words.

‘eech – to pick

All prefixes must be in proper order for the word to work.

tuyakaoonagadagaxayaeecheen.
I am usually picking, on purpose, a long object through the hole while standing on a table.

tuyakaoonagootxaya‘eecheen.
I am usually being forced to pick a long object through the hole while standing on a table.

tuyaoonagootxawa’eecheen.
I am usually being picking the edible long object through the hole while standing on a table.

Tlingit has a pretty unusual phonology. For one thing, it is the only language on Earth with no l. This despite the fact that it has five other laterals: dl (tɬ), tl (tɬʰ), tl’ (tɬʼ), l (ɬ) and l’ (ɬʼ). The tɬʼ and ɬʼ sounds are rare in the world’s languages. ɬʼ is only found in the wild NW Caucasian languages. It also has two labialized glottal consonants, ʔʷ and hw (hʷ).

Tlingit gets a 6 rating, hardest of all.

Athabascan
Southern

Navajo has long, short and nasal vowels, a tone system and a grammar totally unlike anything in Indo-European. A stem of only four letters or so can take enough affixes to fill a whole line of text.

Navajo is a polysynthetic language. In polysynthetic languages, very long words can denote an entire sentence, and it’s quite hard to take the word apart into its parts and figure out exactly what they mean and how they go together. The long words are created because polysynthetic languages have an amazing amount of morphological richness. They put many morpheme together to create a word out of what might be a sentence in a non-polysynthetic language.

Some Navajo dictionaries have thousands of entries of verbs only, with no nouns. Many adjectives have no direct translation into Navajo. Instead, verbs are used as adjectives. A verb has no particular form like in English – to walk. Instead, it assumes various forms depending on whether or not the action is completed, incomplete, in progress, repeated, habitual, one time only, instantaneous, or simply desired. These are called aspects. Navajo must have one of the most complex aspect systems of any language:

The Primary aspects:

Momentaneous – punctually (takes place at one point in time)
Continuative – an indefinite span of time & movement with a specified direction
Durative – over an indefinite span of time, non-locomotive uninterrupted continuum
Repetitive – a continuum of repeated acts or connected series of acts
Conclusive – like durative but in perfective terminates with static sequel
Semelfactive – a single act in a repetitive series of acts
Distributive – a distributive manipulation of objects or performance of actions
Diversative – a movement distributed among things (similar to distributive)
Reversative – results in directional change
Conative – an attempted action
Transitional – a shift from one state to another
Cursive – progression in a line through time/space (only progressive mode)

The subaspects:

Completive – an event/action simply takes place (similar to the aorist tense)
Terminative – a stopping of an action
Stative – sequentially durative and static
Inceptive – beginning of an action
Terminal – an inherently terminal action
Prolongative – an arrested beginning or ending of an action
Seriative – an interconnected series of successive separate & distinct acts
Inchoative – a focus on the beginning of a non-locomotion action
Reversionary – a return to a previous state/location
Semeliterative – a single repetition of an event/action

The tense system is almost as wild as the aspectual system.

For instance, the verb ndideesh means to pick up or to lift up. But it varies depending on what you are picking up:

ndideeshtiil – to pick up a slender stiff object (key, pole)
ndideeshleel – to pick up a slender flexible object (branch, rope)
ndideesh’aal – to pick up a roundish or bulky object (bottle, rock)
ndideeshgheel – to pick up a compact and heavy object (bundle, pack)
ndideeshjol – to pick up a non-compact or diffuse object (wool, hay)
ndideeshteel – to pick up something animate (child, dog)
ndideeshnil – to pick up a few small objects (a couple of berries, nuts)
ndideeshjih – to pick up a large number of small objects (a pile of berries, nuts)
ndideeshtsos – to pick up something flexible and flat (blanket, piece of paper)
ndideeshjil – to pick up something I carry on my back
ndideeshkaal – to pick up anything in a vessel
ndideeshtloh – to pick up mushy matter (mud).

But picking up is only one way of handling the 12 different consistencies. One can also bring, take, hang up, keep, carry around, turn over, etc. objects. There are about 28 different verbs one can use for handling objects. If we multiply these verbs by the consistencies, there are over 300 different verbs used just for handling objects.

In Navajo textbooks, there are conjugation tables for inflecting words, but it’s pretty hard to find a pattern there. One of the most frustrating things about Navajo is that every little morpheme you add to a word seems to change everything else around it, even in both directions.

Navajo is said to have a very difficult system for counting numerals.

There is also a noun classifier system with more than a dozen classifiers that affect inflection. This is quite a few classifiers even for a noun classifier language and is similar to African languages like Zulu. In addition, it has the strange direct/inverse system.

To add insult to injury, Navajo is an ergative language.

Navajo also has an honorifics or politeness system similar to Japanese or Korean.

Navajo also has the odd feature where the word niinaa – because can be analyzed as a verb.

X áhóót’įįd biniinaa…
Because X happened…

Shiniinaa sits’il.
It broke into pieces because of me.

In the latter sentence, the only way we know that 1st singular was involved in because of the person marking on niinaa.

There are 25 different kinds of pronominal prefixes that can be piled onto one another before a verb base.

Navajo has a very strange feature called animacy, where nouns take certain verbs according to their rank in the hierarchy of animation which is a sort of a ranking based on how alive something is. Humans and lightning are at the top, children and large animals are next and abstractions are at the bottom.

All in all, Navajo, even compared to other polysynthetic languages, has some of the most incredibly complicated polysynthetic morphology of any language. On craziest grammar and craziest language lists, Navajo is typically listed.

It is even said that Navajo children have a hard time learning Navajo as compared to children learning other languages, but Navajo kids definitely learn the language. Similarly with Hopi below, even linguists find even the best Navajo grammars difficult or even impossible to understand.

However, Navajo is quite regular, a common feature in Amerindian languages.

Navajo is rated 6, hardest of all.

Northern

Slavey, a Na-Dene language of Canada, is hard to learn. It is similar to Navajo and Apache. Verbs take up to 15 different prefixes. All Athabascan languages have wild verbal systems. It also uses a completely different alphabet, a syllabic one designed for Canadian Indians.

Slavey is rated 6, hardest of all.

Haida

Haida is often thought to be a Na-Dene language, but proof of its status is lacking. If it is Na-Dene, it is the most distant member of the family. Haida is in the competition for the most complicated language on Earth, with 70 different suffixes.

Haida is rated 6, hardest of all.

Salishan

The Salishan languages spoken in the Northwest have a long reputation for being hard to learn, in part because of long strings of consonants, in one case 11 consonants long. Salish languages are the only languages on Earth that allow words without sonorants.

Many of the vowels and consonants are not present in most of the world’s widely spoken languages. The Salish languages are, like Chukchi, polysynthetic. Some translations treat all Salish words are either verbs or phrases. Some say that Salish languages do not contain nouns, though this is controversial. The verbal system of Salish languages is absurdly complex.

All Salishan languages are rated rated 6, hardest of all.

Nuxálk (Bella Coola)

Nuxálk is a notoriously difficult Salishan Amerindian language spoken in British Colombia. It is famous for having some really wild words and even sentences that don’t seem to have any vowels in them at all. For instance:

xłp̓x̣ʷłtłpłłskʷc̓ (xɬpʼχʷɬtʰɬpʰɬːskʷʰt͡sʼ in IPA)He had a bunchberry plant.

Nuyamłamkis timantx tisyuttx ʔułtimnastx.
The father sang the song to his son.

Musis tiʔimmllkītx taq̓lsxʷt̓aχ.
The boy felt that rope.

However, this word is not typically used by speakers and by no means do most words consist of all consonants. The language sounds odd when spoken. It has been described as “whispering while chewing on a granola bar” (see the video sample under Montana Salish below).

These wild consonant clusters are even crazier than the ones in Ubykh and NW Caucasian. In fact, the nutty consonant clusters in Salish and causing a debate in linguistics about whether or not the syllable is even a universal phenomenon in language as some Salish words and phrases appear to lack syllables. Some Berber dialects have raised similar questions about the syllable.

This link shows an elder on the Flathead Indian Reservation in Montana, Steven Smallsalmon, speaking Montana Salish. He also leads classes in the language. This is probably one of the strangest sounding languages on Earth.

Montana Salish is rated 6, hardest of all.

Central

Straits Salish has an aspectual distinction between persistent and nonpersistent. Persistent means the activity continues after its inception as a state. The persistent morpheme is -í. The result is similar to English:

figure out – nonpersistentknow – persistent

look at – nonpersistentwatch – persistent

take – nonpersistenthold – persistent

-í is referred to as a “parasitic morpheme” and only occurs in stem that has an underlying ə which serves as a “host” for the -í morpheme.

How strange.

The Saanich dialect of Straits Salish is often listed in the rogue’s gallery of craziest grammars on Earth. The writing system is often listed as one of the worst out there. In addition, Saanich makes it onto craziest grammars lists for the parasitic morphemes and for having no distinction between nouns and verbs!

Straits Salish gets a 6 rating, hardest of all.

Halkomelem, spoken by 570 people around Vancouver, British Colombia, is widely considered to be one of the hardest languages on Earth to learn. In Halkomelem, many verbs have an orientation towards water. You can’t just say, She went home. You have say how she was going home in relation to nearby bodies of water. So depending on where she was walking home in relation to the nearest river, you would say:

She was farther away from the water and going home.
She was coming home in the direction away from the water.
She was walking parallel to the flow of the water downstream.
She was walking parallel to the flow of the water upstream.

Halkomelem gets a 6 rating, hardest of all.

Lushootseed

Lushootseed is said to be just as hard to learn as Nuxálk. Lushootseed is one of the few languages on Earth that has no nasals at all, except in special registers like baby talk and the archaic speech of mythological figures. It also has laryngealized glides and nasals: w ̰ , m̥ ̰ , and n̥ ̰ .

Lushootseed is rated 6, hardest of all.

Iroquoian

All Iroquoian languages are extremely difficult, but Athabaskan is probably even harder. Siouan languages may be equal to Iroquoian in difficulty.

Compare the same phrases in Tlingit (Athabaskan) and and Cherokee (Iroquoian).

Tlingit:

kutíkusa‘áat – It’s cold outside.
kutíkuta‘áat – It’s cold right now.

In Tlingit, you can add or modify affixes at the beginning as prefixes, in the middle as infixes and at the end as suffixes. In the above example, you changed a part of the word within the clause itself.

Cherokee:

doyáditlv uyvtlv – It is cold outside. (Lit. Outside it is cold)ka uyvtlv – It is cold now. (Lit. Now it is cold.)

As you can see, Cherokee is easier.

Cherokee

Cherokee is very hard to learn. In addition to everything else, it has a completely different alphabet. It’s polysynthetic, to make matters worse. It is possible to write a Cherokee sentence that somehow lacks a verb. There are five categories of verb classifiers. Verbs needing classifiers must use one. Each regular verb can have an incredible 21,262 inflected forms! All verbs contain a verb root, a pronominal prefix, a modal suffix and an aspect suffix. In addition, verbs inflect for singular, plural and also dual. For instance:

ᎠᎸᎢᎭ a'lv'íha
You have 126 different forms:
ᎬᏯᎸᎢᎭ gvyalv'iha I tie you up
ᏕᎬᏯᎸᎢᎭ degvyalviha I'm tying you up
ᏥᏯᎸᎢᎭ jiyalv'ha I tie him up
ᎦᎸᎢᎭ I tie it
ᏍᏓᏯᎸᎢᎭ sdayalv'iha I tie you (dual)
ᎢᏨᏯᎢᎭ ijvyalv'iha I tie you (pl)
ᎦᏥᏯᎸᎢᎭ gajiyalv'iha I tie them (animate)
ᏕᎦᎸᎢᎭ I tie them up (inanimate)
ᏍᏆᎸᎢᎭ squahlv'iha You tie me
ᎯᏯᎸᎢᎭ hiyalv'iha You're tying him
ᎭᏢᎢᎭ hatlv'iha You tie it
ᏍᎩᎾᎸᎢᎭ skinalv'iha You're tying me and him
ᎪᎩᎾᏢᎢᎭ goginatlv'iha They tie me and him etc.

Let us look at another form:

to seeI see myself gadagotia
I see you gvgohtia
I see him/ tsigotia
I see it tsigotia
I see you two advgotia
I see you (plural) istvgotia
I see them (live) gatsigotia
I see them (things) detsigotiaYou see me sgigotia
You see yourself hadagotia
You see him/her higo(h)tia
You see it higotia
You see another and me sginigotia
You see others and me isgigotia
You see them (living) dehigotia
You see them (living) gahigotia
You see them (things) detsigotiaHe/she sees me agigotia
He/she sees you tsagotia
He/she sees you atsigotia
He/she sees him/her agotia
He/she sees himself/herself adagotia
He/she sees you + me ginigotia
He/she sees you two sdigotia
He/she sees another + me oginigotia
He she sees us (them + me) otsigotia
He/she sees you (plural) itsigotia
He/she sees them dagotiaYou and I see him/her/it igigotia
You and I see ourselves edadotia
You and I see one another denadagotia/dosdadagotia
You and I see them (living) genigotia
You and I see them (living or not) denigotiaYou two see me sgninigotia
You two see him/her/it esdigotia
You two see yourselves sdadagotia
You two see us (another and me) sginigotia
You two see them desdigotiaAnother and I see you sdvgotia
Another and I see him/her osdigotia
Another and I see it osdigotia
Another and I see you-two sdvgotia
Another and I see ourselves dosdadagotia
Another and I see you (plural) itsvgotia
Another and I see them dosdigotiaYou (plural) see me isgigoti
You (plural) see him/her etsigotiThey see me gvgigotia
They see you getsagotia
They see him/her anigoti
They see you and me geginigoti
They see you two gesdigoti
They see another and me gegigotia/gogenigoti
They see you (plural) getsigoti
They see them danagotia
They see themselves anadagotiI will see datsigoi
I saw agigohviHe/she will see dvgohi
He/she sawugohvi

Number is marked for inclusive vs. exclusive and there is a dual. 3rd person plural is marked for animate/inanimate. Verbs take different object forms depending on if the object is solid/alive/indefinite shape/flexible. This is similar to the Navajo system.

Cherokee also has lexical tone, with complex rules about how tones may combine with each other. Tone is not marked in the orthography. The phonology is noted for somehow not having any labial consonants.

However, Cherokee is very regular. It has only three irregular verbs. It is just that there are many complex rules.

Wyandot, a dormant language that has been extinct for about 50 years, has some unbelievably complex structures. Let us look at one of them. Wyandot is the only language on Earth that allows negative sentences that somehow do not contain a negative morpheme. Wyandot makes it onto craziest grammars lists. (To be continued).

Lakota and other Siouan languages may well be as convoluted as Iroquoian. In Lakota, all adjectives are expressed as verbs. Something similar is seen in Nahuatl.

Ógle sápe kiŋ mak’ú.
The shirt it is black he gave it to me.
He gave me the black shirt.

In the above, it is black is a stative verb and serves as an adjective.

Ógle kiŋ sabyá mak’ú.
Shirt the blackly he gave it to me.
He gave me the black shirt. (Lit. He gave me the shirt blackly.)

Bkackly is an adverb serving as an adjective above.

Lakota gets a 5.5 rating, hardest of all.

Algic
Algonquian

All Algonquian languages have distinctions between animate/inanimate nouns, in addition to having proximate/obviate and direct/inverse distinctions. However, most languages that have proximate/obviate and direct/inverse distinctions are not as difficult as Algonquian.

Proximate/obviative is a way of marking the 3rd person in discourse. It distinguishes between an important 3rd person (proximate) and a more peripheral 3rd person (obviative). Animate nouns and possessor nouns tend to be marked proximate while inanimate nouns and possessed nouns tend to be marked obviative.

Direct/inverse is a way of marking discourse in terms of saliency, topicality or animacy. Whether one noun ranks higher than another in terms of saliency, topicality or animacy means that that nouns ranks higher in terms of person hierarchy. It is used only in transitive clauses. When the subject has a higher ranking than the object, the direct form is used. When the object has a higher ranking than the object, the inverse form is used.

Central Algonquian
Cree-Montagnais

Cree is very hard to learn. It are written in a variety of different ways with different alphabets and syllabic systems, complicating matters even further. The syllabic alphabet has many problems and is often listed as one of the worst scripts out there. They are both polysynthetic and have long, short and nasal vowels and aspirated and unaspirated voiceless consonants. Words are divided into metrical feet, the rules for determining stress placement in words are quite complex and there is lots of irregularity. Vowels fall out a lot, or syncopate, within words.

Cree adds noun classifiers to the mix, and both nouns and verbs are marked as animate or inanimate. In addition, verbs are marked for transitive and intransitive. In addition, verbs get different affixes depending on whether they occur in main or subordinate clauses.

Cree is rated 6, hardest of all.

Ojibwa-Patowatomi

Ojibwa is said to be about as hard to learn, as Cree as it is very similar.

Ojibwa is rated 6, hardest of all.

Plains Algonquian
Cheyenne

Cheyenne is well-known for being a hard Amerindian language to learn. Like many polysynthetic languages, it can have very long words.

However, Cheyenne is quite regular, but has so many complex rules that it is hard to figure them all out.

Cheyenne is rated 6, hardest of all.

Arapahoan

Arapaho has a strange phonology. It lacks phonemic low vowels. The vowel system consists of i, ɨ~,u, ɛ, and ɔ, with no low phonemic vowels. Each vowel also has a corresponding long version. In addition, there are four diphthongs, ei, ou, oe and ie, several triphthongs, eii, oee, and ouu, as well as extended sequences of vowels such as eee with stress on either the first or the last vowel in the combination. Long vowels of various types are common:

Héétbih’ínkúútiinoo.
I will turn out the lights.

Honoosóó’.
It is raining.

There is a pitch accent system with normal, high and allophonic falling tones. Arapaho words also undergo some very wild sound changes.

Arapaho is rated 6, hardest of all.

Gros Ventre has a similar phonological system and similar elaborate sound changes as Arapaho.

Gros Ventre is rated 5, hardest of all.

Caddoan
Northern
Wichita

Wichita has many strange phonological traits. It has only one nasal. Labials are rare and appear in only two roots. It also may have only three vowels, i, e, and a, with only height as a distinction. Such a restricted vertical vowel distribution is only found in NW Caucasian and the Papuan Ndu languages. There is apparently a three-way contrast in vowel length – regular, long and extra-long.

This is only found in Mixe and Estonian. There are some interesting tenses. Perfect tense means that an act has been carried out. The strange intentive tense means that one hopes or hoped to to carry out an act. The habitual tense means one regularly engages in the activity, not that one is doing so at the moment.

Long consonant clusters are permitted.

kskhaːɾʔa

nahiʔinckskih
while sleeping

There are many cases where a CVɁ sequence has been reduced to CɁ due to loss of the vowel, resulting in odd words such as:

ki·sɁ
bone

Word order is ordered in accordance with novelty or importance.

hira:wisɁiha:s kiyari:ce:hire:
Our ancestors God put us on this Earth.

weɁe hira:rɁ tiɁi na:kirih
God put our ancestors on this Earth.

In the sentence above, “our ancestors” is actually the subject, so it makes sense that it comes first.

Wichita has inclusive and exclusive 3rd person plural and has singular, dual and plural. There is an evidential system where if you say you know something, you must say how you know it – whether it is personal knowledge or hearsay.

Wichita gets a 6 rating, hardest of all.

Hokan
Tequislatecan
Coastal Chantal

Huamelutec or Lowland Oaxaca Chantal has the odd glottalized fricatives fʼ, sʼ, ɬʼ and xʼ as its only glottalized consonants. They alternate with plain f, s, l and x. fʼ, ɬʼ and xʼ are extremely rare in the world’s languages, usually only found in 2-3 other languages, often in NW Caucasian. xʼ occurs only in one other language – Tlingit. sʼ is slightly more common, occurring five other languages including Tlingit. In other languages, these odd sounds derived from sequences of consonant + q: Cq -> Cʔ -> glottalized fricative.

Sentence structure is odd:

Hit the ball the man.
Hit the man the ball.
The man hit the ball.

All mean the same thing.

Huamelutec gets a 6 rating, hardest of all.

Karok

Karok is a language isolate spoken by a few dozen people in northern California. The last native speaker recently died, however, there are ~80 who have varying levels of L2 fluency.

In Karok, you can use a suffix for different types of containment – fire, water or a solid.

pa:θ-kirih
throw into a fire

pa:θ-kurih
throw into water

pa:θ-ruprih
throw through a solid

The suffixes are unrelated to the words for fire, water and solid.

Karok gets a 5 rating, hardest of all.

Uto-Aztecan
Northern

Hopi is so difficult that even grammars describing the language are almost impossible to understand. For instance, Hopi has two different words for and depending on whether the noun phrase containing the word and is nominative or accusative.

Hopi is rated 6, hardest of all.

Southern Uto-Aztecan
Corachol-Aztecan
Core Nahua
Nahuatl

In Nahuatl, most adjectives are simply stative verbs. Hence:

Umntu omde waya eTenochtitlan.
The man he is tall went to Tenochtitlan.
The tall man went to Tenochtitlan.

He is tall is a stative verb in the above.

Nahuatl gets a 6 rating, hardest of all.

Numic
Central Numic

Comanche is legendary for being one of the hardest Indian languages of all to learn. Reasons are unknown, but all Amerindian languages are quite difficult. I doubt if Comanche is harder than other Numic languages.

Bizarrely enough, Comanche has very strange sounds called voiceless vowels, which seems to be an oxymoron, as vowels would seem to be inherently voiced. English has something akin to voiceless vowels in the words particular and peculiar, where the bolded vowels act something akin to a voiceless vowel.

Comanche was used for a while by the codespeakers in World War 2 – not all codespeakers were Navajos. Comanche was specifically chosen because it was hard to figure out. The Japanese were never able to break the Comanche code.

Comanche is rated 6, hardest of all.

Oto-Manguean
Western Oto-Mangue
Oto-Pame-Chinantecan
Chinantecan

Chinantec, an Indian language of southwest Mexico, is very hard for non-Chinantecs to learn. The tone system is maddeningly complex, and the syntax and morphology are very intricate.

Chinantec is rated 6, hardest of all.

Popolocan
Mazatecan
Lowland Valley
Southern

Jalapa Mazatec has distinctions between modal, creaky, breathy-voiced vowels along with nasal versions of those three. It also has creaky consonants and voiceless nasals. It has three tones, low, mid and high. Combining the tones results in various contour tones. In addition, it has a 3-way distinction in vowel length. Whistled speech is also possible. It has a phonemic distinction between “ballistic” and “controlled” syllables which is only present on Oto-Manguean.

Maipurean
Northern
Upper Amazon
Eastern Nawiki

Tariana is a very difficult language mostly because of the unbelievable amount of information it crams into its morphology and syntax. This is mostly because it is an Arawakan language that has been heavily influenced by neighboring Tucanoan languages, with the result that it has many of the grammatical categories and particles present in both families.

This stems from the widespread bilingualism in the Vaupes Basin of Colombia, where many people grow up bilingual from childhood and often become multilingual by adulthood. Learning up to five different languages is common. Code-switching was frowned upon and anyone using a word from Language Y while speaking Language X would get laughed at. Hence the various languages tended to borrow features from each other quite easily.

For instance, Tariana has both a noun classifier system and a gender system. Noun classifiers and gender are sometimes subsumed under the single category of “noun classifiers.” Yet Tariana has both, presumably from its relationship to two completely different language families. So in Tariana is not unusual to get both demonstratives and verbs marked for both gender and noun classifier. Tariana borrowed such things as serialized perception verbs and the dubitative marker from Tucano.

In addition, Tariana has some very odd sounds, including aspirated nasals mh (mʰ), nh (n̺ʰ) and ñh (ɲʰ) and an aspirated w (wʰ) of all things. They seem to be actually aspirated, not just partially devoiced as many voiceless nasals and liquids are.

Tariana gets 6, hardest of all.

Huitotoan
Proto-Bora-Muinane

Bora, a Wintotoan language spoken in Peru and Colombia near the border between the two countries, has a mind-boggling 350 different noun classes. The noun classifier system is actually highly productive and is often used to create new nouns. New nouns can be created very easily, and their meanings are often semantically transparent. In some noun classifier systems, classifiers can be stacked one upon the other. In these cases, typically the last one is used for agreement purposes.

Bora also is a tonal language, but it has only two tones. In addition, nearly all consonantal phonemes have phonemic aspirated and palatalized counterparts. The agreement structure in the language is also quite convoluted. The classifier system effectively replaces much derivational morphology on the noun and noun compounding processes that other languages use to expand the meanings of nominals.

Bora gets a 6 rating, hardest of all.

Tucanoan
Eastern Tucanoan
Bará-Tuyuka

Tuyuca is a Tucanoan language spoken in by 450 people in the department of Vaupés in Colombia. An article in The Economist magazine concluded that it was the hardest language on Earth to learn.

It has a simple sound system, but it’s agglutinative, and agglutinative languages are pretty hard. For instance, hóabãsiriga means I don’t know how to write. It has two forms of 1st person plural, I and you (inclusive) and I and the others (exclusive). It has between 50-140 noun classes, including strange ones like bark that does not cling closely to a tree, which can be extended to mean baggy trousers or wet plywood that has begun to fall apart.

Like Yamana, a nearly extinct Amerindian language of Chile, Tuyuca marks for evidentiality, that is, how it is that you know something. For instance:

Diga ape-wi. = The boy played soccer. (I saw him playing).
Diga ape-hiyi. = The boy played soccer. (I assume he was playing soccer, though I did not see it firsthand).

Evidential marking is obligatory on all Tuyuca verbs and it forces you to think about how you know whatever it is you know.

However, verbs can function as adjectives, and the adjective roots can either turn into nouns themselves or they can take the inflections of either nouns or verbs. Wild!

Similar to how the grammar of Tariana has been influenced by Tucano languages, the grammar of Tucanoan Cubeo has been influenced by neighboring Arawakan languages. The grammar has been described as either SOV or OVS. That would mean that the following:

Just by looking at any given consonant-initial suffix, it is impossible to determine which of the first three categories it belongs to. They must be learned one by one.

Cubeo has nasal assimilation, common to many Amazonian languages. In some of these, nasalization is best analyzed at the syllable level – some syllables are nasal and others are not.

dĩ-bI-ko
/dĩ-bĩ-ko/nĩmĩko
She recently went.

The underlying form dĩ-bI-ko is realized on the surface as nĩmĩko. The ĩ in dĩ-bI-ko nasalizes the d, the b, and the I on either side of it, so nasal spreading works in both directions. However, it is blocked from the third syllable because k is part of a class of non-nasalizable consonants.

Pretty difficult language.

Cuneo gets a 6 rating, hardest of all.

Carib
Waiwai

Hixkaryána is famous for being the only language on Earth to have basic OVS (Object-Verb-Subject) word order.

The sentence Toto yonoye kamara, or The man ate the jaguar, actually means The jaguar ate the man.

Grammatical suffixes attached to the end of the verb mark not only number but also aspect, mood and tense.

Hixkaryána gets a 6 rating, hardest of all.

Nambikwaran
Mamaindê

This is actually a series of closely related languages as opposed to one language, but the Southern Nambikwara language is the most well-known of the family, with 1,200 speakers in the Brazilian Amazon.

Phonology is complex. Consonants distinguish between aspirated, plain and glottalized, common in the Americas. There are strange sounds like prestopped nasals glottalized fricatives. There are nasal vowels and three different tones. All vowels except one have both nasal, creaky-voiced and nasal-creaky counterparts, for a total of 19 vowels.

The grammar is polysynthetic with a complex evidential system.

Reportedly, Nambikwara children do not pick up the language fully until age 10 or so, one of the latest recorded ages for full competence. Nambikwara is sometimes said to be the hardest language on Earth to learn, but it has some competition.

Nambikwara definitely gets a 6 rating, hardest of all!

Muran

Pirahã is a language isolate spoken in the Brazilian Amazon. Recent writings by Daniel Everett indicate that not only is this one of the hardest languages on Earth to learn, but it is also one of the weirdest languages on Earth. It is monumentally complex in nearly every way imaginable. It is commonly listed on the rogue’s gallery of craziest languages and phonologies on Earth.

It has the smallest phonemic inventory on Earth with only seven consonants, three vowels and either two or three tones. Everett recently wrote a paper about it after spending many years with them. Previous missionaries who had spent time with the Pirahã generally failed to learn the language because it was too hard to learn. It took Everett a very long time, but he finally learned it well.

Many of Everett’s claims about Pirahã are astounding: whistled speech, no system for counting, very few Portuguese loans (they deliberately refuse to use Portuguese loans) evidence for the Sapir-Whorf linguistic relativity hypothesis, and evidence that it violates some of Noam Chomsky’s purported language universals such as embedding. It also has the t͡ʙ̥ sound – a bilabially trilled postdental affricate which is only found in two other languages, both in the Brazilian Amazon – Oro Win and Wari’.

Initially, Everett never heard the sound, but they got to know him better, they started to make it more often. Everett believes that they were ridiculed by other groups when they made the odd sound.

Pirahã has the simplest kinship system in any language – there is only word for both mother and father, and the Pirahã do not have any words for anyone other than direct biological relatives.

Pirahã may have only two numerals, or it may lack a numeral system altogether.

Pirahã does not distinguish between singular and plural person. This is highly unusual. The language may have borrowed its entire pronoun set from the Tupian languages Nheengatu and Tenarim, groups the Pirahã had formerly been in contact with. This may be one of the only attested case of the borrowing of a complete pronoun set.

There are mandatory evidentiality markers that must be used in Pirahã discourse. Speakers must say how they know something, whether they saw it themselves, whether it was hearsay or whether they inferred it circumstantially.

There are various strange moods – the desiderative (desire to perform an action) and two types of frustrative – frustration in starting an action (inchoative/incompletive) and frustration in completing an action (causative/incompletive). There are others: immediate/intentive (you are going to do something now/you intend to do it in the future)

There are many verbal aspects: perfect/imperfect (completed/incomplete) telic/atelic (reaching a goal/not reaching a goal), continuative (continuing), repetitive (iterative), and beginning an action (inchoative).

The future tense is divided into future/somewhere and future/elsewhere. The past tense is divided into plain past and immediate past.

Pirahã has a closed class of only 90 verb roots, an incredibly small number. But these roots can be combined together to form compound verbs, a much larger category. Here is one example of three verbs strung together to form a compound verb:

xig ab op
take turn go – bring back, You take something away, you turn around, and you go back to where you got it to return it.

There are no abstract color terms in Pirahã. There are only two words for colors, one for light and one for dark. The only other languages with this restricted of a color sense are in Papua New Guinea. The other color terms are not really color terms, but are more descriptive – red is translated as like blood.

Pirahã can be whistled, hummed or encoded into music. Consonants and vowels can be omitted altogether and meaning conveyed instead via variations in stress, pitch and rhythm. Mothers teach the language to children by repeating musical patterns.

Pirahã may well be one of the hardest languages on Earth to learn.

Pirahã gets a 6 rating, hardest of all.

Quechuan

Quechua (actually a large group of languages and not a single language at all) is one of the easiest Amerindian languages to learn. Quechua is a classic example of a highly regular grammar with few exceptions. Its agglutinative system is more straightforward than even that of Turkish. The phonology is dead simple.

On the down side, there is a lot of dialectal divergence (these are actually separate languages and not dialects) and a lack of learning materials. Some say that Quechua speakers spend their whole lives learning the language.

Quechua has inconsistent orthographies. There is a fight between those who prefer a Spanish-based orthography and those who prefer a more phonemic one. Also there is an argument over whether to use the Ayacucho language or the Cuzco language as a base.

Quechua has a difficult feature known as evidential marking. This marker indicates the source of the speaker’s knowledge and how sure they are about the statement.

-mi expresses personal knowledge:

Tayta Wayllaqawaqa chufirmi.
Mr. Huayllacahua is a driver. (I know it for a fact.)

Aymaran
Aymara

Aymara has some of the wildest morphophonology out there. Morpheme-final vowel deletion is present in the language as a morphophonological process, and it is dependent on a set of highly complex phonological, morphological and syntactic rules (Kim 2013).

For instance, there are three types of suffixes: dominant, recessive and a 3rd class is neither dominant nor recessive. If a stem ends in a vowel, dominant suffixes delete the vowel but recessive suffixes allow the vowel to remain. The third class either deletes or retains the vowel on the stem depending on how many vowels are in the stem. If the root has two vowels, the vowel is retained. If it has three vowels, the vowel is deleted.

Although all of this seems quite odd, Finnish has something similar going on, if not a lot worse.

Nevertheless, Aymara is still said to be a very easy language to learn. The Guinness Book of World Records claims it is almost as easy to learn as Esperanto.

Aymara gets a 2 rating, very easy to learn.

Australian

Australian Aborigine languages are some of the hardest languages on Earth to learn, like Amerindian or Caucasian languages. Some Australian languages have phonemic contrasts that few other languages have, such as apico-dental, lamino-dental, apico-post-alveolar, and lamino-postalveolar cononals.

Australian languages tend to be mixed ergative. Ordinary nouns are ergative-absolutive, but 1st and 2nd person pronouns are nominative-accusative. One language has a three way agent-patient-experiencer distinction in the 1st person pronoun. Australian pronouns typically have singular, plural and dual forms along with inclusive and exclusive 1st plural. In some sentences, they have what is known as double case agreement which is rare in the world’s languages:

I gave a spear to my father.
I gave a spear mine-to father’s-to.

Both elements of the phrase my father are in both dative and genitive.

However, Aboriginal languages do have the plus of being very regular.

All Australian languages are rated 6, most difficult of all.

Tor-Kwerba
Orya-Tor
Tor

Berik is a Tor-Orya language spoken in Indonesian colony of Irian Jaya in New Guinea.

Verbs take many strange endings, in many cases mandatory ones, that indicate what time of day something happened, among other things.

Telbener – He drinks in the evening.

Where a verb takes an object, it will not only be marked for time of day but for the size of the object.

Kitobana – He gives three large objects to a man in the sunlight.

Verbs may also be marked for where the action takes place in reference to the speaker.

Gwerantena – To place a large object in a low place nearby.

Berik is rated 6, hardest of all.

Trans New Guinea
Madang
Croisilles
Gum

Amele is the world’s most complex language as far as verb forms go, with 69,000 finitive and 860 infinitive forms.

Amele is rated 6, hardest of all.

Torricelli
Wapei
Valman

Valman is a bizarre case where the word and that connects two nouns is actually a verb of all things and is marked with the first noun as subject and the second noun as object.

John (subject) and Mary (object)

John is marked as subject for some reason, and Mary is marked as object, and the and word shows subject agreement with John and object agreement with Mary.

Valman gets a 6 rating, hardest of all.

Afroasiatic
Semitic

Semitic languages such as Arabic and Hebrew are notoriously difficult to learn, and Arabic (especially MSA) tops many language learners’ lists as the hardest language they have ever attempted to learn. Although Semitic verbs are notoriously complex, the verbal system does have some advantages especially as compared to IE languages like Slavic. Unlike Slavic, Semitic verbs are not inflected for mood and there is no perfect or imperfect.

Central
South
Arabic

Arabic has some very irregular manners of noun declension, even in the plural. For instance, the word girls changes in an unpredictable way when you say one girl, two girls and three girls, and there are two different ways to say two girls depending on context. Two girls is marked with the dual, but different dual forms can be used. All languages with duals are relatively difficult for most speakers that lack a dual in their native language. However, the dual is predictable from the singular, so one might argue that you only need to learn how to say one girl and three girls.

Further, it is full of irregular plurals similar to octopus and octopi in English, whereas these forms are rare in English. With any given word, there might be 20 different possible ways to pluralize it, and there is no way to know which of the 20 paradigms to use with that word, and further, there is no way to generalize a plural pattern from a singular pattern. In addition, many words have 2-3 ways of pluralizing them. Some messy Arab plurals:

When you say I love you to a man, you say it one way, and when you say it to a woman, you say it another way. On and on.

The Arabic writing system is exceeding difficult and is more of the hardest to use of any on Earth. Soft vowels are omitted. You have to learn where to insert missing vowels, where to double consonants and which vowels to skip in the script. There are 28 different symbols in the alphabet and four different ways to write each symbol depending on its place in the word.

Consonants are written in different ways depending on where they appear in a word. An h is written differently at the beginning of a word than at the end of a word. However, one simple aspect of it is that the medial form is always the same as the initial form. You need to learn not only Arabic words but also the grammar to read Arabic.

Pronouns attach themselves to roots, and there are many different verb conjugation paradigms which simply have to be memorized. For instance, if a verb has a و, a ي, or a ء in its root, you need to memorize the patters of the derivations, and that is a good chunk of the conjugations right there. The system for measuring quantities is extremely confusing.

The grammar has many odd rules that seem senseless. Unfortunately, most rules have exceptions, and it seems that the exceptions are more common than the rules themselves. Many people, including native speakers, complain about Arabic grammar.

Arabic does have case, but the system is rather simple.

The laryngeals, uvulars and glottalized sounds are hard for many foreigners to make and nearly impossible for them to get right. The ha’(ح ), qa (ق ) and غ sounds and the glottal stop in initial position give a lot of learners headaches.

Arabic is at least as idiomatic as French or English, so it order to speak it right, you have to learn all of the expressionistic nuances.

One of the worst problems with Arabic is the dialects, which in many cases are separate languages altogether. If you learn Arabic, you often have to learn one of the dialects along with classical Arabic. All Arabic speakers speak both an Arabic dialect and Classical Arabic.

In some Arabic as a foreign language classes, even after 1 1/2 years, not one student could yet make a complete and proper sentence that was not memorized.

Adding weight to the commonly held belief that Arabic is hard to learn is research done in Germany in 2005 which showed that Turkish children learn their language at age 2-3, German children at age 4-5, but Arabic kids did not get Arabic until age 12.

Arabic has complex verbal agreement with the subject, masculine and feminine gender in nouns and adjectives, head-initial syntax and a serious restriction to forming compounds. If you come from a language that has similar nature, Arabic may be easier for you than it is for so many others. Its 3 vowel system makes for easy vowels.

MSA Arabic is rated 5, extremely difficult.

Arabic dialects are often somewhat easier to learn than MSA Arabic. At least in Lebanese and Egyptian Arabic, the very difficult q’ sound has been turned into a hamza or glottal stop which is an easier sound to make. Compared to MSA Arabic, the dialectal words tend to be shorter and easier to pronounce.

To attain anywhere near native speaker competency in Egyptian Arabic, you probably need to live in Egypt for 10 years, but Arabic speakers say that few if any second language learners ever come close to native competency. There is a huge vocabulary, and most words have a wealth of possible meanings.

Egyptian Arabic is rated 4.5, very to extremely difficult.

Moroccan Arabic is said to be particularly difficult, with much vowel elision in triconsonantal stems. In addition, all dialectal Arabic is plagued by irrational writing systems.

Moroccan Arabic is rated 4.5, very to extremely difficult.

Maltese is a strange language, basically a Maghrebi Arabic language (similar to Moroccan or Tunisian Arabic) that has very heavy influence from non-Arabic tongues. It shares the problem of Gaelic that often words look one way and are pronounced another.

It has the common Semitic problem of difficult plurals. Although many plurals use common plural endings (-i, -iet, -ijiet, -at), others simply form the plural by having their last vowel dropped or adding an s (English borrowing). There’s no pattern, and you simply have to memorize which ones act which way. Maltese permits the consonant cluster spt, which is surely hard to pronounce.

On the other hand, Maltese has quite a few IE loans from Italian, Sicilian, Spanish, French and increasingly English. If you have knowledge of Romance languages, Maltese is going to be easier than most Arabic dialects.

Maltese is rated 4, very difficult.

South
Canaanite

Hebrew is hard to learn according to a number of Israelis. Part of the problem may be the abjad writing system, which often leaves out vowels which must simply be remembered. Also, other than borrowings, the vocabulary is Afroasiatic, hence mostly unknown to speakers of IE languages. There are also difficult consonants as in Arabic such as pharyngeals and uvulars.

The het or glottal h is particularly hard to make. However, most modern Israelis no longer make the het sound or a’ain sounds. Instead, they pronounce the het like the chaf sound and the a’ain like an alef. Almost all Ashkenazi Israeli Jews no longer use the het or a’ain sounds. But most Jews who came from Arab countries (often older people) still use the sound, and some of their children do (Dorani 2013).

Hebrew has complex morphophonological rules. The letters p, b, t, d, k and g change to v, f, dh, th, kh and gh in certain situations. In some environments, pharyngeals change the nature of the vowels around them. The prefix ve-, which means and, is pronounced differently when it precedes certain letters. Hebrew is also quite irregular.

Hebrew has quite a few voices, including active, passive, intensive, intensive passive, etc. It also has a number of tenses such as present, past, and the odd juissive.

Hebrew also has two different noun classes. There are also many suffixes and quite a few prefixes that can be attached to verbs and nouns.

Even most native Hebrew speakers do not speak Hebrew correctly by a long shot.

Quite a few say Hebrew is as hard to learn as MSA or perhaps even harder, but this is controversial.

Hebrew gets a 5 rating for extremely difficult.

Berber
Northern
Atlas

Berber languages are considered to be very hard to learn. Worse, there are very few language learning resources available.

Tamazight allows doubled consonants at the beginning of a word! How can you possibly make that sound?

Tamazight gets a 6 rating, hardest of all.

In Tachelhit , words like this are possible:

tkkststt You took it off.

tfktstt You gave it.

In addition, there are words which contain only one or two consonants:

ɡ
be

ks
feed on

Tachelhit gets a 6 rating, hardest of all.

South
Ethiopian
South
Transversal
Amharic–Argobba
Amharic

Amharic is said to be a very hard language to learn. It is quite complex, and its sentence structures seem strange even to speakers of other Semitic languages. Hebrew speakers say they have a hard time with this language.

There are a multitude of rules which almost seem ridiculous in their complexity, there are numerous conjugation patterns, objects are suffixed to the verb, the alphabet has 274 letters, and the pronunciation seems strange. However, if you already know Hebrew or Arabic, it will be a lot easier. The hardest part of all is the verbal system, as with any Semitic language. It is easier than Arabic.

Amharic gets a 4.5 rating, very hard to extremely hard.

Cushitic
East Cushitic

Dahalo is legendary for having some of the wildest consonant phonology on Earth. It has all four airstream mechanisms found in languages: ejectives, implosives, clicks and normal pulmonic sounds. There are both glottal and epiglottal stops and fricatives and laminal and apical stops.

There is also a strange series of nasal clicks and are both glottalized and plain. Some of these clicks are also labialized. It has both voiced and unvoiced prenasalized stops and affricates, and some of the stops are also labialized. There is a weird palatal lateral ejective. There are three different lateral fricatives, including a labialized and palatalized one, and one lateral approximant. It contrasts alveolar and palatal lateral affricates and fricatives, the only language on Earth to do this.

The Dahalo are former elephant hunting hunter gatherers who live in southern Kenya. It is believed that at one time they spoke a language like Sandawe or Hadza, but they switched over to Cushitic at some point. The clicks are thought to be substratum from a time when Dahalo was a Sandawe-Hadza type language.

Dahalo gets a 6 rating, hardest of all.

Somali

Somali has one of the strangest proposition systems on Earth. It actually has no real prepositions at all. Instead it has preverbal particles and possessives that serve as prepositions.

Here is how possessives serve as prepositions:

habeennimada horteeda
the night her front
before nightfall

kulaylka dartiisa
the heat his reason
because of the heat

Here we have the use of a preverbal particle serving as a preposition:

kú ríd shandádda
Into put the suitcase.
Put it into the suitcase.

Somali combines four “prepositions” with four deictic particles to form its prepositions.

There are four basic “prepositions”:

to
in
from
with

These combine with a four different deictic particles:

toward the speaker
away from the speaker
toward each other
away from each other

Hence you put the “prepositions” and the deictic particles together in various ways. Both tend to go in front of and close to the verb:

Nínkíi bàan cèelka xádhig kagá sóo saaray.
…well-the rope with-from towards-me I-raised.
I pulled the man out of the well with a rope.

Way inoogá warrámi jireen.
They us-to-about news gave.
They used to give us news about it.

Prepositions are the hardest part of the Somali language for the learner.

Somali deals with verbs of motion via deixis in a similar way that Georgian does. One reference point is the speaker and the other is any other entities discussed. Verbs of motion are formed using adverbs. Entities may move:

towards each other wada
away from each other kala
towards the speaker so
away from the speaker si

Hence:

kala durka separate
si gal go in (away from the speaker)
so gal come in (toward the speaker)

All of the difficult sounds of Arabic are also present in Somali, another Semitic language – the alef, the ha, the qaf and the kha. There are long and short vowels. There is a retroflex d, the same sound found in South Indian languages. Somali also has 2 tones – high and low. For some reason, Somali tends to make it onto craziest phonologies lists.

Somali pluralization makes no sense and must be memorized. There are seven different plurals, and there is no clue in the singular that tells you what form to use in the plural. See here:

Republication:

áf (language) -> afaf

Suffixation:

hoóyo (mother) -> hoyoóyin

áabbe -> aabayaal

Note the tone shifts in all three of the plurals above.

There are four cases, absolutive, nominative, genitive and vocative. Despite the presences of absolutive and nominative cases, Somali is not an ergative language. Absolutive case is the basic case of the noun, and nominative is the case given to the noun when a verb follows in the sentence. There are different articles depending on whether the noun was mentioned previously or not (similar to the articles a and the in English). The absolutive and nominative are marked not only on the noun but also on the article that precedes it.

In terms of difficulty, Somali is much harder than Persian and probably about as difficult as Arabic.

Malayalam, a Dravidian language of India, was has been cited as the hardest language to learn by an language foundation, but the citation is obscure and hard to verify.

Malayalam words are often even hard to look up in a Malayalam dictionary.

For instance, adiyAnkaLAkkikkoNDirikkukayumANello is a word in Malayalam. It means something like I, your servant, am sitting and mixing s.t. (which is why I cannot do what you are asking of me). The part in parentheses is an example of the type of sentence where it might be used.

The above word is composed of many different morphemes, including conjunctions and other affixes, with sandhi going on with some of them so they are eroded away from their basic forms. There doesn’t seem to be any way to look that word up or to write a Malayalam dictionary that lists all the possible forms, including forms like the word above. It would probably be way too huge of a book. However, all agglutinative languages are made up of affixes, and if you know the affixes, it is not particularly hard to parse the word apart.

However, Malayalam has the advantage of having many pedagogic materials available for language learning such as audio-visual material and subtitled videos.

Malayalam is rated 5, extremely difficult.

Tamil

Tamil, a Dravidian language is hard, but probably not as difficult as Malayalam is. Tamil has an incredible 247 characters in its alphabet. Nevertheless, most of those are consonant-vowel combinations, so it is almost more of a syllabary than an alphabet. Going by what would traditionally be considered alphabetic symbols, there are probably only 72 real symbols in the alphabet. Nevertheless, Tamil probably has one of the easier Indic scripts as Tamil has fewer characters than other scripts due to its lack of aspiration. Compare to Devanagari’s over 1,000 characters.

But no Indic script is easy. A problem with Tamil is that all of the characters seem to look alike. It is even worse than Devanagari in that regard. However, the more rounded scripts such as Kannada, Sinhala, Telegu and Malayalam have that problem to a worse degree. Tamil has a few sharp corners in the characters that helps to disambiguate them.

In addition, as with other languages, words are written one way and pronounced another. However, there are claims that the difficulty of Tamil’s diglossia is overrated.

Tamil has two different registers for written and spoken speech, but the differences are not large, so this problem is exaggerated. Both Tamil and Malayalam are spoken very fast and have extremely complicated, nearly impenetrable scripts. If Westerners try to speak a Dravidian language in south India, more often than not the Dravidian speaker will simply address them in English rather than try to accommodate them.

Tamil has the odd evidential mood, similar to Bulgarian.

However, on the plus side, the language does seem to be very logical and regular, almost like German in that regard. In addition, there are a lot of language learning materials for Tamil.

Tamil is rated 4, very difficult.

Altaic
Korean

Most agree that Korean is a hard language to learn.

The alphabet, Hangul at least is reasonable; in fact, it is quite elegant. But there are four different Romanizations- Lukoff, Yale, Horne, and McCune-Reischauer – which is preposterous. It’s best to just blow off the Romanizations and dive straight into Hangul. This way you can learn a Romanization later, and you won’t mess up your Hangul with spelling errors, as can occur if you go from Romanization to Hangul.

Hangul can be learned very quickly, but learning to read Korean books and newspapers fast is another matter altogether because you really need to know the hanja or Chinese character that are used in addition to the Hangul. After World War 2, the Koreas decided to officially get rid of their Chinese characters, but in practice this was not successful. With the use of Chinese characters in Korean, you can be a lot more precise in terms what you are trying to communicate.

Bizarrely, there are two different numeral sets used, but one is derived from Chinese so it should be familiar to Chinese, Japanese or Thai speakers who use similar or identical systems.

Korean has a wealth of homonyms, and this is one of the tricky aspects of the language. Any given combination of a couple of characters can have multiple meanings. Japanese has a similar problem with homonyms, but at least with Japanese you have the benefit of kanji to help you tell the homonyms apart. With Korean Hangul, you get no such advantage.

Similarly, there seem to be many ways to say the same thing in Korean. The learner will feel when people are using all of these different ways of saying the same thing that they are actually saying something different each time, but that is not the case.

One problem is that the b, p, j, ch, t and d are pronounced differently than their English counterparts. The consonants, the pachim system and the morphing consonants at the end of the word that slide into the next word make Korean harder to pronounce than any major European language. Korean has a similar problem with Japanese, that is, if you mess up one vowel in sentence, you render it incomprehensible.

The vocabulary is very difficult for an English speaker who does not have knowledge of either Japanese or Chinese. On the other hand, Japanese or Chinese will help you a lot with Korean.

Korean is agglutinative and has a subject-topic discourse structure, and the logic of these systems is difficult for English speakers to understand. In addition, there are hundreds of ways of conjugating any given verb based on tense, mood, age or seniority. Adjectives also decline and take hundreds of different suffixes.

Meanwhile, Korean has an honorific system that is even wackier than that of Japanese. A single sentence can be said in three different ways depending on the relationship between the speaker and the listener. However, the younger generation is not using the honorifics so much, and a foreigner isn’t expected to know the honorific system anyway.

Maybe 60% of the words are based on Chinese words, but unfortunately, much of this Chinese-based vocabulary intersects with Japanese versions of Chinese words in a confusing way.

Speakers of Korean can learn Japanese fairly easily. Korean seems to be a more difficult language to learn than Japanese. There are maybe twice as many particles as in Japanese, the grammar is dramatically more difficult and the verbs are quite a bit harder. The phonemic inventory in Korean is also larger and includes such oddities as double consonants.

Korean is rated by language professors as being one of the hardest languages to learn.

Korean is rated 5, extremely hard.

Japonic

Japanese also uses a symbolic alphabet, but the symbols themselves are sometime undecipherable in that even Japanese speakers will sometimes encounter written Japanese and will say that they don’t know how to pronounce it. I don’t mean that they mispronounce it; that would make sense. I mean they don’t have the slightest clue how to say the word! This problem is essentially nonexistent in a language like English.

The Japanese orthography is one of the most difficult to use of any orthography.

There are over 2,000 frequently used characters in three different symbolic alphabets that are frequently mixed together in confusing ways. Due to the large number of frequently used symbols, it’s said that even Japanese adults learn a new symbol a day a ways into adulthood.

The Japanese writing system is probably crazier than the Chinese writing system and it often makes it onto lists of worst orthographies. The very idea of writing an agglutinative language in a combination of two syllabaries and an ideography seems wacky right off the bat. Japanese borrowed Chinese characters.

But then they gave each character several pronunciations, and in some cases as many as 24. Next they made two syllabaries using another set of characters, then over the next millennia came up with all sorts of contradictory and often senseless rules about when to use the syllabaries and when to use the character set. Later on they added a Romanization to make things even worse.

Chinese uses 5-6,000 characters regularly, while Japanese only uses around 2,000. But in Chinese, each character has only one or maybe two pronunciations. In Japanese, there are complicated rules about when and how to combine the hiragana with the characters. These rules are so hard that many native speakers still have problems with them. There are also personal and place names (proper nouns) which are given completely arbitrary pronunciations often totally at odds with the usual pronunciation of the character.

There are some writers, typically of literature, who deliberately choose to use kanji that even Japanese people cannot read. For instance, Ryuu Murakami uses the odd symbols 擽る、, 轢く、and 憑ける.

The Japanese system is made up of three different systems: the katakana and hiragana (the kana) and the kanji, similar to the hanzi used in Chinese. Chinese has at least 85,000 hanzi. The number of kanji is much less than that, but kanji often have more than one meaning in contrast to hanzi.

After WW2, Japan decided to simplify its language. They both simplified and reduced the number of Chinese characters used, and they unified the written and spoken language, which previously had been different.

Speaking Japanese is not as difficult as everyone says, and many say it’s fairly easy. However, there is a problem similar to English in that one word can be pronounced in multiple ways, like read and read in English.

A common problem is that a perfectly grammatically correct sentence uttered by a Japanese language learner, while perfectly correct, is still not acceptable by Japanese speakers because “we just don’t say it that way.” The Japanese speaker often cannot tell why the unacceptable sentence you uttered is not ok. On the other hand, this problem may be common to more languages than Japanese.

There is also a class of Japanese called “honorifics” or “keigo” that is quite hard to master. Honorifics are meant to show respect and to indicate one’s place or status in the social hierarchy. These typically effect verbs but can also affect particles and prefixes. They are usually formed by archaic or highly irregular verbs. However, there are both regular and irregular honorific forms. Furthermore, there are five different levels of honorifics. Honorifics vary depending on who you are and who you are talking to. In addition, gender comes into play.

Although it is true the Japanese young people are said to not understand the intricacies of keigo, it is still expected that they know how to speak this well. Consequently, many young Japanese will opt out of certain conversations because they feel that their keigo is not very good. Books explaining how to use keigo properly have been big sellers among young people in Japan in recent years as young people try to appear classy, refined or cultured.

In addition, Japanese born overseas (especially in the US), while often learning Japanese pretty well, typically have a very poor understanding of keigo. Instead of embarrassing themselves by not using keigo or using it wrong, these Japanese speakers often prefer to speak in English to Japanese people rather than bother with keigo-less Japanese. Overcorrection in keigo is also a problem when hypercorrection leads to someone making errors in keigo due to “trying to hard.” This looks like phony or insincere politeness and is often worse than not using keigo at all.

One wild thing about Japanese is counting forms. You actually use different numeral sets depending on what it is you are counting! There are dozens of different ways of counting things which involve the use of a complex numerical noun classifier system.

Japanese grammar is often said to be simple, but that does not appear to be the case on closer examination. Particles are especially vexing. Verbs engage in all sorts of wild behavior, and adverbs often act like verbs. Nouns can act like adjectives and adverbs. Meanwhile, honorifics change the behavior of all words. There are particles like ha and ga that have many different meanings. One problem is that all noun modifiers, even phrases, must precede the nouns they are modifying.

It’s often said that Japanese has no case, but this is not true. Actually, there are seven cases in Japanese. The aforementioned ga is a clitic meaning nominative, made is terminative case, -no is genitive and -o is accusative.

In this sentence:

The plane that was supposed to arrive at midnight, but which had been delayed by bad weather, finally arrived at 1 AM.

Everything underlined must precede the noun plane:

Was supposed to arrive at midnight, but had been delayed by bad weather, the plane finally arrived at 1 AM.

One of the main problems with Japanese grammar is that it is going to seem to so different from the sort of grammar and English speaker is likely to be used to.

Speaking Japanese is one thing, but reading and writing it is a whole new ballgame. It’s perfectly possible to know the meaning of every kanji and the meaning of every word in a sentence, but you still can’t figure out the meaning of the sentence because you can’t figure out how the sentence is stuck together in such a way as to create meaning.

The real problem is that the Japanese you learn in class is one thing, and the Japanese of the street is another. One problem is that in street Japanese, the subject is typically not stated in a sentence. Instead it is inferred through such things as honorific terms or the choice of words you used in the sentence. Probably no one goes crazier on negatives than the Japanese. Particularly in academic writing, triple and quadruple negatives are common, and can be quite confusing.

Yet there are problems with the agglutinative nature of Japanese. It’s a completely different syntactic structure than English. Often if you translate a sentence from Japanese to English it will just look like a meaningless jumble of words.

However, Japanese grammar has the advantage of being quite regular. For instance, there are only four frequently used irregular verbs.

Like Chinese, the nouns are not marked for number or gender. However, while Chinese is forgiving of errors, if you mess up one vowel in a Japanese sentence, you may end up with incomprehension.

Although many Japanese learners feel it’s fairly easy to learn, surveys of language professors continue to rate Japanese as one of the hardest languages to learn. A study by the US Navy concluded that the hardest language the corpsmen had to learn in the course of service was Japanese. However, it’s generally agreed that Japanese is easier to learn than Korean. Japanese speakers are able to learn Korean pretty easily.

Japanese is rated 5, extremely hard.

Classical Japanese is much harder to read than Modern Japanese. Though you can get by with much less kanji when reading the modern language, you will need a minimum knowledge of 3,000 kanji for reading Classical Japanese, and that’s using a dictionary. There are only about 500-1,000 frequently used characters, but there are countless other words that will come up in your reading especially say special words used in the Imperial Court. Many words have more than one meaning, and unless you know this, you will be lost. 東宮(とうぐう) for instance means Eastern Palace. However, it also means Crown Prince because his residence was to the east of the Emperor’s.

Turkic
Oghuz
Western Oghuz

Turkish is often considered to be hard to learn, and it’s rated one of the hardest in surveys of language teachers, however, it’s probably easier than its reputation made it out to be. It is agglutinative, so you can have one long word where in English you might have a sentence of shorter words. One word is

Çekoslovakyalilastiramadiklarimizdanmissiniz?
Were you one of those people whom we could not turn into a Czechoslovakian?

Many words have more than one meaning. However, the agglutination is very regular in that each particle of meaning has its own morpheme and falls into an exact place in the word. See here:

gözeye
göz-lükglasses
göz-lük-çüoptician
göz-lük-çü-lükthe business of an optician

Nevertheless, agglutination means that you can always create new words or add new parts to words, and for this reason even a lot of Turkish adults have problems with their language.

There is no verb to be, which is hard for many foreigners. Instead, the concept is wrapped onto the subject of the sentence as a -dim or -im suffix. Turkish is an imagery-heavy language, and if you try to translate straight from a dictionary, it often won’t make sense.

However, the suffixation in Turkish, along with the vowel harmony, are both precise. Nevertheless, many words have irregular vowel harmony. The rules for making plurals are very regular, with no exceptions (the only exceptions are in foreign loans). In Turkish, incredible as it sounds, you can make a plural out of anything, even a word like what, who or blood. However, there is some irregularity in the strengthening of adjectives, and the forms are not predictable and must be memorized.

Turkish is a language of precision in other ways. For instance, there are eight different forms of subjunctive mood that describe various degrees of uncertainty that one has about what one is talking about. This relates to the evidentiality discussed under Tuyuca above, and Turkish has an evidential form similar to Tamil and Bulgarian. On Turkish news, verbs are generally marked with miş, which means that the announcer believes it to be true though he has not seen it firsthand. The particle miş is interesting because this evidential form is coded into the tense system, which is an unusual use of evidentiality.

The Roman alphabet and almost mathematically precise grammar really help out. Turkish lacks gender and has but a single irregular verb – olmak. Nevertheless, there are many verbal forms. However, this is controversial and it depends on how you define grammatical irregularity. There is some strangeness in some of the verb paradigms, but it is argued that these oddities are rule-based. The aorist tense is said to have irregularity.

There is some irregular morphophonology, but not much. The oblique relative clauses have complex morphosyntax. Turkish has two completely different ways of making relative clauses, one of which may have been borrowed from Persian. There are many gerunds for verbs, and these have many different uses. At the end of the day, Turkish grammar is not as regular or as simple as it is made out to be.

Words are pronounced nearly the same as they are written. A suggestion that Turkish may be easier to learn that many think is the research that shows that Turkish children learn attain basic grammatical mastery of Turkish at age 2-3, as compared to 4-5 for German and 12 for Arabic. The research was conducted in Germany in 2005.

In addition, Turkish has a phonetic orthography.

However, Turkish is hard for an English speaker to learn for a variety of reasons. It is agglutinative like Japanese, and all agglutinative languages are difficult for English speakers to learn. As in Japanese, you start your Turkish sentence the way you would end your English sentence. As in the Japanese example above, the subordinate clause must precede the subject, whereas in English, the subordinate clause must follow the subject. The italicized phrase below is a subordinate clause.

In English, we say, “I hopethat he will be on time.”

In Turkish, the sentence would read, “That he will be on timeI hope.”

Turkish vowels are unusual to speakers of IE languages, and Turkish learners say the vowels are hard to make or even tell apart from one another.

Turkish is rated 3.5, harder than average to learn.

Uralic

Finno-Ugric

One test of the difficulty of any language is how much of the grammar you must know in order to express yourself on a basic level. On this basis, Finno-Ugric languages are complicated because you need to know quite a bit more grammar to communicate on a basic level in them than in say, German.

Finnic
Northern

Finnish is very hard to learn, and even long-time learners often still have problems with it. Famous polyglot Barry Farber said it was one of the hardest languages he learned. You have to know exactly which grammatical forms to use where in a sentence. In addition, Finnish has 15 cases in the singular and 16 in the plural. This is hard to learn for speakers coming from a language with little or no case.

For instance,talo – the house

Cases:
talon house's
taloasome of the house
taloksiinto as the house
talossain the house
talostafrom inside the house
talooninto the house
talollaon to the house
taloltafrom beside the house
talolleto the house
taloistafrom the houses
taloissa in the houses

It gets much worse than that. This web page shows that the noun kauppa – shop can have 2,253 forms.

A simple adjective + noun type of noun phrase of two words can be conjugated in up to 100 different ways.

Adjectives and nouns belong to 20 different classes. The rules governing their case declension depend on what class the substantive is in.

Like Turkish, Finnish agglutination is very regular. Each bit of information has its own morpheme and has an exact place in the word.

Like Turkish, Finnish has vowel harmony, but the vowel harmony is very regular like that of Turkish. Unlike Turkish or Hungarian, consonant gradation forms a major part of Finnish morphology. In order to form a sentence in Finnish, you will need to learn about verb types, cases and consonant gradation, and it can take a while to get your mind around those things.

Finnish, oddly enough, always puts the stress on the first syllable. Finnish vowels will be hard to pronounce for most foreigners.

However, Finnish has the advantage of being pronounced precisely as it is written. This is also part of the problem though, because if you don’t say it just right, the meaning changes. So, similarly with Polish, when you mangle their language, you will only achieve incomprehension. Whereas with say English, if a foreigner mangles the language, you can often winnow some sense out of it.

However, despite that fact that written Finnish can be easily pronounced, when learning Finnish, as in Korean, it is as if you must learn two different languages – the written language and the spoken language. A better way to put it is that there is “one language for writing and another for speaking.” You use different forms whether conversing or putting something on paper.

Some pronunciation is difficult. The the contrast between short and long vowels and consonants is particularly troublesome. Check out these minimal pairs:

sydämellä
sydämmellä

jollekin
jollekkin

A problem for the English speaker coming to Finnish would be the vocabulary, which is alien to the speaker of an IE language. Finnish language learners often find themselves looking up over half the words they encounter. Obviously, this slows down reading quite a bit!

In the grammar, the partitive case and potential tense can be difficult. Here is an example of how Finnish verb tenses combine with various cases to form words:

Finnish verbs are very regular. The irregular verbs can almost be counted on one hand:

juosta
käydä
olla
nähdä
tehdä

and a few others. In fact, on the plus side, Finnish in general is very regular.

One easy aspect of Finnish is the way you can build many forms from a base root:

kirj-

kirja – book
kirje – letter
kirjoittaa – to write
kirjailija – writer

As in many Asian languages, there are no masculine or feminine pronouns, and there is no grammatical gender. The numeral system is quite simple compared to other languages. Finnish has a complete lack of consonant clusters. In addition, the phonology is fairly simple.

Finnish is rated 5, extremely hard to learn.

Southern

Estonian has similar difficulties as Finnish, since they are closely related. However, Estonian is more irregular than Finnish. In particular, the very regular agglutination system described in Finnish seems to have gone awry in Estonian. Estonian has 14 cases, including strange cases such as the abessive, adessive, elative and inessive. On the other hand, all of these cases can simply be analyzed as the genitive case plus a single unvarying suffix for each case. In addition, there is no gender, so the only things you have to worry about when forming cases are singular and plural.

Estonian has a strange mood form called the quotative, often translated as “reported speech.”

tema on – he/she/it is

tema olevat – it’s rumored that he/she/it is or he/she/it is said to be

This mood is often used in newspaper reporting and is also used for gossip.

Estonian has an astounding 25 diphthongs. It also has three different varieties of vowel length, which is strange in the world’s languages. There are short, vowels and extra-long vowels and consonants.

lina – linen – short n
linna – the town’s – long n, written as nn
`linna – into the town – extra-long n, not written out!

There are differences in the pronunciation of the three forms above, but in rapid speech, they are hard to hear, though native speakers can make them out. Difficulties are further compounded in that extra-long sonorants (m, n, ng, l, and r) and vowels and are not written out. All in all, phonemic length can be a problem in Estonian, and foreigners never seem to get it completely down.

Estonian pronunciation is not very difficult, though the õ sound can cause problems. However, Estonian has completely lost the vowel harmony system it inherited from Finnish, resulting in words that seem very hard to pronounce.

At least in written form, Estonian is not as complex as Finnish. Estonian can be seen as an abbreviated and modernized form of Finnish. The grammar is also like a simplified version of Finnish grammar and may be much easier to learn.

Estonian is rated 4.5, very to extremely difficult.

Sami
Eastern

Skolt Sami‘s Latinization is often listed as one of the worst Latinizations around. The rest of the language is quite similar to, and as difficult as, Finnish.

Skolt Sami gets a 5 rating, extremely hard to learn.

Ugric
Hungarian

It’s widely agreed that Hungarian is one of the hardest languages on Earth to learn. Even language professors agree. The British Diplomatic Corps did a study of the languages that its diplomats commonly had to learn and concluded that Hungarian was the hardest. Hungarian grammar is maddeningly complex, and Hungarian is often listed on craziest grammar lists. For one thing, there are many different forms for a single word via word modification. This enables the speaker to make his intended meaning very precise. Looking at nouns, there are about 257 different forms per noun.

Hungarian is said to have from 24-35 different cases (there are charts available showing 31 cases), but the actual number may only be 18. Nearly everything in Hungarian is inflected, similar to Lithuanian or Czech. Similar to Georgian and Basque, Hungarian has the polypersonal agreement, albeit to a lesser degree than those two languages. There are many irregularities in inflections, and even Hungarians have to learn how to spell all of these in school and have a hard time learning this.

The case distinctions alone can create many different words out of one base form. For the word house, we end up with 31 different words using case forms:

házba – into the house
házban – in the house
házból – from [within] the house
házra – onto the house
házon – on the house
házról – off [from] the house
házhoz – to the house
házíg – until/up to the house
háznál – at the house
háztól – [away] from the house
házzá – Translative case, where the house is the end product of a transformation, such as They turned the cave into a house.
házként – as the house, which could be used if you acted in your capacity as a house or disguised yourself as one. He dressed up as a house for Halloween.
házért – for the house, specifically things done on its behalf or done to get the house. They spent a lot of time fixing things up (for the house).
házul – Essive-modal case. Something like “house-ly” or in the way/manner of a house. The tent served as a house (in a house-ly fashion).

And we do have some basic cases:

ház – Nominative. The house is down the street.
házat – Accusative. The ball hit the house.
háznak – Dative. The man gave the house to Mary.
házzal – Similar to instrumental, but more similar to English with. Refers to both instruments and companions.

The genitive takes 12 different declensions, depending on person and number:

Only about five of those terms are archaic and seldom used, the rest are in current use. However, to be a fair, a Hungarian native speaker might only recognize half of those words.

In addition, while most languages have names for countries that are pretty easy to figure out, in Hungarian even languages of nations are hard because they have changed the names so much. Italy becomes Olazorszag, Germany becomes Nemetzorsag, etc.

As in Russian and Serbo-Croatian, word order is relatively free in Hungarian. It is not completely free as some say but rather is it governed by a set of rules. The problem is that as you reorder the word order in a sentence, you say the same thing but the meaning changes slightly in terms of nuance. Further, there are quite a few dialects in Hungarian. Native speakers can pretty much understand them, but foreigners often have a lot of problems. Accent is very difficult in Hungarian due to the bewildering number of rules used to determine accent. In addition, there are exceptions to all of these rules. Nevertheless, Hungarian is probably more regular than Polish.

Hungarian spelling is also very strange for non-Hungarians, but at least the orthography is phonetic. Nevertheless, the orthography often makes it onto worst orthographies lists.

Hungarian phonetics is also strange. One of the problems with Hungarian phonetics is vowel harmony. Since you stick morphemes together to make a word, the vowels that you have used in the first part of the word will influence the vowels that you will use to make up the morphemes that occur later in the word. The vowel harmony gives Hungarian a “singing effect” when it is spoken. The ty, ny, sz, zs, dzs, dz, ly, cs and gy sounds are hard for many foreigners to make. The á, é, ó, ö, ő, ú, ü, ű, and í vowel sounds are not found in English.

Elmentegettethetnélek.
I could make others save you occasionally (on a disk).

Verbs change depending on whether the object is definite or indefinite.

Olvasok könyvet.
I read a book. (indefinite object)

Olvasom a könvyet.
I read the book. (definite object)

As noted in the introduction to the Finno-Ugric section, you need to know quite a bit of Hungarian grammar to be able to express yourself on a basic level. For instance, in order to say:

I like your sister.

you will need to understand the following Hungarian forms:

verb conjugation and definite or indefinite forms

possessive suffixes

case

how to combine possessive suffixes with case

word order

explicit pronouns

articles

It’s hard to say, but Hungarian is probably harder to learn than even the hardest Slavic languages like Czech, Serbo-Croatian and Polish. At any rate, it is generally agreed that Hungarian grammar is more complicated than Slavic grammar, which is pretty impressive as Slavic grammar is quite a beast.

Hungarian is rated 5, extremely hard to learn.

Sino-Tibetan
Sinitic
Chinese
Mandarin

It’s fairly easy to learn to speak Mandarin at a basic level, though the tones can be tough. This is because the grammar is very simple – short words, no case, gender, verb inflections or tense. But with Japanese, you can keep learning, and with Chinese, you often tend to hit a wall, often because the syntactic structure is so strangely different from English (isolating).

Actually, the grammar is harder than it seems. At first it seems simple, like a simplified English. No word is capable of declension, and there is no tense, case, and number, nor are there articles. But the simplicity makes it difficult. No tense means there is no easy way to mark time in a sentence. Furthermore, tense is not as easy as it seems. Sure, there are no verb conjugations, but instead you must learn some particles and special word orders that are used to mark tense. Mandarin has 12 different adverbs for which there is no good English translation.

Once you start digging into Chinese, there is a complex layer under all the surface simplicity. There is such things as aspect, serial verbs, a complex classifier system, syntax marked by something called topic-prominence, a strange form called the detrimental passive, preposed relative clauses, use of verbs rather than adverbs to mark direction, and all sorts of strange stuff. Verb complements can be baffling, especially potential and directional complements. The 把, 是 and 的 constructions can be very hard to understand.

The topic-prominence is interesting in that only a few major languages have topic-comment syntax, and most of those are Oriental languages with a lot of Chinese borrowing. Topicalization is not marked morphologically.

There are sentences where the entire meaning changes with the addition of a single character. Chinese sentences are SVO (Subject -Verb – Object) at their base, but that is a bit of an illusion. A sentence that causes you to discuss time duration makes you repeat the verb after the direct object – SVOVT (T= time phrase). In the case of topicalization, sentences can have the structure of OSV (Object – Subject – Verb). Relative clauses and all subordinate clauses come before the noun they modify. In other words:

English: The man who always wore red walked into the room.
Chinese: Who always wore red the man walked into the room.

The relative clause in the sentences above is marked in bold.

In Chinese, the prepositional phrase comes between the subject and the verb:

English: The man hit the ball into the yard.
Chinese: The man into the yard hit the ball.

The prepositional phrase is bolded in the sentences above.

In Chinese, adjectives are actually stative verbs as in Nahuatl and Lakota.

那个热的菜很好吃。 Nàgè rède cài hěnhǎochī. The it is hot food is good to eat. The hot food is delicious.

The 的 symbol turns food hot into food it is hot, an attributive verb. 的 means something like to be.

There are dozens of words called particles which shade the meaning of a sentence ever so slightly.

Chinese phonology is not as easy as some say. There are way too many instances of the zh, ch, sh, j, q, and x sounds in the language such that many of the words seem to sound the same. There is a distinction between aspirated and nonaspirated consonants. There is also the presence of odd retroflex consonants.

Chinese orthography is probably the most hardest orthography of any language. The alphabet uses symbols, so it’s not even a real alphabet. There are at least 85,000 symbols and actually many more, but you only need to know about 3-5,000 of them, and many Chinese don’t even know 1,000. To be highly proficient in Chinese, you need to know 10,000 characters, and probably less than 5% of Chinese know that many.

In addition, the characters have not been changed in 3,000 years, and the alphabet is at least somewhat phonetic, so we run into a serious problem of lack of a spelling reform.

The Communists tried to simplify the system (simplified Mandarin) but instead of making the connections between the phonetic aspects of character more sensible by decreasing their number and increasing their regularity (they did do this somewhat but not enough), they simply decreased the number of strokes needed for each symbol typically without dealing with the phonetic aspect of all. The simplification did not work well, so now you have a mixture of two different types of written Chinese – simplified and traditional.

In addition to all of this, Chinese borrowed a lot from the Japanese symbolic alphabet a full 1,000 years after it had already been developed and had not undergone a spelling reform, adding insult to injury.

Even leaving the characters aside, the stylistic and literary constraints required to write Chinese in an eloquent or formal (literary) manner would make your head swim. And just because you can read Chinese does not mean that you can read Classical Chinese prose. It’s as if it’s written in a different language – actually, it is technically a different language similar to Middle English or Old English. However, few Middle English or Old English texts are read anymore, and Classical Chinese is still widely read.

However, the orthography is at least consistent. 90% of characters have only one reading. Once you learn the character, you generally know the meaning in any context.

Writing the characters is even harder than reading them. One wrong dot or wrong line either completely changes the meaning or turns the symbol into nonsense.

It’s a real problem when you encounter a symbol you don’t know because there is no way to sound out the word. You are really and truly lost and screwed. There is a clue at the right side of the symbol, but it is not always accurate.You need to learn quite a bit of vocabulary just to speak simple sentences.

Similarly, a dictionary is not necessarily helpful when trying to read Chinese. You can have a Chinese sentence in front of you along with a dictionary, and the sentence still might not make sense even after looking it up in the dictionary.

Some Chinese Muslims write Chinese using an Arabic script. This is often considered to be one of the worst orthographies of all.

The tones are often quite difficult for a Westerner to pick up. If you mess up the tones, you have said a completely different word. Often foreigners who know their tones well nevertheless do not say them correctly, and hence, they say one word when they mean another. However, compared to other tone systems around the world, the tonal system in Chinese is comparatively easy.

A major problem with Chinese is homonyms. To some extent, this is true in many tonal languages. Since Chinese uses short words and is disyllabic, there is a limited repertoire of sounds that can be used. At a certain point, all of the sounds are used up, and you are into the realm of homophones.

Tonal distinctions are one way that monosyllabic and disyllabic languages attempt to deal with the homophone problem, but it’s not good enough, since Chinese still has many homophones, and meaning is often discerned by context, stress, rhythm and intonation. Chinese, like French and English, is heavily idiomatic.

It’s little known, but Chinese also uses different forms (classifiers) to count different things, like Japanese.

There is zero common vocabulary between English and Chinese, so you need to learn a whole new set of lexical forms.

In addition, nouns often show relatedness or hierarchy. For instance, in English, you can simply say my brother or my sister, but in Chinese, you cannot do this. You have to indicate whether you are speaking of an older or younger sibling.

On the positive side, Chinese grammar is fairly regular and word derivation, compound words are sensible and the meaning can be determined by looking at the word. In other languages, compound words are not necessarily so obvious.

Many agree that Chinese is the hardest to learn of all of the major languages. A recent survey of language professors rated Chinese as the hardest language on Earth to learn.

Mandarin gets a 5.5 rating for nearly hardest of all.

However, Cantonese is even harder to learn than Mandarin. Cantonese has eight tones to Mandarin’s four, and in addition, they continue to use a lot of the older traditional Chinese characters that were superseded when China moved to a simplified script in 1949. Furthermore, since non-Mandarin characters are not standardized, Cantonese cannot be written down as it is spoken.

In addition, Cantonese has verbal aspect, possibly up to 20 different varieties. Modal particles are difficult in Cantonese. Clusters of up to the 3 sentence final particles are very common. 我食咗飯 and 我食咗飯架啦喎 are both grammatical for I have had a meal, but the particles add the meaning of I have already had a meal, answering a question or even to imply I have had a meal, so I don’t need to eat anymore.

Cantonese gets a 5.5 rating, nearly hardest of all.

Min Nan is also said to be harder to learn than Mandarin, as it has a more complex tone system, with five tones on three different levels. Even many Taiwanese natives don’t seem to get it right these days, as it is falling out of favor, and many fewer children are being raised speaking it than before.

Min Nan gets a 5.5 rating, nearly hardest of all.

A recent 15 year survey out of Fudan University utilizing both the departments of Linguistics and Anthropology looked at 579 different languages in 91 linguistic families in order to try to find the most complicated language in the world. The result was that a Wu language dialect (or perhaps a separate language) in the Fengxian district of southern Shanghai (Dônđän Wu) was the most phonologically complex language of all, with 20 separate vowels (Wang 2012). The nearest competitor was Norwegian with 16 vowels.

Dônđän Wu gets a 5.5 rating, nearly hardest of all.

Classical Chinese is still read by many Chinese people and Chinese language learners. Unless you have a very good grasp on modern Chinese, classical Chinese will be completely wasted on you. Classical Chinese is much harder to read than reading modern Chinese.

Classical Chinese covers an era extending over 3,000 years, and to attain a reading fluency in this language, you need to be familiar with all of the characters used during this period along with all of the literature of the period so you can understand all the allusions. Even with a knowledge of Classical Chinese, you need to read it in context. If you are good at Classical Chinese and someone throws you a random section of it, it will take you a good amount of time to figure it out unless you know context.

The language is much more to the point than Modern Chinese, but this is not as good as it sounds. This simplicity leaves a room for ambiguity, and context plays an important role. A joke about some obscure historical or literary anecdote will be lost you unless you know what it refers to. For reading modern Chinese, you will need at least 5,000 characters, but even then, you will still need a dictionary. With Classical Chinese, there are no lower limits on the number of characters you need to know. The sky is the limit.

Classical Chinese gets a 6 rating, hardest of all.

Tibeto-Burman
Qiangic
Northern
Qiang

In Quiang, a language of Sichuan Province in China, not only are there rhotic vowels, which are present in only 1% of the world’s languages, but there is also rhoticity harmony, where a non-rhotic vowel in a morpheme becomes rhotic when it is followed by a morpheme with a rhotic vowel.

ʀuɑ + kʰe˞ > ʀuɑ˞kʰe˞
me + we˞ ˞> me˞we˞

Rhotic vowels are found in US English – Unstressed ɚ: standard, dinner, Lincolnshire, editor, measure, martyr.

Qiang also has a very bad romanization, so bad that the Qiang will not even use it. Voiced consonants are written by adding a vowel to the symbol for the voiceless consonant. It has long and short vowels, but these are not represented in the system.

Qiang gets a 5 rating, extremely hard to learn.

Western Tibeto-Burman
Bodish
Central Bodish
Central

Tibetan probably has one of the least rational orthographies of any language. The orthography has not changed in ~1,000 years while the language has gone through all sorts of changes. A langauge learner in Tibet can get by using phonetic spelling. The problem comes when you try to spell using the Classical Alphabet. For instance:

Srong rtsan Sgam po (written)soŋtsɛn ɡampo (spoken)

bsgrubs (written)

d`up (spoken)

While the orthography is etymological and completely outdated, it is quite predictable.

Tibetan gets a 5 rating, extremely hard to learn.

Southern

Dzongka, the official language of Bhutan, has some pretty wild phonology, in addition to having the Tibetan writing system, this time using Bhutanese forms of the Tibetan script.

It contrasts all of the following: s, sʰ, ʰs, ʰsʰ, ts, ʰts, tsʰ, z, ʱz, dz, ʱdz, ⁿsʰ, ᵐtsʰ, ⁿtsʰ, ⁿdz, ᵖts, ᵖtsʰ, ᵖtsʷʰ, and ᶲs, and in addition it has four tones, but there is no single word that is distinguished by tone only. On top of that, there are 22 different vowels.

Dzongka gets a 5 rating, extremely hard to learn.

Austroasiatic
Mon-Khmer
Vietic

Vietnamese is also hard to learn because to an outsider, the tones seem hard to tell apart. Therefore, foreigners often make themselves difficult to understand by not getting the tone precisely correct. It also has “creaky-voiced” tones, which are very hard for foreigners to get a grasp on.

Vietnamese grammar is fairly simple, and reading Vietnamese is pretty easy once you figure out the tone marks. Words are short as in Chinese. However, the simple grammar is relative, as you can have 25 or more forms just for I, the 1st person singular pronoun. In addition, the Latin orthography is said to be quite bad. It was invented by missionaries a few centuries ago, and it has never made much sense.

Vietnamese gets 5 rating, extremely hard to learn.

Mon-Khmer
Khmer

Khmer has a reputation for being hard to learn. I understand that it has one of the most complex honorifics systems of any language on Earth. Over a dozen different words mean to carry depending on what one is carrying. There are several different words for slave depending on who owned the slave and what the slave did. There are 28-30 different vowels, including sets of long and short vowels and long and short diphthongs. The vowel system is so complicated that there isn’t even agreement on exactly what it looks like. Khmer learners, especially speakers of IE languages, often have a hard time producing or even distinguishing these vowels.

Speaking it is not so bad, but reading and writing it is pretty difficult. For instance, you can put up to five different symbols together in one complex symbol. The orthographic script is even worse than the Thai one. There are actually rules to this mess, but no one seems to know who they are.

Khmer gets a 4.5 rating, very to extremely hard.

Bahnaric
North Bahnaric
West
Sedang-Todrah
Sedang

Sedang, a language of Vietnam, has the highest number of vowel sounds of any language on Earth, at 55 distinct vowel sounds.

Sedang gets a 5 rating, extremely hard to learn.

Hmong-Mien
Hmongic
Chuanqiandian

Hmong is widely spoken in this part of California, but it’s not easy to learn. There are eight tones, and they are not easy to figure out. It’s not obviously related to any other major language but the obscure Mien.

It has some very strange consonants called voiceless nasals. We have them in English as allophones – the m in small is voiceless, but in Hmong, they put them at the front of words – the m in the word Hmong is voiceless. These can be very hard to pronounce.

The romanization is widely criticized for being a lousy one, but the Hmong use it anyway.

Hmong gets a 5 rating, extremely hard to learn.

Austro-Tai
Austronesian
Tsouic

Tsou is a Taiwanese aborigine language spoken by about 2,000 people in Taiwan. It has the odd feature whereby the underlying glides y and w turn into or surface as non-syllabic mid vowels e̯ and o̯ in certain contexts:

jo~joskɨ -> e̯oˈe̯oskɨ -= fishes

Tsou is also ergative like most Formosan languages. Tsou is the only language in the world that has no prepositions or anything that looks like a preposition. Instead it uses nouns and verbs in the place of prepositions. Tsou allows more potential consonant clusters than most other languages. About 1/2 of all possible CC clusters are allowed.

Tsou has an inclusive/exclusive distinction in the 1st person plural and a very strange visible and non-visible distinction in the 3rd person singular and plural. Both adjectives and adverbs can turn into verbs and are marked for voice in the same way that verbs are. Verbs are extensively marked for voice. Nouns are marked for a variety of odd cases, often referring to perception, (visible/invisible) person, and place deixis.

‘e – visible and near speakersi/ta – visible and near hearerta – visible but away from speaker‘o/to – invisible and far away, or newly introduced to discoursena/no ~ ne – non-identifiable and non-referential (often when scanning a class of elements)

Tsou gets a 5 rating, extremely hard to learn.

Malayo-Polynesian
Malayo-Chamic
Malayic
Malay

Bahasa Indonesia is an easy language to learn. For one thing, the grammar is dead simple. There are only a handful of prefixes, only two of which might be seen as inflectional. There are also several suffixes. Verbs are not marked for tense at all. And the sound system of these languages, in common with Austronesian in general, is one of the simplest on Earth, with only two dozen phonemes. Bahasa Indonesia has few homonyms, homophones, homographs, or heteronyms. Words in general have only one meaning.

Though the orthography is not completely phonetic, it only has a small number of nonphonetic exceptions. The orthography is one of the easiest on Earth to use.

The system for converting words into either nouns or verbs is regular. To make a plural, you simply repeat a word, so instead of saying pencils, you say pencil pencil.

Bahasa Indonesia gets a 1.5 rating, extremely easy to learn.

Malay is only easy if you learn the standard spoken form or one of the creoles. Learning the literary language is quite a bit more difficult. However, the Jawi script, which is Malay written in Arabic script, is often considered to be perfectly awful.

Malay get a 2 rating for moderately easy.

Philippine
Greater Central Philippine
Central Philippine
Tagalog

However, Tagalog is much harder than Malay or Indonesian. Compared to many European languages, Tagalog syntax, morphology and semantics are often quite different. Also, Tagalog is typically spoken very fast. Unlike Malay, verbs conjugate quite a bit in Tagalog. The main idea of Tagalog grammar is something called focus. Once you figure that out, the language gets pretty easy, but until you understand that concept, you are going to have a hard time.

Maori and other Polynesian languages have a reputation for being quite easy to learn. The main problem for English speakers is that the sentence structure is backwards compared to English. In addition, macrons can cause problems.

One problem with Maori is dialects. The dialects are so diverse that this means that there are multiple words for the same thing. Swiss German has a similar issue, with up to 50 words for each common household item (nearly every major dialect has its own word for common objects):

On the plus side, the pronunciation is simple, and there is no gender. The language is as regular as Japanese. No Polynesian language has more than 16 sounds, and they all lack tones. They all have five vowels, which can be either long or short. A consonant must be followed by a vowel, so there are no consonant clusters. All consonants are easy to pronounce.

Maori gets a 3 rating, average difficulty.

Marquesic

Hawaiian is a pretty easy language to learn. It is easy to pronounce, has a simple alphabet, lacks complex morphology and has a fairly simple syntax.

Hawaiian gets a 2 rating, very easy to learn.

North and Central Vanuatu
East Santo
North

Sakao is a very strange langauge spoken by 4,000 people in Vanuatu. It is very strange. It is a polysynthetic Austronesian language, which is very weird. It allows extreme consonant clusters. Sakao has an incredible seven degrees of deixis. The language has an amazing four persons: singular, dual, paucal and plural. The neighboring language Tomoko has singular, dual, trial and plural. The trial form is very odd. Sakao’s paucal derived from Tomato’s trial:

jørðœl
they, from three to ten

jørðœl løn
the five of them (Literally, they three, five)

All nouns are always in the singular except for kinship forms and demonstratives, which only display the plural:

ðjœɣ – my mother/aunt -> rðjœɣ – my aunts

walðyɣ – my child -> raalðyɣ – my children

It has a number of nouns that are said to be “inalienably possessed”, that is, whenever they occur, they must be possessed by some possessor. These often take highly irregular inflections:

Here, mouth is either œsɨŋœ-, ɔsɨŋɔ- or œsœŋ-, and hair is either uly-, ulœ- or nøl-

Sakao, strangely enough, may not even have syllables in the way that we normally think of them. If it does have syllables at all, they would appear to be at least a vowel optionally surrounded by any number of consonants.

Kwaio is an Austronesian language spoken in the Solomon Islands. It has four different forms of number to mark pronouns – not only the usual singular and plural, but also the rarer dual and the very rare paucal. In addition, there is an inclusive/exclusive contrast in the non-singular forms.

1 paucal inclusive (you, I and a few others)
1 paucal exclusive (I and a few others)

1 plural inclusive (I, you and many others)
1 plural exclusive (I and many others)

Pretty wild!

Kwaio gets a 5 rating, extremely hard to learn.

Greater Barito
East Barito
Malagasy

Malagasy, the official language of Madagascar, has a reputation for being even easier to learn than Indonesian or Malay.

Malagasy gets a 1 rating, easiest of all to learn.

Tai-Kadai
Kam-Tai
Tai
Southwestern

Thai is a pretty hard language to learn. There are 75 symbols in the strange script, there are no spaces between words in the script, and vowels can come before, after, above or below consonants in any given syllable. There seem to be many different glyphs for every consonant, but the different glyphs for the same consonant will sometimes change the sound of the neighboring vowel. The orthography is as insensible as that of English since centuries have gone by with no spelling reforms, in fact, Thai has not changed its system in 1000 years. The wild card of having tone thrown in adds to the insanity.

Consonant pronunciations vary depending on the location of the syllable in the word – for instance, s can change to t. There are many vowels which are spoken but not written. There are many consonants that are pronounced the same – for instance, there are six different t‘s, not counting the s‘s that turn into t‘s. The Thai script is definitely one of the most difficult phonetic scripts. Nevertheless, the Thai script is easier to learn than the Japanese or Chinese character sets. In spite of all of that, the syntax is simple, like Chinese.

There are five tones, including a neutral tone. Tones are determined by a variety of complex things, including a combination of tone marks, the class of consonants, if the syllable ends in a sonorant or a stop and what the tone of the preceding syllable was. Tone marking in the orthography is quite complex.

The vowels are different than in many languages, and there are some unusual diphthongs: eua, euai, aui and uu. There is a contrast between aspirated and unaspirated consonants.

There is a system of noun classifiers for counting various things, similar to Japanese. In addition, common to many Asian languages, there is a complicated honorifics system.

On the plus side, Thai is a regular language, with few exceptions to the rules. However, the rules are quite complex. The syntax is about as complex as that of Chinese, and the grammar is dead simple.

Thai gets a 5 rating, hardest of all to learn.

Lao is very similar to Thai, in fact it is identical to a Thai language spoken by 16 million people in northeast Thailand called Northeastern Thai. The Lao script is similar to Thai, but it has fewer letters so there is somewhat less confusion.

Lao gets a 4.5 rating, very to extremely hard to learn.

Kam-Sui

The Kam languages of the Dong people in southwest China were rated by the Fudan University study referenced above under Wu as the 2nd most phonologically complex on Earth (Wang 2012). There are 32 stem initial consonants, including oddities like tɕ, tɕʰ, pʲ, pʲʰ, ɕ, kʷ, kʷʰ, ŋʷ, tʃʰ, tsʰ. Note the many contrasts between aspirated and unaspirated voiceless consonants, including bilabial palatalized stops, labialized velar stops, and alveolar affricates. There are an incredible 64 different syllable finals, and 14 others that occur only in Chinese loans.

There are an astounding 15 different tones, nine in open syllables and six in checked syllables (entering tones). Main tones are high, high rising, high falling, low, low rising, low falling, mid, dipping and peaking. When they speak, it sounds as if they are singing.

Kam gets a 5 rating, extremely hard to learn.

Kra
Paha

According to the Fudan University study quoted above, Buyang in the 3rd most phonologically complex language in the world. Buyang is a cluster of 4 related languages spoken by 1,900 people in Yunnan Province, China. Buyang has a completely wild consonant inventory.

It has a full set of both voiced and voiceless plain and aspirated stops, including voiceless uvulars. The contrast between aspirated and plain voiced stops is peculiar. The stop series also has distinctions between palatalized and rounded stops throughout the series. It has a labialized voiceless palatal fricative and a voiceless dental aspirated lateral, unusual sounds. It has four different voiceless aspirated nasals. It has voiceless y and w, more odd sounds. It also has plain and labialized palatal glides.

That is one heck of a wild phonology.

Buyang gets a 5 rating, extremely hard to learn.

Niger-Kordofanian
Niger-Congo
Atlantic–Congo
Kwa
Nyo
Ga-Dangme

The African Bantu language Ga has a bad reputation for being a tough nut to crack. It is spoken in Ghana by about 600,000 people. It has two tones and engages in a strange behavior called tone terracing that is common to many West African languages. There is a phonemic distinction between three different types of vowel length. All vowels have 3 different lengths – short, long and extra long. It also has many sounds that are not in any Western languages.

Ga gets a 5 rating, extremely hard to learn.

Potou-Tano
Tano
Central Bia
Northern

Anyi is a language spoken by 610,000 people in Côte d’Ivoire. It is relatively straightforward as far as African languages go. Probably the hardest part about the language is that it is tonal, and it does have two tones. The phonology does have the unusual +-ATR contrast which will seem very odd. ATR stands for advanced tongue root, so the language has a contrast between vowels with an advanced tongue root and without one. However, the grammar is pretty regular. There are few confusing phonological processes.

Anyi has a simple tense system, with only present, past and future. There is no aspect, mood or voice marking, and it lacks the noun class systems so common in many African languages. It has a plural marker, but it is often optional.

The syntax does have serial verbs, which will seem odd to Westerners. It distinguishes between relative clauses marked with bɔ and subordinate clauses marked with kɛ.

S
Nguni

Xhosa, a language of South Africa, is quite difficult, with up to nine click sounds. Clicks only exist in one language outside of Africa – the Australian language Damin – and are extremely difficult to learn. Even native speakers mess up the clicks sometimes. Nelson Mandela said he had problems making some of the click sounds in Xhosa. The phonemics in general of Xhosa are pretty wild.

Xhosa gets a 5 rating, extremely hard to learn.

Zulu and Ndebele also have these impossible click sounds. However, outside of click sounds, the phonology of Nguni languages is straightforward. All Nguni languages are agglutinative. These languages also make plurals by changing the prefix of the noun, and the manner varies according the noun class. If you want to look up a word in the dictionary, first of all you need to discard the prefix. For instance, in Ndebele,

river – umfula
rivers – imifula, but

stone – ilitshe
stones – amatshe, yet

tree – isihlahla
trees – izihlahla

Ndebele gets a 5 rating, hardest of all.

Zulu has pitch accent, tones and clicks. There are nine different pitch accents, four tones and three clicks, but each click can be pronounced in five different ways. However, tones are not marked in writing, so it’s hard to figure out when to use them. Zulu also has depressor consonants, which lower the tone in the vowel in the following syllable. In addition, Zulu has multiple gender – 15 different genders. And some nouns behave like verbs. It also has 12 different noun classes, but 90% of words are part of a group of only three of those classes.

Zulu gets a 5 rating, extremely hard to learn.

G
Swahili

For unknown reasons, Swahili is generally considered to be an easy language to learn. The US military ranks it 1, with the easiest of all languages to learn. This seems to be the typical perception. Why Swahili is so easy to learn, I am not sure. It’s a trade language, and trade languages are often fairly easy to learn. There’s also a lot of controversy about whether or not Swahili can be considered a creole, but that has not been proven. For the moment, the reasons why Swahili is so easy to learn will have to remain mysterious.

On the down side, Swahili has many noun classes, but they have the benefit of being more or less logical.

Swahili gets a 2 rating, moderately easy.

Khoisan
Southern Africa
Southern
Hua

!Xóõ (Taa), spoken by only 4,200 Bushmen in Botswana and Namibia, is a notoriously difficult Khoisan language replete with the notoriously impossible to comprehend click sounds. Taa has anywhere from 130 to 164 consonants, the largest phonemic inventory of any language. Of this vast wealth of sounds, there are anywhere from 30-64 different click sounds. There are five basic clicks and 17 accompanying ones. Speakers develop a lump on their larynx from making the click sounds.

In addition, there are four types of vowels: plain, pharyngealized, breathy-voiced and strident. On top of that, there are four tones. Taa appears on many lists of the wildest phonologies and craziest languages period on Earth.

Taa gets a 5 rating, extremely hard to learn.

Northern

Ju|’hoan, a Khoisan language spoken by 5,000 people in Botswana, has one of the wildest phonological inventories on Earth. The voiced aspirated consonants – sb͡pʰ, d͡tʰ , d͡tsʰ , d͡tʃʰ , ɡ͡kʰ , and ᶢǃʰ – are particularly odd. Some question whether these segments actually exist and say that they are instead spoken with a “breathy-voice.” However, voiced aspirated consonants do appear to be real. In addition, Ju|’hoan has a closed class of only 17 adjectives since descriptive functions are done by verbs. They are the following:

female
male
other (those remaining)other (strange)true
old
new
a certain
each
all
some
the numbers one through four

Ju|’hoan scored very high on a study of the weirdest languages on Earth.

Ju|’hoan gets a 5 rating, extremely hard to learn.

Eskimo-Aleut
Eskimo
Inuit-Inupiaq

Inuktitut is extremely hard to learn. Inuktitut is polysynthetic-agglutinative, and roots can take many suffixes, in some cases up to 700. Verbs have 63 forms of the present indicative, and conjugation involves 252 different inflections. Inuktitut has the complicated polypersonal agreement system discussed under Georgian above and Basque below. In a typical long Inuktitut text, 92% of words will occur only once. This is quite different from English and many other languages where certain words occur very frequently or at least frequently. Certain fully inflected verbs can be analyzed both as verbs and as nouns. Words can be very long.

Inuktituusuungutsialaarungnanngittuaraaluuvunga.
I truly don’t know how to speak Inuktitut very well.

You may need to analyze up to 10 different bits of information in order to figure out a single word. However, the affixation is all via suffixes (there are no prefixes or infixes) and the suffixation is extremely regular.

Inuktitut is also rated one by linguists one of the hardest languages on Earth to pronounce. Inuktitut may be as hard to learn as Navajo.

Inuktitut is rated 6, hardest of all.

Kalaallisut (Western Greenlandic) is very closely related to Inuktitut. Look at this sentence:

Aliikusersuillammassuaanerartassagaluarpaalli…
However, they will say that he is a great entertainer, but …

That word is composed of 12 separate morphemes. A single word can conceptualize what could be an entire sentence in a non-polysynthetic language.

Kalaallisut is rated 6, hardest of all.

Chukotko-Kamchatkan
Northern
Chukot

Chukchi is a polysynthetic, agglutinating and incorporating language and is often listed as one of the hardest languages on Earth to learn.

Təmeyŋəlevtpəγtərkən.
I have a fierce headache.

There are five morphemes in that word, and there are three lexical morphemes (nouns or adjectives) incorporated in that word: meyŋ – great, levt – head, and pəγt – ache.

Chukchi gets a 6 rating, hardest of all.

Basque

Basque, of course, is just a wild language altogether. There is an old saying that the Devil tried to learn Basque, but after seven years, he only learned how to say Hello and Goodbye. Many Basques, including some of the most ardent Basque nationalists, tried to learn Basque as adults. Some of them succeeded, but a very large number of them failed. Based on the number that failed, it does seem that Basque is harder for an adult to learn as an L2 than many other languages are. Basque grammar is maddeningly complex and it often makes it onto craziest grammars and craziest language lists.

There are 11 cases, and each one takes four different forms. The verbs are quite complex. This is because it is an ergative language, so verbs vary according to the number of subjects and the number of objects and if any third person is involved.

This is the same polypersonal agreement system that Georgian has above. Basque’s polypersonal system is a polysynthetic system consisting of two verb types – synthetic and analytical. Only a few verbs use the synthetic form.

Three of Basque’s cases – the absolutive (intransitive verb case), the ergative (intransitive verb case) and the dative – can be marked via affixes to the verb. In Basque, only present simple and past simple synthetic tenses take polypersonal affixes.

The analytical forms are composed of more than one word, while the synthetic forms are all one word. The analytic verbs are built via the synthetic verbs izan – be, ukan – have and egin – do.

Synthetic:

d-akar-ki-o-gu = We bring it to him/her. The verb is ekarri – bring.z-erama-zki-gu-te-n = They took them to us. The verb is eraman – take

Analytic:

Ekarriko d-i-o-gu = We’ll bring it to him/her. Literally: We will have-bring it to him/her. The analytic verb is built from ukan – have.

Eraman d-ieza-zki-gu-ke-te = They can take them to us. Literally: They can be taking them to us. The analytic verb is built from izan – be.

Most of the analytic verbs require an auxiliary which carries all sorts of information that is often carried on verbs in other languages – tense, mood, sometimes gender and person for subject, object and indirect object.

Jaten naiz.
Eat I-am-doing.
I am eating.

Jaten nintekeen.
Eat I-was-able-to.
I could eat.

Eman geniezazkiake.
Give we-might-have-them-to-you-male.
We might have given them to you.

In the above, naiz,nintekeen and geniezazkiake are auxiliaries. There are actually 2,640 different forms of these auxiliaries!

A language with ergative morphosyntax in Europe is quite a strange thing, and Basque is the only one of its kind. The ergative itself is quite unusual:

The noun gizon takes a different form whether it is the subject of a transitive or intransitive verb. The first sentence is in absolutive case (unmarked) while the second sentence is in the ergative case (marked by the morpheme -k). If you come from a non-ergative IE language, the concept of ergativity itself is difficult enough to conceptualize, much less trying to actually learn an ergative language. Consequently, any ergative language will automatically be more difficult than a non-ergative one for all speakers of IE languages.

If you don’t grow up speaking Basque, it’s hard to attain native speaker competence. It’s quite a bit easier to write in Basque than to speak it.

Nevertheless, Basque verbs are quite regular. There are only a few irregularities in conjugations and they have phonetic explanations. In fact, the entire language is quite regular. In addition, most words above the intermediate level are borrowings from large languages, so once you reach intermediate Basque, the rest is not that hard. In addition, pronunciation is straightforward.