漢字プロジェクト - The Kanji Project

This webpage presents information about a research that was conducted on both, Twitter and YouTube, by analyzing tweets from Japanese idols and comments from Japanese YouTube channels. Its purpose was to retrieve not only the most common Japanese Kanji characters but also the most common Japanese words used in social media.

What are Japanese idols?

Japanese idol is a term often used to refer to those persons that want to pursue a career into music, modeling, acting, etc. but lack prior experience in the entertainment industry.
Some Japanese talent agencies decide to provide these persons an opportunity to reach their goals by working with them in the people's career of choice.

Why analyzing tweets from Japanese idols?

Japanese idols tend to heavily use social media, especially Twitter, to let their fans, and their potential fans, know about their daily activities.

Advantages of the research

Japanese idols use Twitter intensively on a daily basis

Japanese idols not only tweet about their professions but also about their personal lives in general

Limitations of the research

The tweets can only be retrieved from the Japanese idols you follow

Cannot extract tweets from Japanese people based on location since not all people list their location

Can only retrieve tweets from public accounts, not protected accounts

Can only download YouTube comments from public videos, not private videos

There are two other factors that need to be taken into account. In social media, Japanese people tend to type a word using any writing system; for example:

The word "Beautiful" sometimes is written as:

綺麗 (Using Kanji)

きれい (Using Hiragana)

キレイ (Using Katakana)

Also, sometimes the words are "misspelled." I refer to those words as "misspelled" because it seems to be done rather on purpose than by mistake; for example:

ありがとう sometimes is written as ありがと

おはよう sometimes is written as おはよ or simply おは

It is important to point out that those Japanese Kanji characters that are exclusively used for Japanese names and the characters that are used to abbreviate words, were not included as part of the analysis.

Statistics

This project was run using a PC with 16 GB of RAM

Python's version 3.4.3 was used for this research

3,452 Japanese idol profiles and 100 Japanese YouTube channels were used for the purpose of this research

75,467,261 tweets and 25,168,139 YouTube comments were analyzed

It took 145 days to process 75 million tweets and 60 days to process 25 million YouTube comments

Download the complete list of the 2,136 Japanese Kanji characters and 1,746 Japanese words that are used the most in social media along with their pronunciations, meanings, examples, and sentences:

The following list only represents the first 58 Kanji characters that were extracted from those tweets and YouTube comments, it is not the complete list.

These were the first 58 Japanese words:

On the other hand, the sentences that are provided for each Japanese Kanji character, were modified using the following guideline:

Replaced いいえ for いや

Replaced ええ for はい

Replaced the first が (when used as a particle) for は

Removed あなたは and 私は when they were the first words in the sentence

Added ？ after か when asking a question

Highlights

The following list is a comparison between the 2,136 Kanji characters that are taught in Japanese schools and the first 2,136 Kanji characters that are used the most in social media.
The characters in red represent those Kanji from social media.

The following list is a comparison between the 2,211 Kanji characters that are taught in order to take the JLPT and the first 2,136 Kanji characters that are used the most in social media.
The characters in red represent those Kanji from social media.

Download the complete list of the 2,136 Japanese Kanji characters and 1,746 Japanese words that are used the most in social media along with their pronunciations, meanings, examples, and sentences: