Translating Classical Japanese Literature

A combination of researchers from the Japanese government, academia, research institutes, and Google have published three datasets of Japanese script to preserve Japanese cultural knowledge. The datasets contain nearly 500,000 images of characters from the classical Japanese cursive script Kuzushiji, which most Japanese natives cannot read because the writing style is no longer a part of the official school curriculum. The researchers classified the images by their 4,000 modern equivalent characters. Millions of classical Japanese books use Kuzushiji characters, and this dataset could promote the development of machine learning algorithms that can translate Kuzushiji to the modern Japanese writing system.

Michael McLaughlin is a research assistant at the Center for Data Innovation. He researches and writes about a variety of issues related to information technology and Internet policy, including digital platforms, e-government, and artificial intelligence. Michael graduated from Wake Forest University, where he majored in Communication with Minors in Politics and International Affairs and Journalism. He received his Master’s in Communication at Stanford University, specializing in Data Journalism.