Bilingual Web Page and Site Readability Assessment

Readability assessment is an instrument to measure the difficulty of a piece of text material, and it is widely used in educational field to assist instructors to prepare appropriate materials for students. In this paper, we investigate the applications of readability assessment in Web development, such that users can retrieve information which is appropriate to their levels. We propose a bilingual (English and Chinese) assessment scheme for Web page and Web site readability based on textual features, and conduct a series of experiments with realWeb data to evaluate our scheme. Experimental results show that, apart from just indicating the readability level, the estimated score acts as a good heuristic to figure out pages with low content-values. Furthermore, we can obtain the overall content distribution in a Web site by studying the variation of its readability.