Who Can Understand “Dunno”? Automatic Assessment of Text Complexity in Children’s Literature

2021. № 5, 55-68

Boris L. Iomdin,

Vinogradov Russian Language Institute (Russian Academy of Sciences),

(Russia, Moscow)



Dmitry A. Morozov,

Novosibirsk State University

(Russia, Novosibirsk)



The need to assess the readability of a given text may arise in different situations: drafting legal texts or manuals, writing textbooks, selecting literature for extracurricular reading. Especially interesting is the assessment of readability of educational texts for children, since such texts are expected to satisfy multiple requirements that may contradict each other. Children should understand these texts well, the texts should be relevant and interesting, and at the same time they should teach readers new concepts, words and constructions. Currently, age marking of texts for children is carried out by experts manually, which makes the process long and laborious, with the results likely to be subjective. We propose a method for automatic classification of texts with regard to complexity using a neural network model. This method is supposed to be used to create a corpus of children's literature with target age markup (within the framework of the Russian National Corpus). The quality of the predictions of our model reaches 0.92. The emergence of an automatic mechanism that estimates the readability level of a given text with acceptable accuracy will make it possible to quickly create a representative corpus of texts written for children, with the possibility of selecting texts that are obviously understandable to children of a given age. Such a corpus would be in demand by teachers, parents, translators of fiction, linguists, and everyone who intends to select fiction texts that are understandable to children.

For citation:

Iomdin B. L., Morozov D. A. Who Can Understand “Dunno”? Auto matic Assessment of Text Complexity in Children’s Literature. Russian Speech = Russkaya Rech’. 2021. No. 5. Pp. 55–68. DOI: 10.31857/S013161170017239-1.


This research is supported by the grant from the RFBR No. 19-29-14224.