Measuring Readability of Polish Texts: Baseline Experiments

Bartosz Broda , Maciej Ogrodniczuk , Bartłomiej Nitoń , Włodzimierz Gruszczyński


Measuring readability of a text is the first sensible step to its simplification. In this paper we present an overview of the most common approaches to automatic measuring of readability. Of the described ones, we implemented and evaluated: Gunning FOG index, Flesch-based Pisarek method. We also present two other approaches. The first one is based on measuring distributional lexical similarity of a target text and comparing it to reference texts. In the second one, we propose a novel method for automation of Taylor test ― which, in its base form, requires performing a large amount of surveys. The automation of Taylor test is performed using a technique called statistical language modelling. We have developed a free on-line web-based system and constructed plugins for the most common text editors, namely Microsoft Word and Inner workings of the system are described in detail. Finally, extensive evaluations are performed for Polish ― a Slavic, highly inflected language. We show that Pisarek’s method is highly correlated to Gunning FOG Index, even if different in form, and that both the similarity-based approach and automated Taylor test achieve high accuracy. Merits of using either of them are discussed.
Author Bartosz Broda
Bartosz Broda,,
, Maciej Ogrodniczuk
Maciej Ogrodniczuk,,
, Bartłomiej Nitoń
Bartłomiej Nitoń,,
, Włodzimierz Gruszczyński (Wydział Nauk Humanistycznych i Społecznych)
Włodzimierz Gruszczyński,,
- Wydział Nauk Humanistycznych i Społecznych
Publication size in sheets0.5
Book Calzolari Nicoletta (eds.): Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014, European Language Resources Association (ELRA), ISBN 978-2-9517408-8-4, 4162 p.
Languageen angielski
427-Paper.pdf of 14-09-2015
200.02 KB
Score (nominal)5
Citation count*25 (2019-12-03)
Additional fields
Dorobek Naukowy - Preview URL
Dorobek Naukowy - Approve URL
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?