Automatic readability assessment

January 2010

Author:
Lijun Feng
City University of New York
,
Adviser:
Matt Huenerfauth
City University of New York

Publisher:

City University of New York
New York, NY
United States

ISBN:978-1-124-28999-1

Order Number:AAI3426751

Pages:

204

Purchase on ProQuest

Bibliometrics

Abstract

We describe the development of an automatic tool to assess the readability of text documents. Our readability assessment tool predicts elementary school grade levels of texts with high accuracy. The tool is developed using supervised machine learning techniques on text corpora annotated with grade levels and other indicators of reading difficulty. Various independent variables or features are extracted from texts and used for automatic classification. We systematically explore different feature inventories and evaluate the grade-level prediction of the resulting classifiers. Our evaluation comprises well-known features at various linguistic levels from the existing literature, such as those based on language modeling, part-of-speech, syntactic parse trees, and shallow text properties, including classic readability formulas like the Flesch-Kincaid Grade Level formula. We focus in particular on discourse features, including three novel feature sets based on the density of entities, lexical chains, and coreferential inference, as well as features derived from entity grids. We evaluate and compare these different feature sets in terms of accuracy and mean squared error by cross-validation. Generalization to different corpora or domains is assessed in two ways. First, using two corpora of texts and their manually simplified versions, we evaluate how well our readability assessment tool can discriminate between original and simplified texts. Second, we measure the correlation between grade levels predicted by our tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities. The applications of this work include selection of reading material tailored to varying proficiency levels, ranking of documents by reading difficulty, and automatic document summarization and text simplification.

Cited By

Vajjala S and Meurers D On improving the accuracy of readability classification using insights from second language acquisition Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, (163-173)

Contributors

Matt Huenerfauth
Rochester Institute of Technology
- Publication Years2004 - 2024
- Publication counts95
- Citation count1,460
- Available for Download81
- Downloads (cumulative)58,857
- Downloads (12 months)14,885
- Downloads (6 weeks)1,943
- Average Downloads per Article727
- Average Citation per Article15
View Full Profile
Lijun Feng
The City University of New York
- Publication Years2009 - 2010
- Publication counts6
- Citation count82
- Available for Download5
- Downloads (cumulative)3,312
- Downloads (12 months)294
- Downloads (6 weeks)29
- Average Downloads per Article662
- Average Citation per Article14
View Full Profile

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Is cross‐lingual readability assessment possible?

Most research efforts related to automatic readability assessment focus on the design of strategies that apply to a specific language. These state‐of‐the‐art strategies are highly dependent on linguistic features that best suit the language for which they ...
Semi-automatic Construction of Sight Words Dictionary for Filipino Text Readability
Knowledge Management and Acquisition for Intelligent Systems
Abstract
Readability formulas consider word familiarity as one of the factors for predicting the readability of children’s books. Word familiarity is dependent on the frequency in which the words are encountered in daily reading. Often referred to as “...
Automatic readability assessment for people with intellectual disabilities

My research goal is to advance our understanding of, and quantify, what makes a text easy or difficult to read, in particular for readers with intellectual disabilities. Previous research in automatic readability assessment has looked at a limited class ...

Browse Theses

Sections

Cited By

Is cross‐lingual readability assessment possible?

Semi-automatic Construction of Sight Words Dictionary for Filipino Text Readability