Abstract
Modern information access systems hold the promise to give users direct access to key information from authoritative primary sources such as scientific literature, but non-experts tend to avoid these sources due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of these barriers, thereby avoiding that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. The CLEF 2021 SimpleText track will address the opportunities and challenges of text simplification approaches to improve scientific information access head-on. We aim to provide appropriate data and benchmarks, starting with pilot tasks in 2021, and create a community of NLP and IR researchers working together to resolve one of the greatest challenges of today.
Everything should be made as simple as possible, but no simpler
Albert Einstein
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
“2019-nCoV” OR ... Publication Year: 2020 in Publications - Dimensions. https://covid-19.dimensions.ai/
Background knowledge. https://www.thefreedictionary.com/background+knowledge
Altbach, P.G., Wit, H.D.: Too much academic research is being published, July 2018. https://www.universityworldnews.com/post.php?story=20180905095203579
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)
Barzilay, R., Elhadad, N., McKeown, K.R.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 17, 35–55 (2002)
Bellot, P., et al.: Overview of INEX. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization - 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, 23–26 September 2013. Proceedings, pp. 269–281 (2013)
Bellot, P., Moriceau, V., Mothe, J., SanJuan, E., Tannier, X.: INEX tweet contextualization task: evaluation, results and lesson learned. Inf. Process. Manage. 52(5), 801–819 (2016). https://doi.org/10.1016/j.ipm.2016.03.002
Chae, J., Nenkova, A.: Predicting the fluency of text with shallow structural features: case studies of machine translation and human-written text. In: Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 139–147 (2009)
Chall, J.S., Dale, E.: Readability revisited: The new Dale-Chall readability. Brookline Books, Cambridge (1995)
Collins-Thompson, K., Callan, J.: A language modeling approach to predicting reading difficulty. In: Proceedings of HLT/NAACL, vol. 4 (2004)
Coster, W., Kauchak, D.: Simple English Wikipedia: a new text simplification task. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 665–669 (2011)
Cram, D., Daille, B.: Terminology extraction with term variant detection. In: Proceedings of ACL-2016 System Demonstrations, Berlin, Germany, pp. 13–18. Association for Computational Linguistics, August 2016. https://doi.org/10.18653/v1/P16-4003, https://www.aclweb.org/anthology/P16-4003
Ermakova, L., Cossu, J.V., Mothe, J.: A survey on evaluation of summarization methods. Inf. Process. Manage. 56(5), 1794–1814 (2019). https://doi.org/10.1016/j.ipm.2019.04.001, http://www.sciencedirect.com/science/article/pii/S0306457318306241
Ermakova, L., Goeuriot, L., Mothe, J., Mulhem, P., Nie, J.-Y., SanJuan, E.: CLEF 2017 microblog cultural contextualization lab overview. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 304–314. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_27
Ermakova, L., Mothe, J., Firsov, A.: A metric for sentence ordering assessment based on topic-comment structure (short paper). In: ACM SIGIR Special Interest Group on Information Retrieval (SIGIR), Tokyo, Japan, 07/08/2017-11/08/2017 (2017). selection rate 30
Fecher, B., Friesike, S.: Open science: one term, five schools of thought. In: Bartling, S., Friesike, S. (eds.) Opening Science, pp. 17–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-00026-8_2
Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. COLING 2010, Stroudsburg, PA, USA, pp. 276–284, Association for Computational Linguistics (2010). http://dl.acm.org/citation.cfm?id=1944566.1944598
Filippova, K., Altun, Y.: Overcoming the lack of parallel data in sentence compression. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1481–1491 (2013)
Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), p221–233 (1948)
Fry, E.: A readability formula for short passages. J. Read. 8(594–597), 33 (1990)
Fry, E.: The Varied Uses of Readability Measurement, April 1986
Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications. EANL 2008, Stroudsburg, PA, USA, pp. 71–79. Association for Computational Linguistics (2008). http://dl.acm.org/citation.cfm?id=1631836.1631845
Hoffart, J., et al.: Robust disambiguation of named entities in text. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 782–792 (2011)
Jarreau, P.B., Porter, L.: Science in the social media age: profiles of science blog readers. J. Mass Commun. Quart. 95(1), 142–168 (2018). https://doi.org/10.1177/1077699016685558, publisher: SAGE Publications Inc
Ladyman, J., Lambert, J., Wiesner, K.: What is a complex system? European J. Philos. Sci. 3(1), 33–67 (2013). https://doi.org/10.1007/s13194-012-0056-8
Leroy, G., Endicott, J.E., Kauchak, D., Mouradi, O., Just, M.: User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. J. Medical Internet Res. 15(7), e144 (2013)
McCarthy, P.M., Guess, R.H., McNamara, D.S.: The components of paraphrase evaluations. Behav. Res. Methods 41(3), 682–690 (2009). https://doi.org/10.3758/BRM.41.3.682
Molek-Kozakowska, K.: Communicating environmental science beyond academia: Stylistic patterns of newsworthiness in popular science journalism. Disc. Commun. 11(1), 69–88 (2017). https://doi.org/10.1177/1750481316683294
Mutton, A., Dras, M., Wan, S., Dale, R.: Gleu: automatic evaluation of sentence-level fluency. In: ACL 2007, pp. 344–351 (2007)
O’Reilly, T., Wang, Z., Sabatini, J.: How much knowledge is too little? When a lack of knowledge becomes a barrier to comprehension. Psychol. Sci. (2019). https://doi.org/10.1177/0956797619862276, https://journals.sagepub.com/doi/10.1177/0956797619862276, publisher: SAGE PublicationsSage CA: Los Angeles, CA
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assessment. Comput. Speech Lang. 23(1), 89–106 (2009). https://doi.org/10.1016/j.csl.2008.04.003, http://dx.doi.org/10.1016/j.csl.2008.04.003
Pitler, E., Nenkova, A.: Revisiting readability: A unified framework for predicting text quality (2008)
Rao, S., Tetreault, J.: Dear sir or madam, may i introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 129–140 (2018)
Shi, H., Revithis, S., Chen, S.S.: An agent enabling personalized learning in e-learning environments. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2. AAMAS 2002, New York, NY, USA, pp. 847–848. Association for Computing Machinery, July 2002. https://doi.org/10.1145/544862.544941
Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 574–576, New York, NY, USA. ACM (2001). https://doi.org/10.1145/502585.502695, http://doi.acm.org/10.1145/502585.502695
Stenner, A.J., Horablin, I., Smith, D.R., Smith, M.: The Lexile Framework. Metametrics, Durham, NC (1988)
Stenner, A., Horabin, I., Smith, D.R., Smith, M.: The Lexile Framework. MetaMetrics, Durham, NC (1988)
Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 685–696 (2018)
Søe, S.O.: Algorithmic detection of misinformation and disinformation: Gricean perspectives. J. Doc. 74(2), 309–332 (2018). https://doi.org/10.1108/JD-05-2017-0075, publisher: Emerald Publishing Limited
Tavernier, J., Bellot, P.: Combining relevance and readability for INEX 2011 question-answering track, pp. 185–195 (2011)
Wan, S., Dale, R., Dras, M.: Searching for grammaticality: propagating dependencies in the Viterbi algorithm. In: Proceedings of the Tenth European Workshop on Natural Language Generation (2005)
Wubben, S., van den Bosch, A., Krahmer, E.: Sentence simplification by monolingual machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1015–1024 (2012)
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)
Zakaluk, B.L., Samuels, S.J.: Readability: its past, present, and future. International Reading Association, 800 Barksdale Rd (1988). https://eric.ed.gov/?id=ED292058
Zwarts, S., Dras, M.: Choosing the right translation: a syntactically informed classification approach. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 1153–1160 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ermakova, L. et al. (2021). Text Simplification for Scientific Information Access. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12657. Springer, Cham. https://doi.org/10.1007/978-3-030-72240-1_68
Download citation
DOI: https://doi.org/10.1007/978-3-030-72240-1_68
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72239-5
Online ISBN: 978-3-030-72240-1
eBook Packages: Computer ScienceComputer Science (R0)