Text Simplification for Scientific Information Access

Liana Ermakova¹⁴,
Patrice Bellot¹⁵,
Pavel Braslavski¹⁶,
Jaap Kamps¹⁷,
Josiane Mothe¹⁸,
Diana Nurbakova¹⁹,
Irina Ovchinnikova²⁰ &
…
Eric San-Juan²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12657))

Included in the following conference series:

European Conference on Information Retrieval

2689 Accesses
4 Citations

Abstract

Modern information access systems hold the promise to give users direct access to key information from authoritative primary sources such as scientific literature, but non-experts tend to avoid these sources due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of these barriers, thereby avoiding that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. The CLEF 2021 SimpleText track will address the opportunities and challenges of text simplification approaches to improve scientific information access head-on. We aim to provide appropriate data and benchmarks, starting with pilot tasks in 2021, and create a community of NLP and IR researchers working together to resolve one of the greatest challenges of today.

Everything should be made as simple as possible, but no simpler

Albert Einstein

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 111.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 139.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Overview of SimpleText 2021 - CLEF Workshop on Text Simplification for Scientific Information Access

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

CLEF 2023 SimpleText Track

Notes

References

“2019-nCoV” OR ... Publication Year: 2020 in Publications - Dimensions. https://covid-19.dimensions.ai/
Background knowledge. https://www.thefreedictionary.com/background+knowledge
Altbach, P.G., Wit, H.D.: Too much academic research is being published, July 2018. https://www.universityworldnews.com/post.php?story=20180905095203579
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)
Barzilay, R., Elhadad, N., McKeown, K.R.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 17, 35–55 (2002)
Google Scholar
Bellot, P., et al.: Overview of INEX. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization - 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, 23–26 September 2013. Proceedings, pp. 269–281 (2013)
Google Scholar
Bellot, P., Moriceau, V., Mothe, J., SanJuan, E., Tannier, X.: INEX tweet contextualization task: evaluation, results and lesson learned. Inf. Process. Manage. 52(5), 801–819 (2016). https://doi.org/10.1016/j.ipm.2016.03.002
Chae, J., Nenkova, A.: Predicting the fluency of text with shallow structural features: case studies of machine translation and human-written text. In: Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 139–147 (2009)
Google Scholar
Chall, J.S., Dale, E.: Readability revisited: The new Dale-Chall readability. Brookline Books, Cambridge (1995)
Google Scholar
Collins-Thompson, K., Callan, J.: A language modeling approach to predicting reading difficulty. In: Proceedings of HLT/NAACL, vol. 4 (2004)
Google Scholar
Coster, W., Kauchak, D.: Simple English Wikipedia: a new text simplification task. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 665–669 (2011)
Google Scholar
Cram, D., Daille, B.: Terminology extraction with term variant detection. In: Proceedings of ACL-2016 System Demonstrations, Berlin, Germany, pp. 13–18. Association for Computational Linguistics, August 2016. https://doi.org/10.18653/v1/P16-4003, https://www.aclweb.org/anthology/P16-4003
Ermakova, L., Cossu, J.V., Mothe, J.: A survey on evaluation of summarization methods. Inf. Process. Manage. 56(5), 1794–1814 (2019). https://doi.org/10.1016/j.ipm.2019.04.001, http://www.sciencedirect.com/science/article/pii/S0306457318306241
Ermakova, L., Goeuriot, L., Mothe, J., Mulhem, P., Nie, J.-Y., SanJuan, E.: CLEF 2017 microblog cultural contextualization lab overview. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 304–314. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_27
Chapter Google Scholar
Ermakova, L., Mothe, J., Firsov, A.: A metric for sentence ordering assessment based on topic-comment structure (short paper). In: ACM SIGIR Special Interest Group on Information Retrieval (SIGIR), Tokyo, Japan, 07/08/2017-11/08/2017 (2017). selection rate 30
Google Scholar
Fecher, B., Friesike, S.: Open science: one term, five schools of thought. In: Bartling, S., Friesike, S. (eds.) Opening Science, pp. 17–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-00026-8_2
Chapter Google Scholar
Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. COLING 2010, Stroudsburg, PA, USA, pp. 276–284, Association for Computational Linguistics (2010). http://dl.acm.org/citation.cfm?id=1944566.1944598
Filippova, K., Altun, Y.: Overcoming the lack of parallel data in sentence compression. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1481–1491 (2013)
Google Scholar
Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), p221–233 (1948)
Article Google Scholar
Fry, E.: A readability formula for short passages. J. Read. 8(594–597), 33 (1990)
Google Scholar
Fry, E.: The Varied Uses of Readability Measurement, April 1986
Google Scholar
Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications. EANL 2008, Stroudsburg, PA, USA, pp. 71–79. Association for Computational Linguistics (2008). http://dl.acm.org/citation.cfm?id=1631836.1631845
Hoffart, J., et al.: Robust disambiguation of named entities in text. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 782–792 (2011)
Google Scholar
Jarreau, P.B., Porter, L.: Science in the social media age: profiles of science blog readers. J. Mass Commun. Quart. 95(1), 142–168 (2018). https://doi.org/10.1177/1077699016685558, publisher: SAGE Publications Inc
Ladyman, J., Lambert, J., Wiesner, K.: What is a complex system? European J. Philos. Sci. 3(1), 33–67 (2013). https://doi.org/10.1007/s13194-012-0056-8
Leroy, G., Endicott, J.E., Kauchak, D., Mouradi, O., Just, M.: User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. J. Medical Internet Res. 15(7), e144 (2013)
Google Scholar
McCarthy, P.M., Guess, R.H., McNamara, D.S.: The components of paraphrase evaluations. Behav. Res. Methods 41(3), 682–690 (2009). https://doi.org/10.3758/BRM.41.3.682
Molek-Kozakowska, K.: Communicating environmental science beyond academia: Stylistic patterns of newsworthiness in popular science journalism. Disc. Commun. 11(1), 69–88 (2017). https://doi.org/10.1177/1750481316683294
Mutton, A., Dras, M., Wan, S., Dale, R.: Gleu: automatic evaluation of sentence-level fluency. In: ACL 2007, pp. 344–351 (2007)
Google Scholar
O’Reilly, T., Wang, Z., Sabatini, J.: How much knowledge is too little? When a lack of knowledge becomes a barrier to comprehension. Psychol. Sci. (2019). https://doi.org/10.1177/0956797619862276, https://journals.sagepub.com/doi/10.1177/0956797619862276, publisher: SAGE PublicationsSage CA: Los Angeles, CA
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assessment. Comput. Speech Lang. 23(1), 89–106 (2009). https://doi.org/10.1016/j.csl.2008.04.003, http://dx.doi.org/10.1016/j.csl.2008.04.003
Pitler, E., Nenkova, A.: Revisiting readability: A unified framework for predicting text quality (2008)
Google Scholar
Rao, S., Tetreault, J.: Dear sir or madam, may i introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 129–140 (2018)
Google Scholar
Shi, H., Revithis, S., Chen, S.S.: An agent enabling personalized learning in e-learning environments. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2. AAMAS 2002, New York, NY, USA, pp. 847–848. Association for Computing Machinery, July 2002. https://doi.org/10.1145/544862.544941
Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 574–576, New York, NY, USA. ACM (2001). https://doi.org/10.1145/502585.502695, http://doi.acm.org/10.1145/502585.502695
Stenner, A.J., Horablin, I., Smith, D.R., Smith, M.: The Lexile Framework. Metametrics, Durham, NC (1988)
Google Scholar
Stenner, A., Horabin, I., Smith, D.R., Smith, M.: The Lexile Framework. MetaMetrics, Durham, NC (1988)
Google Scholar
Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 685–696 (2018)
Google Scholar
Søe, S.O.: Algorithmic detection of misinformation and disinformation: Gricean perspectives. J. Doc. 74(2), 309–332 (2018). https://doi.org/10.1108/JD-05-2017-0075, publisher: Emerald Publishing Limited
Tavernier, J., Bellot, P.: Combining relevance and readability for INEX 2011 question-answering track, pp. 185–195 (2011)
Google Scholar
Wan, S., Dale, R., Dras, M.: Searching for grammaticality: propagating dependencies in the Viterbi algorithm. In: Proceedings of the Tenth European Workshop on Natural Language Generation (2005)
Google Scholar
Wubben, S., van den Bosch, A., Krahmer, E.: Sentence simplification by monolingual machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1015–1024 (2012)
Google Scholar
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)
Article Google Scholar
Zakaluk, B.L., Samuels, S.J.: Readability: its past, present, and future. International Reading Association, 800 Barksdale Rd (1988). https://eric.ed.gov/?id=ED292058
Zwarts, S., Dras, M.: Choosing the right translation: a syntactically informed classification approach. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 1153–1160 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Université de Bretagne Occidentale, HCTI - EA 4249, Brest, France
Liana Ermakova
Aix Marseille Univ, Universite de Toulon, CNRS, LIS, Marseille, France
Patrice Bellot
Ural Federal University, Yekaterinburg, Russia
Pavel Braslavski
University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Université de Toulouse, IRIT, Toulouse, France
Josiane Mothe
Institut National des Sciences Appliquées de Lyon, Lyon, France
Diana Nurbakova
Sechenov University, Moscow, Russia
Irina Ovchinnikova
Avignon Université, LIA, Avignon, France
Eric San-Juan

Authors

Liana Ermakova
View author publications
You can also search for this author in PubMed Google Scholar
Patrice Bellot
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Braslavski
View author publications
You can also search for this author in PubMed Google Scholar
Jaap Kamps
View author publications
You can also search for this author in PubMed Google Scholar
Josiane Mothe
View author publications
You can also search for this author in PubMed Google Scholar
Diana Nurbakova
View author publications
You can also search for this author in PubMed Google Scholar
Irina Ovchinnikova
View author publications
You can also search for this author in PubMed Google Scholar
Eric San-Juan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liana Ermakova .

Editor information

Editors and Affiliations

Radboud University Nijmegen, Nijmegen, The Netherlands
Djoerd Hiemstra
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Toulouse, Toulouse Institute of Computer Science Research, Toulouse, France
Josiane Mothe
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy
Raffaele Perego
Leipzig University, Leipzig, Germany
Martin Potthast
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy
Fabrizio Sebastiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ermakova, L. et al. (2021). Text Simplification for Scientific Information Access. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12657. Springer, Cham. https://doi.org/10.1007/978-3-030-72240-1_68

Download citation

DOI: https://doi.org/10.1007/978-3-030-72240-1_68
Published: 30 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72239-5
Online ISBN: 978-3-030-72240-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Text Simplification for Scientific Information Access

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Overview of SimpleText 2021 - CLEF Workshop on Text Simplification for Scientific Information Access

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

CLEF 2023 SimpleText Track

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Text Simplification for Scientific Information Access

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Overview of SimpleText 2021 - CLEF Workshop on Text Simplification for Scientific Information Access

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

CLEF 2023 SimpleText Track

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation