[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Text Simplification for Scientific Information Access

CLEF 2021 SimpleText Workshop

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12657))

Included in the following conference series:

Abstract

Modern information access systems hold the promise to give users direct access to key information from authoritative primary sources such as scientific literature, but non-experts tend to avoid these sources due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of these barriers, thereby avoiding that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. The CLEF 2021 SimpleText track will address the opportunities and challenges of text simplification approaches to improve scientific information access head-on. We aim to provide appropriate data and benchmarks, starting with pilot tasks in 2021, and create a community of NLP and IR researchers working together to resolve one of the greatest challenges of today.

Everything should be made as simple as possible, but no simpler

Albert Einstein

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 111.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 139.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://tac.nist.gov/2014/BiomedSumm/.

  2. 2.

    https://ornlcda.github.io/SDProc/sharedtasks.html#laysumm.

  3. 3.

    http://www.ifs.tuwien.ac.at/~clef-ip/tasks.shtml.

  4. 4.

    http://trec-news.org/guidelines-2020.pdf.

References

  1. “2019-nCoV” OR ... Publication Year: 2020 in Publications - Dimensions. https://covid-19.dimensions.ai/

  2. Background knowledge. https://www.thefreedictionary.com/background+knowledge

  3. Altbach, P.G., Wit, H.D.: Too much academic research is being published, July 2018. https://www.universityworldnews.com/post.php?story=20180905095203579

  4. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)

  5. Barzilay, R., Elhadad, N., McKeown, K.R.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 17, 35–55 (2002)

    Google Scholar 

  6. Bellot, P., et al.: Overview of INEX. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization - 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, 23–26 September 2013. Proceedings, pp. 269–281 (2013)

    Google Scholar 

  7. Bellot, P., Moriceau, V., Mothe, J., SanJuan, E., Tannier, X.: INEX tweet contextualization task: evaluation, results and lesson learned. Inf. Process. Manage. 52(5), 801–819 (2016). https://doi.org/10.1016/j.ipm.2016.03.002

  8. Chae, J., Nenkova, A.: Predicting the fluency of text with shallow structural features: case studies of machine translation and human-written text. In: Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 139–147 (2009)

    Google Scholar 

  9. Chall, J.S., Dale, E.: Readability revisited: The new Dale-Chall readability. Brookline Books, Cambridge (1995)

    Google Scholar 

  10. Collins-Thompson, K., Callan, J.: A language modeling approach to predicting reading difficulty. In: Proceedings of HLT/NAACL, vol. 4 (2004)

    Google Scholar 

  11. Coster, W., Kauchak, D.: Simple English Wikipedia: a new text simplification task. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 665–669 (2011)

    Google Scholar 

  12. Cram, D., Daille, B.: Terminology extraction with term variant detection. In: Proceedings of ACL-2016 System Demonstrations, Berlin, Germany, pp. 13–18. Association for Computational Linguistics, August 2016. https://doi.org/10.18653/v1/P16-4003, https://www.aclweb.org/anthology/P16-4003

  13. Ermakova, L., Cossu, J.V., Mothe, J.: A survey on evaluation of summarization methods. Inf. Process. Manage. 56(5), 1794–1814 (2019). https://doi.org/10.1016/j.ipm.2019.04.001, http://www.sciencedirect.com/science/article/pii/S0306457318306241

  14. Ermakova, L., Goeuriot, L., Mothe, J., Mulhem, P., Nie, J.-Y., SanJuan, E.: CLEF 2017 microblog cultural contextualization lab overview. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 304–314. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_27

    Chapter  Google Scholar 

  15. Ermakova, L., Mothe, J., Firsov, A.: A metric for sentence ordering assessment based on topic-comment structure (short paper). In: ACM SIGIR Special Interest Group on Information Retrieval (SIGIR), Tokyo, Japan, 07/08/2017-11/08/2017 (2017). selection rate 30

    Google Scholar 

  16. Fecher, B., Friesike, S.: Open science: one term, five schools of thought. In: Bartling, S., Friesike, S. (eds.) Opening Science, pp. 17–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-00026-8_2

    Chapter  Google Scholar 

  17. Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. COLING 2010, Stroudsburg, PA, USA, pp. 276–284, Association for Computational Linguistics (2010). http://dl.acm.org/citation.cfm?id=1944566.1944598

  18. Filippova, K., Altun, Y.: Overcoming the lack of parallel data in sentence compression. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1481–1491 (2013)

    Google Scholar 

  19. Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), p221–233 (1948)

    Article  Google Scholar 

  20. Fry, E.: A readability formula for short passages. J. Read. 8(594–597), 33 (1990)

    Google Scholar 

  21. Fry, E.: The Varied Uses of Readability Measurement, April 1986

    Google Scholar 

  22. Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications. EANL 2008, Stroudsburg, PA, USA, pp. 71–79. Association for Computational Linguistics (2008). http://dl.acm.org/citation.cfm?id=1631836.1631845

  23. Hoffart, J., et al.: Robust disambiguation of named entities in text. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 782–792 (2011)

    Google Scholar 

  24. Jarreau, P.B., Porter, L.: Science in the social media age: profiles of science blog readers. J. Mass Commun. Quart. 95(1), 142–168 (2018). https://doi.org/10.1177/1077699016685558, publisher: SAGE Publications Inc

  25. Ladyman, J., Lambert, J., Wiesner, K.: What is a complex system? European J. Philos. Sci. 3(1), 33–67 (2013). https://doi.org/10.1007/s13194-012-0056-8

  26. Leroy, G., Endicott, J.E., Kauchak, D., Mouradi, O., Just, M.: User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. J. Medical Internet Res. 15(7), e144 (2013)

    Google Scholar 

  27. McCarthy, P.M., Guess, R.H., McNamara, D.S.: The components of paraphrase evaluations. Behav. Res. Methods 41(3), 682–690 (2009). https://doi.org/10.3758/BRM.41.3.682

  28. Molek-Kozakowska, K.: Communicating environmental science beyond academia: Stylistic patterns of newsworthiness in popular science journalism. Disc. Commun. 11(1), 69–88 (2017). https://doi.org/10.1177/1750481316683294

  29. Mutton, A., Dras, M., Wan, S., Dale, R.: Gleu: automatic evaluation of sentence-level fluency. In: ACL 2007, pp. 344–351 (2007)

    Google Scholar 

  30. O’Reilly, T., Wang, Z., Sabatini, J.: How much knowledge is too little? When a lack of knowledge becomes a barrier to comprehension. Psychol. Sci. (2019). https://doi.org/10.1177/0956797619862276, https://journals.sagepub.com/doi/10.1177/0956797619862276, publisher: SAGE PublicationsSage CA: Los Angeles, CA

  31. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  32. Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assessment. Comput. Speech Lang. 23(1), 89–106 (2009). https://doi.org/10.1016/j.csl.2008.04.003, http://dx.doi.org/10.1016/j.csl.2008.04.003

  33. Pitler, E., Nenkova, A.: Revisiting readability: A unified framework for predicting text quality (2008)

    Google Scholar 

  34. Rao, S., Tetreault, J.: Dear sir or madam, may i introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 129–140 (2018)

    Google Scholar 

  35. Shi, H., Revithis, S., Chen, S.S.: An agent enabling personalized learning in e-learning environments. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2. AAMAS 2002, New York, NY, USA, pp. 847–848. Association for Computing Machinery, July 2002. https://doi.org/10.1145/544862.544941

  36. Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 574–576, New York, NY, USA. ACM (2001). https://doi.org/10.1145/502585.502695, http://doi.acm.org/10.1145/502585.502695

  37. Stenner, A.J., Horablin, I., Smith, D.R., Smith, M.: The Lexile Framework. Metametrics, Durham, NC (1988)

    Google Scholar 

  38. Stenner, A., Horabin, I., Smith, D.R., Smith, M.: The Lexile Framework. MetaMetrics, Durham, NC (1988)

    Google Scholar 

  39. Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 685–696 (2018)

    Google Scholar 

  40. Søe, S.O.: Algorithmic detection of misinformation and disinformation: Gricean perspectives. J. Doc. 74(2), 309–332 (2018). https://doi.org/10.1108/JD-05-2017-0075, publisher: Emerald Publishing Limited

  41. Tavernier, J., Bellot, P.: Combining relevance and readability for INEX 2011 question-answering track, pp. 185–195 (2011)

    Google Scholar 

  42. Wan, S., Dale, R., Dras, M.: Searching for grammaticality: propagating dependencies in the Viterbi algorithm. In: Proceedings of the Tenth European Workshop on Natural Language Generation (2005)

    Google Scholar 

  43. Wubben, S., van den Bosch, A., Krahmer, E.: Sentence simplification by monolingual machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1015–1024 (2012)

    Google Scholar 

  44. Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)

    Article  Google Scholar 

  45. Zakaluk, B.L., Samuels, S.J.: Readability: its past, present, and future. International Reading Association, 800 Barksdale Rd (1988). https://eric.ed.gov/?id=ED292058

  46. Zwarts, S., Dras, M.: Choosing the right translation: a syntactically informed classification approach. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 1153–1160 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liana Ermakova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ermakova, L. et al. (2021). Text Simplification for Scientific Information Access. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12657. Springer, Cham. https://doi.org/10.1007/978-3-030-72240-1_68

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72240-1_68

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72239-5

  • Online ISBN: 978-3-030-72240-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics