[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3529372.3530930acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Complexities associated with user-generated book reviews in digital libraries: temporal, cultural, and political case studies

Published: 20 June 2022 Publication History

Abstract

While digital libraries (DL) have made large-scale collections of digitized books increasingly available to researchers [31, 67], there remains a dearth of similar data provisions or infrastructure for computational studies of the consumption and reception of books. In the last two decades, user-generated book reviews on social media have opened up unprecedented research possibilities for humanities and social sciences (HSS) scholars who are interested in book reception. However, limitations and gaps have emerged from existing DH research which utilize social media data for answering HSS questions. To shed light on the under-investigated features of user-generated book reviews and the challenges they might pose to scholarly research, we conducted three exemplar cases studies: (1) a longitudinal analysis for profiling the temporal changes of ratings and popularity of 552 books across ten years; (2) a cross-cultural comparison of book ratings of the same 538 books across two platforms; and, (3) a classification experiment on 20,000 sponsored and non-sponsored books reviews. Correspondingly, our research reveals the real-world complexities and under-investigated features of user-generated book reviews in three dimensions: the transience of book ratings and popularity (temporal dimension), the cross-cultural differences in reading interests and book reception (cultural dimension), and the user power dynamics behind the publicly accessible reviews ("political" dimension). Our case studies also demonstrate the challenges posed by user-generated book reviews' real-world complexities to their scholarly usage and propose solutions to these challenges. We conclude that DL stakeholders and scholars working with user-generated book reviews should look into these under-investigated features and real-world challenges to evaluate and improve the scholarly usability and interpretability of their data.

References

[1]
Parsers "admin". 2020. US court fully legalized website scraping and technically prohibited it. https://parsers.me/us-court-fully-legalized-website-scraping-and-technically-prohibited-it/
[2]
Anne-Mette Bech Albrechtslund. 2017. Negotiating ownership and agency in social media: Community reactions to Amazon's acquisition of Goodreads. First Monday (2017).
[3]
Maria Antoniak and Melanie Walsh. 2020. The Crowdsourced "Classics" and the Revealing Limits of Goodreads Data.
[4]
Maria Antoniak, Melanie Walsh, and David Mimno. 2021. Tags, Borders, and Catalogs: Social Re-Working of Genre on LibraryThing. Proceedings of the ACM on Human-Computer Interaction 5, CSCW 1 (2021), 1--29.
[5]
Tong Bao and Tung-lung Steven Chang. 2014. Why Amazon uses both the New York Times Best Seller List and customer reviews: An empirical study of multiplier effects on product sales from multiple earned media. Decision Support Systems 67 (2014), 1--8.
[6]
Peishan Bartley. 2009. Book tagging on LibraryThing: how, why, and what are in the tags? Proceedings of the American Society for Information Science and Technology 46, 1 (2009), 1--22.
[7]
Douban Books. 2019. How many Douban Top 250 Books have you read? https://mp.weixin.qq.eom/s?_biz=MzAwNzYyNDMyMA==&mid=2651117440&idx=1&sn=86f24dcbc54b18c40978ce325fbefb08
[8]
Douban Books. 2020. Big changes to Douban Top 250 Books: 107 new books on the list for the first time. https://mp.weixin.qq.com/s/iYCf7lGdLkgNurzv_HNa-Q
[9]
Peter Boot. 2013. The desirability of a corpus of online book responses. In Proceedings of the Workshop on Computational Linguistics for Literature. Association for Computational Linguistics (ACL), 32--40.
[10]
Karen Bourrier and Mike Thelwall. 2020. The social lives of books: Reading Victorian literature on Goodreads. Journal of Cultural Analytics 1, 1 (2020), 12049.
[11]
HathiTrust Research Center. 2017. HathiTrust Research Center Non-Consumptive Use Policy. https://www.hathitrust.org/htrc_ncup
[12]
Kent K Chang and Simon DeDeo. 2020. Divergence and the Complexity of Difference in Text and Culture. Journal of Cultural Analytics 4, 11 (2020), 1--36.
[13]
Wikipedia contributors. 2021. Amazon Books. https://en.wikipedia.org/wiki/Amazon_Books
[14]
Wikipedia contributors. 2021. Amazon (company. https://en.wikipedia.org/wiki/Amazon_(company)
[15]
Wikipedia contributors. 2021. Douban. https://en.wikipedia.org/wiki/Douban
[16]
Wikipedia contributors. 2021. Goodreads. https://en.wikipedia.org/wiki/Goodreads
[17]
Wikipedia contributors. 2021. LibraryThing. https://en.wikipedia.org/wiki/LibraryThing
[18]
Wikipedia contributors. 2021. Milan Kundera. https://en.wikipedia.org/wiki/Milan_Kundera
[19]
Lianbin Dai. 2017. From history of the book to history of reading: theories and methods for historical studies of reading. Xinxing.
[20]
Pat Deely. 1975. Copyright: Limitation on Exclusive Rights, Fair Use. Hous. L. Rev. 13 (1975), 1041.
[21]
Stefan Dimitrov, Faiyaz Zamal, Andrew Piper, and Derek Ruths. 2015. Goodreads versus Amazon: the effect of decoupling book reviewing and book selling. In Ninth international AAAI conference on web and social media.
[22]
Douban. 2021. About Douban. https://www.douban.com/about
[23]
Douban. 2022. Lolita (webpage for the book). https://book.douban.com/subject/1465324/
[24]
Beth Driscoll. 2021. How goodreads is changing book culture. Kill Your Darlings (2021), 213--216.
[25]
James F English. 2021. A Future for Empirical Reader Studies. https://culturalanalytics.org/post/1208-a-future-for-empirical-reader-studies
[26]
RA Gekoski. 2004. Tolkien's Gown: And Other Stories of Great Authors and Rare Books. Constable.
[27]
Goodreads. 2021. About Goodreads. https://www.goodreads.com/about/us
[28]
Goodreads. 2022. Lolita (webpage for the book). https://www.goodreads.com/book/show/7604.Lolita
[29]
Zhu guang shi (on Zuoshu2013). 2020. Big Changes to the Douban Books Top 250 List, The Kite Runner Is No Longer Ranked No. 1. https://post.smzdm.com/p/a830r7gq/
[30]
Nan Hu, Indranil Bose, Noi Sian Koh, and Ling Liu. 2012. Manipulation of online reviews: An analysis of ratings, readability, and sentiments. Decision support systems 52, 3 (2012), 674--684.
[31]
Jacob Jett, Boris Capitanu, Deren Kudeki, Timothy Cole, Yuerong Hu, Peter Organisciak, Ted Underwood, Eleanor Dickson Koehl, Ryan Dubnicek, and J. Stephen Downie. 2020. The HathiTrust Research Center Extracted Features Dataset (2.0).
[32]
Ming Jiang and Jana Diesner. 2016. Issue-focused documentaries versus other films: Rating and type prediction based on user-authored reviews. In Proceedings of the 27th ACM Conference on Hypertext and Social Media. 225--230.
[33]
Ming Jiang and Jana Diesner. 2016. Says who...? Identification of expert versus layman critics' reviews of documentary films. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2122--2132.
[34]
Tomoya Kambara, Shohei Okamoto, Yuka Teramoto, Kazuma Kusu, and Kenji Hatano. 2018. Evaluating usefulness of reviews based on evaluation standpoints of consumers. In Proceedings of the 10th International Conference on Management of Digital EcoSystems. 110--117.
[35]
Ho Kiu-Chor et al. 2007. A Case Study of Douban: Social Network Communities. Masaryk University Journal of Law and Technology 1, 2 (2007), 43--56.
[36]
Marijn Koolena, Peter Bootb, and Joris J van Zundertb. 2020. Online Book Reviews and the Computational Modelling of Reading Impact. Proceedings http://ceur-ws.org ISSN 1613 (2020), 0073.
[37]
Joshua Kotin, Rebecca Sutton Koeser, Carl Adair, Serena Alagappan, Paige Allen, Jean Bauer, Oliver J Browne, Nick Budak, Harriet Calver, Jin Yun Chow, et al. 2021. Shakespeare and Company Project Dataset: Lending Library Events. (2021).
[38]
Balázs Kovács and Amanda J Sharkey. 2014. The paradox of publicity: How awards can negatively affect the evaluation of quality. Administrative science quarterly 59, 1 (2014), 1--33.
[39]
Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79--86.
[40]
Theodoros Lappas. 2012. Fake reviews: The malicious perspective. In International Conference on Application of Natural Language to Information Systems. Springer, 23--34.
[41]
Theodoros Lappas, Gaurav Sabnis, and Georgios Valkanas. 2016. The impact of fake reviews on online visibility: A vulnerability assessment of the hotel industry. Information Systems Research 27, 4 (2016), 940--961.
[42]
Hongliu Li, Xingyuan Wang, Shuyang Wang, Wenkai Zhou, and Zhilin Yang. 2022. The power of numbers: an examination of the relationship between numerical cues in online review comments and perceived review helpfulness. Journal of Research in Interactive Marketing (2022).
[43]
LibraryThing. 2021. About LibraryThing. https://www.librarything.com/about
[44]
Zhiwei Liu and Sangwon Park. 2015. What makes a useful online review? Implication for travel product websites. Tourism management 47 (2015), 140--151.
[45]
Hoyt Long. 2021. Culture at Global Scale. https://culturalanalytics.org/post/1160-culture-at-global-scale
[46]
Ana Isabel Lopes, Nathalie Dens, Patrick De Pelsmacker, and Freya De Keyzer. 2020. Which cues influence the perceived usefulness and credibility of an online review? A conjoint analysis. Online Information Review (2020).
[47]
Caimei Lu, Jung-ran Park, and Xiaohua Hu. 2010. User tags versus expert-assigned subject terms: A comparison of LibraryThing tags and Library of Congress Subject Headings. Journal of information science 36, 6 (2010), 763--779.
[48]
Michael Luca and Georgios Zervas. 2016. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science 62, 12 (2016), 3412--3427.
[49]
Suman Kalyan Maity, Abhishek Panigrahi, and Animesh Mukherjee. 2017. Book reading behavior on goodreads can predict the amazon best sellers. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. 451--454.
[50]
Matthias Mauch, Robert M MacCallum, Mark Levy, and Armand M Leroi. 2015. The evolution of popular music: USA 1960--2010. Royal Society open science 2, 5 (2015), 150081.
[51]
Megan McCluskey. 2021. How Extortion Scams and Review Bombing Trolls Turned Goodreads Into Many Authors' Worst Nightmare. https://time.com/6078993/goodreads-review-bombing/
[52]
Ian Milligan. 2016. The problem of history in the age of abundance. (2016).
[53]
Simone Murray. 2021. Secret agents: Algorithmic culture, Goodreads and datafication of the contemporary book world. European Journal of Cultural Studies 24, 4 (2021), 970--989.
[54]
Lisa Nakamura. 2013. "Words with friends": socially networked reading on Goodreads. PMLA/Publications of the Modern Language Association of America 128, 1 (2013), 238--243.
[55]
Edward Daniel Newell, Stefan Dimitrov, Andrew Piper, and Derek Ruths. 2016. To buy or to read: How a platform shapes reviewing behavior. In Tenth International AAAI Conference on Web and Social Media.
[56]
Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kiciman. 2019. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data 2 (2019), 13.
[57]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[58]
Lambodara Ptuabhof and Phool Rani Da. 2018. Goodreads Ratings and Reviews Analysis of Booker Prize Titles. In ICDT 2018:Publishing Technology and Future of Academia. Segment Publication, 363--371.
[59]
Monisha Rajesh. 2021. Pointing out racism in books is not an 'attack' - it's a call for industry reform. https://www.theguardian.com/books/2021/aug/13/pointing-out-racism-in-books-is-not-an-attack-kate-clanchy
[60]
Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Citeseer, 29--48.
[61]
Rezvaneh Rezapour and Jana Diesner. 2017. Classification and detection of micro-level impact of issue-focused documentary films based on reviews. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1419--1431.
[62]
Rui. 2013. Douban Top 250 Books Old Version 2013.06. https://www.douban.com/note/536479320/
[63]
Nazanin Sabri and Ingmar Weber. 2021. Users Data. (2021).
[64]
Ruchira Sharma. 2021. Black and LGBTQ+ authors say they're being harassed on Goodreads and trolled with one-star book reviews. https://inews.co.uk/culture/books/goodreads-book-reviews-black-lgbtq-authors-harrassed-trolled-949179
[65]
Shuyang. 2016. douban.com top 250 movies and books. https://github.com/Shuyang/douban_top250/tree/master
[66]
Eastern Express (Taiyuan). 2011. Douban Top 250 Books. https://www.douban.com/doulist/513669/
[67]
Ted Underwood. 2019. Distant horizons: digital evidence and literary change. University of Chicago Press.
[68]
Melanie Walsh and Maria Antoniak. 2021. The Goodreads 'Classics': A Computational Study of Readers, Amazon, and Crowdsourced Amateur Criticism. Journal of Cultural Analytics 4 (2021), 243--287.
[69]
Mengting Wan and Julian J. McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2--7, 2018, Sole Pera, Michael D. Ekstrand, Xavier Amatriain, and John O'Donovan (Eds.). ACM, 86--94.
[70]
Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian J. McAuley. 2019. Fine-Grained Spoiler Detection from Large-Scale Review Corpora. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 2605--2610.
[71]
Jing Wang, Anindya Ghose, and Panos Ipeirotis. 2012. Bonus, disclosure, and choice: what motivates the creation of high-quality paid reviews?. In ICIS 2012 Proceedings. Citeseer.
[72]
Lotte M Willemsen, Peter C Neijens, Fred Bronner, and Jan A De Ridder. 2011. "Highly recommended!" The content characteristics and perceived usefulness of online consumer reviews. Journal of Computer-Mediated Communication 17, 1 (2011), 19--38.
[73]
Adam Worrall. 2015. " Like a Real Friendship": Translation, Coherence, and Convergence of Information Values in LibraryThing and Goodreads. iConference 2015 Proceedings (2015).
[74]
Yuanyuan Wu, Eric WT Ngai, Pengkun Wu, and Chong Wu. 2020. Fake online reviews: Literature review, synthesis, and directions for future research. Decision Support Systems 132 (2020), 113280.
[75]
Ruolin Xie. 2021. Investigaton into Douban water army: 15 RMB for a short review, votes and thumb-ups available as well[""15. http://www.xinhuanet.com/fortune/2021-02/25/c_1127136296.htm
[76]
Gregory Yauney. 2021. shakespeare-and-company-online-readership. https://github.com/gyauney/shakespeare-and-company-social-readership
[77]
Zebulon2020. 2020. Douban Read Top250 Crawler. https://github.com/zebulon2020/DoubanReadTop250Crawler
[78]
Jialian Zhou. 2018. douban.com top 250 movies and books.

Cited By

View all
  • (2023)Children’s Olfactory Picturebooks: Charting New Trends in Early Childhood EducationEarly Childhood Education Journal10.1007/s10643-023-01457-z52:7(1339-1348)Online publication date: 18-Mar-2023
  • (2023)Focused Issue on Digital Library Challenges to Support the Open Science ProcessInternational Journal on Digital Libraries10.1007/s00799-023-00388-924:4(185-189)Online publication date: 29-Nov-2023
  • (2023)Research with User-Generated Book Review Data: Legal and Ethical Pitfalls and Contextualized MitigationsInformation for a Better World: Normality, Virtuality, Physicality, Inclusivity10.1007/978-3-031-28035-1_13(163-186)Online publication date: 13-Mar-2023

Index Terms

  1. Complexities associated with user-generated book reviews in digital libraries: temporal, cultural, and political case studies

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
      June 2022
      392 pages
      ISBN:9781450393454
      DOI:10.1145/3529372
      • General Chairs:
      • Akiko Aizawa,
      • Thomas Mandl,
      • Zeljko Carevic,
      • Program Chairs:
      • Annika Hinze,
      • Philipp Mayr,
      • Philipp Schaer
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      • IEEE Technical Committee on Digital Libraries (TC DL)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cultural analytics
      2. digital humanities
      3. digital libraries
      4. social media
      5. user-generated content
      6. web archives

      Qualifiers

      • Research-article

      Conference

      JCDL '22
      Sponsor:

      Acceptance Rates

      JCDL '22 Paper Acceptance Rate 35 of 132 submissions, 27%;
      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)58
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 16 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Children’s Olfactory Picturebooks: Charting New Trends in Early Childhood EducationEarly Childhood Education Journal10.1007/s10643-023-01457-z52:7(1339-1348)Online publication date: 18-Mar-2023
      • (2023)Focused Issue on Digital Library Challenges to Support the Open Science ProcessInternational Journal on Digital Libraries10.1007/s00799-023-00388-924:4(185-189)Online publication date: 29-Nov-2023
      • (2023)Research with User-Generated Book Review Data: Legal and Ethical Pitfalls and Contextualized MitigationsInformation for a Better World: Normality, Virtuality, Physicality, Inclusivity10.1007/978-3-031-28035-1_13(163-186)Online publication date: 13-Mar-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media