[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3200334.3200371acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Citation sentence reuse behavior of scientists: a case study on massive bibliographic text dataset of computer science

Published: 19 June 2017 Publication History

Abstract

Our current knowledge of scholarly plagiarism is largely based on the similarity between full text research articles. In this paper, we propose an innovative and novel conceptualization of scholarly plagiarism in the form of reuse of explicit citation sentences in scientific research articles. Note that while full-text plagiarism is an indicator of a gross-level behavior, copying of citation sentences is a more nuanced micro-scale phenomenon observed even for well-known researchers. The current work poses several interesting questions and attempts to answer them by empirically investigating a large bibliographic text dataset from computer science containing millions of lines of citation sentences. In particular, we report evidences of massive copying behavior. We also present several striking real examples throughout the paper to showcase widespread adoption of this undesirable practice. In contrast to the popular perception, we find that copying tendency increases as an author matures. The copying behavior is reported to exist in all fields of computer science; however, the theoretical fields indicate more copying than the applied fields.

References

[1]
Ioannis Arapakis, Yashar Moshfeghi, Hideo Joho, Reede Ren, David Hannah, and Joemon M Jose. 2009. Integrating facial expressions into user profiling for the improvement of a multimodal recommender system. In Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on. IEEE, 1440--1443.
[2]
Jonathan Bailey. 2014. University of Regina prof investigated for allegedly plagiarizing student's work. http://www.ithenticate.com/plagiarism-detection-blog/top-plagiarism-scandals-2014. (2014). {Online; accessed 24-January-2017}.
[3]
Jun-Peng Bao, Jun-Yi Shen, Xiao-Dong Liu, Hai-Yan Liu, and Xiao-Di Zhang. 2004. Semantic sequence kin: A method of document copy detection. In Advances in Knowledge Discovery and Data Mining. Springer, 529--538.
[4]
Sergey Brin, James Davis, and Hector Garcia-Molina. 1995. Copy detection mechanisms for digital documents. In ACM SIGMOD Record, Vol. 24. ACM, 398--409.
[5]
Tanmoy Chakraborty, Suhansanu Kumar, Pawan Goyal, Niloy Ganguly, and Animesh Mukherjee. 2014. Towards a Stratified Learning Approach to Predict Future Citation Counts. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '14). IEEE Press, 351--360.
[6]
Daniel T Citron and Paul Ginsparg. 2015. Patterns of text reuse in a scientific corpus. Proceedings of the National Academy of Sciences 112, 1 (2015), 25--30.
[7]
Eui-Hong Sam Han and George Karypis. 2005. Feature-based recommendation system. In Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 446--452.
[8]
Guy Judge and others. 2008. Plagiarism: Bringing Economics and Education Together (With a Little Help from IT). Computers in Higher Education Economics Review 20, 1 (2008), 21--26.
[9]
Michael Lesk. 2015. How many scientific papers are not original? Proceedings of the National Academy of Sciences 112, 1 (2015), 6--7.
[10]
Mayank Singh, Barnopriyo Barua, Priyank Palod, Manvi Garg, Sidhartha Satapathy, Samuel Bushi, Kumar Ayush, Krishna Sai Rohith, Tulasi Gamidi, Pawan Goyal, and Animesh Mukherjee. 2016. OCR++: A Robust Framework For Information Extraction from Scholarly Articles. In Proceedings of the 26th International Conference on Computational Linguistics. COLING, Osaka, Japan, 3390--3400.
[11]
Mayank Singh, Vikas Patidar, Suhansanu Kumar, Tanmoy Chakraborty, Animesh Mukherjee, and Pawan Goyal. 2015. The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On AMassive Bibliographic Text Dataset. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1271--1280.
[12]
Sven Meyer Zu Eissen and Benno Stein. 2006. Intrinsic plagiarism detection. In Advances in Information Retrieval. Springer, 565--569.

Index Terms

  1. Citation sentence reuse behavior of scientists: a case study on massive bibliographic text dataset of computer science

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    JCDL '17: Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries
    June 2017
    383 pages
    ISBN:9781538638613

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 19 June 2017

    Check for updates

    Author Tags

    1. citation context
    2. plagiarism
    3. text reuse

    Qualifiers

    • Research-article

    Conference

    JCDL '17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 415 of 1,482 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 36
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media