[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science?

Published: 01 January 2013 Publication History

Abstract

Two kinds of bibliographic tools are used to retrieve scientific publications and make them available online. For one kind, access is free as they store information made publicly available online. For the other kind, access fees are required as they are compiled on information provided by the major publishers of scientific literature. The former can easily be interfered with, but it is generally assumed that the latter guarantee the integrity of the data they sell. Unfortunately, duplicate and fake publications are appearing in scientific conferences and, as a result, in the bibliographic services. We demonstrate a software method of detecting these duplicate and fake publications. Both the free services (such as Google Scholar and DBLP) and the charged-for services (such as IEEE Xplore) accept and index these publications.

References

[1]
Ball, P. (2005). Computer conference welcomes gobbledegook paper. Nature, 434, 946.
[2]
Beel, J., & Gipp, B. (2010). Academic search engine spam and google scholar's resilience against it. Journal of Electronic Publishing, 13(3). http://hdl.handle.net/2027/spo.3336451.0013.305.
[3]
Benzecri J. P. (1980). L'analyse des donneés. Paris: Dunod.
[4]
Cover, T.M., & Hart, P.E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21-27.
[5]
Dalkilic, M. M., Clark, W. T., Costello, J. C., & Radivojac, P. (2006). Using compression to identify classes of inauthentic texts. In Proceedings of the 2006 SIAM Conference on Data Mining.
[6]
Elmacioglu, E., & Lee, D. (2009). Oracle, where shall i submit my papers?. Communications of the ACM (CACM), 52(2), 115-118.
[7]
Falagas, M.E., Pitsouni, E.I., Malietzis, G.A., & Pappas, G. (2008). Comparison of, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. The FASEB Journal, 22(2), 338-342.
[8]
Hockey, S., & Martin, J. (1988). OCP users' manual. Oxford: Oxford University Computing Service.
[9]
Jacso, P. (2008). Testing the calculation of a realistic h-index in Google Scholar, Scopus, and Web of Science for F. W. Lancaster. Library Trends, 56(4).
[10]
Jacso, P.: The pros and cons of computing the h-index using Google Scholar. Online Information Review, 32(3), 437-452 (2008).
[11]
Kato, J. (2005). Isi Web of Knowledge: proven track record of high quality and value. KnowledgeLink newsletter from Thomson Scientific.
[12]
Labbé, C. (2010). Ike antkare, one of the great stars in the scientific firmament. International Society for Scientometrics and Informetrics Newsletter, 6(2), 48-52.
[13]
Labbé, C., & Labbé, D. (2001). Inter-textual distance and authorship attribution corneille and moliere. Journal of Quantitative Linguistics 8(3), 213-231.
[14]
Labbé, D. (2007). Experiments on authorship attribution by intertextual distance in english. Journal of Quantitative Linguistics, 14(1), 33-80.
[15]
Lavoie, A., Krishnamoorthy, M. (2010). Algorithmic detection of computer generated text. ArXiv e-prints.
[16]
Lee, L. (1999). Measures of distributional similarity. In 37th Annual Meeting of the Association for Computational Linguistics, pp. 25-32.
[17]
Li, M., Chen, X., Li, X., Ma, B., & Vitanyi, P. (2004). The similarity metric. IEEE Transactions on Information Theory, 50(12), 3250-3264.
[18]
Meyer, D., Hornik, K., & Feinerer, I. (2008). Text mining infrastructure in R. Journal of Statistical Software, 25(5), 569-576.
[19]
Parnas, D. L. (2007). Stop the numbers game. Communications of ACM, 50(11), 19-21.
[20]
Roux, M. (1985). Algorithmes de classification. Paris: Masson.
[21]
Roux M. (1994) Classification des donneés d'enquéte. Paris: Dunod.
[22]
Savoy, J. (2006). Les résultats de google sont-ils biaisés? Genève: Le Temps.
[23]
Sneath, P., & Sokal, R. (1973). Numerical Taxonomy. San Francisco: Freeman.
[24]
Xiong, J., & Huang, T. (2009). An effective method to identify machine automatically generated paper. In Pacific-Asia Conference on Knowledge Engineering and Software Engineering, 2009, KESE '09, pp. 101-102.
[25]
Yang, K., & Meho, L. I. (2006). Citation analysis: a comparison of google scholar, scopus, and web of science. American Society for Information Science and Technology, 43(1), 1-15.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Scientometrics
Scientometrics  Volume 94, Issue 1
January 2013
425 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2013

Author Tags

  1. Bibliographic tools
  2. Fake publications
  3. Google Scholar
  4. Inter-textual distance
  5. Scientific conferences
  6. Scopus
  7. Text-mining
  8. WoK

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)ChatGPT and AIChat EpistemologyComputer10.1109/MC.2023.325237956:5(130-137)Online publication date: 1-May-2023
  • (2023)Automated scholarly paper reviewInformation Fusion10.1016/j.inffus.2023.10183098:COnline publication date: 26-Jul-2023
  • (2023)ChatGPT and the potential growing of ghost bibliographic referencesScientometrics10.1007/s11192-023-04804-4128:9(5351-5355)Online publication date: 31-Jul-2023
  • (2023)Translated Texts Under the Lens: From Machine Translation Detection to Source Language IdentificationAdvances in Intelligent Data Analysis XXI10.1007/978-3-031-30047-9_18(222-235)Online publication date: 12-Apr-2023
  • (2022)Authorship Attribution in Greek Literature Using Word AdjacenciesProceedings of the 12th Hellenic Conference on Artificial Intelligence10.1145/3549737.3549750(1-9)Online publication date: 7-Sep-2022
  • (2021)Towards Man/Machine Co-authoring of Advanced Analytics Reports Around Big Data RepositoriesIntelligent Human Computer Interaction10.1007/978-3-030-98404-5_53(574-583)Online publication date: 20-Dec-2021
  • (2021)Prevalence of nonsensical algorithmically generated papers in the scientific literatureJournal of the Association for Information Science and Technology10.1002/asi.2449572:12(1461-1476)Online publication date: 8-Nov-2021
  • (2020)Which visual elements make texts appear scientific?Proceedings of Mensch und Computer 202010.1145/3404983.3410014(61-65)Online publication date: 6-Sep-2020
  • (2020)Flagging incorrect nucleotide sequence reagents in biomedical papers: To what extent does the leading publication format impede automatic error detection?Scientometrics10.1007/s11192-020-03463-z124:2(1139-1156)Online publication date: 1-Aug-2020
  • (2019)Detecting Machine-Translated Paragraphs by Matching Similar WordsComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-24337-0_36(521-532)Online publication date: 7-Apr-2019
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media