Abstract
This paper considers the question of authorship attribution techniques whenfaced with a pastiche. We ask whether the techniques can distinguish the real thing from the fake, or can the author fool the computer? If the latter, is this because the pastiche is good, or because the technique is faulty? Using a number of mainly vocabulary-based techniques, Gilbert Adair's pastiche of Lewis Carroll, Alice Through the Needle's Eye, is compared with the original `Alice' books. Standard measures of lexical richness, Yule's K andOrlov's Z both distinguish Adair from Carroll, though Z also distinguishesthe two originals. A principal component analysis based on word frequenciesfinds that the main differences are not due to authorship. A discriminantanalysis based on word usage and lexical richness successfully distinguishes thepastiche from the originals. Weighted cusum tests were also unable to distinguish the two authors in a majority of cases. As a cross-validation, wemade similar comparisons with control texts: another children's story from thesame era, and other work by Carroll and Adair. The implications of thesefindings are discussed.
Similar content being viewed by others
References
Adair G. (1984) Alice Through the Needle's Eye: A Third Adventure for Lewis Carroll's 'Alice'. Macmillan, London.
Adair G. (1986) Myths & Memories. Fontana Paperbacks, London.
Baayen H., van Halteren H., Tweedie F. (1996) Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing, 11, pp. 121–131.
Baum L.F. (1900) The Wonderful Wizard of Oz. G.M. Hill, Chicago.
Bee R.E. (1971) Statistical Methods in the Study of theMasoretic Text of the Old Testament. Journal of the Royal Statistical Society A, 134, pp. 611–622.
Bee R.E. (1972) A Statistical Study of the Sinai Pericope. Journal of the Royal Statistical Society A,135, pp. 406–421.
Bell A. (1985) Linked by a Single Tail. Times Literary Supplement, 4th January 1985, p. 18.
Benson J.D., Brainerd B. (1988) Chesterton's Parodies of Swinburne and Yeats: A Lexical Approach. Literary and Linguistic Computing, 3, pp. 221–231.
Binongo J.N.G. (1994) Joaquin's Joaquinesquerie, Joaquinesquerie's Joaquin: A Statistical Expression of a Filipino Writer's Style. Literary and Linguistic Computing, 9, pp. 267–279.
Bissell A.F. (1995a) Weighted Cumulative Sums for Text Analysis Using Word Counts. Journal ofthe Royal Statistical Society A, 158, pp. 525–545.
Bissell D. (1995b) Statistical Methods for Text Analysis by Word-Counts. European Business Management School, University of Wales, Swansea.
Burrows J.F. (1987) Computation into Criticism: A Study of Jane Austen's Novels and an Experiment in Method. Clarendon Press, Oxford.
Burrows J.F. (1989) “An Ocean Where Each Kind ⋯ ”: Statistical Analysis and Some Major Determinants of Literary Style. Computers and the Humanities, 23, pp. 309–321.
Burrows J.F. (1992) Computers and the Study of Literature. In Butler C.S. (ed.), Computers andWritten Texts. Blackwell, Oxford, pp. 167–204.
Carroll L. (1865) Alice's Adventures in Wonderland. Macmillan, London.
Carroll L. (1872) Through the Looking Glass. Macmillan, London.
Carroll L. (1891) The Nyctograph. The Lady, 29th October 1891; reproduced in Fisher J. (ed.), The Magic of Lewis Carroll, Harmondsworth, Middlesex (1975): Penguin, pp. 214–217.
Dodgson C.L. (1889) Curiosa Mathematica Part I: A New Theory of Parallels. Macmillan, London.
Farringdon J.M. (1996) Analysing for Authorship: A Guide to the Cusum Technique. University of Wales Press, Cardiff.
Flesch R. (1974) The Art of Readable Writing. Harper & Row, New York.
Fuller J. (1985) Lewis Carroll is not Dead. The New York Times Book Review, 5th May 1985, p. 42.
Hardcastle R.A. (1997). CUSUM: A Credible Method for the Determination of Authorship? Science & Justice, 37, pp. 129–138.
Hilton M.L., Holmes D.I. (1993) An Assessment of Cumulative Sum Charts for Authorship Attribution. Literary and Linguistic Computing, 8, pp. 73–80.
Holmes D.I. (1994) Authorship Attribution. Computers and the Humanities, 28, pp. 87–106.
Holmes D.I. (1998) The Evolution of Stylometry in Humanities Scholarship. Literary and Linguistic Computing, 13, pp. 111–117.
Holmes D.I., Forsyth R.S. (1995) The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing, 10, pp. 111–127.
Holmes D.I., Singh S. (1996) A Stylometric Analysis of Conversational Speech of Aphasic Patients. Literary and Linguistic Computing, 11, pp. 133–140.
Holmes D.I., Tweedie F.J. (1995) Forensic Stylometry: A Review of the Cusum Controversy. Revue Informatique et Statistique dans les Sciences Humaines, 31, pp. 19–47.
Irizarry E. (1989) Exploring Conscious Imitation of Style with Ready-made Software. Computers and the Humanities, 23, pp. 227–233.
Ledger G., Merriam T. (1994) Shakespeare, Fletcher and the Two Noble Kinsmen. Literary and Linguistic Computing, 9, pp. 235–247.
Mealand D.L. (1995) Correspondance Analysis of Luke. Literary and Linguistic Computing, 10, pp. 171–182.
Morton A.Q. (1978) Literary Detection: How to Prove Authorship and Fraud in Literature and Documents. Bowker, London.
Ogden C.K. (1934) The System of Basic English. Harcourt, Brace, New York.
Orlov J.K. (1983) Ein Modell der Häufigkeitsstruktur des Vokabulars. In Guiter H. and Arapov M. (eds.), Studies on Zipf 's Law. Brockmeyer, Bochum, pp. 154–233.
Potter R.G. (1991) Statistical Analysis of Literature: A Retrospective on Computers and the Humanities, 1966–1990. Computers and the Humanities, 25, pp. 401–429.
Sigelman L. (1995) By Their (New) Words Shall Ye Know Them: Edith Wharton, Marion Mainwaring, and The Buccaneers. Computers and the Humanities, 29, pp. 271–283.
Sigelman L., Jacoby W. (1996) The Not-so-simple Art of Imitation: Pastiche, Literary Style, and Raymond Chandler. Computers and the Humanities, 30, pp. 11–28.
Somers H. (1999) Computational Stylometry and Pastiche: Can a Good Fake Fool the Computer? Unpublished paper presented at ILASH Seminar, University of Sheffield, 8th December 1999. http://www.dcs.shef.ac.uk/research/ilash/Seminars/somers.html
Tweedie F.J., Baayen H.R. (1998) How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32, pp. 323–352.
Tweedie F.J., Holmes D.I., Corns T.N. (1998) The Provenance of De Doctrina Christiana, Attributed to John Milton: A Statistical Investigation. Literary and Linguistic Computing, 13, pp. 77–87.
Yule G.U. (1944) The Statistical Study of Literary Vocabulary. Cambridge University Press, Cambridge.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Somers, H., Tweedie, F. Authorship Attribution and Pastiche. Computers and the Humanities 37, 407–429 (2003). https://doi.org/10.1023/A:1025786724466
Issue Date:
DOI: https://doi.org/10.1023/A:1025786724466