Skip to main content
Log in

Authorship attribution

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

This paper considers the problem of quantifying literary style and looks at several variables which may be used as stylistic “fingerprints” of a writer. A review of work done on the statistical analysis of “change over time” in literary style is then presented, followed by a look at a specific application area, the authorship of Biblical texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Antosch, F. “The Diagnosis of Literary Style with the Verb-Adjective Ratio.” InStatistics and Style. Eds. L. Dolezel and R.W. Bailey. New York: American Elsevier, 1969.

    Google Scholar 

  • Bailey, R.W. “Authorship Attribution in a Forensic Setting.”Advances in Computer-aided Literary and Linguistic Research. Eds. D.E. Ager, F.E. Knowles and J. Smith. Birmingham: AMLC, 1979.

    Google Scholar 

  • Baker, J.C. Pace. “A Test of Authorship Based on the Rate at Which New Words Enter an Author's Text.”Journal of the Association for Literary and Linguistic Computing, 3, 1 (1988), 36–39.

    Google Scholar 

  • Bartholomew, D.J. “Probability, Statistics and Theology.”Journal of the Royal Statistical Society, A, 151, 1 (1988), 137–78.

    Google Scholar 

  • Bee, R.E. “Statistical Methods in the Study of the Masoretic Text of the Old Testament.”Journal of the Royal Statistical Society, A, 134, 4 (1971), 611–622.

    Google Scholar 

  • Bee, R.E. “A Statistical Study of the Sinai Periscope.”Journal of the Royal Statistical Society, A, 135, 3 (1972), 406–421.

    Google Scholar 

  • Bender, T.K. and S.M. Briggum. “Quantitative Stylistic Analysis of Impressionist Style in Joseph Conrad and Ford Maddox Ford.” InComputing in the Humanities. Ed. R.W. Bailey. North-Holland, 1982.

    Google Scholar 

  • Bennett, P.E. “The Statistical Measurement of a Stylistic Trait inJulius Caesar andAs You Like It.” InStatistics and Style. Eds. L. Dolezel and R.W. Bailey. New York: American Elsevier, 1969.

    Google Scholar 

  • Boreland, H. and P. Galloway. “Authorship, Discrimination and Clustering: Timoneda, Montesino and Two Anonymous Poems.”Association for Literary and Linguistic Computing Bulletin, 8 (1980), 125–151.

    Google Scholar 

  • Brainerd, B. “On the Distinction Between a Novel and a Romance: A Discriminant Analysis.”Computers and the Humanities, 7 (1973), 259–270.

    Google Scholar 

  • Brainerd, B.Weighing Evidence in Language and Literature: A Statistical Approach. University of Toronto Press, 1974.

  • Brainerd, B. “Two Models for the Type-Token Relation with Time Dependant Vocabulary Reservoir.” InVocabulary Structure and Lexical Richness. Eds. P. Thoiron, D. Serant and D. Labbe. Paris: Champion-Slatkine, 1988.

    Google Scholar 

  • Brinegar, C.S. “Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship.”Journal of the American Statistical Association, 58 (1963), 85–96.

    Google Scholar 

  • Bruno, A.M.Toward a Quantitative Methodology for Stylistic Analyses. University of California Press, 1974.

  • Burrows, J.F. “Word Patterns and Story Shapes: The Statistical Analysis of Narrative Style.”Journal of the Association for Literary and Linguistic Computing, 2, 2 (1987), 61–70.

    Google Scholar 

  • Burrows, J.F. and A.J. Hassall. “Anna Boleyn and the Authenticity of Fielding's Feminine Narratives.”Eighteenth Century Studies, 21 (1988), 427–453.

    Google Scholar 

  • Burrows, J.F. “Computers and the Study of Literature.” InComputers and Written Texts. Ed. C.S. Butler. Oxford: Blackwell, 1992.

    Google Scholar 

  • Cox, D.R. and L. Brandwood. “On a Discriminating Problem Connected with the Works of Plato.”Journal of the Royal Statistical Society, B, 21 (1959), 195–200.

    Google Scholar 

  • Damerau, F.J. “The Use of Function Word Frequencies as Indicators of Style.”Computers and the Humanities, 9 (1975), 271–280.

    Google Scholar 

  • Delcourt, C. “On Vocabulary Curves.”Association for Literary and Linguistic Computing Journal, 2 (1981), 13–24.

    Google Scholar 

  • Ellegard, A.A Statistical Method for Determining Authorship: The Junius Letters, 1769–1772. Gothenburg: University of Gothenburg, 1962.

    Google Scholar 

  • Fucks, W. “On the Mathematical Analysis of Style.”Biometrika, 39 (1952), 122–129.

    Google Scholar 

  • Fucks, W. and J. Lauter. “Mathematische Analyse des Literarischen Stils.” InMathematik und Dichtung. Eds. H. Kreuzer and R. Gunzenhausers. Munich: Nymphenburger Verlagsbuckhandlung, 1965.

    Google Scholar 

  • Grayston, K. and G. Herdan. “The Authorship of the Pastorals in the Light of Statistical Linguistics.”New Testament Studies, 6 (1959), 1–15.

    Google Scholar 

  • Gregory, M.J. “An Approach to the Study of Style.”Linguistics and Style. Eds. N. Enkvist, J. Spencer and M.J. Gregory. University of Oxford Press, 1964.

  • Herdan, G. “A New Derivation and Interpretation of Yule's ‘Characteristic’ K.”Journal of Applied Mathematics and Physics, 6 (1955), 332–334.

    Google Scholar 

  • Herdan, G.Quantitative Linguistics. London: Butterworths, 1964.

    Google Scholar 

  • Herdan, G.The Advanced Theory of Language as Choice and Chance. New York: Springer-Verlag, 1966.

    Google Scholar 

  • Holmes, D.I. “Vocabulary Richness and the Prophetic Voice.”Literary and Linguistic Computing, 6, 4 (1991), 259–268.

    Google Scholar 

  • Holmes, D.I. “A Stylometric Analysis of Mormon Scripture and Related Texts.”Journal of the Royal Statistical Society (A), 155, 1 (1992), 91–120.

    Google Scholar 

  • Honoré, A. “Some Simple Measures of Richness of Vocabulary.”Association for Literary and Linguistic Computing Bulletin, 7, 2 (1979), 172–177.

    Google Scholar 

  • Hubert, P. and D. Labbe, D. “A Model of Vocabulary Partition.”Journal of the Association for Literary and Linguistic Computing, 3, 4 (1988), 223–225.

    Google Scholar 

  • Johnson, R. “Measures of Vocabulary Diversity.” InAdvances in Computer-aided Literary and Linguistic Research. Eds. D.E. Ager, F.E. Knowles and M.W.A. Smith. Birmingham: AMLC, 1979.

    Google Scholar 

  • Kemp, K.W. “Aspects of the Statistical Analysis and Effective Use of Linguistic Data.”Association for Literary and Linguistic Computing Bulletin, 4 (1976), 14–22.

    Google Scholar 

  • Kenny, A.A Stylometric Study of the New Testament. Oxford University Press, 1986.

  • Kjetssa, G. “And Quiet Flows the Don Through the Computer.”Association for Literary and Linguistic Computing Bulletin, 7 (1979), 248–256.

    Google Scholar 

  • Kjetssa, G. “Written by Dostoyevsky.”Association for Literary and Linguistic Computing Journal, 2 (1981), 25–33.

    Google Scholar 

  • Ledger, G.R.Re-counting Plato: A Computer Analysis of Plato's Style. Oxford: Clarendon, 1989.

    Google Scholar 

  • Mandelbrot, B. “A Note on a Class of Skew Distribution Functions: Analysis and Critique of a Paper by H.A. Simon.”Information and Control, 2 (1959), 90–99.

    Google Scholar 

  • Mendenhall, T.C. “The Characteristic Curves of Composition.”Science, IX (1887), 237–249.

    Google Scholar 

  • Miles, J. and H. C. Selvin. “A Factor Analysis of the Vocabulary of Poetry in the Seventeenth Century.” InThe Computer and Literary Style. Ed. J. Leed. Ohio: Kent State University Press, 1966.

    Google Scholar 

  • Morton, A.Q. “The Authorship of Greek Prose.”Journal of the Royal Statistical Society, A, 128 (1965), 169–233.

    Google Scholar 

  • Morton, A.Q.Literary Detection. New York: Scribners, 1978.

    Google Scholar 

  • Morton, A.Q. “Once. A Test of Authorship Based on Words which are not Repeated in the Sample.”Journal of the Association for Literary and Linguistic Computing, 1, 1 (1986), 1–8.

    Google Scholar 

  • Morton, A.Q. and J. McLeman.The Genesis of John. Edinburgh: St Andrew's Press, 1980.

    Google Scholar 

  • Mosteller, F. and D.L. Wallace. “Inference and Disputed Authorship: TheFederalist.” Reading, MA: Addison-Wesley, 1964.

    Google Scholar 

  • Muller, C. “Calcul des Probabilités et Calcul d'un Vocabulaire.”Travaux de Linguistique et de Littérature (1964), 235–244.

  • Muller, C. “Lexical Distribution Reconsidered: the Waring-Herdan Formula.” InStatistics and Style. Eds. L. Dolezel and R.W. Bailey, New York: American Elsevier, 1969.

    Google Scholar 

  • Muller, C. “Peut-on estimer l'étendue d'un lexique?”Cahiers de Lexicologie, 27 (1975), 3–29.

    Google Scholar 

  • Oakman, R.L.Computer Methods for Literary Research. Columbia: University of South Carolina Press, 1980.

    Google Scholar 

  • Pollatschek, M. and Y.T. Radday. “Vocabulary Richness and Concentration in Hebrew Biblical Literature.”Association for Literary and Linguistic Computing Bulletin, 8 (1981), 217–231.

    Google Scholar 

  • Pollatschek, M. and Y.T. Radday. “Vocabulary Richness and Concentration.” InGenesis: An Authorship Study. Eds. Y.T. Radday and H. Shore. Rome: Biblical Institute Press, 1985.

    Google Scholar 

  • Portnoy, S. “Reply to Professor Bartholomew.”Journal of the Royal Statistical Society, A, 151, 1 (1988), 172.

    Google Scholar 

  • Portnoy, S. and D.L. Petersen. “Biblical Texts and Statistical analysis: Zechariah and Beyond.”Journal of Biblical Literature, 103 (1984), 11–21.

    Google Scholar 

  • Radday, Y.T.The Unity of Isaiah in the Light of Statistical Linguistics. Gerstenberg: Hindlesheim, 1973.

    Google Scholar 

  • Radday, Y.T. and D. Wickmann. “The Unity of Zechariah in the Light of Statistical Linguistics.”Zeit Alttestamentliche Wissenschaft, 87 (1975), 30–55.

    Google Scholar 

  • Radday, Y.T. and M. Pollatschek. “Frequency Profiles: A Key to the M. Pollatschek Structure of Lamentations.”Balsanut Hofsit, 12 (1977), 24–35.

    Google Scholar 

  • Radday, Y.T., D. Wickmann, G. Leb, and S. Talman. “The Book of Judges Examined by Statistical Linguistics.”Biblica, 58 (1977), 469–499.

    Google Scholar 

  • Radday, Y.T. and H. Shore.Genesis: An Authorship Study in Computer-assisted Statistical Linguistics. Rome: Biblical Institute Press, 1985.

    Google Scholar 

  • Ratkowsky, D.A. and L. Hantrais. “Tables for Comparing the Richness and Structure of Vocabulary in Texts of Different Lengths.”Computers and the Humanities, 9 (1975), 69–75.

    Google Scholar 

  • Sichel, H.S. “On a Distribution Representing Sentence-Length in Written Prose.”Journal of the Royal Statistical Society (A), 137 (1974), 25–34.

    Google Scholar 

  • Sichel, H.S. “On a Distribution Law for Word Frequencies.”Journal of the American Statistical Association, 70 (1975), 542–547.

    Google Scholar 

  • Sichel, H.S. “Word Frequency Distributions and Type-Token Characteristics.”Mathematical Scientist, 11 (1986), 45–72.

    Google Scholar 

  • Simpson, E.H. “Measurement of Diversity.”Nature, 163 (1949), 688.

    Google Scholar 

  • Smith, M.W.A. “Recent Experience and New Developments of Methods for the Determination of Authorship.”Association for Literary and Linguistic Computing Bulletin, 11 (1983), 73–82.

    Google Scholar 

  • Smith, M.W.A. “An Investigation of the Basis of Morton's Method for the Determination of Authorship.”Style, 19, 3 (1985a), 341–368.

    Google Scholar 

  • Smith, M.W.A. “An Investigation of Morton's Method to Distinguish Elizabethan Playwrights.”Computers and the Humanities, 19, 1 (1985b), 3–21.

    Google Scholar 

  • Smith, M.W.A. “Hapax Legomena in Prescribed Positions: An Investigation of Recent Proposals to Resolve Problems of Authorship.”Journal of the Association for Literary and Linguistic Computing, 2, 3 (1987a), 145–152.

    Google Scholar 

  • Smith, M.W.A. “The Authorship of Pericles: New Evidence for Wilkins.”Journal of the Association for Literary and Linguistic Computing, 2, 4 (1987b), 221–30.

    Google Scholar 

  • Smith, M.W.A. “Attribution by Statistics: A Critique of Four Recent Studies.”Revue, Informatique et Statistique dans les Sciences Humaines, 26 (1990), 233–251.

    Google Scholar 

  • Smith, M.W.A. “The Authorship ofThe Raigne of King Edward the Third.”Literary and Linguistic Computing, 6, 3 (1991a), 166–174.

    Google Scholar 

  • Smith, M.W.A. “The Authorship ofThe Revenger's Tragedy.”Notes and Queries, 38, 4 (1991 b), 508–513.

    Google Scholar 

  • Somers, H.H. “Statistical Methods in Literary Analysis.” InThe Computer and Literary Style. Ed. J. Leed, Ohio: Kent State University Press, 1966.

    Google Scholar 

  • Tallentire, D.R.An Appraisal of Methods and Models in Computational Stylistics, with Particular Reference to Author Attribution. PhD thesis. University of Cambridge, 1972.

  • Tallentire, D.R. “Towards an Archive of Lexical Norms — A Proposal.” InThe Computer and Literary Studies. Eds. A.J. Aitken, R.W. Bailey and N. Hamilton-Smith. Edinburgh University Press, 1973.

  • Tallentire, D.R. “Confirming Intuitions about Style Using Concordances.” InThe Computer in Literary and Linguistic Studies. Eds. A. Jones and R.F. Churchouse. University of Wales Press, 1976.

  • Thoiron, P. “Diversity Index and Entropy as Measures of Lexical Richness.”Computers and the Humanities, 20, 3 (1986), 197–202.

    Google Scholar 

  • Ule, L. “Recent Progress in Computer Methods of Authorship Determination.”Association for Literary and Linguistic Computing Bulletin, 10 (1982), 73–89.

    Google Scholar 

  • Wake, W.C. “Sentence-Length Distributions of Greek Authors.”Journal of the Royal Statistical Society, A, 120 (1957), 331–346.

    Google Scholar 

  • Weitzman, M.P. “Reply to Professor Bartholomew.”Journal of the Royal Statistical Society, A, 151, 1 (1988) 173

    Google Scholar 

  • Williams, C.B. “A Note on the Statistical Analysis of Sentence-Length as a Criterion of Literary Style.”Biometrika, 31 (1940), 356–361.

    Google Scholar 

  • Williams, C.B.Style and Vocabulary: Numerical Studies. Griffin, 1970.

  • Yule, G.U. “On Sentence-Length as a Statistical Characteristic of Style in Prose, with Application to Two Cases of Disputed Authorship.”Biometrika, 30 (1938), 363–390.

    Google Scholar 

  • Yule, G.U.The Statistical Study of Literary Vocabulary. Cambridge University Press, 1944.

  • Zipf, G.K.Selected Studies of the Principle of Relative Frequency in Language. Cambridge, MA: Harvard University Press, 1932.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

David Holmes is a Principal Lecturer in Statistics at the University of the West of England, Bristol with specific responsibility for co-ordinating the research programmes in the Department of Mathematical Sciences. He has taught literary style analysis to humanities students since 1983 and has published articles on the statistical analysis of literary style in theJournal of the Royal Statistical Society, History and Computing, andLiterary and Linguistic Computing. He presented papers at the ACH/ALLC conferences in 1991 and 1993.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holmes, D.I. Authorship attribution. Comput Hum 28, 87–106 (1994). https://doi.org/10.1007/BF01830689

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01830689

Key Words

Navigation