Skip to main content

Advanced Models for Stylometric Applications

  • Chapter
  • First Online:
Machine Learning Methods for Stylometry
  • 1113 Accesses

Abstract

Some well-known models have been explained in the previous chapter, but various advanced approaches have been suggested. Related to the humanities, the Zeta test is focusing on terms used recurrently by one author and mainly ignored by the others. Selecting stylistic markers based on this criterion, the model builds a graph showing the similarities between text excerpts. Compression algorithms could also be applied to identify the true author of a text based on similar word frequencies. More related to the natural language processing domain, the latent Dirichlet allocation (LDA) could be applied to define the most probable author of a given document. To solve the verification problem, several dedicated approaches have been suggested and an overview of them is included in this chapter. Although we usually assume that a novel is written only by a single person, collaborative authorship is possible. To detect passages written by each possible author, the rolling Delta and other ad hoc approaches are described. As neural models constitute an important research field, three sections have been dedicated to them, with one on the basic neural approach, one focusing on word embeddings, and the third on the long short-term memory (LSTM), a well-known deep learning model. The last section is dedicated to adversarial stylometry and obfuscation, or how one can possibly program a computer to hide stylistic markers left by the original author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Abbasi, H. Chen, Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2) (2008). Article 7

    Google Scholar 

  2. S. Adamovic, V. Miskovic, M. Milosavljevic, M. Sarac, M. Veinovic, Automated language-independent authorship verification (for Indo-European languages). J. Assoc. Inf. Sci. Technol. 70(8), 858–871 (2019)

    Google Scholar 

  3. D. Adger, Language Unlimited. The Science Behind Our Most Creative Power (Oxford University Press, Oxford, 2019)

    Google Scholar 

  4. S. Afroz, M. Brennam, R. Greenstadt, Detecting hoaxes, frauds, and deception in writing style online, in Proceedings of the 2012 IEEE Symposium on Security and Privacy, pp. 402–416 (IEEE Computer Society, Washington, 2012)

    Google Scholar 

  5. C.C. Aggarwal, Mining text streams, in Mining Text Data, ed. by C.C. Aggarwal, C.X. Zhai (Springer, New York, 2012), pp. 297–321

    Google Scholar 

  6. S. Ahmadian, S. Azarshahi, D.L. Paulhus, Explaining Donald Trump via communication style: grandiosity, informality, and dynamism. Personal. Individ. Differ. 107, 49–53 (2017)

    Google Scholar 

  7. N. Akiva, M. Koppel, Identifying distinct components of a multi-author document, in European Intelligent and Security Informatics Conference (2012), pp. 205–209

    Google Scholar 

  8. M. Alfaro, The daily 202: Alexander Hamilton has been cast in a starring for impeachment’s closing argument, in Washington Post, 143 (Dec. 17th) (2019)

    Google Scholar 

  9. M. Almishari, G. Tsudik, Exploring linkability of user reviews, in Proceedings Computer Security ESORICS. Lecture Notes in Computer Science, vol. 7459 (Springer, Berlin, 2012), pp. 307–324.

    Google Scholar 

  10. S.M. Alzahrani, N. Salim, A. Abraham, Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybernet. Part C (Appl. Rev.) 42(2), 133–149 (2012)

    Google Scholar 

  11. A. Antonia, C. Hugh, J. Elliott, Language chunking, data sparseness, and the value of a long marker list: explorations with word n-grams and authorial attribution. Lit. Linguis. Comput. 29(2), 147–163 (2014)

    Google Scholar 

  12. S. Argamon, Interpreting Burrows’ Delta: geometric and probabilistic foundations. Lit. Linguist. Comput. 23(2), 131–147 (2008)

    Google Scholar 

  13. S. Argamon, M. Koppel, J.W. Pennebaker, J. Schler, Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)

    Google Scholar 

  14. H.R. Baayen, Word Frequency Distributions (Kluwer Academic Press, Dordrecht, 2001)

    MATH  Google Scholar 

  15. H.R. Baayen, Analysis Linguistic Data: A Practical Introduction to Statistics Using R (Cambridge University Press, Cambridge, 2008)

    Google Scholar 

  16. H. Baayen, H. van Halteren, F.J. Tweedie, Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguis. Comput. 11(3), 121–132 (1996)

    Google Scholar 

  17. A. Bacciu, M. La Morgia, A. Mei, E. Nerio Nemmi, V. Neri, J. Stefa, Bot and gender detection of Twitter accounts using distortion and LSA. Notebook for PAN at CLEF 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)

    Google Scholar 

  18. E. Backer, P. van Kranenburg, On musical stylometry - A pattern recognition approach. Patt. Recogn. Lett. 26(3), 299–309 (2005)

    Google Scholar 

  19. N. Bagnall, Newspaper Language (Focal Press, Oxford, 1993)

    Google Scholar 

  20. D.W. Barowy, E.D. Berger, B. Zorn, ExceLint: automatically finding spreadsheet formula errors, in Proceedings ACM Programming Language, vol. 2 (2018). Article 148

    Google Scholar 

  21. M. Barrick, M.K. Mount, The big five personality dimensions and job performance: a meta-analysis. Person. Psychol. 44(1), 1–26 (1991)

    Google Scholar 

  22. L. Bauer, P. Trudgill, Language Myths (Penguin Books, London, 1998)

    Google Scholar 

  23. A. Bellaachia, E. Jimenez, Exploring performance-based music attributes for stylometric analysis. World Acad. Sci. Eng. Technol. 3(7), 1795–1797 (2009)

    Google Scholar 

  24. D. Benedetto, E. Caglioti, V. Loreto, Language trees and zipping. Phys. Rev. Lett. 88(4), 048702 (2002)

    Google Scholar 

  25. Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    MATH  Google Scholar 

  26. Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin, A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  27. I. Bensalem, P. Rosso, S. Chikhi, One the use of character n-grams as the evidence of plagiarism. Lang. Resour. Eval. 53(2), 1–34 (2019)

    Google Scholar 

  28. S. Benzel, A simple stylometry comparator: Nifty assignment. J. Comput. Sci. Coll. 31(2), 283–284 (2015)

    Google Scholar 

  29. D. Biber, Representativeness in corpus design. Lit. Linguis. Comput. 8(4), 243–257 (1993)

    Google Scholar 

  30. D. Biber, Dimensions of the Register Variation. (Cambridge University Press, Cambridge, 1995)

    Google Scholar 

  31. D. Biber, S. Conrad, Register, Genre, and Style (Cambridge University Press, Cambridge, 2009)

    Google Scholar 

  32. D. Biber, S. Conrad, G. Leech, The Longman Student Grammar of Spoken and Written English (Longman, London, 2002)

    Google Scholar 

  33. J.N.G. Binongo, Who wrote the 15th Book of Oz? An application of multivariate analysis to authorship attribution. Chance 16(2), 9–17 (2003)

    MathSciNet  Google Scholar 

  34. J.N.G. Binongo, M.W. Smith, The application of principal component analysis to stylometry. Lit. Linguis. Comput. 14(4), 445–465 (1999)

    Google Scholar 

  35. D.M. Blei, Probabilistic topic models. Commun. ACM 55(4), 77–84 (2003)

    Google Scholar 

  36. D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation. Mach. Learn. 3(1), 993–1022 (2003)

    MATH  Google Scholar 

  37. T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, A. Kalai, Man is to computer programmer as woman is to homemaker? Debiasing word embeddings, in Advanced in Neural Information Processing Systems 29 (NIPS 2016), vol. 30 (The IEEE Press, Washington, 2016), pp. 4356–4364

    Google Scholar 

  38. R.A. Bosch, J.A. Smith, Separating hyperplanes and the authorship on the Federalist Papers. Am. Math. Mon. 105(7), 601–608 (1991)

    Google Scholar 

  39. B.E. Boser, E. Sackinger, J. Bromley, Y. Le Cun, L.D. Jackel, An analog neural network processor with programmable topology. J. Solid State Circ. 26(12), 2017–2025 (1991)

    Google Scholar 

  40. R.L. Boyd, J.W. Pennebaker, Language-based personality: a new approach to personality in a digital world. Curr. Opin. Behav. Sci. 18, 63–68 (2017)

    Google Scholar 

  41. W.J. Braun, D.J. Murdoch, A First Course in Statistical Programming with R (Cambridge University Press, Cambridge, 2007)

    MATH  Google Scholar 

  42. M. Brennam, S. Afroz, R. Greenstadt, Adversarial stylometry: circumventing authorship recognition to preserve privacy and anonymity. ACM Trans. Inf. Syst. Secur. 13(3) (2011). Article 12

    Google Scholar 

  43. L.D. Brown, T.T. Cai, A. DasGupta, Interval estimation for a binomial proportion. Stat. Sci. 16(2), 101–133 (2001)

    MathSciNet  MATH  Google Scholar 

  44. J.D. Burger, J. Henderson, G. Kim, G. Zarrella, Discriminating gender on Twitter, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2011), pp. 1301–1309

    Google Scholar 

  45. J.F. Burrows, Not unless you ask nicely: the interpretative Nexus between analysis and information. Lit. Linguis. Comput. 7(1), 91–109 (1992)

    Google Scholar 

  46. J.F. Burrows, Delta: a measure of stylistic difference and a guide to likely authorship. Lit. Linguis. Comput. 17(3), 267–287 (2002)

    Google Scholar 

  47. J.F. Burrows, All the way through: testing for authorship in different frequency strata. Lit. Linguis. Comput. 22(1), 27–47 (2007)

    Google Scholar 

  48. J.W. Caesar, G.E. Thurow, J. Tulis, J.M. Bessette, The rise of rhetorical presidency. Pres. Stud. Q. 11(2), 158–171 (1981)

    Google Scholar 

  49. C. Cai, L. Li, D. Zeng, Behavior enhanced deep bot detection in social media, in Proceedings IEEE International Conference on Intelligence and Security Informatics (ISI) (2017), pp. 128–130

    Google Scholar 

  50. F. Can, J.M. Patton, Change of writing style with time. Comput. Humanit. 38(1), 61–82 (2004)

    Google Scholar 

  51. D.V. Canter, An evaluation of the “CUSUM” stylistic analysis of confessions. Expert Evid. 3(1), 93–99 (1992)

    Google Scholar 

  52. S.-H. Cha, Comprehensive survey on distance similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1(4), 300–307 (2007)

    MathSciNet  Google Scholar 

  53. E. Charniak, Introduction to Deep Learning (The MIT Press, Cambridge, 2018)

    Google Scholar 

  54. C. Chaski, Best practices and admissibility of forensic author identification. J. Law Policy 21(2), 333–376 (2013)

    Google Scholar 

  55. L. Chen, H. Zhang, J.M. Jose, H. Yu, Y. Moshfeghi, P. Triantafillou, Topic detection and tracking on heterogeneous information. J. Intell. Inf. Syst. 51(1), 115–137 (2018)

    Google Scholar 

  56. Z. Chu, S. Gianvecchio, H. Wang, S. Jajodia, Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans. Dependable Secure Comput. 9(6), 811–824 (2003)

    Google Scholar 

  57. K.W. Church, P. Hanks, Word association norms, mutual information, and lexicography, in Proceedings Association for Computational Linguistics (ACL), pp. 76–83 (The ACL Press, Stroudsburg, 1999)

    Google Scholar 

  58. R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)

    MathSciNet  MATH  Google Scholar 

  59. K. Connolly, Der Spiegel says top journalist faked stories for years. The Guardian, Dec. 19th, 2018

    Google Scholar 

  60. W.J. Conover, Practical Nonparametric Statistics (Wiley, New York, 1980)

    Google Scholar 

  61. G. Coppersmith, M. Dredze, C. Harman, Quantifying mental health signals in Twitter, in ACL Workshop on Computational Linguistics and Clinical Psychology (The ACL Press, Stroudsburg, 2014), pp. 51–60

    Google Scholar 

  62. M. Corazza, S. Menini, E. Cabrio, S. Tonelli, S. Villata, A multilingual evaluation for online hate speech detection. Lit. Linguis. Comput. 20(2) (2020). Article 10

    Google Scholar 

  63. M.A. Cortelazzo, P. Nadalutti, A. Tuzzi, Improving Labbé intertextual distance: Testing a revised version on a large corpus of Italian literature. J. Quant. Linguis. 20(2), 125–152 (2013)

    Google Scholar 

  64. M. Coulthard, On admissible linguistics evidence. J. Law Policy 21(2) (2012). Article 8

    Google Scholar 

  65. H. Craig, A.F. Kinney, Shakespeare, Computers, and the Mystery of Authorship (Cambridge University Press, Cambridge, 2009)

    Google Scholar 

  66. M.J. Crawley, Statistics. An Introduction Using R (Wiley, Chichester, 2005)

    Google Scholar 

  67. M.J. Crawley, The R Book (Wiley, Chichester, 2007)

    MATH  Google Scholar 

  68. S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi, DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intell. Syst. 31(5), 58–64 (2016)

    Google Scholar 

  69. S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi, Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Trans. Dependable Secure Comput. 15(4), 561–576 (2017)

    Google Scholar 

  70. F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D.E. Losada, G.H. Bürki, L. Cappellato, N. Ferro, Experimental IR Meets Multilinguality, Multimodality, and Interaction (Springer, Cham, 2019)

    Google Scholar 

  71. D. Crystal, The Cambridge Encyclopedia of English Language (Cambridge University Press, Cambridge, 2003)

    Google Scholar 

  72. D. Crystal, Making Sense of Grammar (Pearsons, Harlow, 2004)

    Google Scholar 

  73. D. Crystal, ‘Think on my Words’ Exploring Shakespeare’s Language (Cambridge University Press, Cambridge, 2008)

    Google Scholar 

  74. D. Crystal, Txtng: The Gr8 Db8 (Oxford University Press, Oxford, 2008)

    Google Scholar 

  75. D. Crystal, The Cambridge Encyclopedia of Language (Cambridge University Press, Cambridge, 2010)

    Google Scholar 

  76. D. Crystal, A Little Book of Language (Yale University Press, Yale, 2010)

    Google Scholar 

  77. D. Crystal, Internet Linguistics (Routledge, London, 2011)

    Google Scholar 

  78. D. Crystal, Making a Point. The Pernickety Story of English Punctuation (Profile Books, London, 2016)

    Google Scholar 

  79. B. Crystal, D. Crystal, You Say Potato: The Story of English Accents (MacMillan, Hampshire, 2015)

    Google Scholar 

  80. W. Daelemans, Explanation in computational stylometry, in Computational Linguistics and Intelligent Text Processing (CICLing) (Springer, Cham, 2013), pp. 451–462

    Google Scholar 

  81. W. Daelemans, M. Kestemont, E. Manjavacas, M. Potthast, F. Rangel, P. Rosso, G. Specht, E. Stamatatos, B. Stein, M. Tschuggnall, M. Wiegmann, E. Zangerle, Overview of PAN 2019: bots and gender profiling, celebrity profiling, cross-domain authorship attribution and style change detection, in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D.E. Losada, G.H. Bürki, L. Cappellato, N. Ferro (Springer, Cham, 2019), pp. 402–416

    Google Scholar 

  82. P. Dalgaard, Introductory Statistics with R (Springer, Heidelberg, 2002)

    MATH  Google Scholar 

  83. F. Damereau, The use of function word frequencies as indicator of style. Comput. Humanit. 9(6), 271–280 (1975)

    Google Scholar 

  84. C. Davies, Divided by a Common Language. A Guide to British and American English (Houghton Mifflin Harcourt, Boston, 2007)

    Google Scholar 

  85. M. De Choudhury, E. Kiciman, M. Dredze, G. Coppersmith, M. Kumar, Discovering shifts to suicidal ideation from mental health content in social media, in Proceedings Conference on Human Factor in Computing Systems (SIGCHI’16) (The ACM Press, New York, 2016), pp. 2098–2110

    Google Scholar 

  86. A. de Morgan, Letter to Rev. Heald 18/08/1851, in Memoirs of Augustus de Morgan by his Wife Sophia Elizabeth de Morgan with Selections from his Letters, ed. by S. Elizabeth, D. Morgan (Longman’s Green and Co., London, 1851)

    Google Scholar 

  87. M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, H.E. Stanley, W. Quattrociocchi, The spreading of misinformation online. Proc. Natl. Acad. Sci. 113(3), 554–559 (2016)

    Google Scholar 

  88. M.P. Deisenroth, A.A. Faisal, C.S. Ong, Mathematics for Machine Learning (Cambridge University Press, Cambridge, 2020)

    MATH  Google Scholar 

  89. L. Deng, J. Wiebe, MPQA 3.0: an entity/event-level sentiment corpus. In Proceedings Human Language Technologies (HLT/NAACL) (2015), pp. 1323–1328

    Google Scholar 

  90. G. Desagulier, Corpus Linguistics and Statistics with R (Springer, Heidelberg, 2017)

    Google Scholar 

  91. S.H.H. Ding, B.C.M. Fung, F. Iqbal, W.K. Cheung, Learning stylometric representation for authorship analysis. IEEE Trans. Cybernet. 49(1), 107–121 (2019)

    Google Scholar 

  92. P. Dixon, D. Mannion, Goldsmith’s periodical essays: a statistical analysis of eleven doubtful cases. Lit. Linguis. Comput. 8(1), 1–19 (1993)

    Google Scholar 

  93. R. Dror, L. Peled-Cohen, S. Shlomov, R. Reichart, Statistical Significance Testing for Natural Language Processing (Morgan & Claypool, San Francisco, 2020)

    Google Scholar 

  94. M. Du, N. Liu, X. Hu, Techniques for interpretable machine learning. Commun. ACM 63(1), 68–77 (2020)

    Google Scholar 

  95. T. Dunning, Accurate methods for the statistics of surprise and coincidence. Comput. Linguis. 19(1), 61–74 (1993)

    Google Scholar 

  96. E. Dwoskin, Trump lashes out at social media companies after Twitter labels tweets with fact checks. Washington Post, 144(May. 26th), 2020

    Google Scholar 

  97. P. Eckert, S. McConnell-Ginet, Language and Gender (Cambridge University Press, Cambridge, 2013)

    Google Scholar 

  98. M. Eder, Does size matter? Authorship attribution, small samples, big problem. Digit. Scholarsh. Human. 30(2), 167–182 (2015)

    Google Scholar 

  99. M. Eder, Rolling Delta. Digit. Scholarsh. Humanit. 31(3), 457–469 (2016)

    Google Scholar 

  100. M. Eder, Visualization in stylometry: cluster analysis using networks. Digit. Scholarsh. Humanit. 32(1), 50–64 (2017)

    Google Scholar 

  101. M. Eder, Elena Ferrante: a virtual author, in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (eds.) (Padova University Press, Padova, 2018), pp. 31–46

    Google Scholar 

  102. M. Eder, J. Rybicki, Do birds of a feather really flock together, or how to choose test samples for authorship attribution. Lit. Linguis. Comput. 28(2), 229–236 (2013)

    Google Scholar 

  103. M. Eder, J. Rybicki, M. Kestemont, Stylometry with R: a package for computational text analysis. R J. 8(1), 107–121 (2016)

    Google Scholar 

  104. P. Edmondson, S. Wells (eds.), Shakespeare, Beyond Doubt. Evidence, Argument, Controversy (Cambridge University Press, Cambridge, 2013)

    Google Scholar 

  105. B. Efron, T. Hastie, Computer Age Statistical Inference. Algorithms, Evidence, and Data Science (Cambridge University Press, Cambridge, 2016)

    Google Scholar 

  106. B. Efron, R. Thisted, Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63(3), 435–447 (1976)

    MATH  Google Scholar 

  107. F.J. Eisenstein, Introduction to Natural Language Processing (The MIT Press, Cambridge, 2019)

    Google Scholar 

  108. S.E.M. El, I. Kassou, Authorship analysis studies: a survey. Int. J. Comput. Appl. 86(12), 22–29 (2014)

    Google Scholar 

  109. D.Y. Espinosa, H. Gómez-Adorno, G. Sidorov, Bots and gender profiling using character bigrams. Notebook for PAN at CLEF 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)

    Google Scholar 

  110. J. Estepa, Sean Spicer says ‘covfefe’ wasn’t a typo: Trump knew ‘exactly what he meant’. USA Today, May 31, 2017

    Google Scholar 

  111. S. Evert, T. Proisl, F. Jannidis, I. Reger, S. Pielström, C. Schöch, T. Vitt, Understanding and explaining Delta measures for authorship attribution. Digit. Scholarsh. Humanit. 32(2), ii4–ii16 (2017)

    Google Scholar 

  112. C. Fautsch, J. Savoy, Algorithmic stemmers or morphological analysis? An evaluation. J. Am. Soc. Inf. Sci. 60(8), 1616–1624 (2009)

    Google Scholar 

  113. C. Fellbaum, Wordnet and wordnets, in Encyclopedia of Language and Linguistics, ed. by K. Brown (Elsevier, Amsterdam, 2005), pp. 665–670

    Google Scholar 

  114. C. Fellbaum, G.A. Miller, WordNet: An Electronic Lexical Database (The MIT Press, Cambridge, 1998)

    MATH  Google Scholar 

  115. E. Ferrara, O. Varol, F. Menczer, A. Flammini, Using sentiment to detect bots on twitter: are humans more opinionated than bots? in Proceedings of the IEEE/ACM Conference on Advances in Social Networks Analysis and Mining (ASONAM’14) (2014), pp. 620–627

    Google Scholar 

  116. E. Ferrara, O. Varol, F. Menczer, A. Flammini, Detection of promoted social media campaigns, In Proceedings of the 10th AAAI Conference on Web and Social Media (ICWSM 2016) (2016), pp. 563–566

    Google Scholar 

  117. O. Ferret, Typing relations in distributional thesauri, in Language Production, Cognition, and the Lexicon, pp. 113–134 (Springer, Cham, 2014)

    Google Scholar 

  118. N. Ferro, What happened in CLEF …for a while? in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D. Losada, G. Heinatz, L. Cappellato, N. Ferro (eds.) (Springer, Berlin, 2019)

    Google Scholar 

  119. J.R. Firth, A synopsis of linguistic theory 1930–1955, in Studies in Linguistic Analysis (Blackwell, Oxford, 1957), pp. 1–32

    Google Scholar 

  120. G. Forman, An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  121. R.S. Forsyth, Stylochronometry with substrings, or: a poet young and old. Lit. Linguis. Comput. 14(4), 467–478 (1999)

    Google Scholar 

  122. O. Fourkioti, S. Symeonidis, A. Arampatis, Language models and fusion for authorship attribution. Inf. Process. Manage. 6(56), 102061 (2019)

    Google Scholar 

  123. W.N. Francis, H. Kucera, Frequency Analysis of English Usage (Houghton Mifflin Co., Boston, 1982)

    Google Scholar 

  124. G. Fung, O. Mangasarian, The disputed Federalist Papers: SVM feature selection via concave minimization, in Proceedings on Diversity in Computing (2003), pp. 42–46

    Google Scholar 

  125. W.A. Gale, K.W. Church, What is wrong with adding one? in Corpus-Based Research into Language, ed. by N. Oostdijk, P. de Hann (Harcourt Brace, New York, 1994)

    Google Scholar 

  126. L. Gavalotti, F. Sebastiani, M. Simi, Experiments on the use of feature selection and negative evidence in automated text categorization, in Proceedings European Conference in Digital Libraries (ECDL). Lecture Notes in Computer Science, vol. 1923 (Springer, Heidelberg, 2000), pp. 59–68

    Google Scholar 

  127. C. Gelderman, All the Presidents’ Words. The Bully Pulpit and the Creation of the Virtual Presidency (Walker & Co., New York, 1997)

    Google Scholar 

  128. F.A. Gers, J. Schmidhuber, LSTM recurrent networks learn simple context free and context sensitive languages. IEEE Trans. Neural Netw. 12(6), 1333–1340 (2005)

    Google Scholar 

  129. A. Giachanou, J. Gonzalo, F. Crestani, Propagating sentiment signals for estimating reputation polarity. Inf. Process. Manage. 6(56), 102079 (2019)

    Google Scholar 

  130. G. Giodan, C. Saint-Blancat, S. Sbalchiero, Exploring the history of American sociology through topic modelling, in Tracing the Life Cycle of Ideas in the Humanities and Social Sciences, ed. by A. Tuzzi (Springer, Cham, 2018), pp. 45–64

    Google Scholar 

  131. M. Glickman, J. Brown, Assessing authorship of Beatles songs from musical content: Bayesian classification modeling from bags-of-words representations, in Proceedings JSM, American Statistical Association (2018)

    Google Scholar 

  132. Y. Goldberg, Neural Network Methods for Natural Language Processing (Morgan & Claypool Publishers, San Rafael, 2017)

    Google Scholar 

  133. H. Gómez Adorno, A.I. Valencia, C. Stephens Rhodes, G. Fuentes Pineda, Bots and gender identification based on stylometry of tweet minimal structure and n-grams model. Notebook for PAN at CLEF 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)

    Google Scholar 

  134. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (The MIT Press, Cambridge, 2016)

    MATH  Google Scholar 

  135. N. Graham, G. Hirst, B. Marthi, Segmenting documents by stylistic character. Nat. Lang. Eng. 11(4), 397–415 (2005)

    Google Scholar 

  136. A. Granados, M. Cebirán, D. Camacho, F. de Borja Rodríguez, Reducing the loss of information through annealing text distortion. IEEE Trans. Knowl. Data Eng. 23(7), 1090–1102 (2011)

    Google Scholar 

  137. T. Grant, TXT 4N6: method consistency, and distinctiveness in the analysis of SMS messages. J. Law Policy 21(2) (2012). Article 9

    Google Scholar 

  138. C. Gregori-Signes, B. Clavel-Arroitia, Analysing lexical density and lexical diversity in the university students’ written discourse, in Proceedings International Conference on Corpus Linguistics (2015), pp. 546–556

    Google Scholar 

  139. S. Gries, Quantitative Corpus Linguistics with R: A Practical Introduction (Routledge, London, 2019)

    Google Scholar 

  140. P. Grzybek, E. Kelih, E. Stadlober, The relationship between word length and sentence length: an intra-systemic perspective in the core data structure. Glottometrics 16, 111–121 (2008)

    Google Scholar 

  141. P. Guiraud, Les caractères statistiques du vocabulaire (Presses Universitaires de France, Paris, 1954)

    Google Scholar 

  142. P. Guiraud, Essais de stylistique (Klincksieck, Paris, 1969)

    Google Scholar 

  143. S.C. Guntuku, D.B. Yaden, M.L. Kern, L.H. Ungar, J.C. Eichstaedt, Detecting depression and mental illness on social media: an integrative review. Curr. Opin. Behav. Sci. 18, 43–49 (2017)

    Google Scholar 

  144. M. Hagen, M. Potthast, B. Stein, Overview of the author obfuscation task at PAN 2017: safety evaluation revisited, in Working Notes Papers of the CLEF 2017 Evaluation Labs Volume 1866 of CEUR Workshop, ed. by L. Cappellato, N. Ferro, L. Goeuriot, T. Mandl (CEUR, Aachen, 2017)

    Google Scholar 

  145. A. Hall, L. Terveen, A. Halfaker, Bot detection in Wikipedia using behavioral and other informal cues, in Proceedings of the ACM on Human-Computer Intercation (2018), pp. 620–627

    Google Scholar 

  146. H.V. Halteren, Author verification by linguistic profiling: An exploration of the parameter space. ACM Trans. Speech Lang. Process. 4(1) (2007). Article 1

    Google Scholar 

  147. O. Halvani, C. Winter, L. Graner, On the usefulness of compression models for authorship verification, in ARES’17 (The ACM Press, New York, 2017), pp. 1–32

    Google Scholar 

  148. O. Halvani, L. Graner, I. Vogel, Authorship verification in the absence of explicit features and thresholds, in Proceedings European Conference in Information Retrieval (ECIR). Lecture Notes in Computer Science, vol. 10772 (Springer, Heidelberg, 2018), pp. 454–465

    Google Scholar 

  149. R.A. Hardcastle, CUSUM: a credible method for the determination of authorship? Sci. Just. 37(2), 129–138 (1997)

    Google Scholar 

  150. D. Harman, How effective is suffixing? J. Am. Soc. Inf. Sci. 42(1), 7–15 (1991)

    MathSciNet  Google Scholar 

  151. D. Harman, Information retrieval: the early years. Found. Trends Inf. Retr. 13(5), 425–577 (2019)

    Google Scholar 

  152. Z. Harris, Distributional structure. Word 10(23), 146–162 (1954)

    Google Scholar 

  153. R.P. Hart, Verbal Style and The Presidency. A Computer-Based Analysis (Academic, Orlando, 1984)

    Google Scholar 

  154. R.P. Hart, Trump and Us: What He Says and Why People Listen (Cambridge University Press, Cambridge, 2020)

    Google Scholar 

  155. R.P. Hart, J.P. Childers, C.J. Lind, Political Tone. How Leaders Talk and Why (The Chicago University Press, Chicago, 2013)

    Google Scholar 

  156. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning. Data Mining, Inference, and Prediction (Springer, New York, 2009)

    Google Scholar 

  157. G. Herdan, Quantitative Linguistics (Butterworth, London, 1964)

    MATH  Google Scholar 

  158. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1996)

    Google Scholar 

  159. T. Hofmann, Probabilistic latent semantic indexing, in Proceedings of the International Conference on Information Retrieval (SIGIR 1999) (The ACM Press, New York, 1999), pp. 50–57

    Google Scholar 

  160. D.R. Hoffman, A.D. Howard, Addressing the State of the Union. The Evolution and Impact of the President’s Big Speech (Lynne Rienner, Boulder, 2006)

    Google Scholar 

  161. D.I. Holmes, A stylometric analysis of Mormon scripture and related text. J. R. Stat. Soc. 155(1), 91–120 (1992)

    Google Scholar 

  162. D.I. Holmes, The Federalist revisited: new directions in authorship attribution. Lit. Linguis. Comput. 10(1), 111–127 (1995)

    Google Scholar 

  163. D.I. Holmes, The evolution of stylometry in humanities scholarship. Lit. Linguis. Comput. 13(3), 111–117 (1998)

    Google Scholar 

  164. J. Holmes, Woman talk too much, in Language Myths, ed. by L. Bauer, P. Trudgill (Penguin Books, London, 1998), pp. 41–49

    Google Scholar 

  165. D.I. Holmes, J. Kardos, Who was the author? An introduction to stylometry. Chance 16(2), 5–8 (2003)

    MathSciNet  Google Scholar 

  166. D.I. Holmes, F.J. Tweedie, Forensic stylometry: a review of the CUSUM controversy. Revue Informatique et Statistique dans les Sciences Humaines 31(1), 19–47 (1995)

    Google Scholar 

  167. D.L. Hoover, Another perspective on vocabulary richness. Comput. Humanit. 37(2), 151–178 (2003)

    Google Scholar 

  168. D.L. Hoover, Delta prime? Lit. Linguis. Comput. 19(4), 477–495 (2004)

    MathSciNet  Google Scholar 

  169. D.L. Hoover, Testing Burrows’ Delta. Lit. Linguis. Comput. 19(4), 453–475 (2004)

    MathSciNet  Google Scholar 

  170. D.L. Hoover, Teasing out authorship and style with t-tests and Zeta, in Proceedings Digital Humanities (2010), pp. 1–3

    Google Scholar 

  171. D.L. Hoover, The microanalysis of style variation. Digit. Scholarsh. Humanit. 32(Supplement 2), ii17–ii30 (2017)

    Google Scholar 

  172. D.L. Hoover, S. Hess, An exercise in non-ideal authorship attribution: the mysterious Maria Ward. Lit. Linguis. Comput. 24(4), 467–489 (2009)

    Google Scholar 

  173. P.N. Howard, S. Woolley, R. Calo, Algorithms, bots, and political communication in the US 2016 election: the challenge of automated political communication for election law and administration. J. Inf. Technol. Polit. 15(2), 81–93 (2018)

    Google Scholar 

  174. J. Hudson, S. Mekhennet, G-7 failed to agree on statement after U.S. insisted on calling coronavirus outbreak ‘Wuhan virus’. Washington Post, 144, March 25th, 2020

    Google Scholar 

  175. J.M. Hughes, N.J. Foti, D.C. Krakauer, D.N. Rockmore, Quantitative patterns of stylistic influence in the evolution of literature. Proc. Natl. Acad. Sci. 109(20), 7682–7686 (2012)

    Google Scholar 

  176. J. Humes, Confessions of a White House Ghostwriter: Five Presidents and Other Political Adventures (Regnery Publishing, New York, 1997)

    Google Scholar 

  177. C. Ikae, S. Nath, J. Savoy, Unine at PAN-CLEF 2019: Bots and gender task, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)

    Google Scholar 

  178. C.R. Jacobsen, M. Nielsen, Stylometry of painting using hidden Markov modelling of contourlet transforms. Signal Process. 93(3), 579–591 (2013)

    Google Scholar 

  179. G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning with Applications in R (Springer, New York, 2013)

    MATH  Google Scholar 

  180. M.L. Jockers, Macroanalysis. Digital Methods and Literary History (University of Illinois Press, Urbana, 2013)

    Google Scholar 

  181. M.L. Jockers, Testing authorship in the personal writings of Joseph Smith using NSC classification. Lit. Linguis. Comput. 28(3), 371–381 (2013)

    MathSciNet  Google Scholar 

  182. M.L. Jockers, Text Analysis with R for Students of Literature (Springer, New York, 2014)

    Google Scholar 

  183. M.L. Jockers, D.M. Witten, A comparative study of machine learning methods for authorship attribution. Lit. Linguis. Comput. 25(2), 215–223 (2010)

    Google Scholar 

  184. M.L. Jockers, D.M. Witten, C. Criddle, Reassessing authorship of the Book of Mormon using Delta and nearest shrunken centroid classification. Lit. Linguis. Comput. 23(4), 465–491 (2008)

    Google Scholar 

  185. V. Johansson, Lexical diversity and lexical density in speech and writing. Working Papers, Lund University, vol. 53, pp. 61–79, 2008

    Google Scholar 

  186. F. Johansson, Supervised classification of Twitter accounts based on textual content of tweets. Notebook for PAN at CLEF 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)

    Google Scholar 

  187. M. Joos, The Five Clocks. A Linguistic Excursion into the Five Styles of English Usage (Harvest/HBJ Book, New York, 1961)

    Google Scholar 

  188. P. Joule, D. Vescovi, Analyzing stylometric approaches for author obfuscation, in Conference on Digital Forensics (Springer, Berlin, 2011), pp. 115–125

    Google Scholar 

  189. P. Juola, The time course of language change. Comput. Humanit. 37(1), 77–96 (2003)

    Google Scholar 

  190. P. Juola, Authorship attribution. Found. Trends Inf. Retr. 1(3), 233–334 (2006)

    Google Scholar 

  191. P. Juola, How a computer program helped show J.K. Rowling write a Cuckoo’s Calling. Scientific American, August 20th, 2013

    Google Scholar 

  192. P. Juola, Using the Google n-gram corpus to measure cultural complexity. Lit. Linguis. Comput. 28(4), 668–675 (2013)

    Google Scholar 

  193. P. Juola, The Rowling case: a proposed standard analytic protocol for authorship questions. Digit. Scholarsh. Humanit. 30(1), i100–i113 (2016)

    Google Scholar 

  194. P. Juola, Thesaurus-based semantics similarity judgments: a new approach to authorship similarity? in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 47–59

    Google Scholar 

  195. P. Juola, G.K. Mikros, S. Vinsick, Correlations and potential cross-linguistic indicators of writing style. J. Quant. Linguis. 26(2), 146–171 (2019)

    Google Scholar 

  196. G. Kacmarcik, M. Gamon, Obfuscating document stylometry to preserve author anonymity, in Proceedings of the Conference on Computational Linguistics (COLING-ACL) (The ACL Press, Stroudsburg, 2006), pp. 444–451

    Google Scholar 

  197. O.V. Kakushkina, A.A. Polikarpoc, D.V. Khmelev, Using literal and grammatical statistics for authorship attribution. Probl. Inf. Transm. 37(2), 172–184 (2001)

    MathSciNet  MATH  Google Scholar 

  198. D. Kalb, G. Peters, State of the Union. Presidential Rhetoric from Woodrow Wilson to George W. Bush (CQ Press, Washington, 2007)

    Google Scholar 

  199. D. Kalb, G. Peters, Analysis of Phylogenetics and Evolution with R (Springer, New York, 2012)

    Google Scholar 

  200. A. Karpathy, The unreasonable effectiveness of recurrent neural networks, May 2015

    Google Scholar 

  201. L. Kaufman, P.J. Rousseeuw, Finding Groups in Data. An Introduction to Cluster Analysis (Wiley, Hoboken, 2005)

    Google Scholar 

  202. J. Kelleher, Deep Learning (The MIT Press, Cambridge, 2019)

    Google Scholar 

  203. C. Kesler, C. Rossiter, The Federalist Papers (Signet Classic, New York, 2003)

    Google Scholar 

  204. M. Kestemont, S. Moens, J. Deploige, Collaborative authorship in the twelfth century: a stylometric study of Hildegard of Birgen and Guibert of Gembloux. Lit. Linguis. Comput. 20(2), 199–224 (2015)

    Google Scholar 

  205. V. Kešelj, F. Peng, N. Cercone, C. Thomas, N-gram-based author profiles for authorship attribution, in Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING’03 (The ACL Press, Stroudsburg, 2003), pp. 255–264

    Google Scholar 

  206. R. Ketcham, The Anti-Federalist Papers and Constitutional Convention Debates (Signet Classic, New York, 2003)

    Google Scholar 

  207. B. Kjell, Authorship determination using letter pair frequency features with neural network classifier. Lit. Linguis. Comput. 9(2), 119–124 (1994)

    Google Scholar 

  208. M. Kocher, J. Savoy, A simple and efficient algorithm for authorship verification. J. Assoc. Inf. Sci. Technol. 68(1), 259–269 (2015)

    Google Scholar 

  209. M. Kocher, J. Savoy, Distance measures in author profiling. Inf. Process. Manage. 53(5), 1103–1119 (2017)

    Google Scholar 

  210. M. Kocher, J. Savoy, Distributed language representation for authorship attribution. Digit. Scholarsh. Humanit. 33(2), 425–441 (2018)

    Google Scholar 

  211. M. Kocher, J. Savoy, Evaluation of text representation schemes and distance measures for authorship linking. Digit. Scholarsh. Humanit. 34(1), 189–207 (2019)

    Google Scholar 

  212. M. Kolakowski, T.H. Neale, The president’s State of the Union message: frequently asked questions. Congressional Research Service (RS20021), 2006

    Google Scholar 

  213. M. Koppel, J. Schler, Exploiting stylistic idiosyncrasies for authorship attribution, in IJCAI’03 Workshop on Computational Approaches to Style Analysis and Synthesis (2003), pp. 69–72

    Google Scholar 

  214. M. Koppel, S. Seidman, Detecting pseudoepigraphic texts using novel similarity measures. Digit. Scholarsh. Humanit. 33(1), 72–81 (2018)

    Google Scholar 

  215. M. Koppel, Y. Winter, Determining if two documents are by the same author. J. Assoc. Inf. Sci. Technol. 65(1), 178–187 (2014)

    Google Scholar 

  216. M. Koppel, S. Argamon, A.R. Shimoni, Automatically categorizing written texts by author gender. Lit. Linguis. Comput. 17(4), 401–412 (2002)

    Google Scholar 

  217. M. Koppel, N. Akiva, I. Dagan, Feature instability as a criterion for selecting potential style markers. J. Assoc. Inf. Sci. Technol. 57(11), 1519–1525 (2006)

    Google Scholar 

  218. M. Koppel, J. Schler, E. Bonchek-Dokow, Measuring differentiability: unmasking pseudonymous authors. J. Mach. Learn. Res. 8(6), 1261–1276 (2007)

    MATH  Google Scholar 

  219. M. Koppel, J. Schler, S. Argamon, Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol. 60(1), 9–26 (2009)

    Google Scholar 

  220. M. Koppel, J. Schler, S. Argamon, Authorship attribution in the wild. Lang. Resour. Eval. 45(1), 83–94 (2011)

    Google Scholar 

  221. M. Koppel, J. Schler, S. Argamon, Y. Winter, The ‘fundamental problem’ of authorship attribution. Engl. Stud. 93(3), 284–291 (2012)

    Google Scholar 

  222. D. Kosmajac, V. Kešelj, Twitter user profiling: bot and gender identification, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)

    Google Scholar 

  223. S. Kudugunta, E. Ferrara, Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)

    Google Scholar 

  224. N. Laan, Stylometry and methods. the case of Euripides. Lit. Linguis. Comput. 10(4), 271–278 (1995)

    Google Scholar 

  225. D. Labbé, Experiments on authorship attribution by intertextual distance in English. J. Quant. Linguis. 14(1), 33–80 (2007)

    Google Scholar 

  226. D. Labbé, Romain Gary et Emile Ajar. HAL 00279663, 2008

    Google Scholar 

  227. D. Labbé, Si deux et deux font quatre, Molière n’a pas écrit Dom Juan (Max Milo, Paris, 2009)

    Google Scholar 

  228. C. Labbé, D. Labbé, How to measure the meaning of words? Amour in Corneille’s work. Lang. Res. Eval. 39(4), 335–351 (2005)

    Google Scholar 

  229. D. Labbé, C. Labbé, A tool for literary studies. Lit. Linguis. Comput. 21(3), 311–326 (2006)

    Google Scholar 

  230. C. Labbé, D. Labbé, Duplicate and fake publications in the scientific literature. Scientometrics 94(1), 379–396 (2013)

    Google Scholar 

  231. C. Labbé, N. Grima, T. Gautier, B. Favier, J.A. Byrne, Semi-automated fact-checking of nucleotide sequence reagents in biomedical research publications: the Seek and Blastn tool. PLoS One 14(3), e0213266 (2019)

    Google Scholar 

  232. G. Lakoff, E. Wehling, The Little Blue Book: The Essential Guide to Thinking and Talking Democratic (Free Press, New York, 2012)

    Google Scholar 

  233. M. Lalli, F. Tria, V. Loreto, Data-compression approach to authorship attribution, in Elena Ferrante: A Virtual Author, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 61–83

    Google Scholar 

  234. Q. Le, T. Mikolov, Distributed representations of sentences and documents, in Proceedings International Conference on Machine Learning, vol. 32 (2015), pp. II-1188–II-1196

    Google Scholar 

  235. L. Lebart, A. Salem, L. Berry, Exploring Textual Data (Kluwer, Dordrecht, 1998)

    Google Scholar 

  236. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015)

    Google Scholar 

  237. G. Ledger, R. Merriam, Shakespeare, Fletcher, and The Two Noble Kinsmen. Lit. Linguis. Comput. 9(3), 235–248 (1994)

    Google Scholar 

  238. J.J. Lee, H.Y. Cho, H.R. Park, N-gram-based indexing for Korean text retrieval. Inf. Process. Manage. 35(4), 427–441 (1999)

    Google Scholar 

  239. R.J. Leigh, J. Casson, D. Ewald, A scientific approach to the Shakespeare authorship question. Lit. Rev. 9(1), 1–13 (2019)

    Google Scholar 

  240. O. Levy, Y. Goldberg, Linguistic regularities in sparse and explicit word representations, in Proceedings Computational Language Learning (2014), pp. 171–180

    Google Scholar 

  241. M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi, The similarity metric. IEEE Trans. Inf. Theory 50(12), 3250–3264 (2004)

    MathSciNet  MATH  Google Scholar 

  242. G.J. Lidstone, Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities. Trans. Fac. Actuaries 8, 182–192 (1920)

    Google Scholar 

  243. E.T. Lim, Five trends in presidential rhetoric: an analysis of rhetoric from George Washington to Bill Clinton. Pres. Stud. Q. 32(2), 328–348 (2002)

    MathSciNet  Google Scholar 

  244. D.E. Losada, F. Crestani, J. Parapar, Overview of eRisk: Early risk prediction on the internet. in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by P. Bellot, C. Trabelsi, J. Mothe, F. Murtagh, J.Y. Nie, L. Soulier, E. SanJuan, L. Cappellato, N. Ferro. Lecture Notes in Computer Science, vol. 11018 (Springer, Cham, 2018), pp. 343–361

    Google Scholar 

  245. D.E. Losada, F. Crestani, J. Parapar, Overview of eRisk 2019: early risk prediction on the internet, in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D.E. Losada, G.H. Bürki, L. Cappellato, N. Ferro. Lecture Notes in Computer Science, vol. 11696 (Springer, Cham, 2019), pp. 340–357

    Google Scholar 

  246. H. Love, Attributing Authorship: An Introduction (Cambridge University Press, Cambridge, 2002)

    Google Scholar 

  247. K. Luyckx, W. Daelemans, The effect of author set size and data size in authorship attribution. Lit. Linguis. Comput. 26(1), 35–44 (2011)

    Google Scholar 

  248. P. Maier, Ratification. The People Debate the Constitution, 1787–1788. Simon and Schuster Paperbacks, New York, 2010

    Google Scholar 

  249. C.D. Manning, H. Schütze, Foundations of Statistical Natural Language Processing (The MIT Press, Cambridge, 2000)

    MATH  Google Scholar 

  250. C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, Cambridge, 2008)

    MATH  Google Scholar 

  251. D. Mannion, P. Dixon, Sentence-length and authorship attribution: the case of Oliver Goldsmith. Lit. Linguis. Comput. 19(4), 497–508 (2004)

    Google Scholar 

  252. M.P. Marcus, B. Santorini, M.A. Marcinkiewicz, Building a large annotated corpus of English: the Penn Treebank. Comput. Linguis. 19(2), 313–330 (1993)

    Google Scholar 

  253. Y. Marton, N. Wu, L. Hellerstein, On compression-based text classification, in European Conference on Information Retrieval (ECIR) (Springer, Cham, 2005), pp. 300–314

    Google Scholar 

  254. R. Matthews, T. Merriam, Neural computation in stylometry: an application to the works of Shakespeare and Fletcher. Lit. Linguis. Comput. 8(4), 203–209 (1993)

    Google Scholar 

  255. C. McCormick, BERT word embeddings tutorial, May 2019

    Google Scholar 

  256. G. McCulloch, Because Internet. Understanding the New Rules of Language (Riverhead Books, New York, 2019)

    Google Scholar 

  257. P. McNamee, J. Mayfield, Character n-gram tokenization for European language text retrieval. Inf. Retr. J. 7(1–2), 73–98 (2004)

    Google Scholar 

  258. T. Mendenhall, The characteristic curves of composition. Science 214, 237–249 (1887)

    Google Scholar 

  259. R. Merriam, Letter frequency as a discriminator of authors. Notes Queries 41(4), 467–469 (1994)

    Google Scholar 

  260. M.I. Meyerson, Liberty’s Blueprint. How Madison and Hamilton Wrote the Federalist Papers, Defined the Constitution, and Made Democracy Safe for the World (Basic Books, Philadelphia, 2008)

    Google Scholar 

  261. J.-B. Michel, Y.K. Shen, A.P. Aiden, A. Veres, M.K. Gray, The Google Books Team, J.P. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, S. Pinker, M.A. Nowak, E.L. Aiden, Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)

    Google Scholar 

  262. J. Michell, Who Wrote Shakespeare (Thames and Hudson, London, 1999)

    Google Scholar 

  263. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in Proceedings of Workshop at ICLR 2013 (2013)

    Google Scholar 

  264. T. Mikolov, W.T. Yih, G. Zweig, Linguistic regularities in continuous space word representations, in Proceedings of NAACL HLT 2013 (The ACL Press, Stroudsburg, 2013), pp. 746–751

    Google Scholar 

  265. G.K. Mikros, Blended authorship attribution: Unmasking Elena Ferrante. in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 85–96

    Google Scholar 

  266. A. Miranda-Garcia, J. Calle-Martin, Yule’s characteristic K revisited. Lang. Res. Eval. 39(4), 287–294 (2005)

    Google Scholar 

  267. A. Miranda-Garcia, J. Calle-Martin, Function words in authorship attribution studies. Lit. Linguis. Comput. 22(1), 49–66 (2007)

    Google Scholar 

  268. A. Miranda-Garcia, J. Calle-Martin, The authorship of the disputed Federalist Papers with an annotated corpus. Engl. Stud. 93(3), 371–390 (2012)

    Google Scholar 

  269. T.M. Mitchell, Machine Learning (McGraw-Hill, New York, 1997)

    MATH  Google Scholar 

  270. D. Mitchell, Type-token models: a comparative study. J. Quant. Linguis. 22, 1–21 (2015)

    Google Scholar 

  271. R. Mitton, Spelling checkers, spelling corrections and the misspellings of poor spellers. Inf. Process. Manage. 23(5), 495–505 (1987)

    Google Scholar 

  272. F. Mosteller, D.L. Wallace, Inference in an authorship problem. J. Am. Stat. Assoc. 58(302), 275–309 (1963)

    MATH  Google Scholar 

  273. F. Mosteller, D.L. Wallace, Inference and Disputed Authorship, The Federalist (Addison-Wesley, Reading, 1964)

    MATH  Google Scholar 

  274. M. Motta, The dynamics and political implication of anti-intellectualism in the United States. Am. Polit. Res. 46(3), 465–498 (2018)

    Google Scholar 

  275. C. Muller, Principes et méthodes de statistique lexicale (Honoré Champion, Paris, 1992)

    Google Scholar 

  276. F. Murtagh, P. Legendre, Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J. Classif. 31(3), 274–295 (2014)

    MathSciNet  MATH  Google Scholar 

  277. M.J. Narag, M.N. Soriano, Identifying the painter using texture features and machine learning algorithms, in Proceedings International Conference on Cryptography, Security, and Privacy (ICCSP’19) (2019), pp. 201–205

    Google Scholar 

  278. T. Neal, K. Sundararajan, A. Fatima, Y. Yan, Y. Xiang, D. Woodard, Surveying stylometry techniques and applications. ACM Comput. Surv. 50(6) (2019). Article 86

    Google Scholar 

  279. L. Neidorf, M.S. Krieger, M. Yakubek, P. Chaudhuri, J.P. Dexter, Large-scale quantitative profiling of the Old English verse tradition. Nat. Hum. Behav. 3, 560–567 (2019)

    Google Scholar 

  280. Y. Neuman, Computational Personality Analysis: Introduction, Practical Applications and Novel Directions (Springer, Cham, 2016)

    Google Scholar 

  281. R.E. Neustadt, The Accidental President (Grossman, New York, 1967)

    Google Scholar 

  282. R.E. Neustadt, The Presidential Power and the Modern Presidents. The Politics of Leadership from Roosevelt to Reagan (Free Press, New York, 1990)

    Google Scholar 

  283. J. Noecker, M. Ryan, P. Juola, Psychological profiling through textual analysis. Lit. Linguis. Comput. 28(3), 382–387 (2013)

    Google Scholar 

  284. J.S. Nye, Presidential Leadership and the Creation of the American Era (Princeton University Press, Princeton, 2013)

    Google Scholar 

  285. M.P. Oakes, M. Farrow, Use of the chi-squared test to examine vocabulary differences in English language corpora representing seven different countries. Lit. Linguis. Comput. 22(1), 85–99 (2007)

    Google Scholar 

  286. K.A. O’Halloran, C. Coffin, Getting Started. Describing the Grammar of Speech and Writing (The Open University, Milton Keynes, 2005)

    Google Scholar 

  287. C. Olah, Understanding LSTM networks, August 2015

    Google Scholar 

  288. W. Oliveira, E. Justino, L.S. Oliveira, Comparing compression models for authorship attribution. Forensic Sci. Int. 228, 100–104 (2013)

    Google Scholar 

  289. J. Olsson, Forensic Linguistics (Continuum, London, 2008)

    Google Scholar 

  290. J. Olsson, Word Crime. Solving Crime Through Forensic Linguistics (Bloomsbury, London, 2009)

    Google Scholar 

  291. J. Olsson, More Wordcrime. Solving Crime Through Forensic Linguistics (Bloomsbury, London, 2018)

    Google Scholar 

  292. B. Pang, L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, in Proceedings Association for Computational Linguistics (ACL), pp. 115–124 (The ACL Press, Stroudsburg, 2005)

    Google Scholar 

  293. R.R. Panko, What we known about spreadsheet errors. J. End User Comput. 10(2), 51–21 (1998)

    Google Scholar 

  294. G. Park, D.B. Yaden, H.A. Schwartz, M.L. Kern, J.C. Eichstaedt, M. Kosinski, D. Stillwell, L.H. Ungar, M.E.P. Seligman, Women are warmer but no less assertive than men: gender and language on Facebook. PLoS One 11(5), e0155885 (2016)

    Google Scholar 

  295. A. Pawłowski, Séries temporelles en linguistique: Application à l’attribution de textes, Romain Gary et Emile Ajar (Slatkine, Lausanne, 1996)

    Google Scholar 

  296. L. Pearl, M. Steyvers, Detecting authorship deception: a supervised machine learning approach using author writeprints. Lit. Linguis. Comput. 27(2), 183–196 (2012)

    Google Scholar 

  297. C. Peersman, W. Daelemans, L. Van Vaerenbergh, Predicting age and gender in online social networks, in International Workshop on Search and Mining User-generated Contents (SMUC’11) (Springer, Cham, 2011), pp. 37–44

    Google Scholar 

  298. A. Penas, A. Rodrigo, A single measure to assess nonresponse, in Proceedings 49th Conference of the Association for Computational Linguistics (ACL), pp. 1415–1424 (The ACL Press, Stroudsburg, 2011)

    Google Scholar 

  299. J.W. Pennebaker, The Secret Life of Pronouns. What Our Words Say About Us (Bloomsbury Press, New York, 2011)

    Google Scholar 

  300. J. Pennington, R. Socher, C.D. Manning, GloVe: Global vectors for word representations, in Proceedings of the Empirical Methods in Natural Language Processing (2014), pp. 1532–1543

    Google Scholar 

  301. S. Pinker, The Sense of Style (Penguin Books, London, 2014)

    Google Scholar 

  302. P. Plechác̆, K. Bobenhausen, B. Hammerich, Versification and authorship attribution. Pilot study on Czech, German, Spanish, and English poetry. Studia Metrica et Poetica 5(2), 29–54 (2018)

    Google Scholar 

  303. I.-I. Popescu, G. Altmann, P. Grzybek, B.D. Jayaram, R. Köhler, V. Krupa, J. Mačutek, R. Pustet, L. Uhlířovà, M.N. Vidya, Word Frequency Studies (De Gruyter Mouton, Berlin, 2009)

    Google Scholar 

  304. I.-I. Popescu, K.H. Best, G. Altmann, Unified Modeling of Length in Language (RAM-Verlag, Lüdenscheid, 2014)

    Google Scholar 

  305. M.F. Porter, An algorithm for suffix stripping. Program 14, 130–137 (1980)

    Google Scholar 

  306. N. Potha, E. Stamatatos, Improving author verification based on topic modeling. J. Assoc. Inf. Sci. Technol. 70(10), 1074–1088 (2019)

    Google Scholar 

  307. M. Potthast, A. Barròn-Cedeno, B. Stein, P. Rosso, Cross-language plagiarism detection. Lang. Resour. Eval. 45(1), 1–18 (2011)

    Google Scholar 

  308. M. Potthast, M. Hagen, B. Stein, Author obfuscation: attacking the state of the art in authorship verification, in Working Notes Papers of the CLEF 2016 Evaluation Labs volume 1609 of CEUR Workshop (CEUR, Aachen, 2016)

    Google Scholar 

  309. M. Potthast, F. Rangel, M. Tschuggnall, E. Stamatatos, P. Rosso, B. Stein, Overview of PAN’17: author identification, author profiling, and author obfuscation, in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by G. Jones, S. Lawless, J. Gonzalo, L. Kelly, L. Goeuriot, T. Mandl, L. Cappellato, N. Ferro. Lecture Notes in Computer Science, vol. 10456 (Springer, Berlin, 2017), pp. 275–290

    Google Scholar 

  310. M. Potthast, F. Schremmer, M. Hagen, B. Stein, Overview of the author obfuscation task at PAN 2018: a new approach to measuring safety, in Working Notes Papers of the CLEF 2018 Evaluation Labs Volume 2125 of CEUR Workshop (CEUR, Aachen, 2018)

    Google Scholar 

  311. M. Potthast, P. Rosso, E. Stamatatos, B. Stein, A decade of shared tasks in digital text forensics at PAN, in Proceedings ECIR2019. Springer Lecture Notes in Computer Science, vol. 11438 (2019), pp. 291–300

    Google Scholar 

  312. R. Queneau, Exercices de style (Gallimard, Paris, 1947)

    Google Scholar 

  313. F. Rangel, P. Rosso, On the impact of emotions on author profiling. Inf. Process. Manage. 52(1), 73–92 (2016)

    Google Scholar 

  314. F. Rangel, P. Rosso, Overview of the 7th author profiling task at PAN 2019: bots and gender profiling in twitter, in Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)

    Google Scholar 

  315. F. Rangel, P. Rosso, M. Montes y Gómez, M. Potthast, B. Stein, Overview of the 6th author profiling task at PAN 2018: multimodal gender identification in twitter, in Working Notes Papers of the CLEF 2018 Evaluation Labs Volume 2125 of CEUR Workshop (CEUR, Aachen, 2018)

    Google Scholar 

  316. J.R. Rao, P. Rohatgi, Can pseudonymity really guarantee privacy? in Proceedings of the 9th USENIX Security Symposium (USENIX Association, New Orleans, 2000), pp. 85–96

    Google Scholar 

  317. T.R. Reddy, B.V. Vardhan, P.V. Reddy, A survey on authorship profiling techniques. Int. J. Appl. Eng. Res. 11(5), 3092–3102 (2016)

    Google Scholar 

  318. W.J. Ridings, S.B. McIver, Rating the Presidents: A Ranking of U.S. Leaders, from the Great and Honorable to the Dishonest and Incompetent (Carol Publishing, Secaucus, 1997)

    Google Scholar 

  319. P. Rizvi, An improvement to Zeta. Digit. Scholarsh. Humanit. 34(2), 419–422 (2019)

    Google Scholar 

  320. P. Rizvi, The interpretation of the Zeta test results. Digit. Scholarsh. Humanit.34(2), 401–418 (2019)

    Google Scholar 

  321. A. Rocha, W.J. Scheirer, C.W. Forstall, T. Cavalcante, A. Theophilo, B. Shen, A.R.B. Carvalho, E. Stamatatos, Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017)

    Google Scholar 

  322. X. Rong, Word2vec parameter learning explained (2016). arXiv.org. arXiv:1411.2738

    Google Scholar 

  323. M. Rosen-Zvi, T. Griffiths, T. Steyvers, P. Smyth, The author-topic model for authors and documents, in Proceedings of the Uncertainty in Artificial Intelligence (The AUAI Press, Arlington, 2004), pp. 487–494.

    Google Scholar 

  324. M. Rosen-Zvi, C. Chemudugunta T. Griffiths, T. Steyvers, P. Smyth, Learning author-topic models from text corpora. ACM Trans. Inf. Syst. 28(1) (2010). Article 4

    Google Scholar 

  325. J. Rudman, The state of authorship attribution studies: some problems and solutions. Comput. Humanit. 31(4), 351–365 (1998)

    MathSciNet  Google Scholar 

  326. J. Rudman, Unediting, de-editing, and editing in non-traditional authorship attribution studies: with an emphasis on the canon of Daniel Defoe. Pap. Bibliogr. Soc. Am. 99(1), 5–36 (2005)

    Google Scholar 

  327. J. Rudman, The twelve disputed Federalist Papers: a case for collaboration, in Proceedings Digital Humanities 2012 (2012), pp. 353–356

    Google Scholar 

  328. A. Rule, J.P. Cointet, P.S. Bearman, Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014, in Proceedings National Academy of Sciences, vol. 112(35) (2015), pp. 10837–10844

    Google Scholar 

  329. D. Rumelhart, G. Hinton, R. Williams, Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    MATH  Google Scholar 

  330. J. Rybicki, Partners in life, partners in crime? in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 111–122

    Google Scholar 

  331. J. Rybicki, M. Eder, Deeper Delta across genres and languages: do we really need the most frequent words. Lit. Linguis. Comput. 26(3), 315–321 (2011)

    Google Scholar 

  332. J. Rybicki, M. Heydel, The stylistics and stylometry of collaborative translations: Woolf’s night and day in Polish. Lit. Linguis. Comput. 28(4), 708–717 (2013)

    Google Scholar 

  333. J. Rybicki, D.L. Hoover, M. Kestemont, Collaborative authorship: Conrad, Ford and rolling Delta. Lit. Linguis. Comput. 29(3), 422–431 (2014)

    Google Scholar 

  334. G. Sampson, Empirical Linguistics (Continuum, London, 2001)

    Google Scholar 

  335. J. Savoy, Lexical analysis of US political speeches. J. Quant. Linguis. 17(2), 123–141 (2010)

    Google Scholar 

  336. J. Savoy, Authorship attribution based on specific vocabulary. ACM-Trans. Inf. Syst. 30(2), 170–199 (2012)

    Google Scholar 

  337. J. Savoy, Authorship attribution based on a probabilistic topic model. Inf. Process. Manage. 49(1), 341–354 (2013)

    Google Scholar 

  338. J. Savoy, The Federalist Papers revisited:a collaborative attribution scheme, in Proceedings ASIST 2013, Montreal, November 2013

    Google Scholar 

  339. J. Savoy, Comparative evaluation of term selection functions for authorship attribution. Digit. Scholarsh. Humanit. 30(2), 246–261 (2015)

    MathSciNet  Google Scholar 

  340. J. Savoy, Text clustering: an application with the State of the Union addresses. J. Assoc. Inf. Sci. Technol. 66(8), 1645–1654 (2015)

    Google Scholar 

  341. J. Savoy, Vocabulary growth study: An example with the State of the Union addresses. J. Quant. Linguis. 22(4), 289–310 (2015)

    Google Scholar 

  342. J. Savoy, Estimating the probability of an authorship attribution. J. Assoc. Inf. Sci. Technol. 67(6), 1462–1472 (2016)

    Google Scholar 

  343. J. Savoy, Text representation strategies: an example with the State of the Union addresses. J. Assoc. Inf. Sci. Technol. 67(8), 1858–1870 (2016)

    Google Scholar 

  344. J. Savoy, Analysis of the style and the rhetoric of the American presidents over two centuries. Glottometrics 38(1), 55–76 (2017)

    Google Scholar 

  345. J. Savoy, Analysis of the style and the rhetoric of the 2016 US presidential primaries. Digit. Scholarsh. Humanit. 33(1), 143–159 (2018)

    Google Scholar 

  346. J. Savoy, Elena Ferrante unmasked. in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 123–142

    Google Scholar 

  347. J. Savoy, Is Starnone really the author behind Ferrante? Digit. Scholarsh. Humanit. 33(4), 902–918 (2018)

    Google Scholar 

  348. J. Savoy, Trump’s and Clinton’s style and rhetoric during the 2016 presidential election. J. Quant. Linguis. 25(2), 168–189 (2018)

    Google Scholar 

  349. J. Savoy, Authorship of Pauline epistles revisited. J. Assoc. Inf. Sci. Technol. 70(19), 1089–1097 (2019)

    Google Scholar 

  350. N. Schaetti, J. Savoy, Comparison of visualisable evidence-based authorship attribution using reservoir computing and deep learning architecture. Technical Report, University of Neuchatel, 2020

    Google Scholar 

  351. H. Schmid, Improvements in part-of-speech tagging with an application to German, in Proceedings in the ACL SIGDAT-Workshop (The ACL Press, Stroudsburg, 1995), pp. 47–50

    Google Scholar 

  352. S. Schöberlein, Poe or not Poe? A stylometric analysis of Edgar Allan Poe’s disputed writings. Digit. Scholarsh. Humanit. 32(3), 643–759 (2017)

    Google Scholar 

  353. H.A. Schwartz, J.C. Eichstaedt, M.L. Kern, L. Dziurzynski, S.M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M.E.P. Seligman, L.H. Ungar, Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One 8(9), e73791 (2013)

    Google Scholar 

  354. D. Scully, C.E. Brodley, A compression and machine learning: a new perspective on feature space vectors, in Data Compression Conference (DCC’06) (The IEEE Press, Piscataway, 2006), pp. 332–341

    Google Scholar 

  355. P. Seargeant, The Emoji Revolution. How Technology Is Shaping the Future of Communication (Cambridge University Press, Cambridge, 2019)

    Google Scholar 

  356. F. Sebastiani, Machine learning in automated text categorization. ACM Comput. Surv. 14(1), 1–27 (2002)

    Google Scholar 

  357. C.J. Shogan, The president’s State of the Union address: tradition, function, and policy implications. Congressional Research Service (R40132), 2016

    Google Scholar 

  358. C.J. Shogan, T.H. Neale, The president’s State of the Union address: Tradition, function, and policy implications. Congressional Research Service (7-5700), 2012

    Google Scholar 

  359. K. Shu, H. Liu, Detecting Fake News on Social Networks (Morgan & Claypool, San Francisco, 2019)

    Google Scholar 

  360. K. Shu, A. Silva, S. Wang, J. Tang, H. Liu, Fake news detection on social media: a data mining perspective. ACM SIGKDD Explorations Newsletter 1(19), 22–36 (2017)

    Google Scholar 

  361. H.S. Sichel, On a distribution law for word frequencies. J. Am. Stat. Assoc. 70(351), 542–547 (1975)

    Google Scholar 

  362. E.H. Simpson, Measurement of diversity. Nature 163, 688 (1949)

    MATH  Google Scholar 

  363. R.B. Slatcher, C.K. Chung, J.W. Pennebaker, Winning words: individual differences in linguistic style among U.S. presidential and vice presidential candidates. J. Res. Personal. 41, 63–75 (2007)

    Google Scholar 

  364. F. Smadja, Retrieving collocations from text: Xtract. Comput. Linguis. 19(1), 143–178 (1993)

    Google Scholar 

  365. G. Smith, The AI Delusion (Oxford University Press, Oxford, 2018)

    Google Scholar 

  366. G. Smith, J. Cordes, The 9 Pitfalls of Data Science (Oxford University Press, Oxford, 2019)

    Google Scholar 

  367. J.A. Smith, C. Kelly, Stylistic constancy and change across literary corpora: using measures of lexical richness to date works. Comput. Humanit. 36(4), 411–430 (2002)

    Google Scholar 

  368. V. Sotirova, The Bloomsbury Companion to Stylistics (Bloomsbury, London, 2016)

    Google Scholar 

  369. K. Sparck Jones, A statistical interpretation of term specificity and its application in retrieval. J. Doc. 60(5), 493–502 (1972)

    Google Scholar 

  370. D. Spiegelhalter, The Art of Statistics. Learning from Data (Pelican, London, 2019)

    Google Scholar 

  371. R. Sproat, Morphology and Computation (The MIT Press, Cambridge, 1992)

    Google Scholar 

  372. E. Stamatatos, Authorship attribution based on feature set subspacing ensembles. J. Artif. Intell. Tools 15(5), 823–838 (2006)

    Google Scholar 

  373. E. Stamatatos, A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009)

    Google Scholar 

  374. E. Stamatatos, Authorship attribution using text distortion, in Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (ACL) (The ACL Press, Stroudsburg, 2017), pp. 1138–1149

    Google Scholar 

  375. E. Stamatatos, N. Fakotakis, G. Kokkinakis, Computer-based authorship attribution without lexical measures. J. Assoc. Inf. Sci. Technol. 35(1), 193–214 (2001)

    Google Scholar 

  376. E. Stamatatos, W. Daelemans, B. Verhoeven, M. Potthast, B. Stein, J. Juola, M.A. Sanchez-Perez, A. Barrón-Cadeno, Overview of the author identification task at PAN 2014, in Proceeding CLEF-2014, Working Notes, ed. by L. Cappellato, N. Ferro, M. Halvey, W. Kraaij (CEUR, Aachen, 2014), pp. 877–897

    Google Scholar 

  377. E. Stamatatos, M. Tschuggnall, B. Verhoeven, W. Daelemans, G. Specht, B. Stein, M. Potthast, Clustering by authorship within and across documents, in Notebook Papers of CLEF 2016 Labs and Workshop (CEUR, Aachen, 2016)

    Google Scholar 

  378. C. Stamou, Stylochronometry: Stylistic development, sequence of composition, and relative dating. Lit. Linguis. Comput. 23(2), 181–199 (2008)

    MathSciNet  Google Scholar 

  379. B. Stein, N. Lipka, P. Prettenhofer, Intrinsic plagiarism analysis. Lang. Resour. Eval. 45(1), 63–82 (2011)

    Google Scholar 

  380. J.M. Stella, E. Ferrara, M. De Domenico, Bots increase exposure to negative and inflammatory content in online social systems. Proc. Natl. Acad. Sci. 115(49), 12435–12440 (2018)

    Google Scholar 

  381. P.J. Stone, The General Inquirer: A Computer Approach to Content Analysis. (The MIT Press, Cambridge, 1966)

    Google Scholar 

  382. D.M. Strong, Y.W. Lee, R.Y. Wang, Data quality in context. Commun. ACM 40(5), 103–110 (1997)

    Google Scholar 

  383. L.M. Stuart, S. Tazhibayeva, A.R. Wagoner, J.M. Taylor, On identifying authors with style, in Proceedings of the 2013 IEEE Conference on Systems, Man, and Cybernetics (The IEEE Press, Washington, 2013), pp. 3048–3053

    Google Scholar 

  384. I. Sutskever, J. Martens, G. Hinton, Generating text with recurrent neural networks, in Proceedings of the 28th International Conference on Machine Learning (ICML-11) (Omnipress, Madison, 2011), pp. 1017–1024

    Google Scholar 

  385. I. Sutskever, O. Vinyls, Q.V. Lee, Sequence to sequence learning with neural networks, in Advanced in Neural Information Processing Systems 27 (NIPS 2014), vol. 28 (The IEEE Press, Washington, 2014), pp. 3104–3112

    Google Scholar 

  386. M. Taddy, Document classification by inversion of distributed language representations, in Proceedings Association for Computational Linguistics (ACL) (The ACL Press, Stroudsburg, 2014), pp. 45–49

    Google Scholar 

  387. K. Tanaka-Ishii, S. Aihara, Computational constancy measures of texts - Yule’s K and Rényi’s entropy. Comput. Linguis. 41(3), 481–502 (2015)

    Google Scholar 

  388. L. Tassinari, John Florio, The Man who was Shakespeare (Giano Books, Montreal, 2009)

    Google Scholar 

  389. Y.R. Tausczik, J.W. Pennebaker, The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)

    Google Scholar 

  390. G. Taylor, G. Egan, The New Oxford Shakespeare: Authorship Companion (Oxford University Press, Oxford, 2017)

    Google Scholar 

  391. G. Taylor, R. Loughnane, The life and theatrical interests of Edward de Vere, seventeenth Earl of Oxford, in Shakespeare, Beyond Doubt. Evidence, Argument, Controversy, ed. by P. Edmondson, S. Wells (Cambridge University Press, Cambridge, 2013), pp. 39–48

    Google Scholar 

  392. G. Taylor, R. Loughnane, The canon and chronology of Shakspeare’s works, in The New Oxford Shakespeare: Authorship Companion, ed. by G. Taylor, G. Egan (Oxford University Press, Oxford, 2017), pp. 417–603

    Google Scholar 

  393. W.J. Teahan, D.J. Harper, Using compression-based languages model for text categorization, in Language Modeling for Information Retrieval (Springer, Cham, 2003), pp. 141–165

    MATH  Google Scholar 

  394. R. Thisted, B. Efron, Did Shakespeare write a newly-discovered poem? Biometrika 4740(3), 445–455 (1987)

    MathSciNet  MATH  Google Scholar 

  395. F.N. Thomas, M. Turner, Clear and Simple as the Truth. Writing Classic Prose (Princeton University Press, Princeton, 2011)

    Google Scholar 

  396. J.R.R. Tolkien, Beowulf. The monsters and the critics, in Proceedings of the British Academy (1936)

    Google Scholar 

  397. P. Törnberg, Echo chambers and viral misinformation: Modeling fake news as complex contagion. PLoS One 13(9), e0203958 (2018)

    Google Scholar 

  398. K. Toutanova, D. Klein, C. Manning, Y. Singer, Feature-rich part-of-speech tagging with a cyclic dependency network, in Proceedings of HLT-NAACL 2003, pp. 252–259 (The ACL Press, Stroudsburg, 2003)

    Google Scholar 

  399. A.W. Trask, Deep Learning (Manning, Shelter Island, 2019)

    Google Scholar 

  400. M. Trevisani, A. Tuzzi, A portrait of JASA: the history of statistics through analysis of keyword counts in an early scientific journal. Qual. Quant. 49(3), 1287–1304 (2013)

    Google Scholar 

  401. M. Trevisani, A. Tuzzi, Learning the evolution of disciplines from scientific literature: a functional clustering approach to normalized keyword count trajectories. Knowl.-Based Syst. 146, 129–141 (2018)

    Google Scholar 

  402. J. Tuldava, The development of statistical stylistics (a survey). J. Quant. Linguis. 11(1–2), 141–151 (2004)

    Google Scholar 

  403. J. Tulis, The Rhetorical Presidency (Princeton University Press, Princeton, 1987)

    Google Scholar 

  404. A. Tuzzi, What to put in the bag? Comparing and contrasting procedures for text clustering. Ital. J. Appl. Stat. 22(1), 77–94 (2010)

    Google Scholar 

  405. A. Tuzzi (ed.), Tracing the Life Cycle of Ideas in the Humanities and Social Sciences (Springer, Cham, 2018)

    Google Scholar 

  406. A. Tuzzi, M. Cortelazzo, Drawing Elena Ferrante’s Profile (Padova University Press, Padova, 2018)

    Google Scholar 

  407. A. Tuzzi, M. Cortelazzo, What is Elena Ferrante? A comparative analysis of a secretive bestselling Italian writer. Digit. Scholarsh. Humanit. 33(3), 685–702 (2018)

    Google Scholar 

  408. A. Tuzzi, M.A. Cortelazzo, It takes many hands to draw Elena Ferrante’s profile, in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 9–30

    Google Scholar 

  409. F.J. Tweedie, R.H. Baayen, How variable may a constant be? Measures of lexical richness in perspective. Comput. Humanit. 32(5), 323–352 (1998)

    Google Scholar 

  410. F.J. Tweedie, S. Singh, D.I. Holmes, Neural network applications in stylometry: the Federalist Papers. Comput. Humanit. 30(1), 1–10 (1996)

    Google Scholar 

  411. J. Urbano, H. Lima, A. Hanjalic, Statistical significance testing in information retrieval: an empirical analysis of type I, type II and type III errors, in Proceedings ACM-SIGIR (The ACM Press, New York, 2019), pp. 505–514

    Google Scholar 

  412. R. van der Goot, N. Ljubešić, I. Matroos, M. Nissim, B. Plank, Bleaching text: abstract features for cross-lingual gender prediction, in Proceedings of the Annual meeting of the Association for Computational Linguistics (ACL) (The ACL Press, Stroudsburg, 2018), pp. 383–389

    Google Scholar 

  413. O. Varol, E. Ferrara, C.A. Davis, F. Menczer, A. Flammini, Online human-bot interactions: detection, estimation, and characterization, in Proceedings of the 11th AAAI Conference on Web and Social Media (ICWSM 2017), pp. 280–289 (2017)

    Google Scholar 

  414. T. Veale, M. Cook, Twitterbots. Making Machines that Make Meaning (The MIT Press, Cambridge, 2018)

    Google Scholar 

  415. B. Vickers, Shakespeare, Co-author. A Historical Study of Five Collaborative Plays (Oxford University Press, Oxford, 2002)

    Google Scholar 

  416. H. Voorhees, D. Harman, The TREC Experiment and Evaluation in Information Retrieval (The MIT University Press, Cambridge, 2005)

    Google Scholar 

  417. P. Vossen, EuroWordNet: a Multilingual Database with Lexical Semantic Networks (Kluwer, Dordrecht, 1998)

    MATH  Google Scholar 

  418. A. Vrij, Detecting Lies and Deceit. Pitfalls and Opportunities (Wiley, Chichester, 2008)

    Google Scholar 

  419. T. Wilson, P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, E. Riloff, S. Patwardhan, Opinionfinder: a system for subjectivity analysis, in Proceedings Empirical Methods for Natural Language Processing (HLT/EMNLP) (2005), pp. 34–35

    Google Scholar 

  420. I.H. Witten, E. Frank, M.A. Hall, Data Mining. Practical Machine Learning Tools and Techniques (Morgan Kaufmann, Burlington, 2013)

    Google Scholar 

  421. R. Wittgenstein, Philosophical Investigations (Basil Blackwell, London, 1953)

    MATH  Google Scholar 

  422. D.H. Wolpert, The lack of a priori distinctions between learning algorithms. Neural Comput. 8, 1341–1390 (1996)

    Google Scholar 

  423. D.H. Wolpert, The supervised learning no-free-lunch theorems. in Proceedings of the 6th Online World Conference on Soft Computing in Industrial Applications (2001), pp. 25–42

    Google Scholar 

  424. Y. Yang, X. Liu, A re-examination of text categorization methods, in Proceedings ACM-SIGIR Conference (The ACM Press, New York, 1999), pp. 42–49

    Google Scholar 

  425. Y. Yang, J.O. Pederson, A comparative study of feature selection in text categorization, in Proceedings International Conference on Machine Learning (The ACM Press, New York, 1997), pp. 412–420

    Google Scholar 

  426. B. Ycart, Alberti’s letter counts. Lit. Linguis. Comput. 29(2), 255–265 (2014)

    Google Scholar 

  427. L. Young, S. Soroka, Affective news: the automated coding of sentiment in political texts. Am. Polit. Res. 29(2), 205–231 (2012)

    Google Scholar 

  428. G. Yule, The Study of Language, 7th edn. (Cambridge University Press, Cambridge, 2020)

    Google Scholar 

  429. E. Zangerle, M. Tschuggnall, G. Specht, B. Stein, M. Potthast, Overview of the style change detection task at PAN 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)

    Google Scholar 

  430. R. Zbib, L. Zhao, D. Karakos, W. Hartmann, J. DeYoung, Z. Huang, Z. Jiang, N. Rivkin, L. Zhang, R. Schwartz, J. Makhoul, Neural-network lexical translation for cross-lingual IR from text and speech, in Proceedings ACM-SIGIR (The ACM Press, New York, 2019), pp. 645–654

    Google Scholar 

  431. Y. Zhao, J. Zobel, Entropy-based authorship search in large document collection, in Proceedings ECIR2007. Springer Lecture Notes in Computer Science, vol. 4425 (2007), pp. 381–392

    Google Scholar 

  432. G.K. Zipf, The Psychology of Language (Houghton-Mifflin, Boston, 1935)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Savoy, J. (2020). Advanced Models for Stylometric Applications. In: Machine Learning Methods for Stylometry. Springer, Cham. https://doi.org/10.1007/978-3-030-53360-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-53360-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-53359-5

  • Online ISBN: 978-3-030-53360-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics