Advanced Models for Stylometric Applications

Savoy, Jacques

doi:10.1007/978-3-030-53360-1_7

Jacques Savoy²

1113 Accesses

Abstract

Some well-known models have been explained in the previous chapter, but various advanced approaches have been suggested. Related to the humanities, the Zeta test is focusing on terms used recurrently by one author and mainly ignored by the others. Selecting stylistic markers based on this criterion, the model builds a graph showing the similarities between text excerpts. Compression algorithms could also be applied to identify the true author of a text based on similar word frequencies. More related to the natural language processing domain, the latent Dirichlet allocation (LDA) could be applied to define the most probable author of a given document. To solve the verification problem, several dedicated approaches have been suggested and an overview of them is included in this chapter. Although we usually assume that a novel is written only by a single person, collaborative authorship is possible. To detect passages written by each possible author, the rolling Delta and other ad hoc approaches are described. As neural models constitute an important research field, three sections have been dedicated to them, with one on the basic neural approach, one focusing on word embeddings, and the third on the long short-term memory (LSTM), a well-known deep learning model. The last section is dedicated to adversarial stylometry and obfuscation, or how one can possibly program a computer to hide stylistic markers left by the original author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Abbasi, H. Chen, Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2) (2008). Article 7
Google Scholar
S. Adamovic, V. Miskovic, M. Milosavljevic, M. Sarac, M. Veinovic, Automated language-independent authorship verification (for Indo-European languages). J. Assoc. Inf. Sci. Technol. 70(8), 858–871 (2019)
Google Scholar
D. Adger, Language Unlimited. The Science Behind Our Most Creative Power (Oxford University Press, Oxford, 2019)
Google Scholar
S. Afroz, M. Brennam, R. Greenstadt, Detecting hoaxes, frauds, and deception in writing style online, in Proceedings of the 2012 IEEE Symposium on Security and Privacy, pp. 402–416 (IEEE Computer Society, Washington, 2012)
Google Scholar
C.C. Aggarwal, Mining text streams, in Mining Text Data, ed. by C.C. Aggarwal, C.X. Zhai (Springer, New York, 2012), pp. 297–321
Google Scholar
S. Ahmadian, S. Azarshahi, D.L. Paulhus, Explaining Donald Trump via communication style: grandiosity, informality, and dynamism. Personal. Individ. Differ. 107, 49–53 (2017)
Google Scholar
N. Akiva, M. Koppel, Identifying distinct components of a multi-author document, in European Intelligent and Security Informatics Conference (2012), pp. 205–209
Google Scholar
M. Alfaro, The daily 202: Alexander Hamilton has been cast in a starring for impeachment’s closing argument, in Washington Post, 143 (Dec. 17th) (2019)
Google Scholar
M. Almishari, G. Tsudik, Exploring linkability of user reviews, in Proceedings Computer Security ESORICS. Lecture Notes in Computer Science, vol. 7459 (Springer, Berlin, 2012), pp. 307–324.
Google Scholar
S.M. Alzahrani, N. Salim, A. Abraham, Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybernet. Part C (Appl. Rev.) 42(2), 133–149 (2012)
Google Scholar
A. Antonia, C. Hugh, J. Elliott, Language chunking, data sparseness, and the value of a long marker list: explorations with word n-grams and authorial attribution. Lit. Linguis. Comput. 29(2), 147–163 (2014)
Google Scholar
S. Argamon, Interpreting Burrows’ Delta: geometric and probabilistic foundations. Lit. Linguist. Comput. 23(2), 131–147 (2008)
Google Scholar
S. Argamon, M. Koppel, J.W. Pennebaker, J. Schler, Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)
Google Scholar
H.R. Baayen, Word Frequency Distributions (Kluwer Academic Press, Dordrecht, 2001)
MATH Google Scholar
H.R. Baayen, Analysis Linguistic Data: A Practical Introduction to Statistics Using R (Cambridge University Press, Cambridge, 2008)
Google Scholar
H. Baayen, H. van Halteren, F.J. Tweedie, Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguis. Comput. 11(3), 121–132 (1996)
Google Scholar
A. Bacciu, M. La Morgia, A. Mei, E. Nerio Nemmi, V. Neri, J. Stefa, Bot and gender detection of Twitter accounts using distortion and LSA. Notebook for PAN at CLEF 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)
Google Scholar
E. Backer, P. van Kranenburg, On musical stylometry - A pattern recognition approach. Patt. Recogn. Lett. 26(3), 299–309 (2005)
Google Scholar
N. Bagnall, Newspaper Language (Focal Press, Oxford, 1993)
Google Scholar
D.W. Barowy, E.D. Berger, B. Zorn, ExceLint: automatically finding spreadsheet formula errors, in Proceedings ACM Programming Language, vol. 2 (2018). Article 148
Google Scholar
M. Barrick, M.K. Mount, The big five personality dimensions and job performance: a meta-analysis. Person. Psychol. 44(1), 1–26 (1991)
Google Scholar
L. Bauer, P. Trudgill, Language Myths (Penguin Books, London, 1998)
Google Scholar
A. Bellaachia, E. Jimenez, Exploring performance-based music attributes for stylometric analysis. World Acad. Sci. Eng. Technol. 3(7), 1795–1797 (2009)
Google Scholar
D. Benedetto, E. Caglioti, V. Loreto, Language trees and zipping. Phys. Rev. Lett. 88(4), 048702 (2002)
Google Scholar
Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
MATH Google Scholar
Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin, A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
I. Bensalem, P. Rosso, S. Chikhi, One the use of character n-grams as the evidence of plagiarism. Lang. Resour. Eval. 53(2), 1–34 (2019)
Google Scholar
S. Benzel, A simple stylometry comparator: Nifty assignment. J. Comput. Sci. Coll. 31(2), 283–284 (2015)
Google Scholar
D. Biber, Representativeness in corpus design. Lit. Linguis. Comput. 8(4), 243–257 (1993)
Google Scholar
D. Biber, Dimensions of the Register Variation. (Cambridge University Press, Cambridge, 1995)
Google Scholar
D. Biber, S. Conrad, Register, Genre, and Style (Cambridge University Press, Cambridge, 2009)
Google Scholar
D. Biber, S. Conrad, G. Leech, The Longman Student Grammar of Spoken and Written English (Longman, London, 2002)
Google Scholar
J.N.G. Binongo, Who wrote the 15th Book of Oz? An application of multivariate analysis to authorship attribution. Chance 16(2), 9–17 (2003)
MathSciNet Google Scholar
J.N.G. Binongo, M.W. Smith, The application of principal component analysis to stylometry. Lit. Linguis. Comput. 14(4), 445–465 (1999)
Google Scholar
D.M. Blei, Probabilistic topic models. Commun. ACM 55(4), 77–84 (2003)
Google Scholar
D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation. Mach. Learn. 3(1), 993–1022 (2003)
MATH Google Scholar
T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, A. Kalai, Man is to computer programmer as woman is to homemaker? Debiasing word embeddings, in Advanced in Neural Information Processing Systems 29 (NIPS 2016), vol. 30 (The IEEE Press, Washington, 2016), pp. 4356–4364
Google Scholar
R.A. Bosch, J.A. Smith, Separating hyperplanes and the authorship on the Federalist Papers. Am. Math. Mon. 105(7), 601–608 (1991)
Google Scholar
B.E. Boser, E. Sackinger, J. Bromley, Y. Le Cun, L.D. Jackel, An analog neural network processor with programmable topology. J. Solid State Circ. 26(12), 2017–2025 (1991)
Google Scholar
R.L. Boyd, J.W. Pennebaker, Language-based personality: a new approach to personality in a digital world. Curr. Opin. Behav. Sci. 18, 63–68 (2017)
Google Scholar
W.J. Braun, D.J. Murdoch, A First Course in Statistical Programming with R (Cambridge University Press, Cambridge, 2007)
MATH Google Scholar
M. Brennam, S. Afroz, R. Greenstadt, Adversarial stylometry: circumventing authorship recognition to preserve privacy and anonymity. ACM Trans. Inf. Syst. Secur. 13(3) (2011). Article 12
Google Scholar
L.D. Brown, T.T. Cai, A. DasGupta, Interval estimation for a binomial proportion. Stat. Sci. 16(2), 101–133 (2001)
MathSciNet MATH Google Scholar
J.D. Burger, J. Henderson, G. Kim, G. Zarrella, Discriminating gender on Twitter, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2011), pp. 1301–1309
Google Scholar
J.F. Burrows, Not unless you ask nicely: the interpretative Nexus between analysis and information. Lit. Linguis. Comput. 7(1), 91–109 (1992)
Google Scholar
J.F. Burrows, Delta: a measure of stylistic difference and a guide to likely authorship. Lit. Linguis. Comput. 17(3), 267–287 (2002)
Google Scholar
J.F. Burrows, All the way through: testing for authorship in different frequency strata. Lit. Linguis. Comput. 22(1), 27–47 (2007)
Google Scholar
J.W. Caesar, G.E. Thurow, J. Tulis, J.M. Bessette, The rise of rhetorical presidency. Pres. Stud. Q. 11(2), 158–171 (1981)
Google Scholar
C. Cai, L. Li, D. Zeng, Behavior enhanced deep bot detection in social media, in Proceedings IEEE International Conference on Intelligence and Security Informatics (ISI) (2017), pp. 128–130
Google Scholar
F. Can, J.M. Patton, Change of writing style with time. Comput. Humanit. 38(1), 61–82 (2004)
Google Scholar
D.V. Canter, An evaluation of the “CUSUM” stylistic analysis of confessions. Expert Evid. 3(1), 93–99 (1992)
Google Scholar
S.-H. Cha, Comprehensive survey on distance similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1(4), 300–307 (2007)
MathSciNet Google Scholar
E. Charniak, Introduction to Deep Learning (The MIT Press, Cambridge, 2018)
Google Scholar
C. Chaski, Best practices and admissibility of forensic author identification. J. Law Policy 21(2), 333–376 (2013)
Google Scholar
L. Chen, H. Zhang, J.M. Jose, H. Yu, Y. Moshfeghi, P. Triantafillou, Topic detection and tracking on heterogeneous information. J. Intell. Inf. Syst. 51(1), 115–137 (2018)
Google Scholar
Z. Chu, S. Gianvecchio, H. Wang, S. Jajodia, Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans. Dependable Secure Comput. 9(6), 811–824 (2003)
Google Scholar
K.W. Church, P. Hanks, Word association norms, mutual information, and lexicography, in Proceedings Association for Computational Linguistics (ACL), pp. 76–83 (The ACL Press, Stroudsburg, 1999)
Google Scholar
R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)
MathSciNet MATH Google Scholar
K. Connolly, Der Spiegel says top journalist faked stories for years. The Guardian, Dec. 19th, 2018
Google Scholar
W.J. Conover, Practical Nonparametric Statistics (Wiley, New York, 1980)
Google Scholar
G. Coppersmith, M. Dredze, C. Harman, Quantifying mental health signals in Twitter, in ACL Workshop on Computational Linguistics and Clinical Psychology (The ACL Press, Stroudsburg, 2014), pp. 51–60
Google Scholar
M. Corazza, S. Menini, E. Cabrio, S. Tonelli, S. Villata, A multilingual evaluation for online hate speech detection. Lit. Linguis. Comput. 20(2) (2020). Article 10
Google Scholar
M.A. Cortelazzo, P. Nadalutti, A. Tuzzi, Improving Labbé intertextual distance: Testing a revised version on a large corpus of Italian literature. J. Quant. Linguis. 20(2), 125–152 (2013)
Google Scholar
M. Coulthard, On admissible linguistics evidence. J. Law Policy 21(2) (2012). Article 8
Google Scholar
H. Craig, A.F. Kinney, Shakespeare, Computers, and the Mystery of Authorship (Cambridge University Press, Cambridge, 2009)
Google Scholar
M.J. Crawley, Statistics. An Introduction Using R (Wiley, Chichester, 2005)
Google Scholar
M.J. Crawley, The R Book (Wiley, Chichester, 2007)
MATH Google Scholar
S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi, DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intell. Syst. 31(5), 58–64 (2016)
Google Scholar
S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi, Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Trans. Dependable Secure Comput. 15(4), 561–576 (2017)
Google Scholar
F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D.E. Losada, G.H. Bürki, L. Cappellato, N. Ferro, Experimental IR Meets Multilinguality, Multimodality, and Interaction (Springer, Cham, 2019)
Google Scholar
D. Crystal, The Cambridge Encyclopedia of English Language (Cambridge University Press, Cambridge, 2003)
Google Scholar
D. Crystal, Making Sense of Grammar (Pearsons, Harlow, 2004)
Google Scholar
D. Crystal, ‘Think on my Words’ Exploring Shakespeare’s Language (Cambridge University Press, Cambridge, 2008)
Google Scholar
D. Crystal, Txtng: The Gr8 Db8 (Oxford University Press, Oxford, 2008)
Google Scholar
D. Crystal, The Cambridge Encyclopedia of Language (Cambridge University Press, Cambridge, 2010)
Google Scholar
D. Crystal, A Little Book of Language (Yale University Press, Yale, 2010)
Google Scholar
D. Crystal, Internet Linguistics (Routledge, London, 2011)
Google Scholar
D. Crystal, Making a Point. The Pernickety Story of English Punctuation (Profile Books, London, 2016)
Google Scholar
B. Crystal, D. Crystal, You Say Potato: The Story of English Accents (MacMillan, Hampshire, 2015)
Google Scholar
W. Daelemans, Explanation in computational stylometry, in Computational Linguistics and Intelligent Text Processing (CICLing) (Springer, Cham, 2013), pp. 451–462
Google Scholar
W. Daelemans, M. Kestemont, E. Manjavacas, M. Potthast, F. Rangel, P. Rosso, G. Specht, E. Stamatatos, B. Stein, M. Tschuggnall, M. Wiegmann, E. Zangerle, Overview of PAN 2019: bots and gender profiling, celebrity profiling, cross-domain authorship attribution and style change detection, in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D.E. Losada, G.H. Bürki, L. Cappellato, N. Ferro (Springer, Cham, 2019), pp. 402–416
Google Scholar
P. Dalgaard, Introductory Statistics with R (Springer, Heidelberg, 2002)
MATH Google Scholar
F. Damereau, The use of function word frequencies as indicator of style. Comput. Humanit. 9(6), 271–280 (1975)
Google Scholar
C. Davies, Divided by a Common Language. A Guide to British and American English (Houghton Mifflin Harcourt, Boston, 2007)
Google Scholar
M. De Choudhury, E. Kiciman, M. Dredze, G. Coppersmith, M. Kumar, Discovering shifts to suicidal ideation from mental health content in social media, in Proceedings Conference on Human Factor in Computing Systems (SIGCHI’16) (The ACM Press, New York, 2016), pp. 2098–2110
Google Scholar
A. de Morgan, Letter to Rev. Heald 18/08/1851, in Memoirs of Augustus de Morgan by his Wife Sophia Elizabeth de Morgan with Selections from his Letters, ed. by S. Elizabeth, D. Morgan (Longman’s Green and Co., London, 1851)
Google Scholar
M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, H.E. Stanley, W. Quattrociocchi, The spreading of misinformation online. Proc. Natl. Acad. Sci. 113(3), 554–559 (2016)
Google Scholar
M.P. Deisenroth, A.A. Faisal, C.S. Ong, Mathematics for Machine Learning (Cambridge University Press, Cambridge, 2020)
MATH Google Scholar
L. Deng, J. Wiebe, MPQA 3.0: an entity/event-level sentiment corpus. In Proceedings Human Language Technologies (HLT/NAACL) (2015), pp. 1323–1328
Google Scholar
G. Desagulier, Corpus Linguistics and Statistics with R (Springer, Heidelberg, 2017)
Google Scholar
S.H.H. Ding, B.C.M. Fung, F. Iqbal, W.K. Cheung, Learning stylometric representation for authorship analysis. IEEE Trans. Cybernet. 49(1), 107–121 (2019)
Google Scholar
P. Dixon, D. Mannion, Goldsmith’s periodical essays: a statistical analysis of eleven doubtful cases. Lit. Linguis. Comput. 8(1), 1–19 (1993)
Google Scholar
R. Dror, L. Peled-Cohen, S. Shlomov, R. Reichart, Statistical Significance Testing for Natural Language Processing (Morgan & Claypool, San Francisco, 2020)
Google Scholar
M. Du, N. Liu, X. Hu, Techniques for interpretable machine learning. Commun. ACM 63(1), 68–77 (2020)
Google Scholar
T. Dunning, Accurate methods for the statistics of surprise and coincidence. Comput. Linguis. 19(1), 61–74 (1993)
Google Scholar
E. Dwoskin, Trump lashes out at social media companies after Twitter labels tweets with fact checks. Washington Post, 144(May. 26th), 2020
Google Scholar
P. Eckert, S. McConnell-Ginet, Language and Gender (Cambridge University Press, Cambridge, 2013)
Google Scholar
M. Eder, Does size matter? Authorship attribution, small samples, big problem. Digit. Scholarsh. Human. 30(2), 167–182 (2015)
Google Scholar
M. Eder, Rolling Delta. Digit. Scholarsh. Humanit. 31(3), 457–469 (2016)
Google Scholar
M. Eder, Visualization in stylometry: cluster analysis using networks. Digit. Scholarsh. Humanit. 32(1), 50–64 (2017)
Google Scholar
M. Eder, Elena Ferrante: a virtual author, in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (eds.) (Padova University Press, Padova, 2018), pp. 31–46
Google Scholar
M. Eder, J. Rybicki, Do birds of a feather really flock together, or how to choose test samples for authorship attribution. Lit. Linguis. Comput. 28(2), 229–236 (2013)
Google Scholar
M. Eder, J. Rybicki, M. Kestemont, Stylometry with R: a package for computational text analysis. R J. 8(1), 107–121 (2016)
Google Scholar
P. Edmondson, S. Wells (eds.), Shakespeare, Beyond Doubt. Evidence, Argument, Controversy (Cambridge University Press, Cambridge, 2013)
Google Scholar
B. Efron, T. Hastie, Computer Age Statistical Inference. Algorithms, Evidence, and Data Science (Cambridge University Press, Cambridge, 2016)
Google Scholar
B. Efron, R. Thisted, Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63(3), 435–447 (1976)
MATH Google Scholar
F.J. Eisenstein, Introduction to Natural Language Processing (The MIT Press, Cambridge, 2019)
Google Scholar
S.E.M. El, I. Kassou, Authorship analysis studies: a survey. Int. J. Comput. Appl. 86(12), 22–29 (2014)
Google Scholar
D.Y. Espinosa, H. Gómez-Adorno, G. Sidorov, Bots and gender profiling using character bigrams. Notebook for PAN at CLEF 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)
Google Scholar
J. Estepa, Sean Spicer says ‘covfefe’ wasn’t a typo: Trump knew ‘exactly what he meant’. USA Today, May 31, 2017
Google Scholar
S. Evert, T. Proisl, F. Jannidis, I. Reger, S. Pielström, C. Schöch, T. Vitt, Understanding and explaining Delta measures for authorship attribution. Digit. Scholarsh. Humanit. 32(2), ii4–ii16 (2017)
Google Scholar
C. Fautsch, J. Savoy, Algorithmic stemmers or morphological analysis? An evaluation. J. Am. Soc. Inf. Sci. 60(8), 1616–1624 (2009)
Google Scholar
C. Fellbaum, Wordnet and wordnets, in Encyclopedia of Language and Linguistics, ed. by K. Brown (Elsevier, Amsterdam, 2005), pp. 665–670
Google Scholar
C. Fellbaum, G.A. Miller, WordNet: An Electronic Lexical Database (The MIT Press, Cambridge, 1998)
MATH Google Scholar
E. Ferrara, O. Varol, F. Menczer, A. Flammini, Using sentiment to detect bots on twitter: are humans more opinionated than bots? in Proceedings of the IEEE/ACM Conference on Advances in Social Networks Analysis and Mining (ASONAM’14) (2014), pp. 620–627
Google Scholar
E. Ferrara, O. Varol, F. Menczer, A. Flammini, Detection of promoted social media campaigns, In Proceedings of the 10th AAAI Conference on Web and Social Media (ICWSM 2016) (2016), pp. 563–566
Google Scholar
O. Ferret, Typing relations in distributional thesauri, in Language Production, Cognition, and the Lexicon, pp. 113–134 (Springer, Cham, 2014)
Google Scholar
N. Ferro, What happened in CLEF …for a while? in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D. Losada, G. Heinatz, L. Cappellato, N. Ferro (eds.) (Springer, Berlin, 2019)
Google Scholar
J.R. Firth, A synopsis of linguistic theory 1930–1955, in Studies in Linguistic Analysis (Blackwell, Oxford, 1957), pp. 1–32
Google Scholar
G. Forman, An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
MATH Google Scholar
R.S. Forsyth, Stylochronometry with substrings, or: a poet young and old. Lit. Linguis. Comput. 14(4), 467–478 (1999)
Google Scholar
O. Fourkioti, S. Symeonidis, A. Arampatis, Language models and fusion for authorship attribution. Inf. Process. Manage. 6(56), 102061 (2019)
Google Scholar
W.N. Francis, H. Kucera, Frequency Analysis of English Usage (Houghton Mifflin Co., Boston, 1982)
Google Scholar
G. Fung, O. Mangasarian, The disputed Federalist Papers: SVM feature selection via concave minimization, in Proceedings on Diversity in Computing (2003), pp. 42–46
Google Scholar
W.A. Gale, K.W. Church, What is wrong with adding one? in Corpus-Based Research into Language, ed. by N. Oostdijk, P. de Hann (Harcourt Brace, New York, 1994)
Google Scholar
L. Gavalotti, F. Sebastiani, M. Simi, Experiments on the use of feature selection and negative evidence in automated text categorization, in Proceedings European Conference in Digital Libraries (ECDL). Lecture Notes in Computer Science, vol. 1923 (Springer, Heidelberg, 2000), pp. 59–68
Google Scholar
C. Gelderman, All the Presidents’ Words. The Bully Pulpit and the Creation of the Virtual Presidency (Walker & Co., New York, 1997)
Google Scholar
F.A. Gers, J. Schmidhuber, LSTM recurrent networks learn simple context free and context sensitive languages. IEEE Trans. Neural Netw. 12(6), 1333–1340 (2005)
Google Scholar
A. Giachanou, J. Gonzalo, F. Crestani, Propagating sentiment signals for estimating reputation polarity. Inf. Process. Manage. 6(56), 102079 (2019)
Google Scholar
G. Giodan, C. Saint-Blancat, S. Sbalchiero, Exploring the history of American sociology through topic modelling, in Tracing the Life Cycle of Ideas in the Humanities and Social Sciences, ed. by A. Tuzzi (Springer, Cham, 2018), pp. 45–64
Google Scholar
M. Glickman, J. Brown, Assessing authorship of Beatles songs from musical content: Bayesian classification modeling from bags-of-words representations, in Proceedings JSM, American Statistical Association (2018)
Google Scholar
Y. Goldberg, Neural Network Methods for Natural Language Processing (Morgan & Claypool Publishers, San Rafael, 2017)
Google Scholar
H. Gómez Adorno, A.I. Valencia, C. Stephens Rhodes, G. Fuentes Pineda, Bots and gender identification based on stylometry of tweet minimal structure and n-grams model. Notebook for PAN at CLEF 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)
Google Scholar
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (The MIT Press, Cambridge, 2016)
MATH Google Scholar
N. Graham, G. Hirst, B. Marthi, Segmenting documents by stylistic character. Nat. Lang. Eng. 11(4), 397–415 (2005)
Google Scholar
A. Granados, M. Cebirán, D. Camacho, F. de Borja Rodríguez, Reducing the loss of information through annealing text distortion. IEEE Trans. Knowl. Data Eng. 23(7), 1090–1102 (2011)
Google Scholar
T. Grant, TXT 4N6: method consistency, and distinctiveness in the analysis of SMS messages. J. Law Policy 21(2) (2012). Article 9
Google Scholar
C. Gregori-Signes, B. Clavel-Arroitia, Analysing lexical density and lexical diversity in the university students’ written discourse, in Proceedings International Conference on Corpus Linguistics (2015), pp. 546–556
Google Scholar
S. Gries, Quantitative Corpus Linguistics with R: A Practical Introduction (Routledge, London, 2019)
Google Scholar
P. Grzybek, E. Kelih, E. Stadlober, The relationship between word length and sentence length: an intra-systemic perspective in the core data structure. Glottometrics 16, 111–121 (2008)
Google Scholar
P. Guiraud, Les caractères statistiques du vocabulaire (Presses Universitaires de France, Paris, 1954)
Google Scholar
P. Guiraud, Essais de stylistique (Klincksieck, Paris, 1969)
Google Scholar
S.C. Guntuku, D.B. Yaden, M.L. Kern, L.H. Ungar, J.C. Eichstaedt, Detecting depression and mental illness on social media: an integrative review. Curr. Opin. Behav. Sci. 18, 43–49 (2017)
Google Scholar
M. Hagen, M. Potthast, B. Stein, Overview of the author obfuscation task at PAN 2017: safety evaluation revisited, in Working Notes Papers of the CLEF 2017 Evaluation Labs Volume 1866 of CEUR Workshop, ed. by L. Cappellato, N. Ferro, L. Goeuriot, T. Mandl (CEUR, Aachen, 2017)
Google Scholar
A. Hall, L. Terveen, A. Halfaker, Bot detection in Wikipedia using behavioral and other informal cues, in Proceedings of the ACM on Human-Computer Intercation (2018), pp. 620–627
Google Scholar
H.V. Halteren, Author verification by linguistic profiling: An exploration of the parameter space. ACM Trans. Speech Lang. Process. 4(1) (2007). Article 1
Google Scholar
O. Halvani, C. Winter, L. Graner, On the usefulness of compression models for authorship verification, in ARES’17 (The ACM Press, New York, 2017), pp. 1–32
Google Scholar
O. Halvani, L. Graner, I. Vogel, Authorship verification in the absence of explicit features and thresholds, in Proceedings European Conference in Information Retrieval (ECIR). Lecture Notes in Computer Science, vol. 10772 (Springer, Heidelberg, 2018), pp. 454–465
Google Scholar
R.A. Hardcastle, CUSUM: a credible method for the determination of authorship? Sci. Just. 37(2), 129–138 (1997)
Google Scholar
D. Harman, How effective is suffixing? J. Am. Soc. Inf. Sci. 42(1), 7–15 (1991)
MathSciNet Google Scholar
D. Harman, Information retrieval: the early years. Found. Trends Inf. Retr. 13(5), 425–577 (2019)
Google Scholar
Z. Harris, Distributional structure. Word 10(23), 146–162 (1954)
Google Scholar
R.P. Hart, Verbal Style and The Presidency. A Computer-Based Analysis (Academic, Orlando, 1984)
Google Scholar
R.P. Hart, Trump and Us: What He Says and Why People Listen (Cambridge University Press, Cambridge, 2020)
Google Scholar
R.P. Hart, J.P. Childers, C.J. Lind, Political Tone. How Leaders Talk and Why (The Chicago University Press, Chicago, 2013)
Google Scholar
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning. Data Mining, Inference, and Prediction (Springer, New York, 2009)
Google Scholar
G. Herdan, Quantitative Linguistics (Butterworth, London, 1964)
MATH Google Scholar
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1996)
Google Scholar
T. Hofmann, Probabilistic latent semantic indexing, in Proceedings of the International Conference on Information Retrieval (SIGIR 1999) (The ACM Press, New York, 1999), pp. 50–57
Google Scholar
D.R. Hoffman, A.D. Howard, Addressing the State of the Union. The Evolution and Impact of the President’s Big Speech (Lynne Rienner, Boulder, 2006)
Google Scholar
D.I. Holmes, A stylometric analysis of Mormon scripture and related text. J. R. Stat. Soc. 155(1), 91–120 (1992)
Google Scholar
D.I. Holmes, The Federalist revisited: new directions in authorship attribution. Lit. Linguis. Comput. 10(1), 111–127 (1995)
Google Scholar
D.I. Holmes, The evolution of stylometry in humanities scholarship. Lit. Linguis. Comput. 13(3), 111–117 (1998)
Google Scholar
J. Holmes, Woman talk too much, in Language Myths, ed. by L. Bauer, P. Trudgill (Penguin Books, London, 1998), pp. 41–49
Google Scholar
D.I. Holmes, J. Kardos, Who was the author? An introduction to stylometry. Chance 16(2), 5–8 (2003)
MathSciNet Google Scholar
D.I. Holmes, F.J. Tweedie, Forensic stylometry: a review of the CUSUM controversy. Revue Informatique et Statistique dans les Sciences Humaines 31(1), 19–47 (1995)
Google Scholar
D.L. Hoover, Another perspective on vocabulary richness. Comput. Humanit. 37(2), 151–178 (2003)
Google Scholar
D.L. Hoover, Delta prime? Lit. Linguis. Comput. 19(4), 477–495 (2004)
MathSciNet Google Scholar
D.L. Hoover, Testing Burrows’ Delta. Lit. Linguis. Comput. 19(4), 453–475 (2004)
MathSciNet Google Scholar
D.L. Hoover, Teasing out authorship and style with t-tests and Zeta, in Proceedings Digital Humanities (2010), pp. 1–3
Google Scholar
D.L. Hoover, The microanalysis of style variation. Digit. Scholarsh. Humanit. 32(Supplement 2), ii17–ii30 (2017)
Google Scholar
D.L. Hoover, S. Hess, An exercise in non-ideal authorship attribution: the mysterious Maria Ward. Lit. Linguis. Comput. 24(4), 467–489 (2009)
Google Scholar
P.N. Howard, S. Woolley, R. Calo, Algorithms, bots, and political communication in the US 2016 election: the challenge of automated political communication for election law and administration. J. Inf. Technol. Polit. 15(2), 81–93 (2018)
Google Scholar
J. Hudson, S. Mekhennet, G-7 failed to agree on statement after U.S. insisted on calling coronavirus outbreak ‘Wuhan virus’. Washington Post, 144, March 25th, 2020
Google Scholar
J.M. Hughes, N.J. Foti, D.C. Krakauer, D.N. Rockmore, Quantitative patterns of stylistic influence in the evolution of literature. Proc. Natl. Acad. Sci. 109(20), 7682–7686 (2012)
Google Scholar
J. Humes, Confessions of a White House Ghostwriter: Five Presidents and Other Political Adventures (Regnery Publishing, New York, 1997)
Google Scholar
C. Ikae, S. Nath, J. Savoy, Unine at PAN-CLEF 2019: Bots and gender task, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)
Google Scholar
C.R. Jacobsen, M. Nielsen, Stylometry of painting using hidden Markov modelling of contourlet transforms. Signal Process. 93(3), 579–591 (2013)
Google Scholar
G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning with Applications in R (Springer, New York, 2013)
MATH Google Scholar
M.L. Jockers, Macroanalysis. Digital Methods and Literary History (University of Illinois Press, Urbana, 2013)
Google Scholar
M.L. Jockers, Testing authorship in the personal writings of Joseph Smith using NSC classification. Lit. Linguis. Comput. 28(3), 371–381 (2013)
MathSciNet Google Scholar
M.L. Jockers, Text Analysis with R for Students of Literature (Springer, New York, 2014)
Google Scholar
M.L. Jockers, D.M. Witten, A comparative study of machine learning methods for authorship attribution. Lit. Linguis. Comput. 25(2), 215–223 (2010)
Google Scholar
M.L. Jockers, D.M. Witten, C. Criddle, Reassessing authorship of the Book of Mormon using Delta and nearest shrunken centroid classification. Lit. Linguis. Comput. 23(4), 465–491 (2008)
Google Scholar
V. Johansson, Lexical diversity and lexical density in speech and writing. Working Papers, Lund University, vol. 53, pp. 61–79, 2008
Google Scholar
F. Johansson, Supervised classification of Twitter accounts based on textual content of tweets. Notebook for PAN at CLEF 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)
Google Scholar
M. Joos, The Five Clocks. A Linguistic Excursion into the Five Styles of English Usage (Harvest/HBJ Book, New York, 1961)
Google Scholar
P. Joule, D. Vescovi, Analyzing stylometric approaches for author obfuscation, in Conference on Digital Forensics (Springer, Berlin, 2011), pp. 115–125
Google Scholar
P. Juola, The time course of language change. Comput. Humanit. 37(1), 77–96 (2003)
Google Scholar
P. Juola, Authorship attribution. Found. Trends Inf. Retr. 1(3), 233–334 (2006)
Google Scholar
P. Juola, How a computer program helped show J.K. Rowling write a Cuckoo’s Calling. Scientific American, August 20th, 2013
Google Scholar
P. Juola, Using the Google n-gram corpus to measure cultural complexity. Lit. Linguis. Comput. 28(4), 668–675 (2013)
Google Scholar
P. Juola, The Rowling case: a proposed standard analytic protocol for authorship questions. Digit. Scholarsh. Humanit. 30(1), i100–i113 (2016)
Google Scholar
P. Juola, Thesaurus-based semantics similarity judgments: a new approach to authorship similarity? in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 47–59
Google Scholar
P. Juola, G.K. Mikros, S. Vinsick, Correlations and potential cross-linguistic indicators of writing style. J. Quant. Linguis. 26(2), 146–171 (2019)
Google Scholar
G. Kacmarcik, M. Gamon, Obfuscating document stylometry to preserve author anonymity, in Proceedings of the Conference on Computational Linguistics (COLING-ACL) (The ACL Press, Stroudsburg, 2006), pp. 444–451
Google Scholar
O.V. Kakushkina, A.A. Polikarpoc, D.V. Khmelev, Using literal and grammatical statistics for authorship attribution. Probl. Inf. Transm. 37(2), 172–184 (2001)
MathSciNet MATH Google Scholar
D. Kalb, G. Peters, State of the Union. Presidential Rhetoric from Woodrow Wilson to George W. Bush (CQ Press, Washington, 2007)
Google Scholar
D. Kalb, G. Peters, Analysis of Phylogenetics and Evolution with R (Springer, New York, 2012)
Google Scholar
A. Karpathy, The unreasonable effectiveness of recurrent neural networks, May 2015
Google Scholar
L. Kaufman, P.J. Rousseeuw, Finding Groups in Data. An Introduction to Cluster Analysis (Wiley, Hoboken, 2005)
Google Scholar
J. Kelleher, Deep Learning (The MIT Press, Cambridge, 2019)
Google Scholar
C. Kesler, C. Rossiter, The Federalist Papers (Signet Classic, New York, 2003)
Google Scholar
M. Kestemont, S. Moens, J. Deploige, Collaborative authorship in the twelfth century: a stylometric study of Hildegard of Birgen and Guibert of Gembloux. Lit. Linguis. Comput. 20(2), 199–224 (2015)
Google Scholar
V. Kešelj, F. Peng, N. Cercone, C. Thomas, N-gram-based author profiles for authorship attribution, in Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING’03 (The ACL Press, Stroudsburg, 2003), pp. 255–264
Google Scholar
R. Ketcham, The Anti-Federalist Papers and Constitutional Convention Debates (Signet Classic, New York, 2003)
Google Scholar
B. Kjell, Authorship determination using letter pair frequency features with neural network classifier. Lit. Linguis. Comput. 9(2), 119–124 (1994)
Google Scholar
M. Kocher, J. Savoy, A simple and efficient algorithm for authorship verification. J. Assoc. Inf. Sci. Technol. 68(1), 259–269 (2015)
Google Scholar
M. Kocher, J. Savoy, Distance measures in author profiling. Inf. Process. Manage. 53(5), 1103–1119 (2017)
Google Scholar
M. Kocher, J. Savoy, Distributed language representation for authorship attribution. Digit. Scholarsh. Humanit. 33(2), 425–441 (2018)
Google Scholar
M. Kocher, J. Savoy, Evaluation of text representation schemes and distance measures for authorship linking. Digit. Scholarsh. Humanit. 34(1), 189–207 (2019)
Google Scholar
M. Kolakowski, T.H. Neale, The president’s State of the Union message: frequently asked questions. Congressional Research Service (RS20021), 2006
Google Scholar
M. Koppel, J. Schler, Exploiting stylistic idiosyncrasies for authorship attribution, in IJCAI’03 Workshop on Computational Approaches to Style Analysis and Synthesis (2003), pp. 69–72
Google Scholar
M. Koppel, S. Seidman, Detecting pseudoepigraphic texts using novel similarity measures. Digit. Scholarsh. Humanit. 33(1), 72–81 (2018)
Google Scholar
M. Koppel, Y. Winter, Determining if two documents are by the same author. J. Assoc. Inf. Sci. Technol. 65(1), 178–187 (2014)
Google Scholar
M. Koppel, S. Argamon, A.R. Shimoni, Automatically categorizing written texts by author gender. Lit. Linguis. Comput. 17(4), 401–412 (2002)
Google Scholar
M. Koppel, N. Akiva, I. Dagan, Feature instability as a criterion for selecting potential style markers. J. Assoc. Inf. Sci. Technol. 57(11), 1519–1525 (2006)
Google Scholar
M. Koppel, J. Schler, E. Bonchek-Dokow, Measuring differentiability: unmasking pseudonymous authors. J. Mach. Learn. Res. 8(6), 1261–1276 (2007)
MATH Google Scholar
M. Koppel, J. Schler, S. Argamon, Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol. 60(1), 9–26 (2009)
Google Scholar
M. Koppel, J. Schler, S. Argamon, Authorship attribution in the wild. Lang. Resour. Eval. 45(1), 83–94 (2011)
Google Scholar
M. Koppel, J. Schler, S. Argamon, Y. Winter, The ‘fundamental problem’ of authorship attribution. Engl. Stud. 93(3), 284–291 (2012)
Google Scholar
D. Kosmajac, V. Kešelj, Twitter user profiling: bot and gender identification, in Working Notes Papers of the CLEF 2019 Evaluation Labs volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)
Google Scholar
S. Kudugunta, E. Ferrara, Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)
Google Scholar
N. Laan, Stylometry and methods. the case of Euripides. Lit. Linguis. Comput. 10(4), 271–278 (1995)
Google Scholar
D. Labbé, Experiments on authorship attribution by intertextual distance in English. J. Quant. Linguis. 14(1), 33–80 (2007)
Google Scholar
D. Labbé, Romain Gary et Emile Ajar. HAL 00279663, 2008
Google Scholar
D. Labbé, Si deux et deux font quatre, Molière n’a pas écrit Dom Juan (Max Milo, Paris, 2009)
Google Scholar
C. Labbé, D. Labbé, How to measure the meaning of words? Amour in Corneille’s work. Lang. Res. Eval. 39(4), 335–351 (2005)
Google Scholar
D. Labbé, C. Labbé, A tool for literary studies. Lit. Linguis. Comput. 21(3), 311–326 (2006)
Google Scholar
C. Labbé, D. Labbé, Duplicate and fake publications in the scientific literature. Scientometrics 94(1), 379–396 (2013)
Google Scholar
C. Labbé, N. Grima, T. Gautier, B. Favier, J.A. Byrne, Semi-automated fact-checking of nucleotide sequence reagents in biomedical research publications: the Seek and Blastn tool. PLoS One 14(3), e0213266 (2019)
Google Scholar
G. Lakoff, E. Wehling, The Little Blue Book: The Essential Guide to Thinking and Talking Democratic (Free Press, New York, 2012)
Google Scholar
M. Lalli, F. Tria, V. Loreto, Data-compression approach to authorship attribution, in Elena Ferrante: A Virtual Author, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 61–83
Google Scholar
Q. Le, T. Mikolov, Distributed representations of sentences and documents, in Proceedings International Conference on Machine Learning, vol. 32 (2015), pp. II-1188–II-1196
Google Scholar
L. Lebart, A. Salem, L. Berry, Exploring Textual Data (Kluwer, Dordrecht, 1998)
Google Scholar
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015)
Google Scholar
G. Ledger, R. Merriam, Shakespeare, Fletcher, and The Two Noble Kinsmen. Lit. Linguis. Comput. 9(3), 235–248 (1994)
Google Scholar
J.J. Lee, H.Y. Cho, H.R. Park, N-gram-based indexing for Korean text retrieval. Inf. Process. Manage. 35(4), 427–441 (1999)
Google Scholar
R.J. Leigh, J. Casson, D. Ewald, A scientific approach to the Shakespeare authorship question. Lit. Rev. 9(1), 1–13 (2019)
Google Scholar
O. Levy, Y. Goldberg, Linguistic regularities in sparse and explicit word representations, in Proceedings Computational Language Learning (2014), pp. 171–180
Google Scholar
M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi, The similarity metric. IEEE Trans. Inf. Theory 50(12), 3250–3264 (2004)
MathSciNet MATH Google Scholar
G.J. Lidstone, Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities. Trans. Fac. Actuaries 8, 182–192 (1920)
Google Scholar
E.T. Lim, Five trends in presidential rhetoric: an analysis of rhetoric from George Washington to Bill Clinton. Pres. Stud. Q. 32(2), 328–348 (2002)
MathSciNet Google Scholar
D.E. Losada, F. Crestani, J. Parapar, Overview of eRisk: Early risk prediction on the internet. in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by P. Bellot, C. Trabelsi, J. Mothe, F. Murtagh, J.Y. Nie, L. Soulier, E. SanJuan, L. Cappellato, N. Ferro. Lecture Notes in Computer Science, vol. 11018 (Springer, Cham, 2018), pp. 343–361
Google Scholar
D.E. Losada, F. Crestani, J. Parapar, Overview of eRisk 2019: early risk prediction on the internet, in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D.E. Losada, G.H. Bürki, L. Cappellato, N. Ferro. Lecture Notes in Computer Science, vol. 11696 (Springer, Cham, 2019), pp. 340–357
Google Scholar
H. Love, Attributing Authorship: An Introduction (Cambridge University Press, Cambridge, 2002)
Google Scholar
K. Luyckx, W. Daelemans, The effect of author set size and data size in authorship attribution. Lit. Linguis. Comput. 26(1), 35–44 (2011)
Google Scholar
P. Maier, Ratification. The People Debate the Constitution, 1787–1788. Simon and Schuster Paperbacks, New York, 2010
Google Scholar
C.D. Manning, H. Schütze, Foundations of Statistical Natural Language Processing (The MIT Press, Cambridge, 2000)
MATH Google Scholar
C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, Cambridge, 2008)
MATH Google Scholar
D. Mannion, P. Dixon, Sentence-length and authorship attribution: the case of Oliver Goldsmith. Lit. Linguis. Comput. 19(4), 497–508 (2004)
Google Scholar
M.P. Marcus, B. Santorini, M.A. Marcinkiewicz, Building a large annotated corpus of English: the Penn Treebank. Comput. Linguis. 19(2), 313–330 (1993)
Google Scholar
Y. Marton, N. Wu, L. Hellerstein, On compression-based text classification, in European Conference on Information Retrieval (ECIR) (Springer, Cham, 2005), pp. 300–314
Google Scholar
R. Matthews, T. Merriam, Neural computation in stylometry: an application to the works of Shakespeare and Fletcher. Lit. Linguis. Comput. 8(4), 203–209 (1993)
Google Scholar
C. McCormick, BERT word embeddings tutorial, May 2019
Google Scholar
G. McCulloch, Because Internet. Understanding the New Rules of Language (Riverhead Books, New York, 2019)
Google Scholar
P. McNamee, J. Mayfield, Character n-gram tokenization for European language text retrieval. Inf. Retr. J. 7(1–2), 73–98 (2004)
Google Scholar
T. Mendenhall, The characteristic curves of composition. Science 214, 237–249 (1887)
Google Scholar
R. Merriam, Letter frequency as a discriminator of authors. Notes Queries 41(4), 467–469 (1994)
Google Scholar
M.I. Meyerson, Liberty’s Blueprint. How Madison and Hamilton Wrote the Federalist Papers, Defined the Constitution, and Made Democracy Safe for the World (Basic Books, Philadelphia, 2008)
Google Scholar
J.-B. Michel, Y.K. Shen, A.P. Aiden, A. Veres, M.K. Gray, The Google Books Team, J.P. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, S. Pinker, M.A. Nowak, E.L. Aiden, Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Google Scholar
J. Michell, Who Wrote Shakespeare (Thames and Hudson, London, 1999)
Google Scholar
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in Proceedings of Workshop at ICLR 2013 (2013)
Google Scholar
T. Mikolov, W.T. Yih, G. Zweig, Linguistic regularities in continuous space word representations, in Proceedings of NAACL HLT 2013 (The ACL Press, Stroudsburg, 2013), pp. 746–751
Google Scholar
G.K. Mikros, Blended authorship attribution: Unmasking Elena Ferrante. in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 85–96
Google Scholar
A. Miranda-Garcia, J. Calle-Martin, Yule’s characteristic K revisited. Lang. Res. Eval. 39(4), 287–294 (2005)
Google Scholar
A. Miranda-Garcia, J. Calle-Martin, Function words in authorship attribution studies. Lit. Linguis. Comput. 22(1), 49–66 (2007)
Google Scholar
A. Miranda-Garcia, J. Calle-Martin, The authorship of the disputed Federalist Papers with an annotated corpus. Engl. Stud. 93(3), 371–390 (2012)
Google Scholar
T.M. Mitchell, Machine Learning (McGraw-Hill, New York, 1997)
MATH Google Scholar
D. Mitchell, Type-token models: a comparative study. J. Quant. Linguis. 22, 1–21 (2015)
Google Scholar
R. Mitton, Spelling checkers, spelling corrections and the misspellings of poor spellers. Inf. Process. Manage. 23(5), 495–505 (1987)
Google Scholar
F. Mosteller, D.L. Wallace, Inference in an authorship problem. J. Am. Stat. Assoc. 58(302), 275–309 (1963)
MATH Google Scholar
F. Mosteller, D.L. Wallace, Inference and Disputed Authorship, The Federalist (Addison-Wesley, Reading, 1964)
MATH Google Scholar
M. Motta, The dynamics and political implication of anti-intellectualism in the United States. Am. Polit. Res. 46(3), 465–498 (2018)
Google Scholar
C. Muller, Principes et méthodes de statistique lexicale (Honoré Champion, Paris, 1992)
Google Scholar
F. Murtagh, P. Legendre, Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J. Classif. 31(3), 274–295 (2014)
MathSciNet MATH Google Scholar
M.J. Narag, M.N. Soriano, Identifying the painter using texture features and machine learning algorithms, in Proceedings International Conference on Cryptography, Security, and Privacy (ICCSP’19) (2019), pp. 201–205
Google Scholar
T. Neal, K. Sundararajan, A. Fatima, Y. Yan, Y. Xiang, D. Woodard, Surveying stylometry techniques and applications. ACM Comput. Surv. 50(6) (2019). Article 86
Google Scholar
L. Neidorf, M.S. Krieger, M. Yakubek, P. Chaudhuri, J.P. Dexter, Large-scale quantitative profiling of the Old English verse tradition. Nat. Hum. Behav. 3, 560–567 (2019)
Google Scholar
Y. Neuman, Computational Personality Analysis: Introduction, Practical Applications and Novel Directions (Springer, Cham, 2016)
Google Scholar
R.E. Neustadt, The Accidental President (Grossman, New York, 1967)
Google Scholar
R.E. Neustadt, The Presidential Power and the Modern Presidents. The Politics of Leadership from Roosevelt to Reagan (Free Press, New York, 1990)
Google Scholar
J. Noecker, M. Ryan, P. Juola, Psychological profiling through textual analysis. Lit. Linguis. Comput. 28(3), 382–387 (2013)
Google Scholar
J.S. Nye, Presidential Leadership and the Creation of the American Era (Princeton University Press, Princeton, 2013)
Google Scholar
M.P. Oakes, M. Farrow, Use of the chi-squared test to examine vocabulary differences in English language corpora representing seven different countries. Lit. Linguis. Comput. 22(1), 85–99 (2007)
Google Scholar
K.A. O’Halloran, C. Coffin, Getting Started. Describing the Grammar of Speech and Writing (The Open University, Milton Keynes, 2005)
Google Scholar
C. Olah, Understanding LSTM networks, August 2015
Google Scholar
W. Oliveira, E. Justino, L.S. Oliveira, Comparing compression models for authorship attribution. Forensic Sci. Int. 228, 100–104 (2013)
Google Scholar
J. Olsson, Forensic Linguistics (Continuum, London, 2008)
Google Scholar
J. Olsson, Word Crime. Solving Crime Through Forensic Linguistics (Bloomsbury, London, 2009)
Google Scholar
J. Olsson, More Wordcrime. Solving Crime Through Forensic Linguistics (Bloomsbury, London, 2018)
Google Scholar
B. Pang, L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, in Proceedings Association for Computational Linguistics (ACL), pp. 115–124 (The ACL Press, Stroudsburg, 2005)
Google Scholar
R.R. Panko, What we known about spreadsheet errors. J. End User Comput. 10(2), 51–21 (1998)
Google Scholar
G. Park, D.B. Yaden, H.A. Schwartz, M.L. Kern, J.C. Eichstaedt, M. Kosinski, D. Stillwell, L.H. Ungar, M.E.P. Seligman, Women are warmer but no less assertive than men: gender and language on Facebook. PLoS One 11(5), e0155885 (2016)
Google Scholar
A. Pawłowski, Séries temporelles en linguistique: Application à l’attribution de textes, Romain Gary et Emile Ajar (Slatkine, Lausanne, 1996)
Google Scholar
L. Pearl, M. Steyvers, Detecting authorship deception: a supervised machine learning approach using author writeprints. Lit. Linguis. Comput. 27(2), 183–196 (2012)
Google Scholar
C. Peersman, W. Daelemans, L. Van Vaerenbergh, Predicting age and gender in online social networks, in International Workshop on Search and Mining User-generated Contents (SMUC’11) (Springer, Cham, 2011), pp. 37–44
Google Scholar
A. Penas, A. Rodrigo, A single measure to assess nonresponse, in Proceedings 49th Conference of the Association for Computational Linguistics (ACL), pp. 1415–1424 (The ACL Press, Stroudsburg, 2011)
Google Scholar
J.W. Pennebaker, The Secret Life of Pronouns. What Our Words Say About Us (Bloomsbury Press, New York, 2011)
Google Scholar
J. Pennington, R. Socher, C.D. Manning, GloVe: Global vectors for word representations, in Proceedings of the Empirical Methods in Natural Language Processing (2014), pp. 1532–1543
Google Scholar
S. Pinker, The Sense of Style (Penguin Books, London, 2014)
Google Scholar
P. Plechác̆, K. Bobenhausen, B. Hammerich, Versification and authorship attribution. Pilot study on Czech, German, Spanish, and English poetry. Studia Metrica et Poetica 5(2), 29–54 (2018)
Google Scholar
I.-I. Popescu, G. Altmann, P. Grzybek, B.D. Jayaram, R. Köhler, V. Krupa, J. Mačutek, R. Pustet, L. Uhlířovà, M.N. Vidya, Word Frequency Studies (De Gruyter Mouton, Berlin, 2009)
Google Scholar
I.-I. Popescu, K.H. Best, G. Altmann, Unified Modeling of Length in Language (RAM-Verlag, Lüdenscheid, 2014)
Google Scholar
M.F. Porter, An algorithm for suffix stripping. Program 14, 130–137 (1980)
Google Scholar
N. Potha, E. Stamatatos, Improving author verification based on topic modeling. J. Assoc. Inf. Sci. Technol. 70(10), 1074–1088 (2019)
Google Scholar
M. Potthast, A. Barròn-Cedeno, B. Stein, P. Rosso, Cross-language plagiarism detection. Lang. Resour. Eval. 45(1), 1–18 (2011)
Google Scholar
M. Potthast, M. Hagen, B. Stein, Author obfuscation: attacking the state of the art in authorship verification, in Working Notes Papers of the CLEF 2016 Evaluation Labs volume 1609 of CEUR Workshop (CEUR, Aachen, 2016)
Google Scholar
M. Potthast, F. Rangel, M. Tschuggnall, E. Stamatatos, P. Rosso, B. Stein, Overview of PAN’17: author identification, author profiling, and author obfuscation, in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by G. Jones, S. Lawless, J. Gonzalo, L. Kelly, L. Goeuriot, T. Mandl, L. Cappellato, N. Ferro. Lecture Notes in Computer Science, vol. 10456 (Springer, Berlin, 2017), pp. 275–290
Google Scholar
M. Potthast, F. Schremmer, M. Hagen, B. Stein, Overview of the author obfuscation task at PAN 2018: a new approach to measuring safety, in Working Notes Papers of the CLEF 2018 Evaluation Labs Volume 2125 of CEUR Workshop (CEUR, Aachen, 2018)
Google Scholar
M. Potthast, P. Rosso, E. Stamatatos, B. Stein, A decade of shared tasks in digital text forensics at PAN, in Proceedings ECIR2019. Springer Lecture Notes in Computer Science, vol. 11438 (2019), pp. 291–300
Google Scholar
R. Queneau, Exercices de style (Gallimard, Paris, 1947)
Google Scholar
F. Rangel, P. Rosso, On the impact of emotions on author profiling. Inf. Process. Manage. 52(1), 73–92 (2016)
Google Scholar
F. Rangel, P. Rosso, Overview of the 7th author profiling task at PAN 2019: bots and gender profiling in twitter, in Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)
Google Scholar
F. Rangel, P. Rosso, M. Montes y Gómez, M. Potthast, B. Stein, Overview of the 6th author profiling task at PAN 2018: multimodal gender identification in twitter, in Working Notes Papers of the CLEF 2018 Evaluation Labs Volume 2125 of CEUR Workshop (CEUR, Aachen, 2018)
Google Scholar
J.R. Rao, P. Rohatgi, Can pseudonymity really guarantee privacy? in Proceedings of the 9th USENIX Security Symposium (USENIX Association, New Orleans, 2000), pp. 85–96
Google Scholar
T.R. Reddy, B.V. Vardhan, P.V. Reddy, A survey on authorship profiling techniques. Int. J. Appl. Eng. Res. 11(5), 3092–3102 (2016)
Google Scholar
W.J. Ridings, S.B. McIver, Rating the Presidents: A Ranking of U.S. Leaders, from the Great and Honorable to the Dishonest and Incompetent (Carol Publishing, Secaucus, 1997)
Google Scholar
P. Rizvi, An improvement to Zeta. Digit. Scholarsh. Humanit. 34(2), 419–422 (2019)
Google Scholar
P. Rizvi, The interpretation of the Zeta test results. Digit. Scholarsh. Humanit.34(2), 401–418 (2019)
Google Scholar
A. Rocha, W.J. Scheirer, C.W. Forstall, T. Cavalcante, A. Theophilo, B. Shen, A.R.B. Carvalho, E. Stamatatos, Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017)
Google Scholar
X. Rong, Word2vec parameter learning explained (2016). arXiv.org. arXiv:1411.2738
Google Scholar
M. Rosen-Zvi, T. Griffiths, T. Steyvers, P. Smyth, The author-topic model for authors and documents, in Proceedings of the Uncertainty in Artificial Intelligence (The AUAI Press, Arlington, 2004), pp. 487–494.
Google Scholar
M. Rosen-Zvi, C. Chemudugunta T. Griffiths, T. Steyvers, P. Smyth, Learning author-topic models from text corpora. ACM Trans. Inf. Syst. 28(1) (2010). Article 4
Google Scholar
J. Rudman, The state of authorship attribution studies: some problems and solutions. Comput. Humanit. 31(4), 351–365 (1998)
MathSciNet Google Scholar
J. Rudman, Unediting, de-editing, and editing in non-traditional authorship attribution studies: with an emphasis on the canon of Daniel Defoe. Pap. Bibliogr. Soc. Am. 99(1), 5–36 (2005)
Google Scholar
J. Rudman, The twelve disputed Federalist Papers: a case for collaboration, in Proceedings Digital Humanities 2012 (2012), pp. 353–356
Google Scholar
A. Rule, J.P. Cointet, P.S. Bearman, Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014, in Proceedings National Academy of Sciences, vol. 112(35) (2015), pp. 10837–10844
Google Scholar
D. Rumelhart, G. Hinton, R. Williams, Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
MATH Google Scholar
J. Rybicki, Partners in life, partners in crime? in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 111–122
Google Scholar
J. Rybicki, M. Eder, Deeper Delta across genres and languages: do we really need the most frequent words. Lit. Linguis. Comput. 26(3), 315–321 (2011)
Google Scholar
J. Rybicki, M. Heydel, The stylistics and stylometry of collaborative translations: Woolf’s night and day in Polish. Lit. Linguis. Comput. 28(4), 708–717 (2013)
Google Scholar
J. Rybicki, D.L. Hoover, M. Kestemont, Collaborative authorship: Conrad, Ford and rolling Delta. Lit. Linguis. Comput. 29(3), 422–431 (2014)
Google Scholar
G. Sampson, Empirical Linguistics (Continuum, London, 2001)
Google Scholar
J. Savoy, Lexical analysis of US political speeches. J. Quant. Linguis. 17(2), 123–141 (2010)
Google Scholar
J. Savoy, Authorship attribution based on specific vocabulary. ACM-Trans. Inf. Syst. 30(2), 170–199 (2012)
Google Scholar
J. Savoy, Authorship attribution based on a probabilistic topic model. Inf. Process. Manage. 49(1), 341–354 (2013)
Google Scholar
J. Savoy, The Federalist Papers revisited:a collaborative attribution scheme, in Proceedings ASIST 2013, Montreal, November 2013
Google Scholar
J. Savoy, Comparative evaluation of term selection functions for authorship attribution. Digit. Scholarsh. Humanit. 30(2), 246–261 (2015)
MathSciNet Google Scholar
J. Savoy, Text clustering: an application with the State of the Union addresses. J. Assoc. Inf. Sci. Technol. 66(8), 1645–1654 (2015)
Google Scholar
J. Savoy, Vocabulary growth study: An example with the State of the Union addresses. J. Quant. Linguis. 22(4), 289–310 (2015)
Google Scholar
J. Savoy, Estimating the probability of an authorship attribution. J. Assoc. Inf. Sci. Technol. 67(6), 1462–1472 (2016)
Google Scholar
J. Savoy, Text representation strategies: an example with the State of the Union addresses. J. Assoc. Inf. Sci. Technol. 67(8), 1858–1870 (2016)
Google Scholar
J. Savoy, Analysis of the style and the rhetoric of the American presidents over two centuries. Glottometrics 38(1), 55–76 (2017)
Google Scholar
J. Savoy, Analysis of the style and the rhetoric of the 2016 US presidential primaries. Digit. Scholarsh. Humanit. 33(1), 143–159 (2018)
Google Scholar
J. Savoy, Elena Ferrante unmasked. in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 123–142
Google Scholar
J. Savoy, Is Starnone really the author behind Ferrante? Digit. Scholarsh. Humanit. 33(4), 902–918 (2018)
Google Scholar
J. Savoy, Trump’s and Clinton’s style and rhetoric during the 2016 presidential election. J. Quant. Linguis. 25(2), 168–189 (2018)
Google Scholar
J. Savoy, Authorship of Pauline epistles revisited. J. Assoc. Inf. Sci. Technol. 70(19), 1089–1097 (2019)
Google Scholar
N. Schaetti, J. Savoy, Comparison of visualisable evidence-based authorship attribution using reservoir computing and deep learning architecture. Technical Report, University of Neuchatel, 2020
Google Scholar
H. Schmid, Improvements in part-of-speech tagging with an application to German, in Proceedings in the ACL SIGDAT-Workshop (The ACL Press, Stroudsburg, 1995), pp. 47–50
Google Scholar
S. Schöberlein, Poe or not Poe? A stylometric analysis of Edgar Allan Poe’s disputed writings. Digit. Scholarsh. Humanit. 32(3), 643–759 (2017)
Google Scholar
H.A. Schwartz, J.C. Eichstaedt, M.L. Kern, L. Dziurzynski, S.M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M.E.P. Seligman, L.H. Ungar, Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One 8(9), e73791 (2013)
Google Scholar
D. Scully, C.E. Brodley, A compression and machine learning: a new perspective on feature space vectors, in Data Compression Conference (DCC’06) (The IEEE Press, Piscataway, 2006), pp. 332–341
Google Scholar
P. Seargeant, The Emoji Revolution. How Technology Is Shaping the Future of Communication (Cambridge University Press, Cambridge, 2019)
Google Scholar
F. Sebastiani, Machine learning in automated text categorization. ACM Comput. Surv. 14(1), 1–27 (2002)
Google Scholar
C.J. Shogan, The president’s State of the Union address: tradition, function, and policy implications. Congressional Research Service (R40132), 2016
Google Scholar
C.J. Shogan, T.H. Neale, The president’s State of the Union address: Tradition, function, and policy implications. Congressional Research Service (7-5700), 2012
Google Scholar
K. Shu, H. Liu, Detecting Fake News on Social Networks (Morgan & Claypool, San Francisco, 2019)
Google Scholar
K. Shu, A. Silva, S. Wang, J. Tang, H. Liu, Fake news detection on social media: a data mining perspective. ACM SIGKDD Explorations Newsletter 1(19), 22–36 (2017)
Google Scholar
H.S. Sichel, On a distribution law for word frequencies. J. Am. Stat. Assoc. 70(351), 542–547 (1975)
Google Scholar
E.H. Simpson, Measurement of diversity. Nature 163, 688 (1949)
MATH Google Scholar
R.B. Slatcher, C.K. Chung, J.W. Pennebaker, Winning words: individual differences in linguistic style among U.S. presidential and vice presidential candidates. J. Res. Personal. 41, 63–75 (2007)
Google Scholar
F. Smadja, Retrieving collocations from text: Xtract. Comput. Linguis. 19(1), 143–178 (1993)
Google Scholar
G. Smith, The AI Delusion (Oxford University Press, Oxford, 2018)
Google Scholar
G. Smith, J. Cordes, The 9 Pitfalls of Data Science (Oxford University Press, Oxford, 2019)
Google Scholar
J.A. Smith, C. Kelly, Stylistic constancy and change across literary corpora: using measures of lexical richness to date works. Comput. Humanit. 36(4), 411–430 (2002)
Google Scholar
V. Sotirova, The Bloomsbury Companion to Stylistics (Bloomsbury, London, 2016)
Google Scholar
K. Sparck Jones, A statistical interpretation of term specificity and its application in retrieval. J. Doc. 60(5), 493–502 (1972)
Google Scholar
D. Spiegelhalter, The Art of Statistics. Learning from Data (Pelican, London, 2019)
Google Scholar
R. Sproat, Morphology and Computation (The MIT Press, Cambridge, 1992)
Google Scholar
E. Stamatatos, Authorship attribution based on feature set subspacing ensembles. J. Artif. Intell. Tools 15(5), 823–838 (2006)
Google Scholar
E. Stamatatos, A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009)
Google Scholar
E. Stamatatos, Authorship attribution using text distortion, in Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (ACL) (The ACL Press, Stroudsburg, 2017), pp. 1138–1149
Google Scholar
E. Stamatatos, N. Fakotakis, G. Kokkinakis, Computer-based authorship attribution without lexical measures. J. Assoc. Inf. Sci. Technol. 35(1), 193–214 (2001)
Google Scholar
E. Stamatatos, W. Daelemans, B. Verhoeven, M. Potthast, B. Stein, J. Juola, M.A. Sanchez-Perez, A. Barrón-Cadeno, Overview of the author identification task at PAN 2014, in Proceeding CLEF-2014, Working Notes, ed. by L. Cappellato, N. Ferro, M. Halvey, W. Kraaij (CEUR, Aachen, 2014), pp. 877–897
Google Scholar
E. Stamatatos, M. Tschuggnall, B. Verhoeven, W. Daelemans, G. Specht, B. Stein, M. Potthast, Clustering by authorship within and across documents, in Notebook Papers of CLEF 2016 Labs and Workshop (CEUR, Aachen, 2016)
Google Scholar
C. Stamou, Stylochronometry: Stylistic development, sequence of composition, and relative dating. Lit. Linguis. Comput. 23(2), 181–199 (2008)
MathSciNet Google Scholar
B. Stein, N. Lipka, P. Prettenhofer, Intrinsic plagiarism analysis. Lang. Resour. Eval. 45(1), 63–82 (2011)
Google Scholar
J.M. Stella, E. Ferrara, M. De Domenico, Bots increase exposure to negative and inflammatory content in online social systems. Proc. Natl. Acad. Sci. 115(49), 12435–12440 (2018)
Google Scholar
P.J. Stone, The General Inquirer: A Computer Approach to Content Analysis. (The MIT Press, Cambridge, 1966)
Google Scholar
D.M. Strong, Y.W. Lee, R.Y. Wang, Data quality in context. Commun. ACM 40(5), 103–110 (1997)
Google Scholar
L.M. Stuart, S. Tazhibayeva, A.R. Wagoner, J.M. Taylor, On identifying authors with style, in Proceedings of the 2013 IEEE Conference on Systems, Man, and Cybernetics (The IEEE Press, Washington, 2013), pp. 3048–3053
Google Scholar
I. Sutskever, J. Martens, G. Hinton, Generating text with recurrent neural networks, in Proceedings of the 28th International Conference on Machine Learning (ICML-11) (Omnipress, Madison, 2011), pp. 1017–1024
Google Scholar
I. Sutskever, O. Vinyls, Q.V. Lee, Sequence to sequence learning with neural networks, in Advanced in Neural Information Processing Systems 27 (NIPS 2014), vol. 28 (The IEEE Press, Washington, 2014), pp. 3104–3112
Google Scholar
M. Taddy, Document classification by inversion of distributed language representations, in Proceedings Association for Computational Linguistics (ACL) (The ACL Press, Stroudsburg, 2014), pp. 45–49
Google Scholar
K. Tanaka-Ishii, S. Aihara, Computational constancy measures of texts - Yule’s K and Rényi’s entropy. Comput. Linguis. 41(3), 481–502 (2015)
Google Scholar
L. Tassinari, John Florio, The Man who was Shakespeare (Giano Books, Montreal, 2009)
Google Scholar
Y.R. Tausczik, J.W. Pennebaker, The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)
Google Scholar
G. Taylor, G. Egan, The New Oxford Shakespeare: Authorship Companion (Oxford University Press, Oxford, 2017)
Google Scholar
G. Taylor, R. Loughnane, The life and theatrical interests of Edward de Vere, seventeenth Earl of Oxford, in Shakespeare, Beyond Doubt. Evidence, Argument, Controversy, ed. by P. Edmondson, S. Wells (Cambridge University Press, Cambridge, 2013), pp. 39–48
Google Scholar
G. Taylor, R. Loughnane, The canon and chronology of Shakspeare’s works, in The New Oxford Shakespeare: Authorship Companion, ed. by G. Taylor, G. Egan (Oxford University Press, Oxford, 2017), pp. 417–603
Google Scholar
W.J. Teahan, D.J. Harper, Using compression-based languages model for text categorization, in Language Modeling for Information Retrieval (Springer, Cham, 2003), pp. 141–165
MATH Google Scholar
R. Thisted, B. Efron, Did Shakespeare write a newly-discovered poem? Biometrika 4740(3), 445–455 (1987)
MathSciNet MATH Google Scholar
F.N. Thomas, M. Turner, Clear and Simple as the Truth. Writing Classic Prose (Princeton University Press, Princeton, 2011)
Google Scholar
J.R.R. Tolkien, Beowulf. The monsters and the critics, in Proceedings of the British Academy (1936)
Google Scholar
P. Törnberg, Echo chambers and viral misinformation: Modeling fake news as complex contagion. PLoS One 13(9), e0203958 (2018)
Google Scholar
K. Toutanova, D. Klein, C. Manning, Y. Singer, Feature-rich part-of-speech tagging with a cyclic dependency network, in Proceedings of HLT-NAACL 2003, pp. 252–259 (The ACL Press, Stroudsburg, 2003)
Google Scholar
A.W. Trask, Deep Learning (Manning, Shelter Island, 2019)
Google Scholar
M. Trevisani, A. Tuzzi, A portrait of JASA: the history of statistics through analysis of keyword counts in an early scientific journal. Qual. Quant. 49(3), 1287–1304 (2013)
Google Scholar
M. Trevisani, A. Tuzzi, Learning the evolution of disciplines from scientific literature: a functional clustering approach to normalized keyword count trajectories. Knowl.-Based Syst. 146, 129–141 (2018)
Google Scholar
J. Tuldava, The development of statistical stylistics (a survey). J. Quant. Linguis. 11(1–2), 141–151 (2004)
Google Scholar
J. Tulis, The Rhetorical Presidency (Princeton University Press, Princeton, 1987)
Google Scholar
A. Tuzzi, What to put in the bag? Comparing and contrasting procedures for text clustering. Ital. J. Appl. Stat. 22(1), 77–94 (2010)
Google Scholar
A. Tuzzi (ed.), Tracing the Life Cycle of Ideas in the Humanities and Social Sciences (Springer, Cham, 2018)
Google Scholar
A. Tuzzi, M. Cortelazzo, Drawing Elena Ferrante’s Profile (Padova University Press, Padova, 2018)
Google Scholar
A. Tuzzi, M. Cortelazzo, What is Elena Ferrante? A comparative analysis of a secretive bestselling Italian writer. Digit. Scholarsh. Humanit. 33(3), 685–702 (2018)
Google Scholar
A. Tuzzi, M.A. Cortelazzo, It takes many hands to draw Elena Ferrante’s profile, in Drawing Elena Ferrante’s Profile, ed. by A. Tuzzi, M.A. Cortelazzo (Padova University Press, Padova, 2018), pp. 9–30
Google Scholar
F.J. Tweedie, R.H. Baayen, How variable may a constant be? Measures of lexical richness in perspective. Comput. Humanit. 32(5), 323–352 (1998)
Google Scholar
F.J. Tweedie, S. Singh, D.I. Holmes, Neural network applications in stylometry: the Federalist Papers. Comput. Humanit. 30(1), 1–10 (1996)
Google Scholar
J. Urbano, H. Lima, A. Hanjalic, Statistical significance testing in information retrieval: an empirical analysis of type I, type II and type III errors, in Proceedings ACM-SIGIR (The ACM Press, New York, 2019), pp. 505–514
Google Scholar
R. van der Goot, N. Ljubešić, I. Matroos, M. Nissim, B. Plank, Bleaching text: abstract features for cross-lingual gender prediction, in Proceedings of the Annual meeting of the Association for Computational Linguistics (ACL) (The ACL Press, Stroudsburg, 2018), pp. 383–389
Google Scholar
O. Varol, E. Ferrara, C.A. Davis, F. Menczer, A. Flammini, Online human-bot interactions: detection, estimation, and characterization, in Proceedings of the 11th AAAI Conference on Web and Social Media (ICWSM 2017), pp. 280–289 (2017)
Google Scholar
T. Veale, M. Cook, Twitterbots. Making Machines that Make Meaning (The MIT Press, Cambridge, 2018)
Google Scholar
B. Vickers, Shakespeare, Co-author. A Historical Study of Five Collaborative Plays (Oxford University Press, Oxford, 2002)
Google Scholar
H. Voorhees, D. Harman, The TREC Experiment and Evaluation in Information Retrieval (The MIT University Press, Cambridge, 2005)
Google Scholar
P. Vossen, EuroWordNet: a Multilingual Database with Lexical Semantic Networks (Kluwer, Dordrecht, 1998)
MATH Google Scholar
A. Vrij, Detecting Lies and Deceit. Pitfalls and Opportunities (Wiley, Chichester, 2008)
Google Scholar
T. Wilson, P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, E. Riloff, S. Patwardhan, Opinionfinder: a system for subjectivity analysis, in Proceedings Empirical Methods for Natural Language Processing (HLT/EMNLP) (2005), pp. 34–35
Google Scholar
I.H. Witten, E. Frank, M.A. Hall, Data Mining. Practical Machine Learning Tools and Techniques (Morgan Kaufmann, Burlington, 2013)
Google Scholar
R. Wittgenstein, Philosophical Investigations (Basil Blackwell, London, 1953)
MATH Google Scholar
D.H. Wolpert, The lack of a priori distinctions between learning algorithms. Neural Comput. 8, 1341–1390 (1996)
Google Scholar
D.H. Wolpert, The supervised learning no-free-lunch theorems. in Proceedings of the 6th Online World Conference on Soft Computing in Industrial Applications (2001), pp. 25–42
Google Scholar
Y. Yang, X. Liu, A re-examination of text categorization methods, in Proceedings ACM-SIGIR Conference (The ACM Press, New York, 1999), pp. 42–49
Google Scholar
Y. Yang, J.O. Pederson, A comparative study of feature selection in text categorization, in Proceedings International Conference on Machine Learning (The ACM Press, New York, 1997), pp. 412–420
Google Scholar
B. Ycart, Alberti’s letter counts. Lit. Linguis. Comput. 29(2), 255–265 (2014)
Google Scholar
L. Young, S. Soroka, Affective news: the automated coding of sentiment in political texts. Am. Polit. Res. 29(2), 205–231 (2012)
Google Scholar
G. Yule, The Study of Language, 7th edn. (Cambridge University Press, Cambridge, 2020)
Google Scholar
E. Zangerle, M. Tschuggnall, G. Specht, B. Stein, M. Potthast, Overview of the style change detection task at PAN 2019, in Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop (CEUR, Aachen, 2019)
Google Scholar
R. Zbib, L. Zhao, D. Karakos, W. Hartmann, J. DeYoung, Z. Huang, Z. Jiang, N. Rivkin, L. Zhang, R. Schwartz, J. Makhoul, Neural-network lexical translation for cross-lingual IR from text and speech, in Proceedings ACM-SIGIR (The ACM Press, New York, 2019), pp. 645–654
Google Scholar
Y. Zhao, J. Zobel, Entropy-based authorship search in large document collection, in Proceedings ECIR2007. Springer Lecture Notes in Computer Science, vol. 4425 (2007), pp. 381–392
Google Scholar
G.K. Zipf, The Psychology of Language (Houghton-Mifflin, Boston, 1935)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Neuchatel, Neuchâtel, Switzerland
Jacques Savoy

Authors

Jacques Savoy
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Savoy, J. (2020). Advanced Models for Stylometric Applications. In: Machine Learning Methods for Stylometry. Springer, Cham. https://doi.org/10.1007/978-3-030-53360-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-53360-1_7
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53359-5
Online ISBN: 978-3-030-53360-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics