Abstract
Statistical information on a substantial corpus of representative Spanish texts is needed in order to determine the significance of data about individual authors or texts by means of comparison. This study describes the organization and analysis of a 150,000-word corpus of 30 well-known twentieth-century Spanish authors. Tables show the computational results of analyses involving sentences, segments, quotations, and word length.
The article explains the considerations that guided content, selection, and sample size, and describes special editing needed for the input of Spanish text. Separate sections highlight and comment upon some of the findings.
The corpus and the tables provide objective data for studies of homogeneity and heterogeneity. The format of the tables permits others to add to the original 30 authors, organize the results by categories, or use the cumulative results for normative comparisons.
Similar content being viewed by others
References
Birch, David. “The Stylistic Analysis of Large Corpora of Literary Texts.” ALLCJournal, 6 (1985), 33–38.
Delbecque, N. “On Subject Position in Spanish: A Variable Rule Analysis of Constraints at the Level of the Subject NP and of the UP.” Literary and Linguistic Computing 3, 3 (1988), 185–200.
Ellegård, Alvar. A Statistical Method for Determining Authorship. The Junius Letters, 1769–1772. Göthenburg Studies in English, 13. Göteborg: Elanders Boktryckeri Aktiebolag, 1962.
García Hoz, Victor. Vocabulario usual, vocabulario común y vocabulario fundamental. Madrid: Consejo Superior de Investigaciones Cientificas Instituto “San Jose de Calasanz,” 1953.
Irizarry, Estelle. La creación literaria de Rafael Dieste. SadaLa Coruña: Ediciós do Castro, 1980.
Juilland, Alphonse G., and Eugenio Chang-Rodriguez. Frequency Dictionary of Spanish Words. The Hague: Mouton, 1964.
Kenny, Anthony. The Computation of Style. Oxford: Pergamon, 1982.
Milic, Louis. “The Century of Prose Corpus.” In The Dynamic Text: Guide. Toronto: University of Toronto, 1989, 110–11.
Lotus 1-2-3. Lotus Development Corp., 55 Cambridge Pkwy., Cambridge, MA 02142.
Oakman, Robert L. Computer Methods for Literary Research. Athens: University of Georgia, 1984.
Potter, Rosanne G. “Character Definition Through Syntax: Significant Within-Play Variability in 21 Modern English-Language Plays.” STYLE, 15, 4 (1981), 415–34.
Potter, Rosanne G. “Toward a Syntactic Differentiation of Period Style in Modern Drama: Significant Between-Play Variability in 21 English-Language Plays.” Computers and the Humanities, 14 (1980), 187–96.
Reimer, Stephen R. LitStats. University of Alberta, Edmonton, Alberta (Canada), 1989.
Rodríguez Bon, Ismael, and Universidad de Puerto Rico (Consejo Superior de Enseñanza). Recuento de vocabulario espanol. Ed. OEA and UNESCO. Río Piedras: Universided de Puerto Rico, 1952.
Sherman, Lucius. “Some Observations Upon the Sentence Length in English Prose.” In University of Nebraska Studies, vol. 1 (1988), 119–30.
Unamuno, Miguel de. Niebla. Madrid: Espasa-Calpe, 1968.
Author information
Authors and Affiliations
Additional information
Estelle Irizarry is Professor of Spanish at Georgetown University and author of 20 books and annotated editions dealing with Hispanic literature, art, and hoaxes. Her latest book, an edition of Infortunios de Alonso Ramirez, treats the disputed authorship of Spanish America's first novel. She is Courseware Editor of CHum.
Rights and permissions
About this article
Cite this article
Irizarry, E. Stylistic analysis of a corpus of twentieth-century Spanish narrative. Comput Hum 24, 265–274 (1990). https://doi.org/10.1007/BF00123413
Issue Date:
DOI: https://doi.org/10.1007/BF00123413