Abstract
Dialectometry produces aggregate DISTANCE MATRICES in which a distance is specified for each pair of sites. By projecting groups obtained by clustering onto geography one compares results with traditional dialectology, which produced maps partitioned into implicitly non-overlapping DIALECT AREAS. The importance of dialect areas has been challenged by proponents of CONTINUA, but they too need to compare their findings to older literature, expressed in terms of areas.
Simple clustering is unstable, meaning that small differences in the input matrix can lead to large differences in results (Jain et al. 1999). This is illustrated with a 500-site data set from Bulgaria, where input matrices which correlate very highly (r = 0.97) still yield very different clusterings. Kleiweg et al. (2004) introduce COMPOSITE CLUSTERING, in which random noise is added to matrices during repeated clustering. The resulting borders are then projected onto the map.
The present contribution compares Kleiweg et al.’s procedure to resampled bootstrapping, and also shows how the same procedure used to project borders from composite clustering may be used to project borders from bootstrapping.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
EMBLETON, S. (1987): Multidimensional Scaling as a Dialectometrical Technique. In: R.
M. Babitch (Ed.) Papers from the Eleventh Annual Meeting of the Atlantic Provinces Linguistic Association, Centre Universitaire de Shippagan, New Brunswick, 33-49.
FELSENSTEIN, J. (2004): Inferring Phylogenies. Sinauer, Sunderland, MA.
FISCHER, M. (1980): Regional Taxonomy: A Comparison of Some Hierarchic and Non-Hierarchic Strategies. Regional Science and Urban Economics 10, 503-537.
GOEBL, H. (1984): Dialektometrische Studien: Anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF 3 Vol. Max Niemeyer, Tübin-gen.
HAAG, K. (1898): Die Mundarten des oberen Neckar- und Donaulandes. Buchdruckerei Egon Hutzler, Reutlingen.
JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data Clustering: A Review. ACM Computing Surveys 31(3), 264-323.
KLEIWEG, P., NERBONNE, J. and BOSVELD, L. (2004): Geographic Projection of Cluster Composites. In: A. Blackwell, K. Marriott and A. Shimojima (Eds.) Diagrammatic Rep-resentation and Inference. 3rd Intn’l Conf, Diagrams 2004. Cambridge, UK, Mar. 2004. (Lecture Notes in Artificial Intelligence 2980). Springer,
Berlin, 392-394. KÖNIG, W. (1991, 1 1978): DTV-Atlas zur detschen Sprache. DTV, München.
KRUKSAL, J. (1999): An Overview of Sequence Comparison. In: D. Sankoff and J. Kruskal (Eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Se-quence Comparison, 2nd ed. CSLI, Stanford, 1-44.
MANNI, F. HEERINGA, W. and NERBONNE, J. (2006): To what Extent are Surnames Words? Comparing Geographic Patterns of Surnames and Dialect Variation in the Nether-lands. In Literary and Linguistic Computing 21(4), 507-528.
MUCHA, H.J. and HAIMERL, E. (2005): Automatic Validation of Hierarchical Cluster Anal-ysis with Application in Dialectometry. In: C. Weihs and W. Gaul (Eds.) Classification— the Ubiquitous Challenge. Proc. of 28th Mtg Gesellschaft für Klassifikation, Dortmund, Mar. 9-11, 2004. Springer, Berlin, 513-520.
NERBONNE, J., HEERINGA, W. and KLEIWEG, P. (1999): Edit Distance and Dialect Prox-imity. In: D. Sankoff and J. Kruskal (Eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, 2nd ed. CSLI, Stanford, v-xv.
NERBONNE, J. and SIEDLE, Ch. (2005): Dialektklassifikation auf der Grundlage ag-gregierter Ausspracheunterschiede. Zeitschrift für Dialektologie und Linguistik 72(2), 129-147.
PAGE, R.D.M., and HOLMES, E.C. (2006): Molecular Evolution: A Phylogenetic Approach. (11998) Blackwell, Oxford.
SCHILTZ, G. (1996): German Dialectometry. In: H.-H. Bock and W. Polasek (Eds.) Data Analysis and Information Systems: Statistical and Conceptual Approaches. Proc. of 19th Mtg of Gesellschaft für Klassifikation, Basel, Mar. 8-10, 1995. Springer, Berlin, 526-539.
SPRUIT, M. (2006): Measuring Syntactic Variation in Dutch Dialects. In J. Nerbonne and W. Kretzschmar, Jr. (Eds.) Progress in Dialectometry: Toward Explanation. Special issue of Literary and Linguistic Computing 21(4), 493-506.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nerbonne, J., Kleiweg, P., Heeringa, W., Manni, F. (2008). Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_76
Download citation
DOI: https://doi.org/10.1007/978-3-540-78246-9_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)