Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering

Nerbonne, John; Kleiweg, Peter; Heeringa, Wilbert; Manni, Franz

doi:10.1007/978-3-540-78246-9_76

John Nerbonne⁵,
Peter Kleiweg⁵,
Wilbert Heeringa⁵ &
…
Franz Manni⁶

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

6040 Accesses
17 Citations

Abstract

Dialectometry produces aggregate DISTANCE MATRICES in which a distance is specified for each pair of sites. By projecting groups obtained by clustering onto geography one compares results with traditional dialectology, which produced maps partitioned into implicitly non-overlapping DIALECT AREAS. The importance of dialect areas has been challenged by proponents of CONTINUA, but they too need to compare their findings to older literature, expressed in terms of areas.

Simple clustering is unstable, meaning that small differences in the input matrix can lead to large differences in results (Jain et al. 1999). This is illustrated with a 500-site data set from Bulgaria, where input matrices which correlate very highly (r = 0.97) still yield very different clusterings. Kleiweg et al. (2004) introduce COMPOSITE CLUSTERING, in which random noise is added to matrices during repeated clustering. The resulting borders are then projected onto the map.

The present contribution compares Kleiweg et al.’s procedure to resampled bootstrapping, and also shows how the same procedure used to project borders from composite clustering may be used to project borders from bootstrapping.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

EMBLETON, S. (1987): Multidimensional Scaling as a Dialectometrical Technique. In: R.
Google Scholar
M. Babitch (Ed.) Papers from the Eleventh Annual Meeting of the Atlantic Provinces Linguistic Association, Centre Universitaire de Shippagan, New Brunswick, 33-49.
Google Scholar
FELSENSTEIN, J. (2004): Inferring Phylogenies. Sinauer, Sunderland, MA.
Google Scholar
FISCHER, M. (1980): Regional Taxonomy: A Comparison of Some Hierarchic and Non-Hierarchic Strategies. Regional Science and Urban Economics 10, 503-537.
Article Google Scholar
GOEBL, H. (1984): Dialektometrische Studien: Anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF 3 Vol. Max Niemeyer, Tübin-gen.
Google Scholar
HAAG, K. (1898): Die Mundarten des oberen Neckar- und Donaulandes. Buchdruckerei Egon Hutzler, Reutlingen.
Google Scholar
JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data Clustering: A Review. ACM Computing Surveys 31(3), 264-323.
Article Google Scholar
KLEIWEG, P., NERBONNE, J. and BOSVELD, L. (2004): Geographic Projection of Cluster Composites. In: A. Blackwell, K. Marriott and A. Shimojima (Eds.) Diagrammatic Rep-resentation and Inference. 3rd Intn’l Conf, Diagrams 2004. Cambridge, UK, Mar. 2004. (Lecture Notes in Artificial Intelligence 2980). Springer,
Google Scholar
Berlin, 392-394. KÖNIG, W. (1991, ¹ 1978): DTV-Atlas zur detschen Sprache. DTV, München.
Google Scholar
KRUKSAL, J. (1999): An Overview of Sequence Comparison. In: D. Sankoff and J. Kruskal (Eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Se-quence Comparison, 2nd ed. CSLI, Stanford, 1-44.
Google Scholar
MANNI, F. HEERINGA, W. and NERBONNE, J. (2006): To what Extent are Surnames Words? Comparing Geographic Patterns of Surnames and Dialect Variation in the Nether-lands. In Literary and Linguistic Computing 21(4), 507-528.
Article Google Scholar
MUCHA, H.J. and HAIMERL, E. (2005): Automatic Validation of Hierarchical Cluster Anal-ysis with Application in Dialectometry. In: C. Weihs and W. Gaul (Eds.) Classification— the Ubiquitous Challenge. Proc. of 28th Mtg Gesellschaft für Klassifikation, Dortmund, Mar. 9-11, 2004. Springer, Berlin, 513-520.
Google Scholar
NERBONNE, J., HEERINGA, W. and KLEIWEG, P. (1999): Edit Distance and Dialect Prox-imity. In: D. Sankoff and J. Kruskal (Eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, 2nd ed. CSLI, Stanford, v-xv.
Google Scholar
NERBONNE, J. and SIEDLE, Ch. (2005): Dialektklassifikation auf der Grundlage ag-gregierter Ausspracheunterschiede. Zeitschrift für Dialektologie und Linguistik 72(2), 129-147.
Google Scholar
PAGE, R.D.M., and HOLMES, E.C. (2006): Molecular Evolution: A Phylogenetic Approach. (¹1998) Blackwell, Oxford.
Google Scholar
SCHILTZ, G. (1996): German Dialectometry. In: H.-H. Bock and W. Polasek (Eds.) Data Analysis and Information Systems: Statistical and Conceptual Approaches. Proc. of 19th Mtg of Gesellschaft für Klassifikation, Basel, Mar. 8-10, 1995. Springer, Berlin, 526-539.
Google Scholar
SPRUIT, M. (2006): Measuring Syntactic Variation in Dutch Dialects. In J. Nerbonne and W. Kretzschmar, Jr. (Eds.) Progress in Dialectometry: Toward Explanation. Special issue of Literary and Linguistic Computing 21(4), 493-506.
Google Scholar

Download references

Author information

Authors and Affiliations

Alfa-informatica, University of Groningen, Netherlands
John Nerbonne, Peter Kleiweg & Wilbert Heeringa
Musée de l’Homme, Paris, France
Franz Manni

Authors

John Nerbonne
View author publications
You can also search for this author in PubMed Google Scholar
Peter Kleiweg
View author publications
You can also search for this author in PubMed Google Scholar
Wilbert Heeringa
View author publications
You can also search for this author in PubMed Google Scholar
Franz Manni
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Institute of Business Economics and Information Systems, University of Hildesheim, Marienburgerplatz 22, 31141, Hildesheim, Germany
Christine Preisach
Lehrstuhl für Mustererkennung und Bildverarbeitung, Universität Freiburg, Gebäude 052, 79110, Freiburg i. Br, Germany
Hans Burkhardt
Institute of Computer Science and Institute of Business Economics and Information Systems, Marienburgerplatz 22, 31141, Hildesheim, Germany
Lars Schmidt-Thieme
Fakultät für Wirtschaftswissenschaften, Lehrstuhl für Betriebswirtschaftslehre, insbes. Marketing, Universitätsstraße 25, 33615, Bielefeld, Germany
Reinhold Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nerbonne, J., Kleiweg, P., Heeringa, W., Manni, F. (2008). Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_76

Download citation

DOI: https://doi.org/10.1007/978-3-540-78246-9_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics