Skip to main content

Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering

  • Conference paper
Book cover Data Analysis, Machine Learning and Applications

Abstract

Dialectometry produces aggregate DISTANCE MATRICES in which a distance is specified for each pair of sites. By projecting groups obtained by clustering onto geography one compares results with traditional dialectology, which produced maps partitioned into implicitly non-overlapping DIALECT AREAS. The importance of dialect areas has been challenged by proponents of CONTINUA, but they too need to compare their findings to older literature, expressed in terms of areas.

Simple clustering is unstable, meaning that small differences in the input matrix can lead to large differences in results (Jain et al. 1999). This is illustrated with a 500-site data set from Bulgaria, where input matrices which correlate very highly (r = 0.97) still yield very different clusterings. Kleiweg et al. (2004) introduce COMPOSITE CLUSTERING, in which random noise is added to matrices during repeated clustering. The resulting borders are then projected onto the map.

The present contribution compares Kleiweg et al.’s procedure to resampled bootstrapping, and also shows how the same procedure used to project borders from composite clustering may be used to project borders from bootstrapping.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • EMBLETON, S. (1987): Multidimensional Scaling as a Dialectometrical Technique. In: R.

    Google Scholar 

  • M. Babitch (Ed.) Papers from the Eleventh Annual Meeting of the Atlantic Provinces Linguistic Association, Centre Universitaire de Shippagan, New Brunswick, 33-49.

    Google Scholar 

  • FELSENSTEIN, J. (2004): Inferring Phylogenies. Sinauer, Sunderland, MA.

    Google Scholar 

  • FISCHER, M. (1980): Regional Taxonomy: A Comparison of Some Hierarchic and Non-Hierarchic Strategies. Regional Science and Urban Economics 10, 503-537.

    Article  Google Scholar 

  • GOEBL, H. (1984): Dialektometrische Studien: Anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF 3 Vol. Max Niemeyer, Tübin-gen.

    Google Scholar 

  • HAAG, K. (1898): Die Mundarten des oberen Neckar- und Donaulandes. Buchdruckerei Egon Hutzler, Reutlingen.

    Google Scholar 

  • JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data Clustering: A Review. ACM Computing Surveys 31(3), 264-323.

    Article  Google Scholar 

  • KLEIWEG, P., NERBONNE, J. and BOSVELD, L. (2004): Geographic Projection of Cluster Composites. In: A. Blackwell, K. Marriott and A. Shimojima (Eds.) Diagrammatic Rep-resentation and Inference. 3rd Intn’l Conf, Diagrams 2004. Cambridge, UK, Mar. 2004. (Lecture Notes in Artificial Intelligence 2980). Springer,

    Google Scholar 

  • Berlin, 392-394. KÖNIG, W. (1991, 1 1978): DTV-Atlas zur detschen Sprache. DTV, München.

    Google Scholar 

  • KRUKSAL, J. (1999): An Overview of Sequence Comparison. In: D. Sankoff and J. Kruskal (Eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Se-quence Comparison, 2nd ed. CSLI, Stanford, 1-44.

    Google Scholar 

  • MANNI, F. HEERINGA, W. and NERBONNE, J. (2006): To what Extent are Surnames Words? Comparing Geographic Patterns of Surnames and Dialect Variation in the Nether-lands. In Literary and Linguistic Computing 21(4), 507-528.

    Article  Google Scholar 

  • MUCHA, H.J. and HAIMERL, E. (2005): Automatic Validation of Hierarchical Cluster Anal-ysis with Application in Dialectometry. In: C. Weihs and W. Gaul (Eds.) Classification— the Ubiquitous Challenge. Proc. of 28th Mtg Gesellschaft für Klassifikation, Dortmund, Mar. 9-11, 2004. Springer, Berlin, 513-520.

    Google Scholar 

  • NERBONNE, J., HEERINGA, W. and KLEIWEG, P. (1999): Edit Distance and Dialect Prox-imity. In: D. Sankoff and J. Kruskal (Eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, 2nd ed. CSLI, Stanford, v-xv.

    Google Scholar 

  • NERBONNE, J. and SIEDLE, Ch. (2005): Dialektklassifikation auf der Grundlage ag-gregierter Ausspracheunterschiede. Zeitschrift für Dialektologie und Linguistik 72(2), 129-147.

    Google Scholar 

  • PAGE, R.D.M., and HOLMES, E.C. (2006): Molecular Evolution: A Phylogenetic Approach. (11998) Blackwell, Oxford.

    Google Scholar 

  • SCHILTZ, G. (1996): German Dialectometry. In: H.-H. Bock and W. Polasek (Eds.) Data Analysis and Information Systems: Statistical and Conceptual Approaches. Proc. of 19th Mtg of Gesellschaft für Klassifikation, Basel, Mar. 8-10, 1995. Springer, Berlin, 526-539.

    Google Scholar 

  • SPRUIT, M. (2006): Measuring Syntactic Variation in Dutch Dialects. In J. Nerbonne and W. Kretzschmar, Jr. (Eds.) Progress in Dialectometry: Toward Explanation. Special issue of Literary and Linguistic Computing 21(4), 493-506.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nerbonne, J., Kleiweg, P., Heeringa, W., Manni, F. (2008). Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_76

Download citation

Publish with us

Policies and ethics