Abstract
DISTANCE functions expressing the degree of dissimilarity of sets have found use in physical anthropology1, psychology2, numerical taxonomy3, ecology3 and elsewhere. During an ecological study by one of us, it was noticed that the similarity coefficient of Jaccard6, used in ecology, gives rise to a metric function satisfying the triangle inequality. For two non-empty finite sets X, Y, the Jaccard coefficient is the number of elements in the intersection X∩Y of X and Y. This coefficient (we use absolute value signs to indicate number of elements) has a heuristic interpretation. It measures the probability that an element of at least one of two sets is an element of both, and thus is a reasonable measure of similarity or “overlap” between the two. The one-complement
may then be considered a measure of the dissimilarity of the two sets.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Mahanolobis, P. C., Proc. Nat. Inst. Sci. India, 2, 49 (1936).
McGill, W. J., Psychometrika, 19, 97 (1954).
Sokal, R. R., and Sneath, P. H., Principles of Numerical Taxonomy (W. H. Freeman, 1963).
Orloci, L., J. Ecol., 54, 193 (1966).
Levandowsky, M., thesis, Columbia University, New York, 1970.
Jaccard, P., Bull. Soc. Vaud. Sci. Nat., 38, 69 (1902).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
LEVANDOWSKY, M., WINTER, D. Distance between Sets. Nature 234, 34–35 (1971). https://doi.org/10.1038/234034a0
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1038/234034a0
This article is cited by
-
Molecular differences of angiogenic versus vessel co-opting colorectal cancer liver metastases at single-cell resolution
Molecular Cancer (2023)
-
A crowdsourced dataset of aerial images with annotated solar photovoltaic arrays and installation metadata
Scientific Data (2023)
-
Interplay between topology and edge weights in real-world graphs: concepts, patterns, and an algorithm
Data Mining and Knowledge Discovery (2023)
-
Category tree distance: a taxonomy-based transaction distance for web user analysis
Data Mining and Knowledge Discovery (2023)
-
Tumorous kidney segmentation in abdominal CT images using active contour and 3D-UNet
Irish Journal of Medical Science (1971 -) (2023)