A new analysis of the false positive rate of a Bloom filter

https://doi.org/10.1016/j.ipl.2010.07.024Get rights and content

Abstract

A Bloom filter is a space-efficient data structure used for probabilistic set membership testing. When testing an object for set membership, a Bloom filter may give a false positive. The analysis of the false positive rate is a key to understanding the Bloom filter and applications that use it. We show experimentally that the classic analysis for false positive rate is wrong. We formally derive a correct formula using a balls-and-bins model and show how to numerically compute the new, correct formula in a stable manner. We also prove that the new formula always results in a predicted greater false positive rate than the classic formula. This correct formula is numerically compared to the classic formula for relative error – for a small Bloom filter the prediction of false positive rate will be in error when the classic formula is used.

References (9)

  • P. Bose et al.

    On the false-positive rate of Bloom filters

    Information Processing Letters

    (2008)
  • B. Bloom

    Space/time tradeoffs in hash coding with allowable errors

    Communications of the ACM

    (1970)
  • A. Broder et al.

    Network applications of Bloom filters: a survey

    Internet Mathematics

    (2004)
  • F. Chang et al.

    Bigtable: a distributed storage system for structured data

There are more references available in the full text version of this article.

Cited by (0)

This material is based on work supported by the National Science Foundation under CNS-0520081.

1

Miguel Jimeno is on leave from Universidad del Norte, Barranquila, Colombia.

View full text