Skip to main content
Log in

Zipf's law and the diversity of biology newsgroups

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Usenet newsgroups provide a popular means of scientific communication. We demonstrate striking order in the diversity of biology newsgroups: Submissions to newsgroups obey a form of Zipf's law, a simple power law for the frequency of posts as a function of the rank, by posting, of contributors. We show that a simple stochastic process, due to Günther et al. (1992, 1996), Levitin and Schapiro (1993), and Schapiro (1994), accounts for this pattern and reproduces many of the properties of newsgroups. This model successfully predicts the relative contribution from each poster in terms of the size, the number of posters and total posts, of the newsgroup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Baayen, R. H. (2001), Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht, Netherlands.

    MATH  Google Scholar 

  • Bar-ilan, J. (1997), The “mad cow” disease, Usenet newsgroups and bibliometric laws. Scientometrics, 39: 29–55.

    Article  Google Scholar 

  • David, H. A., Hartley, H. O., Pearson, E. S. (1954), The distribution of the ratio, in a single normal sample, of range to standard deviation. Biometrika, 41: 482–493.

    Article  MATH  MathSciNet  Google Scholar 

  • Frontier, S. (1985), Diversity and structure in aquatic ecosystems. Oceanography and Marine Biology: An Annual Review, 23: 253–312.

    Google Scholar 

  • Günther, R., Levitin, L., Schapiro, B., Wagner, P. (1996), Zipf's law and the effect of ranking on probability distributions. International Journal of Theoretical Physics, 35: 395–417.

    Article  MATH  Google Scholar 

  • Günther, R., Schapiro, B., Wagner, P. (1992), Physical complexity and Zipf's law. International Journal of Theoretical Physics, 31: 525–543.

    Article  MathSciNet  Google Scholar 

  • Hauben, M., Hauben, R. (1997), Netizens: On the History and Impact of Usenet and the Internet. IEEE Computer Society Press, Los Alamitos, California, USA.

    Google Scholar 

  • Hubbell, S. P. (2001), The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton, New Jersey, USA.

    Google Scholar 

  • Huberman, B. A., Pirolli, P. L. T., Pitkow, J. E., Lukose, R. M. (1998), Strong regularities in World Wide Web surfing. Science, 280: 95–97.

    Article  Google Scholar 

  • Kanji, G. K. (1999), 100 Statistical Tests. Sage Publications, London, UK.

    Google Scholar 

  • Levitin, L. B., Schapiro, B. (1993), Zipf's law and information complexity in an evolutionary system. Proceedings IEEE International Symposium on Information Theory, 76.

  • Li, W. (1992), Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38: 1842–1845.

    Article  Google Scholar 

  • Mandelbrot, B. (1953), An information theory of the statistical structure of language. In: W. E. Jackson (Ed.), Communication Theory, Academic Press, New York, New York, USA, pp. 486–502.

    Google Scholar 

  • Mandelbrot, B. (1961), On the theory of word frequencies and on related Markovian models of discourse. In: R. Jakobson (Ed.), Structure of Language and its Mathematical Aspects, American Mathematical Society, Providence, Rhode Island, USA, pp. 190–219.

    Google Scholar 

  • Magurran, A. E. (1988), Ecological Diversity and Its Measurement. Princeton University Press, Princeton, New Jersey, USA.

    Google Scholar 

  • Marsili, M., Zhang, Y.-C. (1998), Interacting individuals leading to Zipf's law. Physical Review Letters, 80: 2741–2744.

    Article  Google Scholar 

  • Miller, G. A., Newman, E. B., Friedman, E. A. (1957), Some effects of intermittent silence. American Journal of Psychology, 70: 311–313.

    Article  Google Scholar 

  • Okuyama, K., Takayasu, M., Takayasu, H. (1999), Zipf's law in income distribution of companies. Physica A, 269: 125–131.

    Article  Google Scholar 

  • Osborne, L. N. (1998), Topic development in USENET newsgroups. Journal of the American Society for Information Science, 49:1010–1016.

    Article  Google Scholar 

  • Schapiro, B. (1994), An approach to the physics of complexity. Chaos, Solitons and Fractals, 4: 115–123.

    Article  MATH  MathSciNet  Google Scholar 

  • Simon, H. A. (1955), On a class of skew distribution functions. Biometrika, 42: 425–440.

    Article  MATH  MathSciNet  Google Scholar 

  • Smith, M. A. (1999), Invisible crowds in cyberspace: mapping the social structure of the Usenet. In: M. A. Smith, P. Kollock (Eds), Communities in Cyberspace, Routledge, London, UK, pp. 195–219.

    Google Scholar 

  • Tokeshi, M. (1993), Species abundance patterns and community structure. Advances in Ecological Research, 24: 111–186.

    Article  Google Scholar 

  • Wilson, J. B., Wells, T. C. E., Trueman, I. C., Jones, G., Atkinson, M. D., Crawley, M. J., Dodd, M. E., Silvertown, J. (1996), Are there assembly rules for plant species abundance? An investigation in relation to soil resources and successional trends. Journal of Ecology, 84: 527–538.

    Article  Google Scholar 

  • Yule, G. U. (1924), A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F. R. S. Philosophical Transactions B, 213: 21.

    Google Scholar 

  • Zipf, G. K. (1935), The Psycho-Biology of Language. Houghton Mifflin, Boston, Massachusetts, USA.

    Google Scholar 

  • Zipf, G. K. (1949), Human Behavior and the Principle of Least Effort. Addison-Wesley Publishing Company, Cambridge, Massachusetts, USA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kot, M., Silverman, E. & Berg, C.A. Zipf's law and the diversity of biology newsgroups. Scientometrics 56, 247–257 (2003). https://doi.org/10.1023/A:1021971212438

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1021971212438

Keywords

Navigation