J. Chem. Inf. Comput. Sci., 38 (3), 379 -386, 1998. ci970437z S0095-2338(97)00437-X
Web Release Date: April 4, 1998

Copyright © 1998 American Chemical Society

On the Properties of Bit String-Based Measures of Chemical Similarity

Darren R. Flower

Department of Physical and Metabolic Sciences, ASTRA Charnwood, Bakewell Road, Loughborough, Leicestershire, UK, LE11 5RH

Received November 26, 1997

Abstract:

With the growth of interest in database searching and compound selection, the quantification of chemical similarity has become an area of intense practical and theoretical interest. One of the most widely used methods of measuring chemical similarity is based on mapping fragments within a molecule as bits within a binary string. We present empirical results which suggest that bit strings provide a nonintuitive encoding of molecular size, shape, and global similarity. Other results, this time statistical in nature, suggest that the observed behavior of bit string-based searches have a large nonspecific component. On this basis, we question whether bit string-based similarity methods possess all the features desirable in a quantitative chemical distance measure or metric and suggest that there are instances when they may not be the most appropriate tool for searching or segregating chemical structures.

Download the full text: PDF | HTML