|
1. |
Propagating Updates in SPIDER
Koudas, N.; Marathe, A.; Srivastava, D.;
Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
15-20 April 2007
Page(s):1146
-
1153
Abstract:
SPIDER, developed at AT&T Labs-Research, is a system that efficiently supports flexible string matching against attribute values in large databases, and is extensively used in AT&T. The scoring methodology is based on tf.idf weighting and cosine similarity, and SPIDER maintains indexes containing string tokens and their weights, for fast matching at query time. Given the "global" nature of the weights maintained in the indexes, even a few updates to the underlying database tables would necessitate a (near-complete recomputation of the indexes, which can be prohibitively expensive. In this paper, we explore novel techniques to considerably reduce the cost of propagating updates in SPIDER, without a significant degradation of answer accuracy or query performance. We present experimental evidence using real data sets to demonstrate the practical benefits of our techniques.
|