ABSTRACT
(MATH) Inversions are used as a fundamental quantity to measure the sortedness of data, to evaluate different ranking methods for databases, and in the context of rank aggregation. Considering the volume of the data sets in these applications, the data stream model {14, 2] is a natural setting to design efficient algorithms.We obtain a suite of space-efficient streaming algorithms for approximating the number of inversions in a permutation. The best space bound we achieve is $O(\log n \log \log n)$ through a deterministic algorithm. In contrast, we derive an $\Omega(n)$ lower bound for randomized exact computation for this problem; thus approximation is essential.(MATH) We also consider two generalizations of this problem: (1) approximating the number of inversions between two permutations, for which we obtain a randomized $O(\sqrt{n} \log n)$-space algorithm, and (2) approximating the number of inversions in a general list, for which we obtain a randomized $O(\sqrt{n} \log^2 n)$-space two-pass algorithm. In contrast, we derive $\Omega(n)$-space lower bounds for deterministic approximate computation for these problems; thus both randomization and approximation are essential.All our algorithms use only O(log n) time per data item.
- N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms, 7(4):567--583, 1986. Google ScholarDigital Library
- N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. JCSS, 58(1):137--147, 1999. Google ScholarDigital Library
- A. Andersson and O. Petersson. Approximate indexed lists. J. Algorithms, 29(2):256--276, 1998. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. WWW7/Computer Networks, 30(1-7):107--117, 1998. Google ScholarDigital Library
- A. Caprara. Sorting permutations by reversals and Eulerian cycle decompositions. SIAM J. Discrete (MATH)., 12:91--110, 1999. Google ScholarDigital Library
- P. Diaconis. Group Representation in Probability and Statistics. IMS Lecture Series 11, IMS, 1988.Google Scholar
- P. Diaconis and R. Graham. Spearman's footrule as a measure of disarray. J. of the Royal Statistical Society, Series B, 39(2):262--268, 1977.Google ScholarCross Ref
- P. F. Dietz. Optimal algorithms for list indexing and subset rank. Proc. WADS, Springer LNCS 382:39--46, 1989. Google ScholarDigital Library
- C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Ranking aggregation methods for the Web. Proc. 10th WWW, pp. 613--622, 2001. Google ScholarDigital Library
- F. Ergun, S. Kannan, R. Kumar. R. Rubinfeld, and M. Viswanathan. Spot-checkers. JCSS, 60(3):717--751, 2000. Google ScholarDigital Library
- V. Estivill-Castro and D. Wood. A survey of adaptive sorting algorithms. ACM Computing Surveys, 24(4):441--476, 1992. Google ScholarDigital Library
- J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate {L1}-difference algorithm for massive data streams. Proc. 40th FOCS, pp. 501--511, 1999. Google ScholarDigital Library
- P. Flajolet. Approximate counting: A detailed analysis. BIT, 25:113--134, 1985. Google ScholarDigital Library
- A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. A few good terms: Efficient streaming computation of wavelet decompositions. Manuscript, 2001.Google Scholar
- S. Guha, N. Koudas, and K. Shim. Data streams and histograms. Proc. 33rd STOC, pp. 471--475, 2001. Google ScholarDigital Library
- M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. DIMACS series in Discr. (MATH). & Theor. Comp. Sc., 50:107--118, 1999. Google ScholarDigital Library
- B. Kalyanasundaram and G. Schnitger. The probabilistic communication complexity of set intersection. SIAM J. Discrete (MATH)., 5(4):545--557, 1992. Google ScholarDigital Library
- D. E. Knuth. The Art of Computer Programming III: Sorting and Searching. Addison-Wesley, 1998. Google ScholarDigital Library
- E. Kushilevitz and N. Nisan. Communication complexity. Cambridge University Press, 1997. Google ScholarDigital Library
- C. Levcopoulos and O. Petersson. Exploiting few inversions when sorting: sequential and parallel algorithms. TCS, 163(1&2):211--238, 1996. Google ScholarDigital Library
- R. Morris. Counting large number of events in small registers. C. ACM, 21(10):840--842, 1978. Google ScholarDigital Library
Index Terms
- Approximate counting of inversions in a data stream
Recommendations
Counting inversions, offline orthogonal range counting, and related problems
SODA '10: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithmsWe give an O(n √lg n)-time algorithm for counting the number of inversions in a permutation on n elements. This improves a long-standing previous bound of O(n lg n/ lg lg n) that followed from Dietz's data structure [WADS'89], and answers a question of ...
Counting inversions in lists
SODA '03: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithmsIn a recent paper, Ajtai et al. [1] give a streaming algorithm to count the number of inversions in a stream Lε[m]n using two passes and O(ε−1-√n log n(log m + log n)) space. Here, we present a simple randomized streaming algorithm for the same problem ...
Adaptive and Approximate Orthogonal Range Counting
We present three new results on one of the most basic problems in geometric data structures, 2-D orthogonal range counting. All the results are in the w-bit word RAM model.
—It is well known that there are linear-space data structures for 2-D orthogonal ...
Comments