skip to main content
10.1145/1410140.1410143acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Aggregate documents: making sense of a patchwork of topical documents

Published:16 September 2008Publication History

ABSTRACT

With the dramatic increase in quantity and diversity of online content, particularly in the form of user generated content, we now have access to unprecedented amounts of information. Whether you are researching the purchase of a new cell phone, planning a vacation, or trying to assess a political candidate, there are now countless resources at your fingertips. However, finding and making sense of all this information is laborious and it is difficult to assess high-level trends in what is said. Web sites like Wikipedia and Digg democratize the process of organizing the information from countless document into a single source where it is somewhat easier to understand what is important and interesting. In this talk, I describe a complementary set of automated alternatives to these approaches, demonstrate these approaches with a working example, the commercial web site Wize.com, and derive some basic principles for aggregating a diverse set of documents into a coherent and useful summary.

References

  1. Bunke, H. and Wang, P.S. 1997. Handbook of character recognition and document image analysis. World Scientific.Google ScholarGoogle Scholar
  2. New York Times Blog. http://www.nytimes.com/ref/topnews/blog-index.htmlGoogle ScholarGoogle Scholar
  3. Antonacopoulos, A and Hu, J. 1995. Web Document Analysis: Challenges and Opportunities. World Scientific. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sifry, D. 2007. "The State of the Live Web, April 2007". http://www.sifry.com/alerts/archives/000493.htmlGoogle ScholarGoogle Scholar
  5. Wikipedia statistics. July 1, 2008. http://en.wikipedia.org/wiki/Special:StatisticsGoogle ScholarGoogle Scholar
  6. Digg.com. July 1, 2008. http://digg.comGoogle ScholarGoogle Scholar
  7. Graham-Cumming, J. "How Many Users does Digg Have?" http://www.jgc.org/blog/2008/01/how-many-users-does-digg-have.htmlGoogle ScholarGoogle Scholar
  8. Wize.com. July 2008. http://wize.comGoogle ScholarGoogle Scholar
  9. Heydon, A. and Najork, M. 1999. "Mercator: A Scalable, Extensible Web Crawler." World Wide Web 2, 4 (Apr. 1999), 219--229. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Aggregate documents: making sense of a patchwork of topical documents

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            DocEng '08: Proceedings of the eighth ACM symposium on Document engineering
            September 2008
            312 pages
            ISBN:9781605580814
            DOI:10.1145/1410140

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 September 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            DocEng '08 Paper Acceptance Rate21of62submissions,34%Overall Acceptance Rate178of537submissions,33%
          • Article Metrics

            • Downloads (Last 12 months)2
            • Downloads (Last 6 weeks)2

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader