ABSTRACT
With the number and types of documents in digital library systems incr easing, tools for automatically organizing and presenting the content have to be found. While many approaches focus on topic-based organization and structuring, hardly any system incorporates automatic structural analysis and representation. Yet, genre information (unconsciously) forms one of the most distinguishing features in conventional libraries and in information searches. In this paper we present an approach to automatically analyze the structure of documents and to integrate this information into an automatically created content-based organization. In the resulting visualization, documents on similar topics, yet representing different genres, are depicted as books in differing colors. This representation supports users intuitively in locating relevant information presented in a relevant form.
- 1.D. Biber. Variations across Speech and Writing. Cambridge University Press, UK, 1988.]]Google Scholar
- 2.D. Biber. A typology of english texts. Linguistics, 27:3 - 43, 1989.]]Google Scholar
- 3.I. Bretan, J. Dewe, A. Hallberg, N. Wolkert, and J. Karlgren. Web-specific genre visualization. In Proc of WebNet '98, Orlando, FL, November 1998. http://www.stacken.kth.se/~dewe/.]]Google Scholar
- 4.H. Chen, C. Schuels, and R. Orwig. Internet categorization and search: A self-organizing approach. Journal of Visual Communication and Image Representation, 7(1):88-102, 1996. http://ai.BPA.arizona.edu/papers/.]]Google ScholarCross Ref
- 5.H. Chernoff. The use of faces to represent points in k-dimensional space graphically. Journal American Statistical Association, 68:361-368, 1973.]]Google ScholarCross Ref
- 6.L. Cherra and W. Vesterman. Writing tools: The STYLE and DICTION programs. Technical Report 91, Bell Laboratories, Murray Hill, NJ, 1981. Republished as part 4.4BSD User's Supplementary Documents by O'Reilly.]]Google Scholar
- 7.J. Himberg. A SOM based cluster visualization and its application for false coloring. In Proc Int'l Joint Conf on Neural Networks (IJCNN 2000), Como, Italy, July 24. - 27. 2000. IEEE Computer Society.]] Google ScholarDigital Library
- 8.J. Karlgren. Stylistic experiments in information retrieval. In T. Strzalkowski, editor, Natural Language Information Retrieval. Kluwer, 1999. http://www.sics.se/~jussi/Artiklar/.]]Google ScholarCross Ref
- 9.J. Karlgren, I. Bretan, J. Dewe, A. Hallberg, and N. Wolkert. Iterative information retrieval using fast clustering and usage-specific genres. In Proc Eighth DELOS Workshop on User Interfaces in Digital Libraries, pages 85-92, Stockholm, Sweden, October 1998. http://www.stacken.kth.se/~dewe/.]]Google Scholar
- 10.J. Karlgren and D. Cutting. Recognizing text genres with simple metrics using discriminant analysis. In Proc 15. Int'l Conf on Computational Linguistics (COLING '94), Kyoto, Japan, 1994. http://www.sics.se/~jussi/Artiklar/.]] Google ScholarDigital Library
- 11.B. Kessler, G. Nunberg, and H. Schutze. Automatic detection of text genre. In Proc 8. Conf Europ. Chapter of the Association for Computational Linguistics (ACL/EACL97), pages 32-38, Madrid, Spain, 1997. http://spell.psychology.wayne.edu/~bkessler/.]] Google ScholarDigital Library
- 12.T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 1982.]]Google Scholar
- 13.T. Kohonen. Self-organizing maps. Springer-Verlag, Berlin, 1995.]] Google ScholarDigital Library
- 14.T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self-organization of a massive document collection. IEEE Transactions on Neural Networks, 11(3):574-585, May 2000. http://ieeexplore.ieee.org/.]]Google ScholarDigital Library
- 15.D. Merkl and A. Rauber. Document classification with unsupervised neural networks. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval, pages 102-121. Physica Verlag, 2000. http://www.ifs.tuwien.ac.at/~andi/LoP.html.]]Google ScholarCross Ref
- 16.A. Rauber. LabelSOM: On the labeling of self-organizing maps. In Proc Int'l Joint Conf on Neural Networks (IJCNN'99), Washington, DC, July 10 - 16. 1999. http://www.ifs.tuwien.ac.at/~andi/LoP.html.]]Google Scholar
- 17.A. Rauber. SOMLib: A digital library system based on neural networks. In E. Fox and N. Rowe, editors, Proc ACM Conf on Digital Libraries (ACMDL'99), pages 240-241, Berkeley, CA,August 11 - 14. 1999. ACM. http://www.acm.org/dl.]] Google ScholarDigital Library
- 18.A. Rauber and H. Bina. Visualizing electronic document repositories: Drawing books and papers in a digital library. In Advances in Visual Database Systems: Proc IFIP TC2 Working Conf on Visual Database Systems, pages95- 114, Fukuoka, Japan, May, 10.- 12. 2000. Kluwer Academic Publishers. http://www.ifs.tuwien.ac.at/~andi/LoP.html.]] Google ScholarDigital Library
- 19.A. Rauber and D. Merkl. The SOMLib Digital Library System. InProc 3.Europ. Conf on Research and Advanced Technology for Digital Libraries (ECDL99), LNCS 1696, pages 323-342, Paris, France, September 22. - 24. 1999. Springer. http://www.ifs.tuwien.ac.at/~andi/LoP.html.]] Google ScholarDigital Library
- 20.A. Rauber, M. Dittenbach, and D. Merkl. Automatically detecting and organizing documents into topic hierarchies: A neural-network based approach to bookshelf creation and arrangement. In Proc 4. Europ. Conf on Research and Advanced Technologies for Digital Libraries (ECDL2000), LNCS 1923, pages 348-351, Lisboa, Portugal, September 18. - 20. 2000. Springer. http://www.ifs.tuwien.ac.at/~andi/LoP.html.]] Google ScholarDigital Library
- 21.K. Ries. Towards the detection and description of textual meaning indicators in spontaneous conversations. In Proc Europ. Conf on Speech Communication and Technology (EUROSPEECH99), Budapest, Hungary, September 5-9 1999.]]Google Scholar
- 22.G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989.]] Google ScholarDigital Library
Index Terms
- Integrating automatic genre analysis into digital libraries
Recommendations
Text Mining in the SOMLib Digital Library System: The Representation of Topics and Genres
With the increasing amount of textual information available in electronic form, more powerful methods for exploring, searching, and organizing the available mass of information are needed to cope with this situation. This paper presents the SOMLIb ...
Ranked Centroid Projection: A Data Visualization Approach With Self-Organizing Maps
The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and ...
Digital web library of a website with document clustering
IBERAMIA'10: Proceedings of the 12th Ibero-American conference on Advances in artificial intelligenceDigital libraries allow organizing, classifying and publishing collections of electronic contents that are available in computers or networks. Also, digital libraries are easy to use and configure and they offer a user interface with access to fast ...
Comments