Skip to main content

Parallel Implementation of Part of Speech Tagging for Text Mining Using Grid Computing

  • Conference paper
Advances in Computing and Communications (ACC 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 190))

Included in the following conference series:

Abstract

There is an urgent need to develop new text mining solutions to tackle exponential growth in text data. Problem sizes are increasing day by day by due to the addition of new text documents. Grid aware text mining is one of the solutions for knowledge extraction from such large volume of text. Part of speech (POS) tagging is an important preprocessing task in text mining. But tagging algorithms working on a very large document collection take very long time on conventional computers to produce results. In this paper we present a framework for parallel implementation of part of speech tagging for text mining using grid computing. Globus Toolkit, which is a middleware for scientific and data intensive grid applications, is used for developing this framework in grid environment. Experimental results show that this model significantly reduces the part of speech tagging time for text mining. This model can be integrated into grid-based text mining tool, helping to improve the overall performance of the text mining process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lopes, M.C., Costa, M.C.A., Ebecken, N.F.F.: Text Mining. In: Rezende, S.O. (ed.) Intelligent Systems: Foundations and Applications (in Portuguese). Editora Manole Ltda (2002)

    Google Scholar 

  2. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1983)

    MATH  Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Books, New York (1999)

    Google Scholar 

  4. Kao, A., Poteet, S.R.: Natural Language Processing and Text Mining. Springer, Heidelberg (2007)

    Book  MATH  Google Scholar 

  5. Hearst, M.A.: Untangling text data mining. In: Proceedings of the 37th Annual Meeting on Computational Linguistics, pp. 3–10. Association for Computational Linguistics (1999)

    Google Scholar 

  6. Konchady, M.: Text Mining Application Programming. Charles River Media, Hingham (2006)

    Google Scholar 

  7. Kudo, S., Bies, A., Libeman, M., Mandel, M., McDonald, R., Palmar, R., Schein, A., Ungar, L.: Integrated annotation for biomedical information extraction. In: Proceedings of HLT/NAACL 2004 (2004)

    Google Scholar 

  8. Tateisi, Y., Tsujii, J.: Part-of-speech annotation of biology research abstracts. In: Proceedings of 4th International Conference on Language Resource and Evaluation (LREC 2004), pp. 1267–1270 (2004)

    Google Scholar 

  9. The Globus Toolkit, http://www.globus.org/toolkit/

  10. GT4 Data Management, http://www.globus.org/toolkit/docs/4.0/data/

  11. The WS-Resource Framework, http://www.globus.org/wsrf/

  12. Replica Location Service, http://www.globus.org/toolkit/data/rls/

  13. LIGO Scientific Collaboration Research Group: Ligo Data Replicator, http://www.lsc-group.phys.uwm.edu/LDR/

  14. Chervenak, A., Schuler, R., Kesselman, C., Koranda, S., Moe, B.: Wide area data replication for scientific collaborations. In: Proceedings of 6th IEEE/ACM International Workshop on Grid Computing, Grid 2005 (November 2005)

    Google Scholar 

  15. Metadata Catalog Service, http://www.globus.org/grid_software/data/mcs.php

  16. GT 4.0: Security: Pre-Web Services Authentication and Authorization, http://www.globus.org/toolkit/docs/4.0/security/prewsaa/

  17. Ninomiya, T., Torisawa, K., Tsujii, J.: An Agent-based Parallel HPSG Parser for Shared-memory Parallel Machines. Journal of Natural Language Processing 8, Ref number 1, 21–48 (2001) ISSN 1340761

    Article  Google Scholar 

  18. Qin, X.: Performance Comparisons of Load Balancing Algorithms for I/O-Intensive Workloads on Clusters, July 2006. Journal of Network and Computer Applications (July 2006)

    Google Scholar 

  19. Gonzalez-Velez, H.: Self-adaptive skeletal task farm for computational grids. Parallel Computing 32(7-8), 479–490 (2006)

    Article  Google Scholar 

  20. Part-of-Speech tagging, http://en.wikipedia.org/wiki/Part-of-speech_tagging

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kumar, N., Kumar, S., Kumar, P. (2011). Parallel Implementation of Part of Speech Tagging for Text Mining Using Grid Computing. In: Abraham, A., Lloret Mauri, J., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22709-7_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22709-7_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22708-0

  • Online ISBN: 978-3-642-22709-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics