ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Journal of Parallel and Distributed Computing
Volume 68, Issue 1, January 2008, Pages 37-53
Parallel Techniques for Information Extraction
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Purchase PDF (439 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.jpdc.2007.06.007    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2007 Elsevier Inc. All rights reserved.

Middleware for data mining applications on clusters and gridsstar, open

Leonid Glimchera, E-mail The Corresponding Author, Ruoming Jinb, E-mail The Corresponding Author and Gagan Agrawala, Corresponding Author Contact Information, E-mail The Corresponding Author

aDepartment of Computer Science and Engineering, Ohio State University, 2015 Neil Avenue, Columbus, OH 43210, USA bDepartment of Computer Science, Kent State University, Kent, OH 44242, USA

Received 24 August 2006; 
revised 9 June 2007; 
accepted 9 June 2007. 
Available online 10 July 2007.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

This paper gives an overview of two middleware systems that have been developed over the last 6 years to address the challenges involved in developing parallel and distributed implementations of data mining algorithms. FREERIDE (FRamework for Rapid Implementation of Data mining Engines) focuses on data mining in a cluster environment. FREERIDE is based on the observation that parallel versions of several well-known data mining techniques share a relatively similar structure, and can be parallelized by dividing the data instances (or records or transactions) among the nodes. The computation on each node involves reading the data instances in an arbitrary order, processing each data instance, and performing a local reduction. The reduction involves only commutative and associative operations, which means the result is independent of the order in which the data instances are processed. After the local reduction on each node, a global reduction is performed. This similarity in the structure can be exploited by the middleware system to execute the data mining tasks efficiently in parallel, starting from a relatively high-level specification of the technique.

To enable processing of data sets stored in remote data repositories, we have extended FREERIDE middleware into FREERIDE-G (FRamework for Rapid Implementation of Data mining Engines in Grid). FREERIDE-G supports a high-level interface for developing data mining and scientific data processing applications that involve data stored in remote repositories. The added functionality in FREERIDE-G aims at abstracting the details of remote data retrieval, movements, and caching from application developers.

Keywords: Data mining; Clusters; Grids; Middleware


Journal of Parallel and Distributed Computing
Volume 68, Issue 1, January 2008, Pages 37-53
Parallel Techniques for Information Extraction
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.