A Proposal: High-Throughput Robust Architecture for Log Analysis and Data Stream Mining

Hussain, Adnan Rashid; Hameed, Mohd Abdul; Fatima, Sana

doi:10.1007/978-981-10-0419-3_36

Adnan Rashid Hussain⁵,
Mohd Abdul Hameed⁶ &
Sana Fatima⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 413))

915 Accesses
1 Citations

Abstract

Various data mining approaches are now available, which help in handling large static data sets, in spite of limited computational resources. However, these approaches lack in mining high-speed endless streams, as their learning procedure though simple require the entire training process to be repeated for each new arriving information instance. The main challenges while dealing with continuous data streams: they are of sizes many times greater than the available memory, are real-time, and the new instances should be inspected at most once, and predictions must be made. Another issue with continuous real-time data is changing of concepts with time, which is often called concept drift. This paper addresses the above stated problems, and provides a solution by proposing a real-time, scalable, and robust architecture. It is a general-purpose architecture, based on online machine learning, which efficiently logs and mines the stream data in a fault-tolerant manner. It consists of two frameworks: (1) Event aggregation framework, which reliably collects events and messages from multiple sources and ships them to a destination for processing (2) Real-time computation framework, which processes streams online for extraction of information patterns. It guarantees reliable processing of billions of messages per second. Furthermore, it facilitates the evaluation of the stream learning algorithms and offers change detection strategies to detect concept drifts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Golab and Ozsu M. T.: Issues in Data Stream Management. In SIGMOD Record, Volume 32, Number 2, June (2003) 5–14.
Google Scholar
Garofalakis M., Gehrke J., Rastogi R.: Querying and mining data streams: you only get one look a tutorial. SIGMOD Conference 2002: 35. (2002).
Google Scholar
Babcock B., Babu S., Datar M., Motwani R., and Widom J.:Models and issues in data stream systems. In Proceedings of PODS (2002).
Google Scholar
Muthukrishnan S.: Data streams: algorithms and applications. Proceedings of the fourteenth annual ACMSIAM symposium on discrete algorithms (2003).
Google Scholar
http://developer.yahoo.com/blogs/hadoop/posts/2010/06/enabling_hadoop_batch_processi_1/.
https://issues.apache.org/jira/browse/ZOOKEEPER-775.
Kafka, http://sna-projects.com/kafka/.
Cloudera’s Flume, https://github.com/cloudera/flume.
http://www.ibm.com/developerworks/library/os-spark/.
http://incubator.apache.org/s4/.
http://cloud.berkeley.edu/data/storm-berkeley.pdf.
Mohamed Medhat Gaber, Arkady Zaslavsky and Shonali Krishnaswamy. “Mining Data Streams: A Review”, VIC3145, Australia, ACM SIGMOD Record Vol. 34, No. 2; June 2005.
Google Scholar
http://activemq.apache.org.
Albert Bifet and Richard Kirkby. Massive Online Analysis, August 2009.
Google Scholar
Alexey Tsymbal. (2004) The Problem of Concept Drift: Definitions and Related Work.
Google Scholar
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavalda, R. (2009). New ensemble methods for evolving data streams. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Google Scholar
Bifet, A. (2010). Adaptive Stream Mining: Pattern Learning and Mining from Evolving DataStreams, IOS Press.
Google Scholar
Bifet, A. and Gavalda, R. (2007). Learning from Time-Changing Data with Adaptive Windowing, in SIAM Int. Conf. on Data Mining (SDM’07).
Google Scholar
http://www.facebook.com/note.php?note_id=32008268919.

Download references

Author information

Authors and Affiliations

Research & Development, Host Analytics Sofwtare Pvt. Ltd., Hyderabad, 500 081, AP, India
Adnan Rashid Hussain
Department of Computer Science, University College of Engineering, Osmania University, Hyderabad, India
Mohd Abdul Hameed
Department of Computer Science, M.J College of Engineering and Technology, Hyderabad, India
Sana Fatima

Authors

Adnan Rashid Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Abdul Hameed
View author publications
You can also search for this author in PubMed Google Scholar
Sana Fatima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adnan Rashid Hussain .

Editor information

Editors and Affiliations

Guru Nanak Institutions, Professor & Managing Director, Ibrahimpatnam, Andhra Pradesh, India
H. S. Saini
Guru Nanak Institutions, Professor & Associate Director, Ibrahimpatnam, Andhra Pradesh, India
Rishi Sayal
Guru Nanak Institutions, Professor and Head – CSE and IT, Ibrahimpatnam, India
Sandeep Singh Rawat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hussain, A.R., Hameed, M.A., Fatima, S. (2016). A Proposal: High-Throughput Robust Architecture for Log Analysis and Data Stream Mining. In: Saini, H., Sayal, R., Rawat, S. (eds) Innovations in Computer Science and Engineering. Advances in Intelligent Systems and Computing, vol 413. Springer, Singapore. https://doi.org/10.1007/978-981-10-0419-3_36

Download citation

DOI: https://doi.org/10.1007/978-981-10-0419-3_36
Published: 20 February 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0417-9
Online ISBN: 978-981-10-0419-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics