Skip to main content
Log in

Discovering likely invariants of distributed transaction systems for autonomic system management

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Large amount of monitoring data can be collected from distributed systems as the observables to analyze system behaviors. However, without reasonable models to characterize systems, we can hardly interpret such monitoring data effectively for system management. In this paper, a new concept named flow intensity is introduced to measure the intensity with which internal monitoring data reacts to the volume of user requests in distributed transaction systems. We propose a novel approach to automatically model and search relationships between the flow intensities measured at various points across the system. If the modeled relationships hold all the time, they are regarded as invariants of the underlying system. Experimental results from a real system demonstrate that such invariants widely exist in distributed transaction systems. Further we discuss how such invariants can be used to characterize complex systems and support autonomic system management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. M.K. Aguilera, J.C. Mogul, J.L. Wiener, P. Reynolds, and A. Muthitacharoen, Performance debugging for distributed systems of black boxes, in: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (2003) pp. 74–89.

  2. http://phx.corporate-ir.net/phoenix.zhtml? c=97664&p=iro-news Article&ID$=$798960&highlight=

  3. W. Brogan, Modern Control Theory, 3rd edn (Prentice Hall, 1990).

  4. M. Chen, A. Accardi, E. Kiciman, J. Lloyd, D. Patterson, A. Fox, and E. Brewer, Path-based failure and evolution management, in: 1st USENIX Symposium on Networked Systems Design and Implementation (NSDI ’04), San Francisco, CA (March, 2004), pp. 309–322.

  5. http://www.nttdocomo.com/files/presscenter/34_No14_Doc.pdf/

  6. I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox, Capturing, indexing, clustering, and retrieving system history, SIGOPS Oper. Syst. Rev. 39(5) (2005) 105–118.

    Article  Google Scholar 

  7. M. Ernst, J. Cockrell, W. Griswold, and D. Notkin, Dynamically discovering likely program invariants to support program evolution. IEEE Trans. on Software Engineering 27(2) (2001) 99–123.

    Article  Google Scholar 

  8. J. Gertler, Fault Detection and Diagnosis in Engineering Systems (Marcel Dekker, New York, 1998).

    Google Scholar 

  9. S. Hangal and M. Lam, Tracking down software bugs using automatic anomaly detection, in: Proceedings of the 24th International Conference on Software Engineering, (2002) pp. 291–301.

  10. R. Isermann and P. Balle, Trends in the application of model-based fault detection and diagnosis of industrial process, Control Engineering Practice 5(5) (1997) 709–719.

    Article  Google Scholar 

  11. G. Jiang, H. Chen, and K. Yoshihira, Modeling and tracking of transaction flow dynamics for fault detection in complex systems, to appear in IEEE Trans. on Dependable and Secure Computing.

  12. http://java.sun.com/products/JavaManagement/

  13. L. Ljung, System Identification—Theory for The User, 2nd edn (Prentice Hall PTR, 1998).

  14. J. O’Madadhain, D. Fisher, S. White, and Y. Boey, The jung (java universal network/graph) framework, Technical Report UCI-ICS 03-17, UC Irvine Information and Computer Science (2003). Available at jung.sourceforge.net

  15. D. Oppenheimer, A. Ganapathi, and D. Patterson, Why do internet services fail, and what can be done about it, in: 4th Usenix Symposium on Internet Technologies and Systems (USITS03) (2003) pp. 1–16.

  16. D. Patterson, A simple way to estimate the cost of downtime, in: Proceedings of LISA-2002: Sixteenth System Administration Conference (2002) pp. 185–188.

  17. D. Patterson, A. Brown et al., Recovery-oriented computing (ROC): Motivation, definition, techniques, and case studies, Technical Report UCB//CSD-02-1175, UC Berkeley Computer Science, Available at roc.cs.berkley.edu (2002).

  18. http://java.sun.com/developer/releases/petstore/

  19. http://news.bbc.co.uk/2/hi/business/4395258.stm

  20. A. Yemini and S. Kliger, High speed and robust event correlation, IEEE Communication Magazine, 34(5) (1996) 82–90.

    Article  Google Scholar 

  21. G. Zhen, G. Jiang, H. Chen, and K. Yoshihira, Tracking probabilistic correlation of monitoring data for fault detection in complex systems, in: The International Conference on Dependable Systems and Networks (DSN2006), Philadelphia, PA (June 2006).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guofei Jiang.

Additional information

Guofei Jiang received the B.S. and Ph.D. degrees in electrical and computer engineering from Beijing Institute of Technology, China, in 1993 and 1998, respectively. During 1998–2000, he was a postdoctoral fellow in computer engineering at Dartmouth College, NH. He is currently a research staff member with the Robust and Secure Systems Group in NEC Laboratories America at Princeton, NJ. During 2000–2004, he was a research scientist in the Institute for Security Technology Studies at Dartmouth College. His current research focus is on distributed system, dependable and secure computing, system and information theory. He has published over 50 technical papers in these areas. He is an associate editor of IEEE Security and Privacy magazine and has served in the program committees of many conferences.

Haifeng Chen received the BEng and MEng degrees, both in automation, from Southeast University, China, in 1994 and 1997 respectively, and the PhD degree in computer engineering from Rutgers University, New Jersey, in 2004. He has worked as a researcher in the Chinese national research institute of power automation. He is currently a research staff member at NEC laboratory America, Princeton, NJ. His research interests include data mining, autonomic computing, pattern recognition and robust statistics.

Kenji Yoshihira received the B.E. in EE at University of Tokyo in 1996 and designed processor chips for enterprise computer at Hitachi Ltd. for five years. He employed himself in CTO at Investoria Inc. in Japan to develop an Internet service system for financial information distribution through 2002 and received the M.S. in CS at New York University in 2004. He is currently a research staff member with the Robust and Secure Systems Group in NEC Laboratories America, inc. in NJ. His current research focus is on distributed system and autonomic computing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, G., Chen, H. & Yoshihira, K. Discovering likely invariants of distributed transaction systems for autonomic system management. Cluster Comput 9, 385–399 (2006). https://doi.org/10.1007/s10586-006-0008-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-006-0008-1

Keywords

Navigation