doi:10.1016/j.peva.2004.10.013
Copyright © 2004 Elsevier B.V. All rights reserved.
Repeated results analysis for middleware regression benchmarking
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Lubomír Buleja, b,
,
, Tomáš Kaliberaa,
and Petr Tůmaa, 
aDistributed Systems Research Group, Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Malostranské nám. 25, 11800 Prague, Czech Republic
bInstitute of Computer Science, Czech Academy of Sciences, Pod Vodárenskou věží 2, 18207 Prague, Czech Republic
Available online 2 December 2004.
Abstract
The paper outlines the concept of regression benchmarking as a variant of regression testing focused at detecting performance regressions. Applying the regression benchmarking in the area of middleware development, the paper explains how the regression benchmarking differs from middleware benchmarking in general, and shows on real-world examples why the existing benchmarks do not give results sufficient for regression benchmarking. Considering two broad groups of benchmarks based on their complexity, novel techniques are proposed for the repeated analysis of results for the purpose of detecting performance regressions.
Keywords: Middleware benchmarking; Regression benchmarking; Regression testing
Fig. 1. The architecture of an environment for regression benchmarking.
Fig. 2. Results of consecutive runs of a benchmark measuring the time to marshal an input array of 1024 CORBA::ULong values on TAO 1.3.1.
Fig. 3. Results of consecutive runs of a benchmark measuring the time to marshal an input array of 1024 CORBA::ULong values on TAO 1.3.1 and 1.3.2.
Fig. 4. Results of consecutive runs of a benchmark measuring the time to marshal an input array of 1024 CORBA::Octet values on TAO 1.3.1–1.3.5.
Fig. 5. Results of nonparametric and parametric statistical tests for consecutive runs of a benchmark measuring the time to marshal an input array of 1024 CORBA::Octet values on TAO 1.3.1 to TAO 1.3.5.
Fig. 6. The architecture of an online bookstore for the TPC-W benchmark.
Fig. 7. Results of the benchmark measuring the time to retrieve book information on original and damaged omniORB.
Fig. 8. Medians and averages of the time to retrieve book information.
Fig. 9. Results of the benchmark measuring the time to retrieve book information on original and damaged omniORB.
Fig. 10. Contribution of three individual operations to the complete results.
Fig. 11. Clusters as identified by the k-means algorithm.

Corresponding author. Tel.: +420 221914267/266053831; fax: +420 221914323.