Applications of clustering techniques to software partitioning, recovery and restructuring
Introduction
Non-functional quality attributes such as maintainability and reliability are essential factors in controlling software life-cycle costs. It has been widely acknowledged that maintaining existing software accounts for as much as 60–80% of a software’s total cost. Cohesion and coupling are two properties that have great impact on some critical software quality attributes, including maintainability. Therefore, management of cohesion and coupling is of critical importance for system design and cost reduction.
Cohesion refers to a component’s internal strength, that is, the strength that holds the internal elements in a component together to perform a certain functionality. A component used in this paper is generic in that it could be a high-level architecture component; a module consisting of procedures; a procedure; a class; or even a variable. While cohesion is an intra-component property, coupling measures the interdependence among components. A desirable system partitioning should achieve high cohesion and low coupling, so that all the elements in one component are closely related for the realization of a certain feature, and changes made to that component will have as little impact as possible on other components. Alexander (1964) also postulated that the major design principle which is common to all engineering disciplines is the relative isolation of one component from other components.
Software engineering is a relatively new area compared to other well-established disciplines, such as mechanical engineering and manufacturing. Software partitioning is usually conducted in an ad hoc manner and is primarily based on the designer’s experience. However, software systems may be either ill-designed, or often drift or erode over time due to changes in requirements and technology (Perry and Wolf, 1992). In other words, software evolves over time and is non-static, as a result of requirement changes. The resulting system could be highly coupled, which in turn creates problems for downstream software phases or evolution. Thus, effective partitioning or re-partitioning is needed. Effective partitioning or clustering is also a paramount goal in other disciplines. Clustering techniques have been used successfully in many areas to assist grouping of similar components and support partitioning of a system. In this research, clustering and partitioning are viewed as two sides of a coin. Partitioning is similar to a top–down approach to decomposing a system into smaller subsystems. Clustering, on the other hand, is a bottom–up method. With clustering, similar components are grouped together to form clusters or subsystems. Those clusters or subsystems are partitions which constitute a system.
In fact, partitioning or clustering analysis has been of long-standing interest and is a fundamental method used in science and engineering. The technique can facilitate better understanding of the observations and the subsequent construction of complex knowledge structures from features and component clusters. For instance, the technique has been used to classify botanical species and mechanical parts. The key concept of clustering is to group similar things together to form a set of clusters, such that intra-cluster similarity (cohesion) is high and inter-cluster (coupling) similarity is low. The objective––high cohesion and low coupling––is similar in software design.
Various clustering techniques have also been studied in software engineering. In this paper, we borrow some clustering ideas from established disciplines, and tailor them to software partitioning, recovery, and restructuring. The clustering techniques adopted in this paper are based on numerical taxonomy or agglomerative hierarchical approaches. Numerical taxonomy uses numerical methods to classify components. There are several reasons for adopting numerical taxonomy. The first is its conceptual and mathematical simplicity, as will be demonstrated in Section 2. Although its concept is simple, no scientific study has shown that numerical taxonomy is inferior to other, more complex multiversity methods (Romesburg, 1990). Another reason is that existing clustering techniques used in software engineering are often limited to only the reverse engineering process, based on source code. The approach presented in this paper can also easily be applied to various levels of abstraction and be used in round-trip engineering (e.g., both forward engineering and reverse engineering processes). Furthermore, the technique can provide more added value by facilitating software (design or code) restructuring, rather than simply design recovery. Lastly, the computation time is fast, which is an important factor if it is applied interactively or incrementally.
The objective of this paper is to examine existing numerical clustering techniques used in other well-established disciplines, tailor those techniques for various software applications, and present empirical studies of the techniques in software engineering. The approach has been applied to several projects at Nortel Networks and some of the results are presented in this paper. The rest of paper is organized as follows: Section 2 presents an overview of the clustering technique and discusses the method adopted for this research and the rationale behind it. Section 3 demonstrates several practical applications of the clustering technique to software partitioning, recovery, and restructuring. Section 4 discusses some lessons learned from applying the approach to various projects. Section 5 highlights some related work in software engineering. Finally, Section 6 presents the summary and discusses future directions.
Section snippets
Clustering
This section first describes the general concept behind the numerical taxonomy clustering technique. Following that, we will discuss the method adopted in this research.
Applications of clustering to software
This section presents applications of the Sorenson method and demonstrates different ways to define and obtain the contents of the input matrix. By defining the matrix differently, we show the various uses of the clustering method. Specifically, examples include software partitioning, recovery, restructuring, and decoupling. A generic example is given first as an illustration of the technique. Some measures have also been adopted which can be used to quantitatively evaluate clustering
Lessons learned
As discussed earlier in this paper, the approach has been applied to various software applications. Specifically, we have applied the approach to software architecture partitioning at the design stage, design recovery in the reverse engineering process, software restructuring to support evolution, and source code decoupling. The approach provides a useful method for revealing the degree of similarity or coupling among components.
When applied to partitioning, as discussed in Section 3.1, there
Related work
Applications of the clustering concept specific to the software partitioning have been studied. Andreu and Madnick (1977) applied the partitioning concept to a database management system in order to minimize coupling. The requirements and their interdependencies were first identified and were converted to a graph problem. Various alternatives for partitioning were examined and a quantitative metric was calculated for each alternative. The alternative with the lowest value of coupling was chosen
Summary and future work
This paper presented a clustering method and demonstrated how it can be applied to software partitioning, recovery, restructuring, and decoupling. The key value of this approach is that it can support a rapid and effective evaluation of a system based on the relationships between components and features, or component interdependencies at various levels of abstraction. System partitioning is usually performed by designers based on their experiences. The proposed method can help designers quickly
Acknowledgements
Much of the research effort presented in this paper has been carried out while the authors were affiliated with Nortel Networks. We are grateful to Kalai Kalaichelvan and Rama Munikoti for their support. We would also like to express our gratitude to the reviewers and editors for their time and helpful comments.
References (35)
Quantitative models of cohesion and coupling in software
J. Syst. Software
(1995)A unified framework for expressing software subsystem classification techniques
J. Syst. Software
(1997)Notes on the Synthesis of Form
(1964)Cluster Analysis for Applications
(1973)- Andreu, R.C., Madnick, S.E., 1977. A systematic approach to the design of complex systems: application to DBMS design...
- Anquetil, N., Lethbridge, T., 1998. Extracting concepts from file names: a new file clustering criterion. In:...
- Anquetil, N., Lethbridge, T., 2000. Experiments with clustering as a software remodularization method. In: Proceedings...
Software Reengineering
(1993)- et al.
Measuring design-level cohesion
IEEE Trans. Software Eng.
(1998) - et al.
Measuring functional cohesion
IEEE Trans. Software Eng.
(1994)
Measuring Software Design Quality
Cornering the Chimera
IEEE Software
Cluster Analysis
System structure analysis: clustering with data bindings
IEEE Trans. Software Eng.
Cited by (66)
Automatic clustering constraints derivation from object-oriented software using weighted complex network with graph theory analysis
2017, Journal of Systems and SoftwareA search engine for finding and reusing architecturally significant code
2017, Journal of Systems and SoftwareCitation Excerpt :In addition, some public repositories (e.g., GitHub and SourceForge) support the search of code snippets and/or software projects. Several researchers have developed techniques for identifying architectural information in the source code (Tsantalis et al., 2006; Velasco-Elizondo et al., 2016; Lung et al., 2004; Cai et al., 2013; Jansen et al., 2008; Huang et al., 2006), however, to the best of our knowledge, no one has attempt to build a search engine to help developers find and reuse such architectural information. Our work is the first attempt to develop a tactic search engine that can assist developers find and reuse tactical code snippets.
Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach
2013, Information and Software TechnologyCitation Excerpt :An example of a class diagram is shown in Fig. 2, while the features matrix is presented in Table 2. The Sorensen–Dice coefficient is used in some research on software clustering [23–25]. Using Fig. 2 and Table 2 as an example, if Simple Matching is used, the coefficient between Class 1 and Class 5 (a = 1, b = 1, c = 1, d = 2) will be 0.6 and will be 0.5 if Sorensen–Dice is used.
Incorporating Use History in Information System Remodularization
2024, IEEE Transactions on Engineering ManagementSoftware Architecture in Practice: Challenges and Opportunities
2023, ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering