Abstract

Feature-based software birthmark is an essential property of software that can be used for the detection of software theft and many other purposes like to assess the security in end-to-end communication systems. Research on feature-based software birthmark shows that using the feature-based software birthmark joint with the practice of software birthmark estimation together can deliver a right and influential method for detecting software piracy and the amount of piracy done by a software. This can also guide developers in improving security of end-to-end communication system. Modern day software industry and systems are in demand to have an unbiased method for comparing the features-based birthmark of software competently, and more concretely for the detecting software piracy and assessing the security of end-to-end communication systems. In this paper, we proposed a mathematical model, which is based on a differential system, to present feature-based software birthmark. The model presented in this paper provides an exclusive way for the features-based birthmark of software and then can be used for comparing birthmark and assessing security of end-to-end communication systems. The results of this method show that the proposed model is efficient in terms of effectiveness and correctness for the features-based software birthmark comparison and security assessment purposes.

1. Introduction

Software piracy is considered to be a foremost anxiety for the industry of software. Software piracy is done due to the large growth of Internet and software industry. Wide-ranging research [1] into the way to do piracy of software detection has encouraged the progress of techniques such as watermarking in software, fingerprints, and recently the birthmark of software. Birthmark of software is inherent characteristic or property of software to be effectively used for theft of software and detection of software piracy. Software watermark and fingerprint have been used for a long time with the realization but these techniques have some limitations. Some of the researchers and practitioners of industry are using forward-looking versions of software watermark [112], fingerprints [13, 14], software clone [15, 16], and software birthmark [1729]. Detection of plagiarism is relevant area to these mentioned software detection methods which are used for source code theft and discovery of similarities among the original and duplicated source codes [3035]. Watermark of software is used to express the proprietorship of a software. The watermarks add some supplementary code or detail information to the existing software to show the ownership. Software fingerprint is used to find the intellectual property. Cloning of software is done by copy-past of source code of copyrighted software that may be in parts or full in another version of the software. The methods of clone detection of software are used to sense the piracy in such cases. Software birthmark is considered to be the recently used technique for the software piracy detection. Birthmarks of software use the inherent characteristics or software properties to identify the originality of software. Birthmark similarities of two software programs show the extent of piracy done among the software.

The concept of birthmark of software is offered for the similar determination of theft identification of software and detection of piracy. Birthmark of software is till now recognized to be resilient to any obliteration or obfuscation technique(s). Several researches have been accomplished to recognize diverse types of birthmark of software [20, 2427, 29, 3640]. Nazir et al. [36] offered the strategy of feature-based software birthmark and a proper estimation process for birthmark of software [37]. Though birthmark of software has been extensively deliberated in research from several viewpoints of the area of software piracy and detection of theft, yet there is no objective measure to compare birthmarks of software efficiently for the detection of piracy and to assess the security of end-to-end communication systems. The aim of the proposed work is to deliver a mathematical model for the purpose of comparison of feature-based birthmark of software and to assess the security of end-to-end communication systems. The proposed model is based on differential equations system and uses the features of birthmark, presented by Nazir et al. [36] and can be assessed for the comparison purpose of features-based birthmark of another (duplicate) software and assessment purpose of the security of end-to-end communication systems. These comparisons will ultimately endorse or reject the piracy performed in software and security changes that occurred in the applications.

The organization of the paper is as follows: Section 2 of the paper presents related work done for software birthmark and detection of piracy. Section 3 gives the details of the mathematical model used for the proposed research, with logic of using mathematical model for birthmark comparisons. This section further provides explanation for the use of differential equations as system model. The results and discussion of the proposed research are discussed in Section 4. This section further discusses the case study of the method. The paper concludes in Section 5.

Software industry and productions are facing with a dreadful problem of piracy and changes of security in software. On the other hand, the pirates of software make vast sums of money from the trade performed in piracy and changes in security of software. According to the report of Business Software Alliance (BSA) [41] of year 2013, about 43% of software programs that are configured on personnel computer systems in the globe were pirated and not appropriately licensed. The marketable value of these unlicensed software programs was about 62.7 billion dollars. Taking this point further, Myles and Collberg [29] outlined the three foremost threats to industry of software. These threats include the illegal re-selling of the legitimate software, malicious reverse engineering, and software tampering. The industries of software adopt diverse practices to trace the theft of software. Among these practices, the software birthmark is one of the techniques which are used for the detection of pirated software and by the assistance of which the pirated or duplicated version of the software would be traced. The software birthmark types and history could be taken at length. Tamada et al. [42] designed the very first birthmark method which is based on four types of birthmark; these birthmarks are constant values in field variables (CVFV), inheritance structure (IS), sequence of method calls (SMC), and used classes (UC). This technique of birthmark was effectively used by the software industry for the purpose of detection theft of software. Myles and Collberg [29] suggested a method of “Whole Program Path Birthmark.” This method is based on the whole control flow of the program. The properties of resilience and credibility were used to assess the effectiveness of the method. The method further reveals that the WPPB is more resilient than the existing methods of birthmark. Zeng et al. [43] proposed a framework of semantic-based abstract interpretation for software birthmark. Mahmood et al. [44] proposed a method-based similarity level for software birthmark. By help of the proposed method, the elements of code and their properties can be found. This method traces the modification occurring in the program. Wang et al. [45] suggested the operand stack dependence-based static software birthmark for the difficulty of semantic lost when mining birthmark with the help of k-gram algorithm.

Moreover, through offering different types of birthmark, several researchers have provided some case studies for the work of their analysis and evaluation they performed. Choi et al. [23] analyzed the static API-based birthmark of software for binary executable of Windows and compared 49 executables. They described that the birthmark used by them can easily distinguish and identify the program copies. The birthmark is checked with the Windows dynamic birthmark and presented to likely suitable for the applications with Graphical User Interfaces. Kakimoto et al. [28] did analysis of the birthmark similarities in Argo UML and then visualized them using multidimensional scale. Park et al. [24] proposed a static API trace birthmark for detection of theft of Java-based programs. This technique assesses the birthmark for the properties of resilience and credibility. Results obtained from their experiment of the proposed method show that the static API birthmark can identify related components of two packages while the other techniques of birthmarks fail to do so. Xie et al. [46] suggested a static birthmark for k-gram and their weights. The weight is computed by analysis rate of change in the k-gram frequency of the actual and modified version of the program. Myles and Collberg [47] accomplished an empirical analysis of the k-gram-based software birthmark by analysis of 111 programs in the Java programming language. Several studies [20, 2427, 29, 35, 36, 42, 43, 4851] were explored for the types of birthmark, their analysis, and assessment, but the work of [48, 49] analyzed the birthmark in depth used for different purposes. From most of the studies, it is derived that in majority of the cases only the results of case study and empirical suggestions are provided to support the given studies.

The current research work is endeavouring to propose a mathematical model for the purpose of comparisons of feature-based software birthmark and to evaluate the security of end-to-end communication systems. The model is based on differential equations system and uses the features of birthmark presented in the literature.

3. Methodology

The methodology is described in the following subsections which present the proposed research methodology for the features-based birthmark of software.

3.1. Need for a Mathematical Model

Diverse methods based on mathematics are used by the researchers and practitioners for modelling the real life occurrences. A number of these techniques include exact equations, linear equations, separable variable methods, substitution solution, and numerical method. These techniques are used to solve the first-order differential equations [52].

Software industry is endeavouring to have a policy and strategic independent description for birthmarks of software, which can then be used as proper estimating and comparisons of birthmark of software. This definition and description will ease the industry of software to detect software theft and piracy with further changes in security of end-to-end communication systems. The recommended feature-based software birthmark [36] is currently mathematically modelled to enable the birthmark comparison based on the defined features. This feature-based birthmark comparison will identify the similarities among software programs for the purpose of piracy detection and changes in security of end-to-end communication systems.

In this research work, the essential model is planned in the form of homogeneous linear differential system. For the design of this type of system, generally three methods are used. These methods are repeated Eigen values, distinct real Eigen values, and Complex Eigen values. In the situation of the proposed research, the Eigen values are complex.

Mathematically, if and , where are Complex Eigen values of the matrix “A,” then the corresponding Eigen vector also contains complex values [52]. This study proposed a mathematical model for the features-based software birthmark to enable the comparisons among the birthmark based on the predefined features.

3.2. Terminologies Used for Modelling Software Piracy Detection

The following subsections briefly discuss the method and terminologies used in this research for modelling features-based birthmark of software.

3.2.1. Differential Model for Software Birthmark

The differential equations have the derivatives of one or more dependent variable(s), with respect to one or more independent variable(s) [52]. Let there be an equation with unknown variables, without any information available about its construction. Such type of an equation (function) can be represented as, for example, y′ = ϕ(x)?

3.2.2. Eigen Values and Eigen Vector

The characteristic polynomial of a square matrix “A” is defined by [53]

If is the characteristic polynomial of matrix “A”, then the roots of are the Eigen values of matrix “A.” If λ is Eigen value of “A” and x ≠ 0 satisfies (A  λI)x = 0, then x is Eigen vector corresponding to the Eigen value λ. In the context of this research, there are three main features (categories), from which a differential system is obtained. This differential system is also called linear differential system. To solve this differential system, we need the Eigen vector for the corresponding Eigen values.

3.3. Model for Comparison of Birthmark for Detection of Software Piracy and Assessment of Security in End-to-End Communication Systems

Diverse approaches have been used in literature in the area of development of healthcare mobile applications. The proposed technique for comparisons of suggested features-based software birthmark is mathematically modelled to enable and facilitate the comparisons of birthmarks and assessment of security of end-to-end communication systems based on the identified features. The features followed by the proposed study are the features that are already identified in the previous research work [36, 48, 49]. This features-based comparison advises the similarity among different modules of the software which can further investigate the changes occurring in the security of end-to-end communication systems. Here, in this study, we considered the four main features that were previously identified [36]. These features include preconditional features, input features, nonfunctional features, and functional features. These categories are further divided into subcategories of features. The preconditional features have three subfeatures categories that are program availability, runnable, and identification of components. These features are significant which can be patterned even for all kinds of programs for detecting the similarities. Figure 1 shows the detail of the feature-based birthmark of software as already defined [36].

After performing the early analysis, the rest of the three features categories are used as the base of mathematical model for the proposed study, while the category of preconditional features is excluded, as this features category can be examined for all types of software while detecting the piracy and changes in security of software. The input feature category is further divided into 17 features that are program context, program contents, internal data structure, program flow, configurable terminologies, program responses, control flow, size of program, interface description, number of statements in program, naming, functions, restriction, limitation and constraints, comprehensive documentation, global data structure, user interface, and internal quality. The nonfunctional feature category is further divided into 12 subfeatures that are automation, ease of use, friendly, scalability, applicability, interface connections, robustness, dependency, portability, scope, standard, and external quality. The functional feature category is divided into further four subfeatures that are data and control process, functional specification, behaviour, and functionality. All the categories of these features are combined and then plotted in the form of differential system mathematically aswhere x, y, and z are the three features. Then, from equation (2), we have

To find these three features x, y, and z, we need to find the solution of equation (2). For this purpose of finding the exact solution, we have to find the Eigen values and Eigen vectors of the matrix A. The proposed process has been carried out in the following steps.

Step 1. To find Eigen value,According to Section 3.2.2, by using equation (1), the characteristic polynomial of the matrix “A” is given by
det (A − λI) = 0. That is,After simplification, we haveBy using syntactic division, we haveThus, the Eigen values of the matrix “A” are 33, 9 + 6.9282i, and 9 – 6.9282i, where λ1 is real, λ2 is complex, and λ3 is complex conjugate of λ2.

Step 2. To find Eigen vector of corresponding Eigen values,
If λ = 33, then the corresponding Eigen vector is given by AX = λX.By solving this, we haveBy solving this, we haveThus, the corresponding Eigen vector for Eigen value λ = 33 is V1 = .
Similarly, the corresponding Eigen vectors for 9 + 6.9282i and 9 − 6.9282i are given by

Step 3. Thus, the solution of equation (2) is given bywhere λ = α + , B1 = real part (Eigen vector) and B2 = imaginary part (Eigen vector). Putting the values in the above equation, we getSimilarly, we havePutting the value of f = 0 in the above equations and using the initial conditions, we haveBy solving these equations, we getThus, the required solution of (2) is given bywhere x(f), y(f), and z(f) represent the required solution of the differential system (2) for the available features of software birthmark.
For the process of comparisons of birthmark of software for the detection purpose of software piracy and assessment of security of end-to-end communication systems, birthmark(s) of various occurrences of (the same) software application defined over the same features based birthmark [36] can be modelled using the given differential system. If the solutions of both of the resulting differential systems are found the same or nearly the same, then the software is copy of the original software; hence, it is proved to be pirated and changes have occurred in the security of end-to-end communication systems.

4. Results and Discussion

The following subsections briefly discuss the results and discussion section of the paper.

4.1. Experimentation with a Case Study and the Results

The proposed research work based on mathematical model for features-based software birthmark has been validated by performing a case study. The case study was intended to test an Android mobile application for features-based software birthmark. Multiple versions of the application were generated to bear and validate the process of comparison. Copies instances of the Android mobile applications were modified (in parts) to enhance and eliminate a portion of functionality. These modifications were made through third-party developers. This process was done to mimic pirated copies of the test cases application and to assess the security of end-to-end communication systems.

After getting modified copies of the mobile (Android) application, the features-based birthmark of software was individually derived from all of the copies of the application as shown in equation (2). This was performed by extracting each individual copy of the features-based application. The features along with their details were taken into consideration for checking the piracy among the applications and to assess the security of end-to-end communication systems. The features of each copy were then extracted and the birthmarks of pirated copies of the applications were then compared with the features-based software birthmark of the actual application to show that piracy and changes in security were done/or not to further show the similarities and security view among the actual and pirated copies of the application. Figure 2 shows case study performed for features extraction from the actual and pirated version of the software and their comparison process.

A case study of the equations below was taken as an example to show the validity:

And the exact equation of the above system of partial differential equation is

Equation (21) satisfies equations (18)–(20) and hence shows that the proposed model works well. If equation (21) is put in equations (18)–(20), then the left-hand side is equal to the right-hand side. Equation (21) is the exact solution of equations (18)–(20). So, it will satisfy for all values of the variables x, y, and z. It can be any real number. The threshold can be any real number for the variables x, y, and z.

The proposed features-based model accepts inputs of software for comparison of features of original and pirated software that is fully or partially pirated. This comparison can ultimately show the extent of piracy and changes done in security of end-to-end communication systems. In the current scenario of the case study, features of original software were extracted as shown in the top of Figure 2. Then features from pirated copies of software as shown in the right side of Figure 2 were extracted. A comparison of these features was done which is mathematically shown as equations (18)–(20) and their solution in equation (7). From the above description, it is clear that the proposed model works very well; hence, piracy and changes can be found up to optimal level.

Furthermore, some other examples were tried to show the validity of the proposed method. These examples follow:

We can find the numerical solution which always contains some error. The proposed mathematical model accepts features as input(s) shown in equation (2) to check the piracy and changes in security of features-based software birthmark. In the context of the current case study, features were extracted from multiple copies of the Android mobile application to show the piracy and security changes among multiple copies of the application.

5. Discussion

Industry of software development and end-to-end communication systems is using diverse approaches and methods to detect and identify the software piracy and assessment of security in end-to-end communication systems. Different techniques like watermarks, fingerprints, and digital signatures were used for showing the originality of the software, but these techniques have some limitations such as with the use of code obfuscation and semantic preserving transformation the watermarks and digital signature can be removed. Due to these limitations, the concept of software birthmark came into existence. The software birthmarks are considered to be of the utmost value and resilient to obliteration, and uniquely identify specific software. Software features are categorised into several categories. A program of software is a combination of several types of features of software. The investigation of code of a program, based on the defined features, resultantly supports the detection of similarity among more than one instance of seemingly the identical application of software. Such detection of similarity will eventually facilitate identifying and detecting the theft and piracy of software. The features-based birthmark of software provides further wide-ranging birthmark and hence representation of a software. The proposed differential-system-based mathematical model in this study using the idea of Eigen values and Eigen vector provides an exclusive solution for the features-based birthmark of software. This exclusive solution provides an unbiased measure for comparisons of features-based software birthmark that can be checked to piracy and assessment of security in end-to-end communication systems.

5.1. Threat to Validity

Software birthmark is the inherent characteristic of software used for the detection of theft in the software and can also be used for other purposes like to show the ownership of the software and detect the level of piracy in the software. So far, the existing literature was searched to analyze the existing efforts made in the area of software birthmark but maybe some work is missed due to the open access and availability of the research work. Validation of the work is also mandatory which is not mostly covered by this research and the work was validated through experts’ opinion.

6. Conclusion

The proposed research work has presented a mathematical model based on differential system for comparisons of features-based birthmark of software and assessment of security in end-to-end communication systems. These comparisons of feature-based software birthmark will eventually find piracy and changes in security performed among the end-to-end communication systems. The main objective of the proposed study is to do comparisons of the feature-based software birthmark that was addressed by Nazir et al. [36]. The birthmark of software in terms of feature-based birthmark is categorised into different types. These categories include input features, functional features, and nonfunctional features. These features-based software birthmark categories are jointly known as software birthmark. This paper contributes to present a mathematical model based on differential system for the features-based software birthmark to support the comparisons of software birthmark to be checked for piracy and security assessment of end-to-end communication systems. The solutions of the differential equation as defined by using the idea of Eigen values designed for the feature categories of the birthmarks provide an unbiased measure and an effective means to compare birthmarks of software for the purpose of detecting piracy. Therefore, this comparison of model can make the process of software piracy and theft detection smooth and assesses the security of end-to-end communication systems.

Data Availability

No primary data were collected.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported by Qatar National Library, Qatar.