1 Introduction

The model-based testing (MBT) process consists in reuse the specification for testing purposes. It is one of the main applications of formal methods and it offers several advantages over classical testing procedures. Test cases are derived from models and subsequently used to test the code. In the classical MBT approach, the model is abstract, still it should contain enough details in order to test all the desired aspects of the SUT (system under test). The designer should spend a good amount of time to validate the model before it can be used for test generation and conformance testing [5, 11, 17, 19]. In case a conformance fault is found, the system (or sometimes the model) should be modified. If no error is found, the designer has the confidence that the SUT conforms to its specification. MBT does not suffer from the weaknesses of code testing based on coverage criteria, like inability to detect missing logic [23]. On the other hand, this classical MBT approach has several drawbacks we try to address in this paper: (a) Before starting the testing, a considerable effort should be spent in order to have a correct and complete model. So testing can start only later in the SUT life cycle. (b) Focusing only on the specification level may leave some critical implementation parts uncovered; for instance, if the specification misses some critical cases which instead are considered in the code, with MBT they will not be tested. (c) In case no fault is found, it may not be clear if the testing activity has been sufficient or not. In general, if one still has some resources to spend on testing, there is no guidance in which directions these resources should be spent.

In this paper we propose an iterative approach which is based on the use of Abstract State Machines (ASM) and combines conformance testing [15, 25, 28, 30] with the refinement methodology [8] guided by code coverage. Initially the designer models the system at a high level with a first ASM. This model must be validated in a classical way (by simulation and property verification, for example). Starting from this ASM model, tests are generated and executed on the real system. A coverage report is provided with information about which parts of code are not covered by the model. Based on this information, the developer refines the initial ASM model by adding details about the not covered parts of the real system code. The process is iteratively executed until good coverage is reached. This process tries to mix a black box approach where tests are generated from the specifications and a white box approach where code is instrumented and coverage information collected in order to understand where the models must be refined. We emphasize that models are not modified arbitrarily, but they must be refined as defined by the ASM refinement [8].

The approach we propose in this paper makes use of Abstract State Machines, but it can be applied to any formal method that supports refinement and test-case generation.

The paper is structured as follows. In Sect. 2 we introduce the Abstract State Machines, its supporting tool , the refinement of ASMs and the IEEE 11073-20601 protocol used in our case study. Our approach of combining testing and model refinement is explained in Sect. 3 and its application to the case study is presented in Sect. 4. The evaluation of the results and a comparison with other techniques (mainly combinatorial testing) are presented in Sect. 5.

2 Background

This work is based on the use of Abstract State Machines (ASMs) [14], which are an extension of Finite State Machines (FSMs) in which unstructured control states are replaced by states with arbitrarily complex data. They are presented in this section along with the case study of the IEEE 11073-20601 protocol [1], which is a core component in the standards family of IEEE 11073 Personal Health Data (PHD).

2.1 ASM and the Asmeta Framework

ASM states are mathematical structures, i.e., domains of objects with functions and predicates defined on them, and the transition from one state \(s_i\) to another state \(s_{i + 1}\) is obtained by firing transition rules (see Fig. 1). Functions are classified as static (never change during any run of the machine) or dynamic (may change as a consequence of agent actions or updates). Dynamic functions are distinguished between monitored (only read by the machine and modified by the environment) and controlled (read in the current state and updated by the machine in the next state).

Fig. 1.
figure 1

An ASM run with a sequences of states and state-transitions (steps)

Fig. 2.
figure 2

The ASM development process powered by the Asmeta framework

The ASM method can facilitate the entire life cycle of software development, i.e., from modeling to code generation. Figure 2 shows the development process based on ASMs supported by the Asmeta (ASM mETAmodeling) frameworkFootnote 1 [9] which provides a set of tools to help the developer in various activities:

  • modeling: the system is modeled using the language AsmetaL. The user is supported by the editor AsmEE and by AsmetaVis, the ASMs visualizer which transforms the textual model into a graphical representation.

  • validation: the process is supported by the model simulator AsmetaS, the scenarios executor AsmetaV, and the model reviewer AsmetaMA. The simulator AsmetaS allows to perform two types of simulation: interactive simulation (the user inserts the value of monitored functions) and random simulation (the tool randomly chooses the value of monitored functions among those available). AsmetaS executes scenarios written using the Avalla language. Each scenario contains the expected system behavior and the tool checks whether the machine runs correctly. The model reviewer AsmetaMA performs static analysis. It determines whether a model has sufficient quality attributes (e.g., minimality - the specification does not contain elements defined or declared in the model but never used, completeness - requires that every behavior of the system is explicitly modeled, and consistency - guarantees that locations are never simultaneously updated to different values).

  • verification: the properties derived from the requirements document are verified to check whether the behavior of the model complies with the intended behavior. The AsmetaSMV tool supports this process.

  • testing: the tool ATGT generates abstract unit tests starting from the ASM specification by exploiting the counterexample generation of a model checker (NuSMV).

  • code generation: given the final ASM specification, the automatically translates it into C++ code [12, 32]. Moreover, the abstract tests, generated by the ATGT tool, are translated to unit tests [13].

2.2 ASM Refinement

The modeling process of an ASM is based on model refinement. The designer starts with a high-level description of the system and he/she proceeds through a sequence of more detailed models each introducing, step-by-step, design decisions and implementation details. In ASM, stuttering refinement is introduced in [8]. It consists in adding state functions and rules in a way that one step in the ASM at higher level can be performed by several steps in the refined model. The refinement is correct if any behavior (i.e., run or sequence of states) in the refined model can be mapped to a run in the abstract model. In this way, the refined ASM preserves the behaviors of the abstract machine. At the end, the designer builds a chain of refined models \(\mathsf {ASM_0}, \dots ,\mathsf {ASM_{n}}\) and the AsmRefProver tool checks whether \(\mathsf {ASM_{i}}\) is a correct refinement of \(\mathsf {ASM_{i-1}}\). We note that an important question in this process is when to stop the refinement. In other words, how many details would we consider adequate in the final refined model, i.e., \(\mathsf {ASM_{n}}\)? This question is one of the motivations behind the work presented in this paper.

2.3 IEEE 11073 PHD Communication Model

IEEE 11073-20601 defines a communication model that allows personal healthcare devices to exchange data with devices with more computing resources like mobile phones, set-top boxes, and personal computers. The measured health data exchanged between these devices can be transmitted to healthcare professionals for remote health monitoring or health advising.

IEEE 11073 PHD defines an efficient data exchange protocol as well as the necessary data models for communication between two types of devices, i.e., the agent and the manager. Agents are personal healthcare devices that are used to obtain measured health data from the user. They are normally portable, energy-efficient and have limited computing capacity. Examples of agent devices include blood pressure monitors, weighing scales and blood glucose monitors. Managers are computing devices that are used to manage and process the data collected by agents. Managers typically have more computing resources than agents. Examples of managers include mobile phones, set-top boxes, and personal computers.

The messages, called APDUs, at low level are encoded in ASN.1 format, and should support at least the MDER (Medical Device Encoding Rules) standard. The communication must have one primary, reliable virtual channel, plus some secondary virtual channels.

The message types are divided into the following categories:

  • messages related to the association procedure: aare (Association Request), aarq (Association Response), rlre (Association Release Response), rlrq (Association Release Request), abrt (Association Abort);

  • messages related to the confirmed service mechanism: roiv-* (Remote Operation Invoke messages): roiv-cmip-confirmed-action, roiv-cmip-confirmed-event-report, roiv-cmip-confirmed-set; and rors-* (Reception of Response messages): rors-cmip-confirmed-action, rors-cmip-confirmed-event-report, rors-cmip-get;

  • messages related to fault or abnormal conditions: roer (Reception of Error Result), rorj (Reception of Reject Result);

  • messages related to the unconfirmed service mechanism: roiv-cmip-action, roiv-cmip-event-report, roiv-cmip-set.

Fig. 3.
figure 3

State machine of the IEEE 11073 PHD Manager

Fig. 4.
figure 4

An example sequence of data exchange

IEEE 11073 State Machine Diagram. There are seven states in the manager state machine defined by the IEEE 11073 specification, as shown in the specification diagram in Fig. 3. We use an example scenario to illustrate how the agent and manager exchange data. In Fig. 4, a weighting scale (our agent device) sends an association request to the manager, containing device configuration information. If the manager recognizes such information, it sends a response of association acceptance, and both devices enter the Operating state. Then the agent sends a measured data to the manager with a Confirmed Event Report APDU, and the manager responds with the acknowledgment. Finally, the agent requests to release the association; the manager responds to this request, and both devices now enter the Unassociated state.

Fig. 5.
figure 5

An overview of the applied framework

3 Conformance Testing with Model Refinements

The proposal of this paper is to combine model refinement with testing in order to perform more efficient conformance testing of a real system. The process we propose is depicted in Fig. 5 and explained in the following.

We assume that at the beginning the user specifies the core functionalities of the system by means of an initial ASM, \(\mathsf {ASM_0}\) in the picture. \(\mathsf {ASM_0}\) captures the most critical behaviors but it leaves some details and behaviors out of the specification. \(\mathsf {ASM_0}\) is validated by means of the techniques like those introduced in Sect. 2.1. Even if it is simple, \(\mathsf {ASM_0}\) must be suitable for test generation and test execution, i.e. it is possible to derive some tests and execute them on the real system. During the testing activity, conformance of the system is checked and information about the coverage of the code is collected. Such coverage information is then used to guide the refinement of \(\mathsf {ASM_0}\) in order to obtain a more detailed version \(\mathsf {ASM_1}\). For instance, if some code statements and branches are not covered the first time, the user has to insert such functionalities in the new version of the abstract state machine. Some V&V activities are then performed over the new specification. Then the process of testing starts over again: tests are derived, executed and then the coverage information collected and used to drive the next refinement step. Such methodology addresses the issues presented in the introduction in several directions:

  1. 1.

    Conformance testing activity can start immediately after a simple first ASM is developed. It is not required to have a complete specification and the most critical behaviors can be tested from the beginning. V&V results on the previous step are not lost during refinement since it preserves the original behaviors (according to the definition given in Sect. 2.2)

  2. 2.

    By analyzing the code coverage, the tester can identify if the specification misses some important areas of functionality that are correctly implemented in the code.

  3. 3.

    Even when no fault has been found, code coverage can give a measure of how much the implementation has been tested and which functionalities and details should be added to the specification.

  4. 4.

    This methodology enables an interleaving approach to perform model verification and testing. Thus, it allows closer interaction between the two activities. In particular, the alternating views of model and implementation could help discover problems that would otherwise not be discovered.

In the following we better explain each step of the process.

Test Generation from ASMs. Starting from an ASM, test sequences can be generated via different approaches present in the literature. We consider test generation based on the following coverage criteria, defined in [20]:

  • Basic Rule Coverage. A test suite satisfies the basic rule coverage if for every rule \(r_i\) there exists at least one test sequence for which \(r_i\) fires at least once, and there exists at least one test sequence for which \(r_i\) does not fire at least once.

  • All-Rule Coverage. A test suite satisfies the all-rule coverage if it satisfies the basic rule coverage plus the Rule Guard coverage and the MCDC coverage described in [20].

According to these criteria, we generate the tests using the tool ATGT, which builds abstract tests starting from the ASM specification by exploiting the counterexample generation of the NuSMV model checker.

Test Execution and Coverage Information. Once abstract tests are generated, they must be executed over the real implementation and coverage information can be collected. To obtain concrete tests cases from abstract ones, there are several methodologies [27]. In our case we use the external tool ProTest [34] which will be presented later.

Model Refinement Guided by the Coverage Information. During the testing activity, coverage information is collected. This requires access to the implementation which must be instrumented somehow to produce some event logs or behavior traces. Our approach is thus not a classical black box testing approach, but rather a gray-box approach. The scope of this activity is to discover which parts or features of the system are not exercised by the tests derived from the abstract model. This information gives a hint to what is missing in the model (i.e., the ASM) and suggests the user what to add. New behaviors are added to the ASM regardless how they are implemented in the code. This must be done by preserving the behavior tested so far, and it is performed by applying the refinement approach explained in Sect. 2.2.

4 Application to the PHD Communication Module

In this section we present how the proposed methodology can be applied to test the conformance of an implementation of the IEEE 11073 PHD communication protocol, to its specification. We present how the tests were executed, which steps of refinements were applied, and which coverage was achieved.

4.1 Test Execution and Coverage Information

The abstract tests generated from ASMs are sequences of abstract states that must be translated into concrete tests that can be executed with the system under test. For this goal, we use ProTest [34] that includes a test agent, that interacts with the manager implementation. Each abstract state contains all the necessary information about the transition to be triggered in that state; ProTest builds the APDU message, sends it to the manager implementation, and checks the conformance of the response from the manager.

At each refinement step we added new messages and ProTest took care of the details of the concretization. In addition, the tool can be customized, as it has a configuration file that allows to specify, for each message type, some sub-types by defining the values for the fields in the messages to send. For further customization out of the scope of the PHD protocol, however, it may be necessary to implement the code to automate the concretization function, in our case by extending ProTest code. Using the refinement methodology proposed in this paper, however, it is possible to start testing with just a few implemented concretization functions, and implement the additional ones only as needed, by the model refinement.

We use Antidote 2.0.0Footnote 2 as implementation of the manager of the PHD protocol. Antidote source code is written in C, and composed by the following source folders: api, asn1, communication, dim, resources, specializations, trans, and util. We measure the coverage on the communication source folder only, as it is the one containing the code to handle the different messages described by the protocol, and it is the most critical part of the library. The other folders contain mainly utility functions for handling the data types, and for the encoding and decoding of the messages. To compute the code coverage we have instrumented Antidote with GCOVFootnote 3 and LCOVFootnote 4, open source tools for coverage measurement: the former is a tool that computes the code coverage, while the latter is only a graphical front-end for the visualization of GCOV results. This way we can obtain coverage reports in an automated way. The code for test generation and the ASM modelsFootnote 5 we produced are available open source as part of the ASMETA tool set.

Results are reported in Table 1. For each refinement of the ASM model, and for each applied test generation technique, the table reports the number of sequences composing the generated sequence set, the minimum, the maximum, and the average number of steps per sequence, and the total number of steps composing the generated set of sequences. An execution step corresponds to an execution of the main rule of the ASM model of the system. The test execution time is proportional to the total length (i.e. steps) of the exercised test sequences. Given the same coverage, a test set with fewer total steps is to be preferred in terms of execution time. We ran the process generating the tests with only the basic rule coverage criteria, and with the criteria presented in Sect. 3 altogether. For reference, we also report the coverage achieved with the Finite State Machine integrated in the ProTest tool [34], using the FSM-based test generation criteria edge coverage, and 2-way coverage.

Table 1. Results of the application of the test generation strategies to different model refinement versions

4.2 First ASM: Ground Model

We specify in ASMETA the first model of the manager, Ground model \(\mathsf {ASM_0}\). This model has only three states: Disassociating, Unassociated, and Operating. Figure 6 reports a fragment of \(\mathsf {ASM_0}\). The signature of \(\mathsf {ASM_0}\) contains three functions: status, transition, and message. The transition represents the type of request to be sent to the manager, and it is defined as a monitored function, as its value can be driven externally, e.g., by the agent. The status represents the current state of the manager, and the message represents the response from the manager. These two functions are modeled as controlled functions (defined in Sect. 2.1). In terms of Finite State Machines, the status, transition, and message of the ASM represent respectively the status, input, and output of the FSM.

Then, in the definitions section, we define the rules; the main rule executes all the rules in parallel at each step. Each rule, based on the current state and the transition, sets the expected next state and the response message. Finally, we need to specify an initial status, defined in the default init s0 section; the machine starts in Unassociated state.

Fig. 6.
figure 6

ASMETA specification of the ASM model V0, specifying transitions in the state diagram

Verification and Validation. The ASM representation allows us to formally verify some properties. Despite the machine was simple in this version, we have specified and verified the following temporal properties:

  • the system can reach the operating state starting from UNASSOCIATED: AG((status=UNASSOCIATED) implies EF(status=OPERATING))

  • if state is UNASSOCIATED and receive a known configuration, then the status in the next state is OPERATING: AG((status=UNASSOCIATED and transition=RX_AARQ_ACCEPTABLE_AND_KNOWN_CONFIGURATION) implies AX(status=OPERATING))

  • if state is OPERATING than the system can remain in OPERATING status or not: AG((status=OPERATING) implies EF(status=OPERATING or status!=OPERATING))

The proposerites above were extracted from the official PHD documentation. We verified these properties to gain confidence of the correctness of the specification.

Testing. With the application of all the test generation rules presented in Sect. 3, we have generated 30 test sequences, with a total of 93 steps. This achieved a statement coverage of the communication folder of just 50.3%. Function coverage and branch coverage are also really low.

4.3 First Refinement: PHD Configuration Management

The coverage of the model \(\mathsf {ASM_0}\) was not satisfactory, and in particular the code that manages configurations was not covered since the configuration management was completely missing in the model. In this refinement (\(\mathsf {ASM_1}\)) we therefore added the states for exchanging the configuration: Checking Config, and Waiting for Config, with their related transitions, messages, and rules. Figure 7 shows a compact graphical representation of \(\mathsf {ASM_1}\).

Testing. Test generation produced 64 sequences, with a total of 216 steps. The code coverage of the communication package increased to 77.2%, mainly due to more functions and statements covered in the configuration management part.

Fig. 7.
figure 7

A graphical view of \(\mathsf {ASM_1}\) of the IEEE PHD manager

4.4 Second Refinement: Error Management

From coverage analysis, we noticed that all the rors APDU messages, related to error management, were missing, and some functions, such as communication_process_rors(ctx, apdu) in communication/operating.c, were never exercised. Therefore we designed a new refined model (\(\mathsf {ASM_2}\)) in which we included the rors message with its subtypes (rors-*). These messages trigger a relevant part of the protocol between the states Disassociating and Unassociated, and within the states Operating, Checking Config, and Waiting for Config. Furthermore, we marked the following two particular sequences of transitions in the model, since from the coverage report we noticed that these behaviors were not captured by the model:

  1. 1.

    the behavior of rx_roiv_confirmed_event_report that brings from the state Waiting for Config to Checking Config has to be handled differently depending on whether the state Waiting for Config was entered with a transition from the state Unassociated or from the state Checking Config. In the former case, no configuration similar to the one transmitted by the agent is present in the manager pool of configurations and the function ext_configurations_get_configuration_attributes is called; in the latter case, a configuration was transmitted previously, and thus the configuration is already in memory of the Antidote manager.

  2. 2.

    the behavior of rx_roiv_confirmed_event_report, that causes a loop in the Checking Config state, is different if executed right after another same message that brought the manager from the state Waiting for Config into the Checking Config state. The function configuring_new_measurement_response_tx, that adds a new measurement from the agent, is executed when this particular sequence occurs.

Testing. Test generation using all the rule-based criteria stated in Sect. 3, has produced 77 sequences, with a total of 266 steps. The statement coverage of the communication package was 78.8%, and function coverage 75.2%, both with an increase of about 3% with respect to the previous refinement of the model.

4.5 Third Refinement: Protocol and Configuration Management

The coverage reached by the previous refinement was quite good, but from coverage analysis we noticed that two important aspects of the connection procedure were not considered. In the first phase, an agent can try to establish a connection with a wrong protocol-id or with an unknown configuration, marked as a specific protocol-id value (0xFFFF) and recognized by Antidote as an external specification. Thus, we added two new variants of the rx_aarq transition in the \(\mathsf {ASM_3}\), respectively with an invalid protocol-id and an external protocol-id.

Testing. Test generation using the all-rule criteria stated in Sect. 3, produced 80 sequences, with a total of 272 steps. The statement coverage of the communication package was 79.4%, and the function coverage 75.3%, with an increase of 0.6 % both in statement coverage and in branch coverage, with respect to the previous model in the refinement chain, \(\mathsf {ASM_2}\).

5 Process Evaluation

In this section, we evaluate the proposed approach and we compare it with other approaches. In particular, we are interested in answering the following three research questions:

  • RQ1 Is refinement a viable option in MBT and does it really improve the efficiency of conformance testing in terms of code coverage?

  • RQ2 Do ASM-based coverage criteria for test generation achieve different results in terms of code coverage?

  • RQ3 Is our method suitable for discovering faults in the implementation?

5.1 RQ1: How Does Refinement Influence Coverage?

We have observed that refinements always increase code coverage, regardless of the criteria used. For \(\mathsf {ASM_0}\), each criteria achieves around 50% in statement coverage. For \(\mathsf {ASM_1}\), the coverage is increased to 72%. The highest coverage is obtained by \(\mathsf {ASM_3}\), with more than 79% of the statements covered by the test sequences. As expected, the number of generated sequences and total steps increase with the refinements: the sequences vary from a minimum of 16 to a maximum of 80, and the total number of steps from 79 in \(\mathsf {ASM_0}\) with the basic-rule coverage to 272 in \(\mathsf {ASM_3}\) with the all-rule coverage. A full code coverage is never reached. However, we were able to increase the statement coverage from 50% to around 80%.

By refinements, the average and the maximum length of test sequences increase. In this case, from Table 1 we can see that the maximum sequence length is 6. It is a relatively high length as these ASM models are not so large, and for larger models the length of the generated test sequences could be higher.

Analyzing the statements that are not covered, we have noticed that they are mainly related to procedures of the agent (that was not object of testing), dead code, or negative use cases (exceptions), often regarding internal configurations of the manager. We believe that a further increase in code coverage could not be achieved by adding new messages, but by including in the model different configurations of the manager at startup (in particular to enable some remote messages that come from the manager and actively ask the associated agent(s) for new data). Full coverage is unachievable due to the presence of some dead code (such as functions declared with an empty body, and never used), but we believe that it is possible to achieve almost full coverage in the communication package by exercising Antidote also to act as an agent, thus completing the transition tables of the specification. Nonetheless, testing the agent was beyond the scope of this work.

5.2 RQ2: Comparing Between Coverage Criteria

We have noticed that, regardless of the refinement, the all rule-coverage criteria always achieves a higher statement and branch coverage than the basic rule coverage criteria. The difference between the coverage of the two criteria, however, is minimal (just 0.2% gap), in some cases the statement and function coverage are the same. The all-rule coverage criteria, however, leads to a 20% more steps in the generated test sequences, with respect to the basic rule coverage, meaning that it requires more time for test execution. All in all, we can notice that model refinement affects the code coverage more than the choice of coverage criteria: even if one applies a stronger test generation criteria, the increase in code coverage (around 0.2% increase) is lower than by applying a refinement (around 1–10% increase). Table 1 reports also the code coverage of combinatorial testing obtained by ProTest [34]. Note that the coverage achieved by our method from the second refinement on, is higher than the coverage obtained by the tests generated with the edge and 2-way coverage of the FSM model in ProTest.

5.3 RQ3: Faults Found

We have found a few mismatches in some of the test executions, namely the actual response from the manager was different from the expected one, according to the model. We analyzed these inconsistencies, and three of them turned out to be real bugs in the implementation, with respect to the protocol specification:

  1. 1.

    The specification of the standard IEEE 11073-20601 requires rx_abrt as response for the sequence “unasocciated + req_assoc_abort”. The Antidote implementation uses no response instead. The fault was revealed from the first model (\(\mathsf {ASM_0}\)).

  2. 2.

    The length of the message rx_roer was computed incorrectly, which results in a rejection by the encoding module. The fault was revealed after the first refinement (\(\mathsf {ASM_1}\)).

  3. 3.

    The sequence “checking_config + rx_aarq \(\rightarrow \) no response” causes a transition mismatch. A transition labeled by event rx_aarq was defined for state checking_config. However, in the actual code, three transitions were implemented for three sub-types of event rx_aarq_*, which can never be fired. This bug means that the Antidote Manager only responds to three sub-types of event rx_aarq_*, but does not respond to rx_aarq itself. The fault was revealed after the first refinement (\(\mathsf {ASM_1}\)).

Figure 8 shows an example of test case execution in ProTest, ending with a conformance error between the model and the implementation, denoted by a red cross in the tool. Furthermore, we have found that the state Associating is not part of the Antidote FSM table, since it was joined together with Unassociated state. In order to make our process work, we had to ignore this state also in the ASMs, but we believe that this is an implementation fault due to oversimplification done by the Antidote team. We have reported the faults to the developers and issued in the tracking system of the Antidote repository in GitHub.

Fig. 8.
figure 8

A test sequence execution, and coverage report, with ProTest [34]

6 Related Work

The works on conformance and interoperability testing for medical/healthcare devices can be classified into two categories: testing health information systems and testing medical or healthcare devices. Snelick et al. [22], and Namli [29] have studied conformance testing strategies for HL-7, a widely used standard for healthcare clinical data exchange. They have compared such testing strategies and proposed a test execution framework for HL7-based systems built on top of an extensible test execution model. This model is represented by an interpretable test description language, which allows dynamic test setup. These works have mainly focused on developing a general test execution framework. This is in contrast with our work, which focuses on test generation and model refinement for the communication model of IEEE 11073 PHD protocol. Garguilo et al. [21] have developed conformance testing tools based on an XML schema derived directly from IEEE 11073 standard, that provides syntactic and semantic validation of individual medical device messages, according to IEEE 11073. This is complementary to our work, as we focus on testing event sequences, and their tool can be used to check the correctness of the individual APDUs. Lim et al. [26] have proposed a toolkit that can generate standard PHD messages using user-defined device information, facilitating users who are not familiar with the standards details. This is another format of representing a model of the protocol messages, as we do in the modeling part of the proposed approach. Yu et al. [34] have proposed a general conformance testing framework for the IEEE 11073 PHD protocol, that streamlines the entire testing process, i.e., from test generation to test execution and evaluation. Our work is built on top of that framework, adding model refinement to improve test coverage, and rule-based test generation to make test sequences more efficient. Similarly to ProTest, there are also methods to generate test cases and to test protocol conformance directly from Finite State Machines, such as in [2, 6, 18], and many of them are included in a survey by Dorofeeva et al. [17]. Refinement is often used in combination with formal verification of properties [16, 24, 35]. In this work, instead, we try to combine refinement and testing. There are also other methodologies for protocol testing, such as the use of extended finite state model [31] and timed automata (TA). In timed automata, for instance, different testing techniques have been proposed, based on different coverage criteria as, e.g., transition coverage [7, 33] and fault-based coverage [3, 4], and they can be used for protocol validation.

7 Conclusion

In this paper, we have presented an approach that combines model refinement with model-based testing capable of improving testing effectiveness. Tests are derived from ASM specifications, obtained using refinement iteratively applied after testing the system under tests. In test execution, coverage info is used to identify system features or behaviors that are not captured in the model. These missing features or behaviors are then added into the model, in a manner that is independent from the implementation. This process has been applied to the case study of the IEEE 11073 PHD’s communication model. This work extends the testing framework presented by Yu et al. [34], aiming at streamlining the entire testing process, including test generation, test execution and test evaluation. We have shown that refinement can improve testing results (coverage and faults found) and that rule-based test generation strategies are a good alternative to the t-way test generation. Model refinement is a crucial process to achieve good results. As future work, we will apply this framework also to the Antidote agent, and to some real medical devices to check their compliance with the IEEE 11073 PHD standards. Moreover, we plan to optimize the generated tests among the model refinements, by not executing again in \(ASM(n+1)\) the same test sequences in the previous model versions, up to ASM(n). The tests themselves could be also refined between different model versions, for example by using the technique in [10].

The goal of our project is to promote methods that help in testing the conformance of medical devices designed to be compliant with IEEE 11073 PHD protocol, and in general to any other protocol specification.