1 Background

Since October 2020, digital health applications (so-called DiGAs) have been part of standard care for people covered by statutory health insurance in Germany. Given Germany’s pioneering role in international comparison, other countries and health systems follow with great interest the coverage policies and assessment processes for digital health applications implemented in Germany [1, 2]. The legal and regulatory requirements were set by the Digital Healthcare Act (Digitale-Versorgung-Gesetz; DVG) from December 2019 and the Digital Health Applications Ordinance (Digitale-Gesundheitsanwendungen-Verordnung; DiGAV) from April 2020.

1.1 Definition of DiGA

DiGAs are active, lower-risk medical devices, classified as class I or IIa medical devices according to the European Union (EU) Medical Device Regulation 2017/745 (MDR) [3]. Therefore, a medical purpose to be achieved by the digital main function of the DiGA must be defined by the manufacturer. This function is intended to support the recognition, monitoring, treatment, or alleviation of diseases or the recognition, compensation, treatment, or alleviation of injuries or disabilities. The target group for DiGAs is patients with a confirmed diagnosis, which must be assignable to a 3- or 4-digit ICD-10 code [4]. All features necessary for a digital health application to qualify as a DiGA can also be found in the DiGA Guide for Manufacturers, Service Providers and Users [5] in accordance with Section 139e of Book V of the Social Security Code (Sozialgesetzbuch V) and are summarized in Fig. 1. DiGAs can be used following a prescription of treating physicians and psychotherapists or after patients request their use and gain approval directly from their statutory health insurance.

Fig. 1
figure 1

Overview DiGA Fast Track—based on Brönneke et al. [66]

At the request of the manufacturer and as a precondition for reimbursement, the applications are reviewed, assessed, and approved by the Federal Institute for Drugs and Medical Devices (Bundesinstitut für Arzneimittel und Medizinprodukte; BfArM) [5]. In addition to the required characteristics of DiGAs outlined above, the BfArM also reviews whether the manufacturer has provided proof that their application fulfills basic requirements such as patient safety, functionality, data protection, data security, quality, and interoperability of the medical device [5]. If the manufacturer has provided proof that these basic requirements are fulfilled, the BfArM will review whether the manufacturer has demonstrated a positive healthcare effect of their application. If all these requirements are met, the DiGA will be approved and included in the DiGA directory maintained by the BfArM [6]. Listing in the DiGA directory is a prerequisite for the prescription of a DiGA and reimbursement of its costs by the statutory health insurance.

1.2 Evaluation of positive healthcare effects

Positive healthcare effects can be demonstrated by proof of medical benefit or patient-relevant improvement of structure and processes (PISP). The term medical benefit refers to outcomes that are known from trials of clinical or pharmaceutical interventions, including (a) improvement of the state of health, (b) reduction of the duration of a disease, (c) prolongation of survival, or (d) improvement in the quality of life. In contrast, PISP refers to an outcome core area which is innovative in Germany and the international comparison both in the context of proof of benefits and in terms of reimbursement, (see Fig. 2) [1]. The 9 PISP outcome domains are (a) coordination of treatment procedures, (b) alignment of treatment with guidelines and recognized standards, (c) adherence, (d) facilitating access to care, (e) patient safety, (f) health literacy, (g) patient autonomy, (h) coping with illness-related difficulties in everyday life, and (i) reduction of therapy-related efforts and strains for patients and their relatives [5]. Table 2 describes each outcome domain, as defined by the DiGA Guide of the BfARM [5].

In the context of assessment and approval, positive healthcare effects in terms of both medical benefits and PISPs are now equally important. Therefore, a positive healthcare effect compared to standard care must be demonstrated for only one of these [5]. The motivation behind this decision was to empower patients to become more active and informed, encourage shared decision-making, and promote health literacy [5]. In addition, the integration of this new outcome core area reflects a more comprehensive understanding of the quality of care and its evaluation for the purpose of patient benefits. Therefore, it increases patient-centeredness in healthcare delivery and contributes to the principles of value-based healthcare [7, 8]. Furthermore, this approval process and the coverage of costs by statutory health insurances provide the basis for patients having low-threshold access to quality-assured digital health applications [5].

The perspective of patients on their health status and healthcare delivery, and their participation in their therapy, have become increasingly relevant in evidence-based medicine [9,10,11]. This is reflected by the significant increase in the use of patient-reported outcome measures (PROMs) and patient-reported experience measures (PREMs) [12,13,14] that can also be observed regarding digital health interventions [10, 15]. In particular, all 9 PISP outcome domains can be assigned to PROMs. PROMs are used to monitor health conditions and the effectiveness of treatments and interventions, whereas PREMs are used to evaluate and monitor experiences during the delivery and use of healthcare services, both measured from the patient perspective.

1.3 Research gaps concerning PISP

Patient perspective is key in the various evaluation frameworks and guidelines on digital health interventions. However, they differ in their definition and operationalization of the patient perspective. For example, the authors of the Model for the Assessment of Telemedicine Applications (MAST) recommend that the patient perspective should be considered by measuring usability and acceptance [16]. The National Institute for Health and Care Excellence (NICE) framework proposes within evidence tier C, which relates to applications that fit the definition of a DiGA, that the focus should be on measuring effectiveness in terms of quality of life or symptom severity [17]. Frameworks also exist that recommend the measurement of particular PISP outcome domains as part of the evaluation of digital health applications, such as the Khoja–Durrani–Scott Evaluation Framework [18] and the design and evaluation framework for digital health interventions (DEDHI) [19], which covers aspects such as “improved access to care,” “equity of care,” “effects on the delivery of care,” and “service quality.”

Thus, some elements of the PISP domains are considered by existing frameworks for evaluating health applications. However, to date, none covers all 9 domains. Whether the developers of the nine PISP domains considered any of the frameworks listes above when further defining PISP for the DiGAV is unclear, and so is the decision process which led to the nine domains. Consequently, given the novelty of PISPs as outcome domains for the evaluation of DiGA, no standardized set of outcomes exists to operationalize the 9 domains, let alone a set of measurement instruments to assess them. Given the fact, however, that PISP alone are relevant criteria for the evaluation of a DiGA, this research gap needs to be closed.

As demonstrated by the aforementioned frameworks, some of the aspects that define the 9 PISP domains have been considered important for evaluating health applications. Therefore, learning from previous evaluation studies can help standardize outcomes and measurement instruments for the collection of PISP in the course of evaluation studies of digital health applications. For medicial benefits of analogue as well as digital interventions, comprehensive measurement tools exist, some of which are codified within Core Outcome Sets [20].

Against this background, this study aimed to examine the characteristics of PISP measured in previous prospective controlled evaluation studies of DiGA-compliant digital health applications, published internationally before the DiGAV.

1.4 Research questions

This systematic review examined the following research questions:

  1. 1.

    What were the characteristics of evaluation studies in which PISPs were collected?

  2. 2.

    Which outcome domains, outcomes, and outcome measurement instruments were used to assess PISP and medical benefits?

  3. 3.

    How frequently were different outcome domains, outcomes, and outcome measurement instruments used in the included evaluation studies?

2 Methods

2.1 Search strategy

To investigate these research questions, we conducted an in-depth review of the evaluation studies of telemedicine applications included in the systematic review of Knapp et al. published in November 2021 [15]. The review investigated the use of PROMs and PREMs in those evaluation studies. Since PISPs belong to the field of PROMs as outlined above, this review provided an appropriate basis to investigate our research questions. The preceding review was performed as an electronic database search on MEDLINE and Embase. The inclusion and exclusion criteria, as well as the search string can be found in Additional file 1. The search string was based on previous works of other research groups on the topics of telemedicine [21] and PROMs [22]. The screening of the 2671 hits in the databases, the extraction as well as the data analysis were each performed independently by two reviewers.

The initial review included 303 studies providing the basis for our in-depth review. Studies up to April 2020 were included in the review because the Digital Health Applications Ordinance became effective then and we aimed to examine evaluation studies published before it. In addition, we performed an in-depth hand search, as well as a comprehensive forward and backward reference search starting from the research items finally included in the original review in order to ensure that we considered all relevant articles in our review. Back and forward as well as hand searches were finalized in September 2021.

2.2 Eligibility criteria

We defined inclusion and exclusion criteria for the in-depth review presented in the current paper in order to identify evaluation studies for telemedicine applications complying with the definition of a DiGA in accordance with the requirements mentioned within the introduction. The inclusion and exclusion criteria were based on participant, intervention, comparison, outcome, and study type (PICOS scheme) as shown in Table 1. Given the explicit focus on PISP, further inclusion criteria beyond those of the initial review were added in order to identify studies appropriate to answer our research questions.

Table 1 Inclusion and exclusion criteria

2.3 Study selection and data extraction

Based on the eligibility criteria, two researchers (MS, AK) independently conducted a title and abstract screening of all articles included in the initial review as well as those gained from hand, forward and backward searches. They then independently assessed the full texts of the preselected articles for inclusion. Any disagreements were discussed between both reviewers and resolved in discussion.

Data extraction was conducted by both researchers (MS, AK) independently. To ensure a consistent approach to data extraction, the matrix for data extraction was jointly developed in advance. The selection of relevant characteristics for the extraction matrix was based on characteristics commonly used in systematic reviews, including author, title, year of publication, journal, study country, study type, and intended medical purpose indicated by 3-digit ICD-10. These were supplemented by characteristics relevant to answering our research questions, including (1) main intended use of the DiGA, describing the type of study intervention; (2) assignment of outcome(s) to the categories of medical benefit, patient-relevant improvement of structure and processes, or others; (3) outcome domain(s); (4) outcome(s); and (5) outcome measurement instrument(s). The latter were divided into validated and non-validated instruments, considering that the DiGAV requires the use of validated outcome measurement instruments to demonstrate the healthcare effects of digital health applications. Figure 2 illustrates this subdivision, based on examples from the initial review by Knapp et al. [15].

Fig. 2
figure 2

Example of division into outcome measurement instrument(s), outcome(s), outcome domain(s), and outcome core areas

For the purpose of characterizing the main intended use of the DiGAs, we used the terminology in the DiGA Guide, which in turn matches the MDR [3]. The main intended area of usage included recognition, monitoring, treatment, alleviation, and compensation.

Due to a lack of existing binding definitions of the different PISPs and a lack of transparemcy on the development of the PISP domains, we used the DIGA Guide as a reference in order to assign outcomes to the respective PISPs [5]. Table 2 includes all PISP domains and corresponding explanations as direct citations from the German original DiGA guide and translated by the authors.

Table 2 Description of PISP outcome domains based on the DIGA Guide [5]

The results of the independent data extraction were discussed by both reviewers (MS, AK). Any disagreements were discussed and resolved by consensus. Additional file 2 shows the entire data extraction table.

As our review explicitly did not aim to prove the effectiveness of DiGAs, but rather to focus on the allocation and application of different outcome domains for the evaluation of DiGAs, we did not perform a risk of bias analysis which would have been necessary in the context of an effectiveness analysis.

3 Results

3.1 Study selection

The initial review by Knapp et al. identified 2,671 studies, which resulted in 303 included studies [15]. After applying the inclusion and exclusion criteria of the in-depth review presented here, 133 studies remained for title and abstract screening, and 17 were included in full-text screening. Six studies met all inclusion criteria and were included in data extraction. 11 studies were excluded, because no PISPs were measured as outcome domains of the intervention (n = 6) [23,24,25,26,27,28], PISPs were not measured by validated outcome measurement instruments as required in the approval process for DiGAs in Germany (n = 3) [29,30,31], a non-controlled study design was used (n = 2) [27, 32] or the record was a study protocol (n = 1) [33]. The flow chart in Fig. 3 shows the entire study selection process. Hand, backward and foward searches did not yield any additional studies which met all inclusion criteria.

Fig. 3
figure 3

PRISMA Flow Chart

3.2 Study characteristics

Of the 6 included studies, 4 were randomized controlled trials [34,35,36,37], 1 was a controlled pragmatic pilot trial [38], and 1 was a controlled trial [39]. The studies were published between 2016 and 2019 and covered a study period from 2009 to 2017. Three studies were conducted in the United States [35, 37, 39], and 1 each was conducted in Sweden [38], India [34], and Canada [36]. The indications addressed by the DiGA-compliant telemedicine applications were chronic obstructive pulmonary disease (COPD) [38], type 2 diabetes mellitus [34], osteoarthritis [35], bronchial asthma [36], chronic heart failure, and spinal cord injury [37]. Of the included studies, five investigated a monitoring intervention [34,35,36,37, 39], and one investigated an intervention for alleviating symptom burden [38]. Further characteristics of the included studies can be found in Additional file 2 but are not shown here because they were not relevant to answer our research questions.

3.3 Characteristics of outcome core areas, outcome domains, outcomes, and outcome measurement instruments

A total of 48 outcomes were collected in the 6 included studies, contributing to a mean of 8.0 outcomes per study. Of these, 14 (29.2%) outcomes addressed PISP, and 29 (60.4%) addressed medical benefit. The remaining 5 (8.3%) outcomes could not be assigned to the core areas of PISP or medical benefit and hence were assigned to the category “other.” They included outcome domains such as satisfaction and usability.

PISPs are divided into 9 outcome domains as outlined in the introduction. The 14 outcomes identified in the studies of our review could be assigned to 5 of these domains. The most commonly used PISP outcome domain was health literacy, with a frequency of 7 (50.0%), followed by coping with illness-related difficulties in everyday life, with a frequency of 3 (21.0%) and adherence with a frequency of 2 (14.0%) (Table 3). Within our studies, we found no outcomes fitting to the domains coordination of treatment procedures, facilitating access to care, patient autonomy, or patient safety.

Table 3 Overview of included studies detailing PISP and medical benefit outcomes, outcome priority, and outcome measurement instruments

A total of 13 different measurement instruments were used for the assessment of PISP outcome domains. One outcome measurement instrument, the Patient Activation Measure (PAM)-13 questionnaire [40], was used twice. Table 3 and Additional file 2 provide a comprehensive overview of all PISP and medical benefit outcome domains, outcomes, and outcome measurement instruments used, presented according to the individual studies.

The majority of PISP outcomes (71.4%, 10/14) were assessed by validated questionnaires, as shown in Fig. 4. One questionnaire (7.1%) was self-developed for evaluation purposes in a single study. In 3 cases (21.4%), PISPs were not collected by questionnaires but by patient-reported data such as self-reported medication or frequency of blood glucose testing according to schedule.

Fig. 4
figure 4

Use of validated outcome measurement instruments

Table 4 summarizes for which PISP outcome domains we found validated outcome measurement instruments and for which PISP outcome domains knowledge is lacking.

Table 4 Overview of validated outcome measurement instruments by PISP outcome domains used in studies on DiGA-compliant applications

All of the 6 included studies investigated outcomes from the core area of PISP in addition to outcomes from the core area of medical benefits. We found no study investigating solely outcomes from the core area of PISP. Furthermore, outcomes from the core area of PISP were investigated as secondary outcomes across all included studies except the one by Evangelista et al., where no difference for primary and secondary endpoints was made [39].

Table 3 provides a comprehensive overview of all included studies detailing the investigated outcome domains, the outcome including their priority, and the applied outcome measurement instruments.

4 Discussion

Our study aimed to examine the characteristics of PISP use in the context of previous prospective evaluation studies of DiGA-compliant digital health applications, published internationally before the DiGAV in April 2020.

4.1 Main findings in the context of previous research

The core area of PISP was introduced to strengthen the role of patients and take their assessments and benefits into greater account in the approval process of digital health applications [5]. Notably, PISP outcome categories are primarily process quality indicators and not outcome quality indicators as represented by medical benefit. Considering process quality indicators in addition to outcome quality indicators, which mutually influence each other, provides the basis for a more holistic, not to mention patient-centered, evaluation of digital health applications [56].

Altogether, we included 6 studies in our review. All included studies used a controlled study design, 4 (4/6) of which were randomized controlled trials. The approval process as a DiGA in Germany also requires a controlled study design, underlining the appropriateness of this design for demonstrating the effectiveness of digital interventions compared to non-controlled study designs [57,58,59]. The majority of evaluated applications focused on patients with chronic conditions (4/6) and offered monitoring of the corresponding disease as a feature (5/6). Both findings reflect the current range of telemedicine applications, which primarily offer monitoring for patients with chronic conditions [15].

The most commonly used PISP outcome domain in evaluation studies was health literacy (7/14, 50.0%), followed by coping with illness-related difficulties in everyday life (3/14, 21.0%). One possible reason for this is that health literacy [60] is a widespread and well-known outcome domain and various established outcome measurement instruments already exist in the form of validated questionnaires. For instance, the Health Literacy Tool Shed online database listed a total of 240 health literacy measurement instruments as of April 28th 2023 [61].

Within our studies, we found no outcomes belonging to the PISP domains of patient autonomy, coordination of treatment procedures, facilitating access to care, or patient safety. Since validated measurement instruments for most of these domains exist [62,63,64], and increasing patient empowerment and access to care are among of the key promises of digital health use [65], this result is surprising.

Regarding the PISP outcome domains in our evaluation studies, self-developed questionnaires as well as process-generated, tracked, or directly measured data were used for only 4 out of 14 (28.5%) of the outcomes measured. In this regard, our study provides evidence that for some PISP outcome categories, validated outcome measurement instruments may be lacking. This is also critical because validated questionnaires are mandatory in the DiGA evaluation studies for measuring and demonstrating a positive healthcare effect [5].

In addition, the analysis showed that clearly assigning the outcomes from evaluation studies to the PISP outcome domains is sometimes difficult. This is due to a lack of detailed guidance on which outcomes can be assigned to each domain and which outcome measurement instruments should be used for assessment [5]. Since the development process of the nine PSIP outcome domains is unclear, this result is not suprising. Using performance management models as suggested for the public sector could help in increasing transparency in this context [66].

Notably, all included studies evaluated outcomes from the core area of PISPs in addition to outcomes from the core area of medical benefits and only as secondary outcomes. No study solely evaluated positive effects for outcomes from the core area of PISPs. This is likely due to the novelty of PISP outcome domains in the context of approval trials and the fact that positive healthcare effects in terms of medical benefits and PISPs are now of equal importance.

4.2 Implications for future research

Our study is the first systematic review of evidence concerning the characteristics of PISPs in the context of previous evaluation studies of DiGA-compliant digital health applications. Therefore, our results are a starting point concerning the guidance needed on which outcomes and outcome measurement instruments can be used in evaluation studies of such applications to measure PISP outcome domains. The results also highlight PISP outcome domains where knowledge is still lacking about outcomes and outcome measurement instruments that can be used and thus help sharpen the focus on results of DiGA ratification in Germany and beyond. Thus, our findings can be used to further develop existing evaluation frameworks and outcome taxonomies. The discussion of our findings also provides several valuable ideas for targeted implementation and dissemination of PISP in Germany, as well as other countries that would like to strengthen the patient perspective in the evaluation and implementation of digital health applications.

Future research should advance the distinct and transparent assignment of outcomes and validated outcome measurement instruments to all 9 of the existing PISP categories based on the results of our review. Research is especially needed concerning the 6 domains for which we found no insights concerning validated outcome measurement instruments.

Another topic for future research is the further inclusion of PISP outcome categories in existing evaluation frameworks and outcome taxonomies. The questions of how PISPs can be classified within existing taxonomies, how taxonomies could be adapted, or even if new ones should be developed hold comprehensive research potential.

Additionally, updating the review in one or two years will be of great interest as it will help to analyze how the characteristics of PISP measurement in evaluation studies have developed since the first DiGA was approved in Germany in October 2020.

4.3 Implications for practice

With DiGAs now being part of standard care for people with statutory health insurance in Germany and with PISPs now relevant in terms of approval and reimbursement decisions, Germany is breaking ground. These innovations, the accompanying discussions, and thus, the results of the present study are of international interest for different target groups:

  1. (I)

    International digital health application manufacturers and vendors who plan to have their application approved as a DiGA in Germany, an important market in the field of mobile health applications [2].

  2. (II)

    Representatives of governments, and healthcare systems worldwide concerning the significance of PISP in evaluation studies of digital health applications and the associated DiGA approval and reimbursement process [2].

  3. (III)

    Patients, who gain low-threshold access to quality-assured and evidence-based digital health applications and whose perspective and potential benefits have become significantly more important through innovative German legislation. This greater consideration of patient benefits when implementing and evaluating care interventions follows the principles of value-based healthcare [7].

4.4 Limitations

Currently, no legally binding document elaborates on the single PISP outcome categories, the corresponding outcome domains, outcomes, or outcome measurement instruments. This lack of explanation made it difficult for us to assign the single outcomes to the respective PISP outcome categories. However, to sharpen our understanding, we used the explanations from the DiGA Guide (Table 2) [5] and insights from a text book published by an expert committee of the Federal Ministry of Health for digital health interventions [67]. Furthermore, the interventions and study settings described in the articles were considered for the assignment of outcomes to the respective PISP outcome categories, and the entire data extraction was done by two reviewers independently. Nonetheless, the difficulties we faced illustrate again the need for further clarification of which outcomes and outcome measurement instruments can be assigned to the 9 PISP outcome domains.

As this was the follow-up analysis of an existing review not directed at measuring the effectiveness of an intervention, no a priori registration of the review was filed. However, the previous review used validated search strategies as well as a piloted data extraction strategy.

Given the fact that we did not aim to measure intervention effectiveness, we refrained from any quality assessment. Therefore, no statement can be made on potential risks of bias within the included studies.

Since we were unable to obtain any additional references from the extensive aditional searches, we are confident that the initial data set and thus the conduct of a subreview was an appropriate method to address our research questions.

5 Conclusions

Our review provides an overview of the characteristics of PISP use in the context of previous prospective controlled evaluation studies of DiGA-compliant digital health applications, published internationally before the DiGAV in April 2020. Our findings also show which outcomes and outcome measurement instruments can be used in evaluation studies of digital health applications to measure PISP outcome domains. Concurrently, we have highlighted PISP outcome domains where knowledge is still lacking about outcomes and validated outcome measurement instruments that can be used. Given the few studies included, the results are a starting point for operationalizing and standardizing PISPs and, therefore, increase the outcome measurement quality of PISPs.

The possibility of demonstrating positive healthcare effects by proof of PISP, and their relevance for cost reimbursement decisions by statutory health insurance funds, both underline the active role and personal responsibility of patients in dealing with their health. This should become even more important in terms of patient-centered care. Prescription and care practice will show to what degree DiGAs and outcomes from the PISP core area are presumed relevant by healthcare providers and patients alike. The outlined need for a clear assignment of outcomes to individual PISP outcome categories is also crucial to enable informed decision-making by physicians and patients for or against a DiGA.