1 Introduction

The Emergency Care Research Institute (ECRI) defines alarm fatigue as the emotional pressure care-providers experience when they are exposed to too many alarm sounds. In other words, alarm fatigue is a phenomenon that occurs when nurses work in a clinical environment where alarm sounds are heard frequently [13]. Humans are able to distinguish between five to seven categories of sound [4]. In many hospitals, noise levels are above the limits recommended by World Health Organization (WHO), i.e. 30 decibels in the ward rooms [5]. What type of alarm sound should be used in hospitals is a controversial issue: some studies suggest that at busy times, nurses may not be able to recognize melodies as alarms and respond to them [6]. In their study of alarm sound-making equipment in hospitals, Hirose et al. [7] conclude that the level of alarm sounds should be set based on the usual level of sounds in the environment; they also recommend that such equipment be set in such a way that the maximum level of alarm sound is automatically activated whenever the equipment is turned on. This recommendation is not consistent with the results of the study of Ryherd et al. [8]: they conclude that, since loud noises can increase stress, fatigue and tension headaches in medical staff and make concentrating more difficult, visual and vibrating alarm systems deserve to be studied in more depth by other researchers. It is necessary that other forms of alarm notification devices be available to guarantee the sound alarm is audible [9].

The psychological pressure caused by frequent exposure to alarm sounds can desensitize nurses to alarm signals, which can in turn lead to negligence of important clinical alarms. As a result of alarm fatigue, nurses may not only become dilatory in responding to clinical alarms, but they may readjust the alarms and adopt settings that are not safe for patients and practically turn the alarm systems silent or off [1]. If doctors or nurses deactivate the alarm sounds, put them on silent, or ignore them, patients’ safety is potentially threatened [3, 10]. False alarms can have many negative consequences: they can turn into a cry wolf (a lie) and make nurses ignore the alarm systems or respond slowly to repetitive alarms. Moreover, false alarms can interfere with nurses’ efficient planning and performance and distract them [3]. A growing problem, alarm fatigue is so serious that ECRI has declared it as threat number one on its list of 10 important threats of technology in the area of health from 2012 to 2015 [11]. Alarm fatigue is considered a kind of human error; and according to a report released by the Institute of Medicine in 1999, human errors are among the leading causes of death in hospitals. The American Food and Drug Administration has reported that 237 deaths from 2002 to 2004 were caused by disregard for clinical alarms [12]. Due to hundreds of alarm-related deaths over the past five years, the Joint Commission set the 2014 National Patient Safety Goals to improve the safety of alarm systems [13, 14]. Many of the alarm-related incidents investigated by ECRI were found to have been caused by alarm fatigue. Information from databases shows that frequently alarms that have gone off as a result of a patient’s serious conditions are either not heard by clinical employees or responded to too late. In other cases, alarm sounds are heard by the staff but are turned off due to their annoying noise [15]. It is essential that healthcare equipment manufacturers and operators and hospital authorities take steps to eliminate unnecessary alarms.

In view of the importance of recognizing alarm fatigue as a potentially dangerous phenomenon, there is need for valid, reliable, and transferable instruments to measure alarm fatigue in nurses. A review of the articles in the databases of PubMed (MEDLINE), CINAHL, Scopus, and Web of Science showed that a comprehensive and valid instrument for measuring nurses’ alarm fatigue had not been designed yet. Graham and Cvach [16] designed a 5-item questionnaire to survey medical staff solely about how to improve the safety of patients with regard to monitor alarm sounds; Baillargeon [2] developed a 6-item instrument to collect information about patient monitoring equipment alarms—type of alarm, alarm description, the number of times an alarm goes off, alarm response time, personalizing alarm parameters, and explanation—which had to be completed by a researcher in a clinical environment. In an online survey in 2006, Healthcare Technology Foundation (HTF) had 1300 technicians, engineers and hospital managers complete a questionnaire on clinical alarms. The results of the study showed that most of the respondents believed that nuisance alarms happened frequently and adversely affected the performance of caregivers, sometimes leading caregivers to deactivate them. Interestingly enough, a similar survey conducted in 2011 yielded similar results. In both cases, repeated false alarms were considered as the most serious problem with alarm signals in clinical environments [17].

In their study of nurses’ attitude toward clinical alarms in intensive care units (ICU), Cho et al. [18] report that alarm sounds from clinical equipment sometimes make nurses impatient and compromise their competence as care-providers.

Though a number of interventions have been suggested to reduce the number of false alarms, there is need for more thorough research that addresses the possible negative consequences of such interventions. Based on the results of several studies in the fields of alarm fatigue and alarm management in the U.S., a variety of measures have been suggested to lessen alarm fatigue in that country since 2010 [2, 3, 10, 12, 16, 17]. However, in Iran, little research has been conducted on medical device alarms and alarm fatigue is a new concept. There are not any comprehensive instruments exclusively designed for measuring alarm fatigue in nurses, thus the need for a valid and reliable alarm fatigue questionnaire. Accordingly, the present study is an attempt at developing a nurses’ alarm fatigue questionnaire and testing its validity and reliability in the hope of improving the quality of clinical care and services.

2 Methods

2.1 Design, setting, and subjects

This is a cross-sectional study with the purpose of designing a questionnaire for evaluating alarm fatigue in nurses and subsequently analyzing its validity and reliability. In the present study, the researchers addressed audio alarms: the alarm sounds made by patient cardiac monitors, infusion pumps, pulse oximeters, syringe pumps, and mechanical ventilators.

In order to develop and analyze items, the researchers initially provided a practical definition of “Nurses’ Alarm Fatigue” and set the specific objectives of the study based on that definition. As there is not a standard definition of alarm fatigue, after reviewing the available literature, the researchers used the following definition to explain the concept of alarm fatigue: having to respond to too many alarm signals at work reduces one’s sensitivity to clinical alarms, which can, in turn, result in alarm sounds being missed or responded to with delay. Next, a blue print was made and the significance of each objective was measured according to the comments of experts and professors who were familiar with the concept and psychometrics. Each item was or was not incorporated into the questionnaire based on the results of the review of literature, the experts’ comments and the item’s significance and relevance to the objectives of the instrument. The remaining items were examined several times and the repeats (items that were similar in content) were eliminated. Eventually, after the final draft had been approved, the validity of the instrument was measured using the two methods of face validity (the quantitative and qualitative approaches) and content validity (the qualitative and quantitative approaches). Test–retest reliability, Cronbach’s alpha, and Principal Component Analysis (PCA) were used to assess the stability and consistency of the instrument.

2.2 Data analysis

2.2.1 Validity

To verify the validity of the questionnaire, the researchers had the questionnaire analyzed by several professors in the fields of anesthesiology, intensive care, and nursing, as well as some experienced head nurses and nurses in intensive care units, all of whom were familiar with the concept in question; based on the suggested revisions, certain items were removed or added. To verify the validity of the questionnaire, both the face validity and content validity of the questionnaire were tested.

2.2.2 Face validity

To confirm the qualitative face validity, the researchers interviewed ten nurses face-to-face and asked for their opinion about the difficulty level (difficulty of comprehending the sentences), relevancy (relevance of the sentences to different aspects of the questionnaire), and ambiguity (chances of misinterpreting the sentences) of the questions. The researchers also ascertained that the sentences were grammatically correct and coherent by double-checking the items and having two language experts examine them.

2.2.3 Content validity

The content validity of the questionnaire was confirmed both qualitatively and quantitatively. Ten experts were asked to evaluate the content validity of the questionnaire: the experts, who were authorities on questionnaire development, medicine, epidemiology, and nursing, were asked to evaluate the qualitative content validity based on grammatical structures, choice of words, and order of the sentences. To evaluate the quantitative content validity of the questionnaire, its Content Validity Ratio (CVR) and Content Validity Index (CVI) were examined.

To determine the CVR of the questionnaire, the experts were asked to rate each item on a three-part scale: necessary, helpful but unnecessary, and unnecessary. Based on Lawshe Table, to determine the least value of content validity ratio, the items whose CVR was judged to be above 0.62 by the experts were considered significant (p value <0.05) and were maintained [19]. Subsequently, the content validity index of the questionnaire was analyzed based on Waltz and Bausell’s method [20]: the experts were asked to evaluate the relevancy, clarity, simplicity and specificity of each item based on a 5-point Likert scale. The CVI score of each item was calculated by dividing the number of experts who had selected scores 3 or 4 for that item by the total number of experts [21]. Hyrkas et al. [22] recommend the score 0.79 and above for accepting the CVI of an item. Finally, the average content validity index of the questionnaire (S-CVI/Ave) was calculated based on the mean of the CVI scores of the entire items. Polit and Beck [21] recommend the score 0.90 and above for accepting the S-CVI/Ave of a questionnaire.

2.2.4 Reliability

The reliability of the questionnaire was verified based on the analysis of its internal homogeneity and consistency in 102 ICU nurses. The Cronbach’s alpha of the questionnaire was calculated to measure its internal homogeneity. It is believed that the internal homogeneity of a questionnaire is satisfactory when its Cronbach’s alpha is between 0.7 and 0.8 [23]. To measure the consistency of the questionnaire, the researchers used the test–retest approach. The key point about this approach is the time interval between the tests; according to Fowler [24], the interval should be long enough for the respondents to forget the items on the questionnaire to a certain extent, but should not be too long for the phenomenon being measured to change. Burns and Grove [25] suggest that the length of the interval be from 2 weeks to 1 month. Thus, the retest was given 14 days after the initial test. The correlation between the scores obtained from the pre- and posttests was analyzed. Principal Component Analysis (PCA) was used to define the organizing principle of the instrument. Factor loading and “alpha if item deleted” were used to measure the contribution of items to the instrument and the scale’s reliability and to reduce the number of items if necessary. SPSS v. 21 was used for analyzing the collected data which included descriptive and analytical statistics.

3 Results

3.1 Participant demographics

In the quantitative part of the present study, in order to evaluate the reliability of the designed questionnaire, 102 nurses with an average age of 32.64 ± 4.81 and length of experience of 7.12 ± 6.34 years were tested. Table 1 shows the distribution of the participants based on gender, education, and employment status.

Table 1 Personal characteristics of the participants in the reliability section of the study

3.2 Validity

Based on the results from stage one of the study, alarm fatigue can be defined as lacking the full capacity and ability for identifying and prioritizing clinical alarms, which leads to inefficient response to alarm sounds. In stage one, 16 statements were composed, which were scored on a 5-point Likert scale: never, rarely, occasionally, usually, and always.

3.3 Face validity

For evaluating the face validity of the questionnaire, the experts and nurses’ opinions were considered and the necessary revisions were made.

3.4 Content validity

Based on the results of the content validity evaluation, the following changes were made: item 7 was revised in compliance with the experts’ documented comments; 3 items were removed based on Waltz and Bausell’s content validity index and the experts’ documented comments (Table 2); 5 new items were added to the questionnaire based on the experts and research team’s documented comments. Eventually, the nurses’ alarm fatigue questionnaire ended up consisting of 19 items and was tested for item reduction and reliability analysis (Table 3). It should be noted that the average content validity index of the questionnaire (S-CVI/Ave) was found to be 0.92.

Table 2 Evaluation results of the content validity index (CVI) of the questionnaire
Table 3 Results of the retest and internal homogeneity of the ICU nurses’ alarm fatigue questionnaire

3.5 Item reduction in the scale

Based on factor loadings of the items and “alpha if item deleted”, six items were suggested to be removed from the scale. As a result, after the second round of consultation with the expert panel, the items were removed from the scale. The results of PCA on the revised scale suggested possible existence of two underlying factors according to the factor loadings of the remained items (Table 4). The final scale and its underlying structure and a summary of item and scale analyses are presented in Tables 2 and 3.

Table 4 Rotated Factor Matrix

3.6 Reliability

3.6.1 Internal homogeneity reliability

The Cronbach’s alpha coefficient for the internal homogeneity of the final version of the questionnaire was found to be very good (Cronbach’s alpha = 0.91).

3.6.2 Test–retest reliability

Moreover, to verify the consistency of the retest, the researchers calculated the Spearman–Brown coefficient of the statements, which was found to be very good (Spearman–Brown coefficient = 0.99) for the entire scale (Table 3).

4 Discussion

In view of the importance of recognizing alarm fatigue in nurses and its consequences for nurses and patients alike, and considering the lack of a proper instrument for measuring alarm fatigue, the present study aimed to develop an ICU nurses’ alarm fatigue questionnaire. The face validity and content validity (quantitative and qualitative validity), internal homogeneity (Cronbach’s alpha), and consistency (test–retest) of the questionnaire were confirmed. PCA was used for item reduction and reliability analysis of the instrument.

So far, there have not been any instruments exclusively designed for measuring alarm fatigue, and the available instruments are intended for collecting data from patients’ monitoring alarms and improving patients’ alarm-related safety. Moreover, the validity and reliability of the available instruments have not been tested, and the instruments have been designed for use in certain hospital wards and are not transferable to all wards where patient monitoring devices are used. One of the instruments which has been designed for collecting data from patients’ monitoring equipment alarms consists of only 6 items which address type of alarm, alarm description, the number of times an alarm goes off, alarm response time, personalizing alarm parameters, and explanation, and does not deal with alarm fatigue [2]. Another questionnaire which is intended for surveying hospital staff in order to improve alarm-related patient safety consists of 5 questions: question 1 is a yes/no question and deals with adjusting the limits of alarms; in question 2, nurses are asked to rank the amount of noise in the ward and the contribution of alarms to the noise from 1 (least) to 5(most); in question 3, nurses are asked to list five alarms which they believe are most important; in question 4, nurses should describe their usual response to the alarms they listed in the previous question; and the last question, which is optional, asks for nurses’ suggestions about alarm management [16]. The questionnaire used by HTF consisted of several items and the participants’ responses reflected to what extent they agreed with each statement. The initial part consists of 19 general statements about clinical alarms. The last section contains 9 items about issues that inhibit effective management of clinical alarms. No information is provided about the reliability and validity of the instrument [17]. The instrument used by Cho et al. is a two-part questionnaire. Part one, which consists of 14 items about the respondent’s knowledge of alarms made by clinical equipment, is based on the questionnaire of Healthcare Technology Foundation (HTF) translated into Korean. Part two consists of 9 items about barriers to handling alarms effectively which are extracted from two other sources in the literature [18]. The validity and reliability of the instrument are assessed only based on the content validity index and Cronbach’s alpha coefficient respectively.

It is evident that the above-mentioned instruments lack exclusivity for measuring alarm fatigue. The questionnaire developed in the present study, however, exclusively addresses alarm fatigue. In addition to evaluating nurses’ attitude, the instrument used by Sowan et al. [26] includes a few items about nurses’ performance with regard to new cardiac monitors; however, the face validity of the instrument has been examined by only 4 expert ICU nurses and the other monitoring devices in the ICU are not dealt with.

Though a number of studies have addressed ICU nurses’ attitudes and practices related to clinical alarms, in none of them is a reliable and valid instrument used. Most of these studies use the instrument developed by HTF whose reliability and validity have not been verified. In the present study, to perform a psychometric analysis of the instrument, the researchers used two methods of face validity (the quantitative and qualitative approaches), content validity (CVI and CVR), test–retest reliability, Cronbach’s alpha, and Principal Component Analysis (PCA); also, the instrument addresses all monitoring audible clinical devices.

Moreover, the Likert scale used in the commonly-used instruments ranges between “agree” and “disagree”; the problem with this is that a respondent may agree with an item but not actually act as stated in the item, which can prevent the results from being a reflection of the real situation. This may account for the fact that there is not a significant difference between the results of the two surveys conducted by Funk et al. (HTF). In the instrument developed in the present study, however, the Likert scale used assesses the respondents’ performances rather than their attitudes.

Even though some of the above-mentioned studies have used very large sample sizes, the instruments used to measure alarm fatigue in nurses have not been accurately tested for reliability and validity.

Validity is the extent to which a method or instrument is accurate in measuring a certain characteristic. In the present study, the researchers evaluated both the face validity and content validity (CVR and CVI) of the designed questionnaire, which resulted in the deletion of 3 statements and revision of one. The average content validity index (S-CVI/Ave) of the questionnaire was found to be 0.92, which is a satisfactory value (according to Polit and Beck, values of 0.90 and above are acceptable for S-CVI/Ave) [27]. Thus, in terms of content validity, the ICU nurses’ alarm fatigue questionnaire developed in the preset study is valid.

Reliability is defined as the homogeneity of respondents’ scores for a set of items as obtained in two separate situations or based on two equivalent instruments. In the present study, the researchers evaluated the internal homogeneity (Cronbach’s alpha), consistency (test–retest), and item reduction of the developed questionnaire. The Cronbach’s alpha coefficient of the questionnaire was found to be 0.91, which shows that the items of the questionnaire have high internal homogeneity. Thus, the reliability of the ICU nurses’ alarm fatigue questionnaire was confirmed. The consistency of the questionnaire was tested based on the test–retest method: the results of the two tests with a 14-day interval were found to be very good (Spearman–Brown coefficient = 0.99), and the questionnaire was found to be consistent.

The score range of the developed questionnaire is between 8 (minimum) and 44 (maximum), with higher scores indicating a greater impact of alarm fatigue on nurses’ performance. Each item on the questionnaire is scored from 0 (“never”) to 4 (“always”), except items 2, and 14 which are scored reversely.

As this is the first attempt at developing and evaluating the psychometric characteristics of an ICU nurses’ alarm fatigue questionnaire, the present research project is innovative in Iran. The fact that the present study was conducted on the ICU nurses in Iran limits the transferability of the results: the results cannot be applied to all nurses. Therefore, the researchers suggest that further studies be conducted with larger numbers of participants and across different countries.

The designed questionnaire, in its final form, consists of 13 items on the 5-point Liker scale (never, rarely, occasionally, usually, and always). The questionnaire is intended for measuring ICU nurses’ alarm fatigue, and has been designed based on an extensive literature review and several experts and nurses’ comments.

5 Conclusion

In the present study, a questionnaire was designed in Iran for measuring nurses’ alarm fatigue. The results of the study show that the validity and reliability of the questionnaire are satisfactory. It should be noted that this questionnaire is easy to use and can be completed in about 10 min. Therefore, this psychometric questionnaire is efficient enough for measuring nurses’ alarm fatigue and can be used in the development of programs for reducing alarm fatigue-related problems and issues.