1 Introduction

The aim of the neonatal intensive care unit (NICU) is to provide an environment, which replaces the womb as much as possible for optimal health recovery and growth. However, the noise level at the NICU is higher than advised. A quieter environment is expected to reduce the frequently observed detrimental outcomes of prematurely born infants. The properties of sounds in the NICU, including equipment alarms, and the effect of sounds on the infants as can be found in the literature, will be discussed as well as ongoing incentives to improve these sounds. The research aim, to improve the sound situation for the babies, is explained.

1.1 Influence of sounds on health, development and well-being of babies

Typical average sound levels at the NICU are around 54 dBA. Williams et al. (2007), for example, recorded sound levels exceeding 45 dB more than 70 % of the time during their measurements in three levels of NICU care. These sound levels conflict with guidelines stating that sound levels exceeding 45 dB are cause for concern and should be avoided and that at most 10 % of the time, levels exceeding 50 dB are acceptable, AAP (1997). As compared to ICUs at NICUs, alarms are even more frequently heard, exceeding even the recommended impulse maximum of 65 dB (Darcy et al. 2008). Maximum sound levels are from, for example, infusion pumps (>65 dB) and alarms in respiratory devices (>80 dB), Van Stuijvenberg et al. (2009). Reduced sound levels are needed.

However, in the womb, the foetus also receives stimulating auditory cues by which it develops its hearing. Foetuses have not learned to organize perception in a way an adult or child would, but already at 32 weeks, conversational turns can be observed to be higher when parents are present than without parents, indicating the maturation of the auditory response but also the maturation of a recognition pattern (Caskey et al. 2011).

An indirect influence of sound on the child is via the parents: even with full-term infants, “the parents found the intensive care unit shocking. The equipment with wiring, tubing and blinking lights looked frightening. The parents felt very unpleasant because of the audible signals” (Jämsä and Jämsä 1998). They suggest that comprehensive health technology assessment should include the assessment of the experiences of child and parents and that these experiences should be taken into account when designing devices.

1.2 Alarms to support nursing in the intensive care unit

To assist in nursing tasks, each patient is surrounded by a lot of medical devices that are responsible for either sensing and representing specific parameters related to physical functions of the patient monitoring equipment (e.g., monitor, ventilator) or supporting the physical functions, supporting equipment (e.g., ventilator, dialysis machine, infusion pumps), Edworthy (2000). Alarms are one part of the total information provided to nurses to do their work.

Consequently, many alarms are heard, although a high proportion of alarms are unrelated to emergencies. For example, Lawless (1994) observed in a paediatric ICU that 68 % of all alarms were false alarms and more than 94 % of the alarms had no clinical significance. Most alarms are simply triggered by one parameter threshold crossing. According to Hollnagel and Niwa (2001), this forms a fundamental mismatch with complex human cognitive work to be supported; the onus of response selection is left with the nurse. Imhoff and Fried (2009) conclude that still not much has changed; the overwhelming number of false alarms at the wards remains the same, despite numerous efforts for improvement.

1.3 Ongoing incentives to change the alarm situation

The problems concerning alarms in the ICU are well known and have been subject of numerous studies—especially their large number and the resulting noise and information overload are problematic.

Many researchers and companies are working on improving the alarm handling situation at the ICU. To improve sensory and cognitive handling of alarms, there are several approaches: (1) Intelligent alarming. Imhoff and Kuhls (2006) give an overview of several computer science methods to improve alarms. These can be divided into roughly two groups, the first group aims at reflecting protocols of decision-making in algorithms, while the second group allows the computer to find out how actual medical practice appears to work and then to produce this decision-making by itself. Intelligent alarming, indeed, can reduce the number of alarms of a single device Chambrin (2001). (2) Eliminating false alarms caused by patient or nurses’ movements, see e.g., Workie et al. (2005). (3) Sound design, for example, on the relation between type of sound and learning and retaining, for example, Edworthy and Hards (1999). (4) Improving correctness of alarms and support care at the same time, for example, Claure et al. (2001). (5) Hospital studies focusing on reducing noise for infant well-being, for example, by changing nurses’ behaviour, technical changes (e.g., changing the materials of the ceiling) or organizational changes (e.g., initiating a rest hour), have a clear but limited positive impact, Van Stuijvenberg et al. (2009). According to Imhoff and Fried (2009), “several approaches have shown efficacy and effectiveness in reducing the rate of false alarms in clinical study”. Unfortunately, they add, “still, very little has been implemented in commercially available medical devices”. A possible reason for this is that sound is perceived in a logarithmic way. This means that a single alarm in a quiet ward has a big impact on sound level and could startle, while taking away the same alarm in a noisy ward has almost no impact on the (high) sound level. This means that only by massive reduction in alarms and noise, and not by single device improvements, the target sound levels can be reached and undisturbed sleep can be guaranteed.

1.4 Research aim

In spite of the positive results of many academic and company studies, several physicians reported in personal communication that the actual sound situation at the ward at most has only changed marginally over the years. The aim of the investigation therefore is to identify what measures are needed to improve the sound situation at the NICU, as to provide premature babies’ with the many undisturbed rest hours they require for their health, development and well-being.

2 Methods

The investigation was conducted in three stages, by three different principal designers and a team. The method used was built up from techniques used in Fuzzy Front End (FFE) (e.g., see Koen et al. 2001) and human factors. FFE is the idea finding phase in product development. Recently, specific human factors approaches for FFE have been developed or adapted for use in this phase. One essential element in FFE is management of the methods used. Design is tacitly or explicitly built up in five layers of design thinking (Sanders 2010). Culture: designers need to learn about cultural aspects in the usage domain—these are alien to them in new project (domain); Mindset: designers intentionally draw their attention to certain important issues, for example, in this project, towards the human stakeholders and (social) organization; Methodologies: the framework of methods applied; Methods: usually several in one project; and Tools and techniques: what is actually done and materials used. In all three stages, design management (e.g., defining research questions, selecting the methods, judging progress, planning) was essential and human factors approaches were selected, tailored and managed towards the design goals.

2.1 Research and design methods

2.1.1 First stage: understanding the problem and vision building

In the first stage, a wide exploration was conducted to understand the problems with ICU alarms and ideas for solutions were identified and evaluated with nurses, see Pijl (2004) and Freudenthal et al. (2005). The two main methodologies were FFE approach, structured in stages by ‘Vision in Product Development’ (ViP), Hekkert and van Dijk (2001) and combined with human factors approach including co-design. Human factors were considered at all levels of the “Human-tech ladder”, physical, psychological, team, organization and politics Vicente (2004). Vicente explains that design should begin by understanding a human or societal need at all these levels. The methods were the following: Literature search; Interviews with experts and users, for example, explanations about work and potentially useful technologies, were discussed with nurses and head nurses; Searching opportunities in technology; Day and night ethnography for design (observations, interview and document studying) and included co-design with end users: 50 h in five ICUs each with another speciality, and for a more detailed understanding of patient care, one specific syndrome was studied at a thorax ICU, that is, clogged artery treated with bypass surgery. For this, another 34 h were spent in one ICU. Various analysis methods were used and some were adapted, for example, abstraction hierarchy (Rasmussen 1983) was adapted from process-oriented to task-oriented and critical decision retrieval (Klein et al. 1989) was used. Observing and interviewing in other critical domains, that is, in an operating room, at a fire department, at a control room of an oil company and in aviation. Technological solutions were explored via literature, internet and by personally experiencing them, for example, trying a tactile vest (described in van Erp et al. 2003), combined with personal meetings with the researchers. Evaluations were done by storyboard group interviews with nurses.

2.1.2 Second stage: verifying and specifying

After the first study, a physician from the NICU at the participating hospital requested the research team to partner in conducting a feasibility study to find out whether the solution could make the NICU ward quieter, because a quieter environment is expected to have a positive impact on later functional outcome of neonate babies, see Langeveld (2006) and Freudenthal et al. (2010). FFE approach was continued and focussed on identifying (implicit) assumptions from first stage and deciding which are essential for success of the whole idea and testing these. Special approaches (not common in FFE) were in-depth human factors approach, including cognitive ergonomics and research through design (Horváth 2008)—design is used as research means. The methods include the following: Literature search; 30 h of ethnography for design in an academic NICU; Observe and interview in a new (1 year old) ICU; Focus group; Interviews with experts, for example, nurses, physicians and technical staff; Brainstorm and share and evaluate ideas with nurses; Evaluations with nurses. Common tools and techniques for the above were used. Some of the special tools used were the following: In the focus group: 12 female participants with 4–26 years of work experience participated in a focus group with generative tools (collaging and telling stories), see Sanders (2001); the focus group was preceded by a workbook, filled in by 10 participants; Adapted abstraction hierarchy; Two storyboard evaluations: In the first round, three different solutions were evaluated. In the second round, one design was evaluated by 1 doctor and 3 nurses, for 6 work scenarios representing ‘all’ nursing work; A questionnaire with 5-point scale and open questions and testing of a working demonstrator of the tactile alarm (see Fig. 3), with visual cues and a stop button, during 15 min.

2.1.3 Third stage: preparation for strict development

The third phase focused on project management—choosing research targets, investigations needed and expertise needed—in preparation of strict development. The outcome is a set of proposed interlinked measures and research and development areas that could potentially solve the problem, see Freudenthal et al. (2010). A new human factors ‘checklist’ tool was designed and used. The new tool is called ‘check matrix of human factors and design thinking’. It is meant to identify and get overview about all research questions and methods needed in the following phase, as well as design methods and targets. The matrix (see Fig. 1) combines two checklists (each of five attention areas) and together this forms a matrix of 25 cells. In the columns, the five levels from “the Human-tech ladder” by Vicente (2004) is presented and in the rows the five layers of design thinking by Sanders (2010), see introduction of Sect. 2. Vicente (2004) explains that it is important to understand the human or societal needs and that solutions should be tailored to reflect specific human factors. Although Vicente refers to Moray (1994) and Rasmussen (1997)—both writing from the safety domain, his view is much wider: it is about ‘design’. The five levels physical, psychological, team, organization and politics “are not unique. For example, the “physical” level could be subdivided into anatomy and physiology, and the “political” level could be subdivided in public opinion, government and regulatory associations. Also, the exact number of levels that are useful may differ across sectors (e.g., health care vs. nuclear power)”.

Fig. 1
figure 1

Check matrix of human factors (columns) and design thinking (rows) with a few filled in examples of ‘key phrases’, representing research question(s) or design targets(s) and related approaches(s). The matrix as a tool and more examples were presented in Freudenthal et al. (2010)

By checking all cells, questions to be asked at the different layers and for the different levels were identified. While conducting this work, it was noticed that there were too many questions. Therefore, the key phrases were placed in the matrix only once, even though they usually matched several cells. This was not a problem as the check matrix became only a memory aid for the principal designer who remembered that the phrase also applied to (several) other cells. She moderated the overview to the other team members.

The check matrix of human factors and design thinking was updated several times and is expected to be updated in the future. In the last ‘freeze,’ it contained 52 ‘key phrases’ representing sets of research question(s) and method(s) or design target(s) and method(s). Some examples from the matrix are shown in Fig. 1.

One more step was needed to make the final set of investigations transparent. They had to be integrated into meaningful ‘work packages’, with (multiple) clear expertise domains. As Rasmussen (1997) explains: “We need cross disciplinary studies of the … interaction amongst levels of the socio-technical systems with reference to the nature of the technological basis at the lower level” and Rasmussen (2000) “Focus will be on the selection of those academic paradigms from the involved professions which are mutually compatible and, therefore, useful for the problems at hand”. These compatible pairs formed the proposed sets of investigations (see chapters 4 and 5 and Fig. 5). Indeed, these “will not be concerned with comparative paradigm evaluations within the individual disciplines” Rasmussen (2000).

2.2 Measuring noise levels anno 2012

Sound measurements were conducted at a NICU ward that had been measured earlier in 2004/2005, Van Stuijvenberg et al. (2009). A sound metre was placed in the centre of the NICU on a vibration-free tripod in ward A. Ten measurements of 1 h between 8:00 AM and 10:00 PM were conducted. A Rion NL-32, class 1, hand-held sound level metre was used to measure the sound pressure levels. The effect measure was the A-weighted level of sound in decibels (dBA). From these data, the following values were calculated: the L EQ, which is a time integration of the sound pressure level, which represents the sound intensity that if kept constant over time, would equal the variable sound exposure measured during that period; the L 10, which is the 10 % exceedance level (the sound pressure level exceeded for 10 % of the measurement period); and the L MAX, which is the highest sound pressure level recorded during the measurement period. Baseline measurements included the L EQ only.

3 Results

The results from human factors and design work include:

  • the current sound situation at (neonatal) ICUs (Sect. 3.1);

  • a model of the current work process and the role of auditory alarms (Sect. 3.2);

  • task analysis of monitoring physiological functions (Sect. 3.2.2);

  • proposed sound interventions by design and organization (Sect. 3.3);

  • a model of the envisioned workflow with the proposed technological support (Sect. 3.4).

Furthermore, the literature presented in chapter 1 is part of the results.

For readability purposes, in this article, she is used for male and female nurses, he is used for female and male physicians. Quotes are from nurses and doctors participating in the field studies. VP refers to Vera Pijl, main designer in stage 1, SL refers to Sanne Langeveld, main designer in stage 2.

3.1 Sound situation anno 2012

In spite of the positive results of many academic and company studies as described in Sect. 1.3, several physicians reported in personal communication that the actual sound situation at the ward has only changed marginally (if at all) over the years. To check new sound, measurements were done in 2012. Results are listed in Table 1.

Table 1 Sound level measurements at NICU wards (dBA)

In Table 1, it can be read that indeed the average sound level in the NICU as well as the level of peak volumes is much higher at this ward than the guidelines prescribe. They are far above the advised daytime sound levels of 45 dBA for the NICU (AAP, 1997) and the recommended impulse maximum of 65 dB (Darcy et al. 2008).

3.2 Current nursing work

In order to understand whether and how alarms can be redesigned, nurses’ information usage was studied and modelled.

3.2.1 Nursing tasks and parallel alarm flow

(III) The task division between physician and nurse is in principle that the physician conducts diagnostics and decides on treatment protocol. The nurse should adjust treatment within a set margin if required. If the patient changes go beyond set margin, she should first consult the physician, who then decides on changes in treatment or other measures. Of course, a nurse should always immediately start reanimation if needed.

A nurse conducts many tasks, for example, caring for the patient, consulting the physician, taking care of administration, taking care of her own education and training others (Melles 2011). Furthermore, (III) nurses and physicians also conduct practice-based research to improve their care and investigations for evidence-based medicine. This means, amongst others, that patients are sometimes enrolled in new decision-making rules, or in control groups.

(I) Three main tasks are the following: (1) monitoring the patient’s physiological functions; (2) supporting or completely taking over the patient’s physiological functions with the help of medical equipment; (3) preparing and maintaining the medical equipment. These three tasks are guided or influenced by alarms (based on Van den Brink et al. (2000) and observations).

According to VP and confirmed by SL to conduct both alarm handling and the other nursing tasks, there is a constantly updated schedule in the mind of the nurse and partially outside the mind. Ongoing tasks cannot be disturbed frequently by the multitude of incoming alarms. Therefore, there are two flows of work that run in parallel. In Fig. 2, a model is made of the work process. At the left alarm, handling is depicted, and at the right ongoing work. These tasks are also crucial for babies’ health, development and well-being. For example, medication should be provided several times a day, feeding needs to be executed on time and the parents need attending to. If alarms interrupt these planned care moments, this not only undermines the humanistic treatment management but also the incubator needs to be closed again (noise) and hands need to be cleaned—this costs time.

Fig. 2
figure 2

Current cognitive and physical work process of ICU nurses. Two parallel processed: at the right, constantly tasks are (re)scheduled and at the left incoming alarms and other demands

3.2.2 Cues about physiological functions

ICU nurses think in physiological functions (I, II): a patient (a person being part of a family with parents and developing towards an emotional and social being) has a set of anatomic subsystems (e.g., brain, lungs). The physiological system is linked to a set of physiological processes (e.g., ventilation, cardiovascular system). The vital functions are monitored (i.e., lungs and heart). If changes occur (decrease in stability), they have to pick these up, start treatment of the most urgent problems and give a sign to other members of the medical team in charge. According to Reddy and Dourish (2002) and expressed by the nurses, the primary goal of the ICU staff is to stabilize not cure the patient. VP found that “in alarming events multiple anatomic subsystems can be involved” and that almost in every event, one subsystem can be indicated as “initiator”.

SL tested the assumptions by VP via several generative exercises. She asked what physiological functions are recognized and used in monitoring and treatment. Furthermore, she asked to identify cues used for monitoring. Three physiological functions were studied in detail: lung/respiratory system, heart/cardiovascular system and digestive system. By sorting all signals, eight groups of cues used could be distinguished: cues from patient, about food, about respiratory support, about medication, tests results, from parents, from monitor and from alarms.

For example, to monitor lung function, three out of five subjects listed: respiration; skin colour; Spo2 (saturation); apneu; EtCo2 (end-tidal CO2–CO2 in expiration air); oxygen concentration of applied air; date of intubation; size of tube; insertion and fixation depth of tube; suction, suction depth; tube blocked; stenosis; Mv low (minute volume − respiration), Mv high; % own respiration; ventilation pressures (these tend to change frequently); diagnosis based on X-ray; blood gas analysis; moist of air; temperature expiration air; manual respiration support; mechanical respiration support; low flow; Cpap (continuous positive airway pressure); respiratory depth; respiratory rhythm; respiratory frequency; body position. Two out of five subjects additionally listed: pulse, Hf (heart rate), asystoly, ABP systolic (blood pressure); medication; diagnosis based on ultrasound. One subject also mentioned ABP, pulse and CVD (central venous pressure).

As can be concluded, alarms are only one part of a set of cues that need to be combined to decide about the situation or expected future situation. This also means that the same “alarm” might require action in one patient situation, but not in another; in the latter case, it might be rated to be ‘false’.

3.2.3 Alarm filtering and processing

In Fig. 2, it is depicted that the first task is to filter urgent alarms pertaining to the own patients from the stream of auditory signals (II). Very often, it is unclear where the sound signal comes from: which baby, which equipment.

The first step is to integrate a relevant subset of cues and to at once decide on: (II) Which one of my babies? (Usually a nurse treats several) (I, II): Is the alarm urgent? Which physiological function is involved? And as a second but immediate cognitive step: what type of handling will be needed? This is the very first set of required data. Indeed instantly, all planned work has to be rescheduled. If help from colleagues is required this should be arranged. The nurse can also be disturbed by another nurse calling for help in an emergency situation. These information steps, however, are not presented directly by the alarm signals, but have to be integrated by the nurse, as was explained in the previous paragraph.

In observations (I, II), it became clear that the number of ‘alarms’ to be handled in combination with the parallel nursing processes makes it impossible to (cognitively) handle all (important alarms). VP presumed that the first stage must be preattentive. She followed Woods (1995) who describes the preattentive mechanism for dynamic fault management in domains with similar characteristics. When operators have to handle other important work, they apply a filtering strategy, dependent on deviations from expectations. He refers to Broadbent’s research (1977) on preattentive processes who suggests an early, passive and global analysis of information with a later, active and more detailed stage. VP observed that nurses specifically screen for expected and also likely deviations. Her impression was that it is trained and intuitive. The incoming multitude of sound signals is filtered by, according to Groen (1995), the type of current and expected work (e.g., vigilance or hands-on routine) determines the preattentive processing, by way of explicitly drawing attention to alarms or not and that reactions depend (amongst others) on whether the alarm is expected or surprises. This ‘level of vigilance’ might be different per physiological function, depending on the patients’ condition. This is indicated in Fig. 2.

3.3 Proposed technical measures to reduce alarms and noise

To meet the goals for the newborn babies and to fit to cognitive capacities and available work skills and practice, an idea for a new technical system was defined. A complete set of technological interventions will be listed and the three main technology directions will be explained, that is, intelligent alarming, personal multisensory display and sound design.

3.3.1 The set of measures

The set of proposed technology measures are the following:

  1. (a)

    Centrally controlled sound-generating unit(s) with redesigned urgent alarm sounds [alarms from other equipment will be permanently silenced] (I, II, III).

  2. (b)

    A multisensory display worn on the skin of nurse (I, II, III). Firstly, the device should indicate it is the nurses’ baby who has an urgent alarm, for example, by red light-emitting diodes (LED)s. Secondly, the device should have tactile output—meaning the message is alerting: soon some handling is needed, but it can wait. The tactile signals and the related (e.g., yellow) LEDs could be presented in a sequence/rhythm (II). Messages should convey the physical function, whether it is about equipment or patient and type of handling to be anticipated. Tactile messaging should not be annoying and overloading.

  3. (c)

    An information screen, similar to the current vital signs monitor, but it should provide more integrated data, including an overview of stability of physical functions relevant to monitor in this patient (I, II, III). The new overview screen should take into account parent’s experiences. For nurses, monitoring stability directly is important, but for parents, this is hard to understand and they can easily be misled by such information for them to view. [A nurse commented: “We talk about patients that way: He is stable, but taking the situation into account he is doing badly in a relatively stable way”. To relatives of the patient this point of view is too complicated.]

  4. (d)

    Sound design anticipating infant’s and nurses’ perception, not disturbing patient or nurse, but attracting nurses’ attention and conveying meaning appropriately;

  5. (e)

    A ‘buddy system’ allowing transfer of a patient to another nurse—including personal signals on the arm (II, III). [Nurses work in a nursing team, helping each other. It is a 24/7 business meaning that transfer of shift happens regularly and also nurses occasionally take care of other patients to whom they have not been introduced to with respect to their specific individual health problems, e.g., while a nurse leaves the ward for a cup of coffee or a reanimation elsewhere in the hospital].

3.3.2 Intelligent alarming

VP proposed a system to follow nurses’ thinking. The system conducts the initial preattentive step now executed by the nurse. She has the clinical picture in her mind from various cues and has to filter out and figure out whether an incoming ‘alarm’ signal actually is an alarm and requires immediate or later actions. For this, she in fact has to execute the monitoring of physiological functions with the cues described in Sect. 3.2.2. She has to combine a long list of cues from various sources. The idea is that artificial intelligence (AI) could also perform this task.

AI should integrate information to identify initial presentation modality of the integrated message: urgent—as sound and visual; notification—tactile and visual (light/rhythm); or information—visual on screen and later information for handling (I, II, III);

The system should decide: (II) Which baby? [The system should be clear even when two babies have an alarm simultaneously] (I, II) Is the alarm urgent or is it notifying? This defines presentation modality. (II) The nurses indicated they want to be in charge of deciding what is urgent and what should be notifying—so they should control AI settings.

If the alarm is urgent or notifying, the system should also assess which physiological function is involved (I, II). Is it about equipment or the patient? (III) What type of patient or equipment handling will be needed (III).

VP: The nurses did agree with the chosen way of exchanging information. They liked the thought of receiving only one alarm instead of multiple alarms at the same time that all indicate the same event.

The opinions differ when it comes to ‘action suggestion’. Several nurses were generally positive, even towards getting suggestions on potential problems. The doctor mentioned he would like which type of medicine (in a pump) is causing problems. He suggested presenting this on the overview screen, so he could see it from a distance. One nurse said “But by confirming you can save your actions in the PDMS-system [Patient Data Management System], right? This is a very quick way of entering data”. This was the participants’ reaction to a nurse with a negative viewpoint: she would not like to confirm or reject optional actions (as was proposed at this stage). Handling information is now made by the nurse to reschedule work. One nurse reacted: “If the equipment thinks for the nurse and the nurse need not think for herself anymore she may become recalcitrant against the system”. However, handling information given by the equipment could probably support the current filtering strategy. Therefore, altogether, we conclude that nurses would be helped by receiving some information by which they can reschedule their work—how this exactly should be designed has to be decided still.

The implication of the new system would be that a central system is needed that would connect all medical equipment and that also manual data can be entered, for example, on skin colour (I, II, III).

3.3.3 Personal multisensory display

In the first investigation, a tactile device was proposed and evaluated via storyboard technique. It notified by ‘a tap on the shoulder’. This did not seem to appeal to the nurses. In the second round, a multisensory (tactile-visual) display was proposed. It touches the skin, under the clothes (e.g., on the upper arm). It notifies by tactile patterns and accompanying yellow LED lights. Furthermore, when an auditory alarm for the nurse would sound, red LEDs would light up. If a colleague would get an urgent auditory alarm, her personal device would light up, visible for the other nurses. It had a ‘stop button’ right next to it, see Fig. 3. There was a ‘buddy function’: the personal device functions could be completely transported to a colleague, permanently (shift change) or temporarily. This device was prototyped (Fig. 3) and its evaluation was accompanied by storyboards explaining the whole workflow. The device was liked, because it was expected to result in sound reduction and the users expected that it would be immediately clear where something is wrong: which patient or which equipment. The multiple modalities allow for a more complex transmission of messages as compared to the tactile (mono) modality interface explored earlier.

Fig. 3
figure 3

Prototyped multisensory display as it was tested with nurses, under their clothes. It consisted of four RGB LEDS and vibrating elements. It was coded in MAX/MSP and signalled a rhythmic pattern accompanied by one of three different colours. It has a push button to stop and a sound element (for private sounds). The elements were placed in soft cloth under the clothes

3.3.4 Sound design

Four technical measures to reduce and change remaining sounds were proposed:

  1. 1.

    III Sound design for babies, parents and nurses. The sound requirements for the babies and the nurses are conflicting and have to be balanced. Infant’s well-being, health and development would benefit from a quiet ward with some stimulating comfortable sounds (also good for parents). Nurses also require reduction in average sound levels and maximal peak levels, but probably to a lesser extent. They also need the sounds to conduct their cognitive work. The sounds should be heard and understood.

  2. 2.

    III Auditory alarms should be significantly reduced. The remaining alarm sounds should be context-dependent by sensors and control loops directing the actual volume and frequency up built, to never be unnecessary loud and always audible.

  3. 3.

    III Hospital-driven measures, for example, changing nurses’ behaviour, as described in Sect. 1.3 should be applied to reduce context sounds. But also new hardware design is needed, such as incubators that do not slam loudly when they close and a reduction in mechanical sounds, for example, air flow sounds made by a respiratory device.

3.4 New proposed nursing work process

An envisioned workflow with the new technology was developed. Current work is characterized by an overload of alarms that need to be filtered to be able to conduct other work. In the new situation, only a few ‘real’ alarms are provided and these are expected to be handled consciously; the expectation is that missing of alarms will be reduced. The new work will be discussed next.

In the envisioned workflow, the nurse only gets information about her own patients and also about other nurses’ urgent alarms, see Fig. 4. All urgent alarms are really urgent. Considering that only a few times a day a really urgent alarm occurs, it should be possible to handle all these alarms with full attention. Vigilance therefore will no longer be a filter for urgent alarms, as it is now, but it will result from correct information provided by the system. It is immediately clear that an urgent alarm is about one’s own baby or about another baby. Which nurse will become clear by reading body language, verbal expressions or looking at the arms of colleagues (who has the red light?).

Fig. 4
figure 4

Envisioned new cognitive (and physical) work process for NICU nursing. At the left new incoming alarms and notifications and at the right the parallel handling of alarms, notifications, scheduled work and rescheduling of work

In the new situation, the nurse knows that she should immediately process any incoming sound alarm, and she can do that. If she is handling other work that cannot be dropped, she has to delegate some tasks. If a notification comes in, the nurse will recognize it by its modality (tactile). She is immediately informed also about future handling and she can reschedule her work if needed. Furthermore, she can already adjust her vigilance level and (re)direct her attention, if needed.

Background information is no longer required to assess urgency or physiological function involved but is used to handle the alarms. Furthermore, the expectation is that more often expected changes in the patient will be anticipated sooner, because of a more organized way to handle notifications. Also, the AI can be developed such that specific NICU relevant notifications can be developed.

When a complicated situation (calamity) occurs and multiple urgent alarms would simultaneously signal, this is not very helpful for the nurse. The AI should be able to recognize such a situation and adapt presentation modalities after attention is achieved.

The nurses evaluated the system. One said it is “like a virtual umbilical cord between nurse and patient”. The nurses expected the signals to become less annoying, which could reduce stress. However, “the nurses and doctors need to learn another way of perceiving information. At the start this will be difficult, a big step into the unknown. It will take time to get a good overview of the system”. The ‘buddy system’, the possible transfer of a patient to another nurse—including tactile notification, “might influence maybe the way of communicating; you do not need to shout anymore through the ward to the other side”.

Such a radical change in work process requires training of the new nursing work. Besides training new cognitive tactics, also general behaviour has to be trained (e.g., moving quietly). Together, this will mean a radical training programme, even before a pilot test can be done.

3.5 Investigations to implement the design

In the third stage of research, the complete set of measures was identified as well as investigations needed to develop the measures. In Fig. 5, an overview is given of all measures.

Fig. 5
figure 5

The set of investigations and development areas to solve the sound-related problems impacting health, development and well-being of infants at the NICU

4 Discussion

Fifty-two research questions and methods were identified in a range of expertise domains. It should be stressed that if the alarm system is to be developed and implemented, all these measures (see Fig. 5) should be anticipated to avoid problems. Sufficient reduction in noise and alarms means massive and fairly complete reduction. Single device improvements, the focus so far, will not result in babies’ undisturbed sleep. It is impossible to discuss all of these measures in detail. Some are fairly straight forward, for example, the measures regarding building materials at the ward and influencing behaviour have been well studied and described (see Sect. 1.3): these can ‘simply’ be rolled out. This discussion is limited to the main design challenges expected in strict development.

4.1 Persistent sound problems

The measurements at the wards show that the sound problems for premature babies are still not solved. Three measurements are shown in Table 1. The first two columns show ward A that participated in a sound reduction programme at time of measurement, in 2004/2005, and ward B, a reference ward. In the third column, ward A is measured again in 2012. The measurements are once again in the range as found in the literature, near the 55 dBA, and still far above the advised daytime sound levels of 45 dBA for the NICU (AAP 1997). Also peak volumes are similar to other measurements and much higher than the guidelines prescribe, that is, over 80 dB, while the impulse maximum according to Darcy et al. (2008) should be 65 dB.

Comparison with the previous situation is not possible because there are too many differences, for example, concerning number of patients at the wards. But the new measurement can be compared to the overview from literature, see Sect. 1.1, including the 2004/2005 measurements (Van Stuijvenberg et al. 2009) in a more global way. Such a comparison suggests that no improvements of the sound situation have taken place yet.

Imhoff and Fried (2009) report that there is still an overwhelming number of false medical device alarms at the paediatric ICU. These probably contribute largely to this situation. “Interestingly”, they state, is that “there is no scarcity of research addressing the problem of medical device alarms…. Still, very little is implemented in commercially available medical devices”. The sound-related health, development and well-being problems for premature infants are indeed persistent.

4.2 Artificial intelligence and graphical user interface

Development of AI to reduce alarms has been studied extensively over the years, see Sect. 1.3. Considering that despite these incentives no changes are manifest at the wards, developing and transferring into clinical use is not a trivial matter. Computer science is needed to find the proper algorithms. But before that also much human factors and medical research is needed. Main approaches towards AI development and related information presentation described in literature are the following:

  1. (A)

    Ethnography to understand conscious decisions—a cognitive systems engineering approach;

  2. (B)

    Making or applying existing medical protocols—a medical approach; if applied to technology—often in collaboration with engineers;

  3. (C)

    Extracting decision-making without really studying it, for example, by systems’ intelligence, detecting patterns in nursing behaviour—computer science;

  4. (D)

    Understanding and supporting dynamic fault management (alarm handing), for example, Woods (1994, 1995)—cognitive systems engineering;

  5. (E)

    Various user interface design approaches, for example, graphical design, industrial design, computer science;

  6. (F)

    Investigating best medical approaches coupled to different technology properties to achieve medical performance, for example, Claure et al. (2001). This type of work is often done in collaboration between engineers and doctors;

  7. (G)

    Developing automatic trend detection. When many parameters should be combined, especially, automatic systems can outperform nurses (e.g., see Hravnak et al. 2011, on cardio respiratory instability). Timely warnings can prevent urgent alarm situations—mainly medical research and computer science.

The applicability of the current approaches to the alarm redesign will be discussed as well as their limitations.

4.2.1 Approaches A and B—decision replacement

In the study, the cues used for nursing initial decisions were explored. The idea is that these could be used to feed AI so that AI can mimic the nurses’ initial decisions. The same idea was applied by Chen et al. (2010). They designed a monitoring system for anaesthesiology in the OR: “Production Rules are created for alerts, notifications and reminders. Alerts are combinations of one or several monitor, EMR, or calculated variables that may potentially cause adverse outcome if not addressed in a timely manner. Notification rules contain normal, abnormal and marginal ranges for variables such as bispectral index (BIS), monitored anaesthesia care (MAC), systolic blood pressure (SBP), heart filling volume, end-tidal (ET) CO2, peak airway pressure (PAP), pulse oximeter oxygen saturation (SpO2), body temperature, haematocrit (HCT), estimated HCT, glucose, positive end-expiratory pressure (PEEP) and creatine. All the rules and thresholds are based on well defined and agreed upon anaesthesia practice”. Their system was designed to reduce the overload of alarms in anaesthesiology. The authors claim it meets their expectations, but it is still being clinically tested so definitive conclusions cannot be drawn. However, it is likely that Pijl’s idea to integrate the large number of signals and readings the clinician uses and to combine monitor readings with data from hospital records or even manual input could massively reduce the alarms and aid the clinician.

It would be tempting to use Chen and colleagues’ design as the solution, as it is already implemented and being tested. This is, however, not an option. In anaesthesiology, there is a tradition of developing treatment protocols in the form of decision trees. Protocols for the NICU have only been developed for certain situations, conditions or treatments. For example, ‘respiratory nurses’ are trained to follow strict protocols, focussing on a specific group of patients, for example, premature babies suffering from IRDS (idiopathic respiratory distress syndrome). Therefore, almost all decision models have to be developed still. One might expect that these can be derived from observing and interviewing nurses and doctors, but also this is probably not (fully) the case. There are many clinical questions still, for example, about how to monitor SO2 and how to respond, Merritt and Mazela (2010), so decision models are a matter of ongoing research at the wards.

Unfortunately, the protocols from anaesthesia will only be usable in certain regards. There are many differences between anaesthesia and NICU nursing, for example, different (main) treatment goals, situation and patient type, conditions develop over longer time—such as gradual development of a sepsis or intestine problems, nurses taking over from each other and the parallel tasks of monitoring and caring for the patient 24/7.

Altogether the expectation is that identifying all the possible decision protocols for the NICU is undoable, certainly in the short term, so also other mechanisms of support will be needed.

4.2.2 Approach C—automatic decision imitation

This brings us to consider detecting patterns in nursing behaviour automatically and then applying it to next patients. This has to be done with great care, and maybe it is even not feasible. Besides regular treatment, nurses and physicians also conduct practice-based research to improve their care and investigations for evidence-based medicine. This means, amongst others, that patients are sometimes enrolled in new decision-making rules, or in control groups. Detecting work has to be done with care, and if work has changed by new procedures, it has to be timely corrected. The chance of this process going wrong is significant. There are fair chances of software to become outdated or following rejected decision-making models.

4.2.3 New approaches

To solve the limitations of approaches A, B and C, other principles from psychology should be considered to investigate. Not only conscious decision-making should be studied: Deliberation-without-attention in many cases outperforms conscious decision-making in complex decision-making (Dijksterhuis et al. 2006). The quality of both conscious and unconscious decisions depends on experience and expertise. Which cues exactly are taken in by the nurses are hard to assess. Therefore, in design work, it should be recognized that many cues are used, even cues the designer is not aware of. In general, any patterns should be made recognizable. According to Shapiro et al. (2006), identification of key pieces of data, pattern recognition and interpretation of significance and meaning are key elements in medical decision-making. These approaches have somewhat been applied, for example, in advertisement design, but have not formally been explored for complex biomedical interface design.

4.2.4 Approach D—dynamic fault management

The expectation is that it is impossible to replace the current system in one go. Alarms are spread over many devices and manufacturers. It is more likely that there will be a gradual shift from the current situation (Fig. 2) to the ideal situation (Fig. 4). Therefore, design guidelines about dynamic fault management and preattentive filtering of alarms will remain relevant for next phases. However, these guidelines were developed from and for other domains, for example, for an operating room or a nuclear power plant, see Woods (1994, 1995) and McCrickard et al. (2003). Specifics for NICU are needed. In particular, the task of rescheduling work parallel to handling alarms has only been described from the nursing domain (e.g., Groen 1995). In this literature, there are no indications of design consequences. Expectedly, the parallel rescheduling task will influence design requirements significantly and therefore has to be studied.

4.2.5 Approach E—user interface design

VP and SL proposed a simple overview screen showing physiological functions and their stability. When an urgent alarm sounds all background information is immediately visible. For handling notifications or self-initiated work also, background information can be looked up on the overview. This design is extremely close to the design by Chen et al. (2010). They designed a graphical view on the organs that is constantly visible. The colour and shape of the heart, lungs and circulation and brain give a live view on how the functions are doing. It is visible from several metres distance in clear colour and shape codes. For example, a low heart filling volume is depicted as a picture of a red coloured (means abnormal) half full heart. When closer to the monitor, one can read supportive data next to the organs explaining the details. An example alarm is a message across the overview screen, “potential hypotension”. Considering the similarities between the two designs, the graphical idea is probably an appropriate design direction that can be explored further.

4.2.6 Approach F and G—technological treatment support and automatic trend detection

These two approaches are ongoing in many research groups, and collaboration with these research groups is suggested. These approaches can have a positive impact on sound reduction and can solve some barriers to eliminating sounds.

4.3 Designing the multimodal alarm codes

Jones and Sarter (2008) state that tactile stimuli attract attention in a subtle yet reliable way and have been successfully applied for large-scale supervisory control environments and that tactons (tactile icons) can support attention and interruption management in complex, event-driven domains. Tactons can be used to convey complex concepts and ideas, and tactons can be combined in a hierarchical way, for example, a family of errors (Brewster and Brown 2004). For the NICU, a more complex set of messages is required; therefore, a multisensory display is proposed, combining tactile and visual modality in rhythm and patterns. An extensive literature search by Lu et al. (2011) comparing auditory and tactile modality interruptions of ongoing tasks revealed in general faster response times for tactile, while auditory interruptions were advised for alerts and complex messages and tactile for notifications. They note that most studies to date fail to vary ongoing task workload, although it may significantly limit detection times. Sklar and Sarter (1999) found that “tactile feedback did not interfere with, nor was its effectiveness affected by, the performance of concurrent visual tasks”. No studies were found comparing tactile/visual cues to auditory/visual cues. Design guidelines for tactile displays should be followed, wherever applicable for the multisensory display (e.g., Jones and Sarter 2008).

Tactile displays have not yet been successfully applied in medical domains. They are in the experimental phase, for example, in anaesthesiology; see, for example, the design tested by Ng et al. (2005). These devices have not been seriously explored for the NICU. If the tactile unit, which is placed in direct contact with the skin, is not comfortable, nurses could remove it and place it in their pocket or tamper with it (placing extra cloth under it, etc.).

The multimodal alarm system has to be coded in a coherent way, cross-modalities (sound/tactile/visual). A form family conveying, for example, physiological function or type of equipment is expected to be needed; no literature could be traced to give directions about such a cross-modality form family, not even if the search is limited to the combination of tactons (tactile icons) with visual cues.

4.4 Sound design

Remaining sound design is challenging, because not much is known about infants’ auditory capacities and comfort and also knowledge about proper auditory warning design for the medical setting is lacking, according to Edworthy and Hellier (2005). ICU alarm design has classically been done by thinking of beeps and tunes. For example, IEC 60601-1-8 recommends described melodies, but these have shown to be difficult to learn and are confusing for critical care nurses (Wee and Sanderson 2008).

To set goals of sound design, knowledge is needed about infant’s audiology capacities and experienced comfort. Auditory brainstem response (ABR) was considered as a potential method and was therefore explored in several studies. In a vast majority of infants even born at 26 weeks of gestation, an auditory signal could be detected in the developing brain (Coenraad et al. 2011b). Also, differences in aetiologic factors could be seen distinguishing between infants with certain diseases and without these diseases (Coenraad et al. 2011a). These outcomes are promising and suggest that indeed ABR can be used to quantify infants’ hearing. Secondly, to measure experienced comfort, an earlier developed comfort scale for neonates, could be useful (van Dijk et al. 2009).

Thinking out of the box will probably accelerate solving the ‘complicated’ problems. For example, how about just having an attention drawing sound, and then a nice female voice gently telling the nurse all the information needed about the urgent alarm? Tests on understandability and on memorizability will be needed, and creative design will be needed.

4.5 Designing multiple interconnected emerging technologies

The system comprises of several emerging (ICT) technologies for medicine, that is, tactile alarms, AI for medicine, multimodal alarm system and mobile communication in critical care. Furthermore, some of the medical and human factors areas have hardly been studied yet, for example, preterm infant’s perception and decision protocols in NICU nursing. Developing a system with the defined aims and properties is therefore a challenging exercise.

Specifically for such a condition, the method collaborative co-design for emerging multi-(ICT) technologies in medicine was developed (for surgery) in an international consortium with eight partners: academics, hospitals and industries, Freudenthal et al. (2011). Small teams developed various emerging technologies, and the components of several teams were combined in integrated prototypes and iteratively tested with users. User requirements were investigated for the total system and fed development. Integration meant that software and hardware of parts had to match but also that the system as a whole performed to meet medical requirements and human factors needs. Every researcher within a small team was responsible to identify the research questions and paradigms needed for his part and actual methods chosen. Several inventions were needed—this is typical for emerging technologies. Collaborative co-design has proven to be a method that can be used to develop integrated solutions in domains with several gaps in human factors, medical and technical backgrounds—therefore also for the alarm system, it could be useful.

4.6 Designing for Human-tech levels

Several authors (e.g., Vicente 2004 and Rasmussen 1997, 2000) have stressed the need for a systems approach. Unfortunately, there are limited examples in medicine about how to execute these aims. Also, in the case of the sound situation at the NICU, investigations only focus on certain fields of research. Investigations are either narrow, example is Ferris and Sarter (2011) who test a prototype - or the investigations are broad, example is Edworthy (2000) on the ergonomics of sound. But no study brings together the whole scope of selected issues that need to be considered regarding nurses’ work, and infant’s health, development and well-being and how they relate to each other.

There is a gap between details needed for product design, related to safety and operatability and higher level organization recommendations, coming from risk analysis and other evaluations. Indeed hospital risk management methods, for example, root cause analysis system (RCAS), see Bagian et al. (2002), analyse risk at ‘all’ levels, including recommendations about redesign of equipment. The nitty–gritty details for redesign are, however, generally missing. These should be covered by approaches such as prescribed in ISO 14971 as used in medical device industry. Unfortunately, “The FDA, in its regulatory review of new devices, focuses on individual device performance with relatively little attention to the integration of the device into the clinical environment. Furthermore add-on, multi-device communications systems have received little attention from the FDA, in part because they are currently in the grey zone of whether or not they are themselves medical devices”, ACCE (2006).

Long ago, airplanes were already developed as coherent systems, parts had to fit in the planned total system. There are some cross-disciplinary approaches around the world attempting to do just this, for example, Lockheed Martin and Johns Hopkins are teaming up to use methods from aviation to integrate systems in the ICU to, for example, “prevent alarm fatigue” and “to help us build a safer and more efficient ICU model” (Hopkinsmedicine website 2011) and Philips and Microsoft are teaming up to “actually provide access to health information when and where it is needed” (Microsoft website 2010).

Rasmussen (2000) indicates that “focus will be on the selection of those academic paradigms from the involved professions which are mutually compatible and, therefore, useful for the problems at hand” Bagian (2012), with a background in aviation as well as medicine, promotes the coupling of the organization oriented approaches, such as by RCAS, to the actual details in product design via human factors engineering methods and gives several examples from design practice. Unfortunately, such examples are scarce and how to execute systems design for a medical ward, like the NICU, with a multitude of separate devices and procedures is not described yet.

Therefore, a new method was devised combining human factors engineering with Fuzzy Front End (FFE), opportunity finding in industrial design engineering. This was adapted for the complex domain of critical care. The main adaptation was the use of several human factors methods and a focus on all Human-tech levels and on their interaction. Hereby, the traditional FFE focus, mainly business driven, was transferred into a human-centred approach.

5 Conclusions

The sound situation at the neonatal intensive care unit (NICU) is a persistent problem. New sound measurements reconfirmed a typical average sound level of over 54 dBA, which is far above the advised sound levels for NICUs and for concentrated work.

Furthermore, there are still peaks of over 80 dB. This is problematic for the neonate infants, because noise is a direct cause of long-lasting auditory problems and a significant cause of cardiovascular and respiratory problems and neurologic impairment.

Unfortunately, despite many investigations to reduce sound levels by various means, including environmental and organization measures and reducing (false) alarms, very few advances are implemented in commercially available medical devices. Massively reducing sounds at the NICU is needed, because sound perception is logarithmic in nature. The aim of the investigation is to identify how to accomplish this—how to improve neonate babies’ health, development and well-being.

An investigation in three stages was conducted, covering the earliest stages of design: understanding current work and identifying what exactly the problems are and what measures could be useful. The focus was on the selection of those solutions that are mutually compatible and, therefore, useful for the problems at hand.

The main proposed technology measures are the following (see Fig. 5):

  1. (a)

    A central system connecting all medical equipment. Artificial intelligence should integrate information (not only from devices) and alarms to identify initial presentation modality of the integrated message: urgent—as sound and visual; notification—tactile and visual (light/rhythm); or information—visual on screen and later information for handling;

  2. (b)

    Centrally controlled sound-generating unit(s) with redesigned sounds;

  3. (c)

    User interfaces: a multisensory display worn on the skin of the nurse to communicate about urgent alarms and notifications. Multimodal coding will be developed to present meaning cross-modalities, in sound, tactile, visual and in the graphical user interface that is important for instant understanding and later information handling;

  4. (d)

    Sound design anticipating infant’s and nurses’ perception, not disturbing patient or nurse, but attracting nurses’ attention and conveying meaning appropriately;

  5. (e)

    An information screen, including an overview of stability of physical functions;

  6. (f)

    A ‘buddy system’ allowing transfer of patient to another nurse;

  7. (g)

    To support nursing work, the initial signal of alarm and notification should convey: which baby is concerned or which equipment, urgency, which physiological function and type of handling needed. Handling information can be used to reschedule work for alarm handling and probably also to support the current filtering strategy.

Other measures:

  1. (h)

    An envisioned workflow with the new technology was developed. Current work is characterized by an overload of alarms that need to be filtered to be able to conduct other work. In the new situation, only a few ‘real’ alarms are provided and these can be handled consciously; the expectation is that missing of alarms will be reduced;

  2. (i)

    Training of new nursing work;

  3. (j)

    Influencing behaviour (e.g., moving quietly);

  4. (k)

    The new user interface should take into account parent’s experiences.

Several research and development approaches should be combined, including infant’s and nurses’ perception research; studying nurses’ decision-making, but also pattern recognition and other strategies of information use; behavioural research considering nurses and experience research considering parents; multimodal user interface design work; software and hardware design.

Although many authors indicate the need for a systems approach, methods to do so are generally lacking for the complicated situation in critical care. A new approach from the industrial design engineering domain was conducted, that is, combining Fuzzy Front End earliest stage product development and human factors methods, with a focus on all Human-tech levels and on their interaction. To manage the huge amount of interrelated problems and proposed solutions, a new tool was developed, the check matrix of human factors and design thinking. The expectation is that the tool will remain useful in strict development to guide Human-tech level integrations for concurrent engineering.

To develop the technical and organization system, concurrent engineering is needed, conducted by several teams of multiple disciplines working together, including technology developers, human factors experts, user interface designers and medical researchers. Because the design impacts the complexity of the nursing context, testing in the real context is required to validate the envisioned new workflow and to assess technology solutions. Development will be different from what is common now in medical devices: not the single device is considered but the whole ward as a system. The health and development of the patient should get the same high level of safety attention as, for example, the prevention of human errors in nursing tasks.