Keywords

1 Introduction

Designing interaction for interactive art installations can be challenging. Especially when it comes to evaluation, it is difficult to choose methodological strategies based on objective, reliable and rigorously validated methods in order to produce meaningful, generalizable outcome that can push the research community further. This is in part because the area is under development and there still is a lack of common practices compared to areas that deal with more traditional Human Computer Interaction (HCI). But it also relates to the goal of objectivity that clashes with the art world where experiences are fundamentally subjective [1]. It seems that the majority of art installations presented in the literature are evaluated as they are installed in real world settings – meaning that evaluation is carried out in order to gain insight into how the finished, implemented installation was received. While this is indeed important and meaningful there may be a need for evaluating various parts of the interaction in the stage of development as is common for other areas of HCI. For instance a common practice in HCI usability evaluation is to perform lab experiments exposing users to different stimuli or different variations of a user interface in order to evaluate the importance of different features. This is often quite challenging when the evaluation takes place in a rich context.

What makes the challenge greater for interactive art installations is that they often deal with evaluation of softer concepts such as the ones in question in this paper: intimacy and ludic engagement. Here rigid usability evaluation methods may fall short of measuring the successful or unsuccessful outcome of an interactive activity that is supposed to have an artistic effect on the participant. Höök et al. [1] argues that it is not easy to adapt HCI methods to an artistic context. As they state, “It would be ludicrous for us to suggest replacing art criticism with HCI evaluation, and we will not answer the question “is this good art?" Instead they propose using methods inspired by HCI for understanding usability issues that might be part of the experience of interacting with an art piece. Artists should be free to create their art works from their own perspective – but carrying out usability-inspired evaluation during develop-ment can help the artist see the artwork from the perspective the participant [1].

Examples do exist of more experimental comparison studies [2, 3]. The paper presented here can be seen as a contribution to this direction of evaluation within the field. Note that it is not argued that this type of evaluation is more important than more exploratory or ethnographic methods [4]. In fact later in the paper it is argued that evaluation in this field should be multi-faceted using mixed methods approaches in order to evaluate these softer factors often involved in interactive art installations.

This paper deals with issues discussed above while evaluating an installation that is under development and which is meant to provide a collaborative, intimate, and ludic experience to remotely located users. The initial development of this musical interactive installation called The OperaBooth has been described earlier in [5] where it was evaluated in an exploratory fashion using qualitative observation and interviews to derive initial feedback uncovering central issues related to user interaction. Here the evaluation is taken a step further by more rigorously evaluating certain aspects found important in the initial exploratory evaluation.

2 Mediated Intimacy and Ludic Engagement

Understanding how technology can be used to mediate emotions between remote users is challenging. Saadatian et al. [6] provide a nice review of technologies for mediated intimate communication. Further on Saadatian et al. [7] describe intimacy as “the perception of closeness to the extend of sharing the physical, emotional and mental personal space”, arguing that mediated intimacy can be divided into three overall dimensions (not mutually exclusive): emotional, physical and cognitive. The installation presented in this paper explores all three dimensions, in the sense that emotionally we are exploring ludic engagement through a cognitive/emotional non-verbal communication, while intruding the physical space of the participant. Vetere et al. [8] presents a model defining several themes involved with mediated intimacy divided into three stages – prior to the act (Antecedents), the act itself (Constituents), and consequences of the experience (Yields). For each stage they outline a set of themes including Self-disclosure, Trust, and Commitment (Antecedents), Emotional, Physical, Expressive, Reciprocity, and Public & Private (Constituents), Presence in absence and Strong Yet Vulnerable (Yields). This framework has helped steer the development of the installation focusing mainly on the Emotional, Expressive, Reciprocity and Strong Yet Vulnerable attributes. While physicality in a literal sense is often viewed as a strong modality for intimate communication [9, 10] it has not been used directly here. However, the installation explores physicality as the physical space close to the participant in the sense that it became a goal to give the participants the capability of virtually intruding each others intimate space.

Exploring ludic engagement was not a focus from the start of the project but emerged as exploratory evaluations were carried out with test participants. Through these studies [5] it became clear how important the humoristic, playful and exploratory and open-ended properties of the installation were for the intimate connection between the participants. As such this aligns well with the Gaver et al.’s [11] assumptions about designing for ludic activities: “Promote curiosity, exploration and reflection”, “De-emphasise the pursuit of external goals” and “Maintain openness and ambiguity”. Discussions about Ludic Engagement by Gao et al. [12] also fit well here as they state: “systems that promote ludic engagement should not be concerned with achieving clear goals, or be overly structured with defined tasks”.

3 The Opera Booth

The installation evaluated in this paper is called the OperaBooth. While still being under development, it is a result of two iterations of development. The initial idea for the interactive art installation was to create a platform where strangers from different parts of the world would be able to experience having a mediated intimate experience together. The goal was to show that humans that come from regions of the world that are politically or otherwise culturally diverse are still just humans and can share human intimate pleasurable experiences. For this, the starting point was to use the international language of music as a means for communication. And since most of the potential participants of the installation would not have any musical experience this posed a series of challenges. These challenges did not necessarily deal with the overall artistic intentions of the piece but more with the user experience or usability of the piece. These initial challenges were summarized in the following (note that ludic engagement was not a specific concern at the starting point. Only after carrying out evaluations of the system in use we became aware of the importance of providing users with a ludic experience):

  • To provide intimate communication through musical exploration (non-verbal communication)

  • Exceed intimate space of the other participant (explore vulnerability)

  • Provide simple control mappings catered towards musical novices

  • Make the control interface expressive

  • Explore different roles for each player for improved musical communication

The following describes in short the development of the installation over two iterations. The evaluation that was part of the second iteration described in the following as Prototype 2 led to the evaluation presented in Sect. 4.

3.1 OperaBooth Prototypes 1 and 2

The premise for developing the OperaBooth was to develop an interface that would provide novice users with a sense of being able to communicate musically with each other. Several different input technologies were considered (including those dealing with more physical and whole body interactions) before settling on exploring facial gestures as the means for making music. This choice was partly based on the idea of participants being able to see each other’s faces while performing different gestures. Additionally, it was found that the face-tracking algorithm by Kyle McDonaldFootnote 1 called faceOSC provided a playful and interesting musical controller. The open-source algorithm detects the face of the user and processes information such as size, position and orientation of the detected face; mouth height and width; eye-size; and eyebrow position. Besides providing an interesting controller for exploration of sound, it was anticipated that face-to-face communication between remote strangers (similar to a Skype video conferencing application) would enhance the intimate connection – especially, since users would have to make many different facial gestures to control the music. Different forms of musical output were also explored including various forms of amplitude and frequency modulation, granular synthesis and sample-based synthesis. While there were many interesting combinations between the controller and different synthesis algorithms it was decided to go for a musical output that connected naturally to the movements of the mouth. Different types of voice were explored including singing, shouting, baby laughter, and bird song. Finally, opera voices were chosen mainly for their theatrical quality.

A simple prototype was built using faceOSC for facial tracking and Ableton LiveFootnote 2 for handling the audio. A MaxFootnote 3 patch was used to handle communication between faceOSC and Live. Additionally, Max was used for sending a live video stream of the face of the participants between two laptops – similar to traditional video conferencing. The audio included custom recordings of female and male voices singing “ahh”, “ooh” and “bah” notes on a harmonic minor scale (three octaves). From a user point of view, opening one’s mouth triggering a random opera singing sample, that was looped using Live’s built-in Sampler, Footnote 4 and thus kept going until the mouth was closed. Both a male and a female voice was implemented. Additionally, a background track was produced as a string section playing harmonic minor chords in the same key as the voices. The two laptops were placed inside a cardboard box and lights were added in order to improve the tracking. Figure 1 (top) gives an impression of the initial prototype (see [5] for more details).

Fig. 1.
figure 1

Prototype 1 (top) and Prototype 2 (bottom) of the OperaBooth

Based on a simple evaluation of the first prototype a high fidelity prototype (Prototype 2, Fig. 1 - bottom) was developed with the following improvements:

  • OperaBooth boxes now communicated with each other over network—dealing with latency issues and synchronization.

  • Direct eye-contact between the remote users was enabled.

  • Recorded samples were improved.

  • Lighting conditions were improved for better tracking.

  • Perceived latency was reduced.

This second prototype was also evaluated, outlining factors that were important for the intimate experience between the installation participants. It was suggested that the humoristic and theatrical feel of the opera genre was important for the overall engagement. It was also questioned whether users had too limited control of the musical output suggesting that increased control might lead to increased exploration. Finally, while it was somewhat clear that participants achieved an intimate connection with each other, it was questioned whether that connection was due to the direct eye contact and intrusion of intimate space or to the interactive musical interaction.

4 Evaluation

As described earlier, evaluation of interactive installation art is an on-going challenge as factors like experience, play, exploration and emotion become central, as opposed to function and performance. Morrison et al. [13] suggest approaching such evaluation through a “Lens of Ludic Engagement” by building on works by Gaver [14] arguing how success criteria differ from those of more traditional HCI. Several approaches have been presented for evaluating such criteria both qualitatively and qualitatively. Jaccucci et al. [15] mention several quantitative methods and end up using Positive and Negative Affect Schedule (PANAS) [16] together with interviews and video recordings to evaluate visitor experiences of two interactive art instal-lations. Similarly, Kortbek & Grønbæk [17] use a mixed methods approach including their own multiple-choice questionnaire and interviews to evaluate interactive installations in an art museum. Gilroy et al. [18] analyse trajectories of affect relating to Flow by using the Pleasure-Arousal-Dominance (PAD) model by Mehrabian [19] to evaluate the user experience of an augmented reality art installation.

The evaluation presented here attempts to evaluate and compare the importance of various factors crucial for ludic and intimate communication – factors that have been identified in less rigorous evaluations [5]. Here experiments are carried out with the following overall goals: (1) to understand the importance of direct eye contact, (2) to understand the influence of using different musical outputs and (3) to understand whether providing participants with more detailed control supports exploration. All three goals have been held up against the overall purpose of supporting intimate and ludic engagement between participants.

4.1 Methodological and Technical Setup

The evaluation was a comparative study of how well different versions of the installation performed in terms of whether participants experienced an intimate connection with each other, how ludic engagement emerged and how musical communication was supported. Three different versions of the system were prepared, each with a different mapping between facial expression and musical output (note that only the detected mouth height was used to control the musical output):

  • Regular Opera: Regular system, where only mouth open/close triggered random opera singing notes.

  • Responsive Opera: Responsive system, where six different mouth-heights each triggered an opera singing note (pitch increased with height of mouth).

  • Responsive Synthetic: Responsive system, where six different mouth-heights each triggered a different pitch of an abstract synthetic sound note (pitch increased with height of mouth).

The regular opera version worked as described for Prototype 2. The responsive opera system detected six different thresholds of mouth height each triggering a different pitch of the opera voice. The idea was to give participants a stronger sense of control of the system leading to increased exploration of the system. The responsive synthetic version detected the same thresholds as the responsive opera system but was mapped to abstract synthetic notes – using Ableton Live’s built-in Sampler (presets “Lead-Dark Thought” and “Lead-Ambient Encounters”). Reasons for choosing the two specific synthesis timbres included: (1) they accompanied the background musical theme nicely, (2) the two voices were distinct from each other approximating a female and male voice, and (3) the timbres were humoristic when played with the mouth.

Finally, the system was setup so participants could either see each in (1) a full screen mode enabling direct eye contact or in (2) a limited screen mode where participants saw each other in a window that filled approximately two thirds of the screen and was moved slightly to the left on the screen. This setup enabled a simulation of a non-direct eye contact interaction. See Fig. 2 for the setup used for the evaluation.

Fig. 2.
figure 2

Setup used at the evaluation. The two OperaBooths were set up near each other for convenience. Screens between the OperaBooths displaying faces of participants were video recorded.

4.2 Test Procedure

Experiments were carried out over 2 days in February 2015. 24 participants (15 female, 9 male) or 12 pairs took part in the evaluation. Each pair went through three sessions, in each of which they were asked to try a different version of the OperaBooth for 3 min followed by answering a questionnaire with Likert scale questions. Here they were asked to which extent they agreed or disagreed with 12 statements regarding their overall engagement, their experienced connection with the other participant (intimate, playful, humoristic, uncomfortable), their exploration of the system, their perceived control of the system, their ability to express themselves musically, and finally overall pleasure.Footnote 5 The test subjects were not told about what the installation was about or how to control it prior to interacting through it. The only instructions they received were to look inside the box and open their mouth. Finally, when all three trials were over, a short interview was carried out asking the participants to explain to each other how they had experienced the installation.

Three different overall versions were tested: (1) Regular Opera, (2) Responsive Opera and (3) Responsive Synthetic. Each pair of participants tried each version in randomized order to avoid learning biases. Finally, since the goal of the evaluation was also to examine the importance of direct eye contact, one of the three versions in each trial was experienced without direct eye contact. Each session was filmed capturing both faces, their tracking data, and the resulting audio in the same.

5 Results

5.1 Quantitative Data

Regular Opera scored slightly better than Responsive Opera – see Fig. 3. The only significant differences between the two were found for expressiveness where the Regular Opera was rated higher and for frustrating, where Regular Opera scored lower. Interestingly, the Regular Opera version, which only responded to opening/closing of the mouth scored higher in both control and exploration although with p-values of 0.12 and 0.14 respectively.

Fig. 3.
figure 3

Shows me an scores for the three versions of the OperaBooth: (1) Regular Opera, (2) Responsive Opera and (3) Responsive Synthetic.

The Regular Opera version generally scored better than the Responsive Synthetic version. Significant differences were found between all scores except more time (p = 0.14), musical (p = 0.20) and pleasing (p = 0.18). The Responsive Opera version, which provided participants with more control, scored mostly between the other two. Significant differences between Responsive Opera and Responsive Synthetic was however found for connection and intimate connection scores. This indicates that the participants were not able to connect as well with the synthetic sounds as with the opera sounds. This is also supported by the qualitative data as explained later.

Surprisingly, only marginal differences were found when comparing scores for versions experienced with direct eye contact and non-direct eye contact for connection (I felt a connection with the other person), intimate connection (I felt an intimate connection with the other person) as seen in Fig. 4. The only significant difference between the two was found for expressive (I felt that I was able to express myself musically) with a p-value of 0.05.

Fig. 4.
figure 4

Shows mean scores for the versions with (1) direct eye contact and (2) non-direct eye contact.

5.2 Observations and Interviews

Video recordings and interviews were analyzed using a critical incidents approach where critical events relevant to the overall purpose of the evaluation were identified. Incidents where participants expressed surprised, bored, confused, in control/non-control, communicative, uneasy (looking away), happy (smiling/grinning), and thea-trical where identified and noted in order to compare between the different versions.

Observations and subsequent interviews generally showed a great appreciation of the installation. For some participants however (approximately 15 percent), the installation was not understood well enough for them to have an engaging experience. Participants never explored the system enough, they were too passive, or the tracking did not work as intended (this was the case for three of the participants).

Generally, subjects seemed confused when first encountering the installation. The participants who tried the Responsive Synthetic version first, found it difficult to understand that they influenced the sound and to understand who was influencing which voice (See Fig. 5a for the passive confusion of the participants). This was most likely to do with the limited naturalness of the connection between mouth and sound. Participants who started with the non-direct eye contact version were also confused about what the other person was able to see. Observation revealed that the communication here was reduced and the exploration of the system became a more personal experience. Participants seem to look more at representations of one another than connecting with one another (see Fig. 5c).

Fig. 5.
figure 5

Shows screenshots of interactions with the OperaBooth representing (a) confusion, (b) engagement, (c) disconnect, (d) competition, (e) exploration, (f) intimacy and (e) added gesturing.

The opposite was observed for the direct eye-contact versions where there was increased non-verbal communication (eye-contact, smiling and grinning as reaction to the movements of the other participant, musical following and turn-taking – see Figs. 5b, d, e, f, g) – confirmed also by interview data. Engaging in direct-eye contact was expressed as feeling intense, as the feeling of sometimes not being able to look directly at each other or as the feeling of being trapped in front another person. One participant even stated: “It felt like he could smell my bad breath”. A few participants stated that the musical experience made it easier to maintain eye contact when they were in control of the sounds and were able to “communicate” with each other, in contrast to the silent and doubtful parts that felt very intense and awkward. One participant even felt embarrassment towards the other because she was not able to control the voice. Observations that supported the notion of intense communication included participants looking away or even pulling their head out when laughing too loud.

The Responsive Synthetic version seemed more playful for the few who were able to control the interaction (three participants were able to fully control this version and all three tried it as the last version) – see for instance Fig. 5d, where participants are almost battling about who could reach the highest note. Still, the Regular Opera was the most preferred version of the three especially for its musicality, naturalness and appealing sound – even for participants who stated that the synthetic version was more playful. As one participant put it: “The opera voices really the sense that we were really singing together”. As also the quantitative data suggests the Responsive Opera version was perhaps too difficult to control and therefor lead to less exploration, probably because participants ended up producing more monotone sounds than the ones experienced where the voices were randomized.

Only few improvements were suggested – these included a more intelligent algorithm that would detect higher level features such as smile, surprise, confused, etc. and express this through sound. A few participants also stated that they felt inhibited because they wanted to use their hands for communicating – even three of the pairs waved to each other during interaction (See Fig. 5g). According to them, including some kind of hand gestures would have enhanced the communication.

6 Discussion and Conclusion

This paper has presented a lab-based usability evaluation comparing different versions of an interactive art installation called the OperaBooth using both quantitative and qualitative data gathering techniques. It is interesting that the observation and interview data does not align with the questionnaire data when it comes to the question of how important direct-eye contact is. A reason could be that when participants provided feedback through the questionnaire they were not conscious about this particular part of the installation, focussing more on their direct interaction and control of the sound. Another explanation could be that even though participants felt a difference, it was overshadowed by the experience of trying to understand how to control the system.

In that respect it can be relevant to ask, whether a quantitative approach like the one presented here is effective for this setting. The answer would probably be: probably not if it is to stand alone. However, as part of a multi-faceted mixed method the quantitative approach is effective at bringing forth new insight about certain aspects especially to do with the usability of the system.

Finally, it is the author’s strong belief that evaluating different alternatives, whether it is using qualitative or quantitative methods brings us a step further at realizing not only whether some forms of interaction work or do not work, but how important certain factors are for the success of those interactions. Understanding the influence of certain factors for enhancing mediated intimacy or ludic engagement is what can help drive the research forward.