1 Introduction

The term XR refers to all real-and-virtual combined environments and human-machine interactions generated by computer technology and wearables, where the ‘X’ represents a variable for any current or future spatial computing technologies. In other words, it is an umbrella term encapsulating augmented reality (AR), virtual reality (VR) but also haptic controls, holograms, and in general, all-immersive techniques, which can either enhance or deceive our natural senses, or both Extended Reality Convergence (2019). More in detail, AR is a generic term for a continuum ranging from Assisted Reality to mixed reality (MR) Milgram and Kishino (1994). This means that in a perfect MR setting, users cannot distinguish between virtual elements and real objects Dwivedi et al. (2020). The perception that extended reality (XR) applications are limited to the gaming sphere is disappearing, as their possible range of applications is rapidly expanding (Hong-zhi et al. 2019; Muñoz-Saavedra et al. 2020; de Souza Cardoso et al. 2020).

From a managerial point of view, it is important to highlight that the amount of funding poured into such technologies is booming: the consumer and industrial market spending is estimated to grow by + 69% and + 134% in the next three years, respectively. While their cost is falling, their performance is rising: a widespread adoption is expected soon, along with the new habits and routines they will shape in people’s lives (Waking up to a new reality 2019).

As such technologies mature, it is also interesting to observe that the traditional separation imposed by hardware devices between AR and VR is fading. Indeed, more and more head-mounted displays, starting from those based on mobile phones up to those engineered for cutting edge applications such as the Varjo XR-3 (Introducing Varjo 2021), are converging towards tools capable of implementing all types of XR scenarios, opening the path to an increasingly mixed world.

One domain in which such types of systems may well integrate and bloom is fashion. The fashion industry’s digital transformation process is pushing online branding and the adoption of e-commerce platforms. To this date, the online fashion segment is worth 332 billion USD, representing 28% of all e-commerce transactions. Such value will likely increase Digital Market Outlook (2019): mobile commerce (m-commerce) is sustaining such growth in the US for a share that exceeds 50% of the retail clothing market.

An interesting question amounts to whether online XR-based commerce (x-commerce) may further support such a trend. Nevertheless, the use of XR technologies in fashion scenarios is currently limited to exclusive and sporadic shows, aiming at achieving a “wow-effect” capable of attracting customers and/or enhancing brand reputation. The reason is a combination of (a) lack of maturity of XR technologies, (b) low adoption among consumers, and (c) low understanding of their use by non-specialists, be they consumers or businesses. Let us remind that not too long ago, in the early 2000s, a similar situation already took place: the e-commerce market was, in fact, in its primordial stage. At that time, Venkatesh et al. (2003) investigated by which means it could grow with the formulation of the following research questions:

  1. 1.

    “What do consumers want?”;

  2. 2.

    “Would consumers adopt it?”;

  3. 3.

    “How would consumers behave?”;

  4. 4.

    “What capabilities are required to make e-commerce viable?”.

Two decades later, the same guidelines may be adopted when approaching a next-generation digital commerce system.

For question #1, let us isolate one of the aspects which may influence the shopping experience of a customer, making him/her feel at ease. The scientific literature has found in the relationship built with a professional salesperson an aspect which may improve the loyalty and the satisfaction of a customer entering a retail clothing shop (Goff et al. 1997; Marzo-Navarro et al. 2004). In terms of interaction, such a relationship entails the exchange of visual and verbal information.

On this basis, we hence proceeded to answer question #2. To provide the visual and vocal interactions required by a next-generation digital commerce system, it is possible to resort to emerging technologies such as virtual reality and voice assistants (VAs) or, in other words, to x-commerce systems. Recent forecasts estimate that in the US by the end of 2021, 57.4 million people will use VR, while almost 120 million will be using VAs (US Virtual and Augmented Reality Users 2020; Voice Assistant Use Reaches Critical Mass 2021).

Having answered #2, we can now concentrate on how consumers may accept x-commerce (question #3). Interestingly, preliminary works show promising results when integrating VR and voice in a simple gaming environment Weiß et al. (2018). Nevertheless, the potential of embedding an off-the-shelf voice assistant in a more complex retail x-commerce scenario has not been investigated to this date, to the best of our knowledge.

In this work, we narrowed down the research space to the design and implementation of a VR-based shopping experience for fashion retail, where we assessed the benefits of integrating an off-the-shelf VA such as Amazon AlexaFootnote 1. We extended a preliminary study, carried out by  Morotti et al. (2020), to provide detailed insights regarding the technical solution along with a thorough assessment, given the possible use of VR-Voice Assistant (VR-VA) integrated solutions in x-commerce. We hence sought for clues that could help answer an updated version of question #4: “Would an integrated VR-VA environment make fashion retail x-commerce viable?”. To this aim, we approached the problem from two different perspectives to understand: (a) the usability of the proposed VR-VA environment and (b) its feasibility for a fashion retail x-commerce application. Such questions have been answered, interviewing a special group of users, fashion experts, who fully tested the proposed x-commerce experience. Summing up, the main contributions of this work may fall in the following categories:

  • Usability This work confirms that the use of vocal interactions within a fashion retail x-commerce may effectively improve the shopping experience;

  • Technology-readiness Off-the-shelf voice assistant technologies such as Alexa, thought for personal scenarios, may be challenging to integrate seamlessly in public ones, such as retail x-commerce;

  • Fashion x-commerce The use of fashion x-commerce applications may exceed the goals of engagement and branding, as they could be fruitfully enjoyed for the purchase of specific categories of products.

In the next section, we review the most relevant state-of-art academic and industrial works for our contribution.

Section 3 describes our research hypotheses and the design of the proposed VR experience based on two fashion applications. Hence, we introduce the questionnaire we submitted to our group of users to evaluate their VR experiences, and we report the obtained results. Subsequently, in Sect. 5, we draw some observations and present future works, expanding the study to AR fashion applications. Finally, In Sect. 6, we provide an overall overview of our work and argue the possible future of x-commerce.

2 Related works

The use of XR technologies has inspired many different initiatives and possible application scenarios in the fashion field, in both industrial (The topshop virtual reality experience aw14 2019; Dior eyes: Virtual reality headset 2019; Covergirl offers augmented reality makeup trials in times square 2019; Double the fun - the world’s first online-to-offline virtual fitting solution 2018; Increase your sales and reduce your returns thanks to accurate sizing 2018) and academic (Park et al. 2018; Xi and Hamari 2021; Singh et al. 2019; Cruz et al. 2019; Van Kerrebroeck et al. 2017; Papagiannidis et al. 2017; Rauschnabel et al. 2019) contexts. In the XR-speech interaction domain, recent academic researches have analyzed how verbal commands could be positively embedded in immersive experiences to facilitate the approach to new devices, making the interaction more natural (Callaghan et al. 2019; Polap 2018; Farinazzo Martins et al. 2016; Araújo et al. 2016; Wang et al. 2018).

To the best of our knowledge, only two works have so far created and analyzed a VR shopping environment controlled by speech inputs (Speicher et al. 2017; Speicher 2018). In such works, the authors described the design and implementation of a mobile VR shopping setting where product search was based on speech inputs (such inputs were processed at word level using Google’s speech recognition web service). In the evaluation study, the authors investigated the task of searching for a product in a VR web store with different combinations of hands-free input (pointing versus speech) and output (desktop versus VR) types. The combination of speech inputs and VR outputs resulted to work best in terms of user performance and preference.

Now, it is noteworthy that a new thrust to novel XR-based fashion initiatives may come from the growing popularity of verbal interaction services. Google Assistant (Android), Cortana (Windows), and Siri (Apple) provide useful and easy-to-use voice assistants on smartphones (e.g., make a call, convert vocal messages to text). Smart home speakers like Amazon Echo, Google Home, or Apple HomePod provide vocal interfaces able to play music from online platforms, control indoor lights, or the AC’s temperature. In May 2020, Statista interviewed 1,015 US citizens older than 18, asking them whether they were currently using a voice-operated personal assistant: more than 600 answered positively (Do you currently ever use a voice-operated personal assistant 2020).

To the best of our knowledge, the VR system proposed in this work is the first to integrate the use of an off-the-shelf voice assistant to foster interaction with a digital sales assistant. With respect to the state-of-art literature, in summary, it offers the following elements of distinction: (a) the use of a home run HMD device (i.e., HTC Vive headset), (b) the support of voice commands with Amazon Alexa, and (c) the evaluation by a group of non-tech-savvy fashion domain experts.

3 Research method

3.1 Apparatus

The components adopted in the proposed system are:

  1. (a)

    A head-mounted display, namely an HTC Vive headset. At the time of the experiments, this specific device slightly outperformed the Oculus Rift in terms of perceived ease of use and perceived intuitiveness Suznjevic et al. (2017);

  2. (b)

    A Dell Alienware desktop models (2021);

  3. (c)

    An Amazon Echo Input speaker powered by Alexa, the AI-based voice assistant developed by Amazon.

3.2 Experience design

To quantify the users’ satisfaction with x-commerce for fashion retail and the possible benefits that may derive from the integration of voice-based interaction into the VR environment, we built two different immersive experiences. Both may be considered prototypes of fashion x-commerce services and forerunners of a virtual shop to provide users with different experiences Mirri et al. (2018).

In the first one, a user moves onto a small desert island, a metaphor for a lonely and isolated setting, hosting a virtual dressing room; exploiting the hand controllers, it is possible to select and try on clothes, accessories and explore the environment. In the second, a user is immersed in a classical fashion shop environment, a place where customers may entertain dialogues with other parties. It is possible here to try on fashion clothes as before, but in this case, interactions can occur with a sales assistant avatar through vocal commands. In particular, the sales assistant can answer a user’s questions and convert his/her words into (virtual) actions, just as in a physical store when a customer asks the staff for clarification and/or assistance. The voice assistant simplifies the interaction between a user and the VR environment, restoring one of the most natural forms of human interaction, i.e., voice, which may also help to alleviate the sense of alienation and loneliness created by the virtual world.

To further explain the reasons behind our choices, the desert island scenario was chosen since it is an environment that is very far from the traditional VR shop. What we emphasized with this setting is the difference between a distant and lonely environment and a “normal” shop, represented by the Virtual Store integrating Alexa’s vocal commands, where both the location and the social interactions may resemble a real situation. This amounts to an argument of interest for psychology and marketing research. Kim et al. (2005), for example, found that retail environments can reduce loneliness and, in turn, attract given classes of consumers to spend their more time and money. In conclusion, having our participants test these two experiences in sequence, we wanted to see the effect of such differences on their answers.

Our results are built upon the comparison between these experiences. More detailed presentations of the two applications are reported in the following.

3.2.1 Fashion Island application

Our baseline application amounts to “Fashion Island”, which may here be considered an introductory level VR-based fitting room Donatiello et al. (2018). As shown in Fig. 1, a pre-modeled male avatar is placed on a desert island, barely dressed in front of a mirror. The game consists in dressing him up by exploring and selecting the available shirts, pants, and accessories. With one hand-controller, the user can teleport in the virtual environment, while the other is used to select and wear clothing items. A simple graphical interface is placed in a fixed position on a large mirror, located at the center of the scene. The avatar can move his hands, arms, and head according to the HTC Vive tracking system and watch his movements in the mirror: this is implemented to improve the sense of immersion and presence.

In essence, a user can somewhat experience a playful environment that, however, has been deprived of any social experience, as the user (avatar) is “abandoned” by him/herself on the desert island.

Fig. 1
figure 1

Some frames of the“ Fashion Island” environment. At the top, the third-person view of the user; at the center, the avatar exploring the available shirts from the user’s view; at the bottom, the avatar appraising the selected outfit in the mirror

3.2.2 A voice-assisted fashion store: virtual store

The second test application is called “Virtual Store”, as it proposes a shopping experience in a fashion store where both male and female clothes are available for (virtual) shopping. The top image in Fig. 2 shows the environment, designed with modern-style elements. A very simple avatar accompanies the user and embodies a shopping assistant in the form of a friendly, smiling emoticon. The emoticon is always in the user’s field of view, together with the cart icon, amounting to an essential and minimally invasive interface. The assistant can interpret the customer’s verbal commands, thanks to its integration with Amazon Alexa, realized exploiting Alexa Skills to perform speech recognition and natural language understanding. As a result, the assistant can explain how to use the shopping platform and provide information regarding the items that are selected by the user: such information is both shown on a supporting panel (central image in Fig. 2) and spoken out by Alexa. Via vocal commands, the user can also add a product to the cart (bottom picture in Fig. 2), control the cart itself, and finalize the purchase of the selected products. All of such functionalities may also be activated by hand controller commands; only navigation is exclusively managed via controllers.

Fig. 2
figure 2

Some frames of the Virtual Store application. Top: the interior of the male sector. Center: pointing a dress displays its price and the information that was previously asked the shopping assistant. Bottom: the user add-to-cart action exploiting Alexa

3.3 VR-VA architectural framework

The virtual assistant integrates the Amazon Echo Input speaker, one of the most affordable smart speakers on the market. Besides the specific device’s widespread use and popularity, our implementation choice was driven by the simplicity of the development framework and the support offered by Amazon itself, Amazon Alexa (2020).

Amazon allows device manufacturers to use alexa voice service (AVS) for free. This approach offers voice recognition and vocal synthesis through a cloud-based service that provides APIs to exploit Alexa’s automatic speech recognition and natural language understanding.

We hence chose to leverage the use of the Echo device and the Alexa Skill capabilities. Those Skills (specific apps for Alexa) take full advantage of the amazon web services (AWS) platform, acting as an infrastructure provider where services can be accessed via REST API calls. Skills are hosted and run in the cloud directly in AWS, adopting a serverless approach, using abstract platforms paid by a ranging fee. With such an approach, all the logic is always online, and no computing infrastructural matters need to be dealt with.

As anticipated, AWS provides the entire stack of technologies needed to make our Skills effective:

  • DynamoDB provides data persistence inside tables;

  • Lambda runs the business logic in the cloud;

  • The NodeJS API Gateway allows the use of web sockets to instantiate a communication between endpoints.

In essence, Alexa natively generates a conversational model and detects the requests from the user’s voice command to execute the right branch of code.

It is important to note that such an approach imposes some constraints, dictated by the design guidelines of Alexa’s Skills, which may interfere with an application’s flow. The first one is the impossibility to leave Alexa on hold: if a user stays silent for over 30 seconds, the dialog session expires. In such cases, a new session can start by saying out loud one of Alexa’s wake words (“Alexa” or “Computer”).

Another restriction regards the case in which Alexa asks a question and no answer is provided within 8 seconds: in this case, Alexa repeats the question only once, then the session expires.

In addition, the Alexa account that has been employed in our setting has been trained with a male’s voice: we observed recognition uncertainties with a few female speakers (although all experiments occurred in an open room with a noisy background). This may not represent a problem as assistants are usually employed by their account-holders.

Finally, despite the limited customization opportunities set off by a proprietary solution, we preferred a proprietary voice assistant like Alexa over open-source ones, like Snips (2019) or Mycroft site (2019), for the following reasons: even if the alternatives allow for greater freedom in designing the user experience, they still have some disadvantages, such as a lower adoption rate, a higher development time or even a lack of some components of the assistant (e.g., the voice synthesis).

3.4 Participants

We recruited thirty-one subjects with at least one year of experience, with different roles, in the fashion field, including (a) researchers from the Fashion faculty at the University of Bologna, (b) Master students from the Design and Technology for Fashion Communication Master’s program at the University of Bologna, (c) Bachelor students from the Fashion Cultures and Practices program and Master students from the Fashion Studies program, offered at the University of Bologna, and, (d) professionals (including product developers, photographers, and clerks) working in the fashion field. They tested the applications and answered a survey where they reported their opinions about their perceived comfort and XR usability. The considered population had an average age of 33 years and was composed of 25 female and 6 male students. Most of the participants declared themselves to be non-tech savvy, not having any expertise in 3D-model desktop applications, nor to play video games at all.

None of them had tried the HTC Vive device before, their opinions were hence well suited to evaluate the ease of use of our interfaces and experiences.

The number of participants (31) amounts to a trade-off between the necessity of acquiring sufficient feedback data from a specific population of participants (i.e., fashion experts) and the time spent for the evaluation phase. This number is higher than 10, which has repeatedly proven to be sufficient to discover over 80% of existing interface design problems (Salomoni et al. 2017; Hwang and Salvendy 2010; Faulkner 2003).

3.5 Ethics

Written consent to participate in this experimental study was collected from each subject. The entire experimental session was possible thanks to the protocol adopted in May 2021 by the University of Bologna, granting access to the Virtual and Augmented Reality Laboratory. This protocol allowed participants to safely employ the devices available at the laboratory.

3.6 Assessment model

Individuals’ acceptance of technological innovation, such as x-commerce, may be analyzed through different theoretical approaches, already established in the literature. One of the most notable is the technology acceptance model (TAM) Davis (1989), which is based on two main key points, namely, perceived usefulness and perceived ease of use. Perceived usefulness is defined as the degree to which individuals believe that adopting one particular technology will improve the performance of their work, whereas perceived ease of use is the degree to which an individual thinks that working with a particular technology will be easy to use. Since the perceived ease of use is intended not only in terms of physical but even mental effort, our research aims to analyze the impact of virtual interfaces on the consumers’ appreciation. In user-centered design and good practice in user interface design have already developed their international standards and general principles in well-established computing systems BEVAN (2001), the ease of use of XR devices is still a challenging issue (Wrzesien et al. 2015; Muhanna 2015).

Above all, immersive VR devices may results difficult for non-expert users (Donatiello et al. 2018). Nevertheless, more usable virtual content and scenes may positively affect consumers’ perception and attitude towards VR technologies. We have hence stated the following hypotheses:

  1. H1.

    Fashion experts appreciate immersive VR interfaces;

  2. H2.

    The VR headset and controllers, together with the voice command integration, provide a positive immersive experience.

Driving our attention to the perceived usefulness of virtual experiences, we expanded our study to understand the feasibility of the extended reality tools within a fashion domain. As already discussed in Sect. 2, x-commerce has the potential to support traditional marketing strategies, both through online channels and in brick-and-mortar stores: in addition to supporting user engagement, XR environments could enrich 3D product presentations while reducing any consumers’ distrust caused by the lack of interactions with items and assistants. Therefore, we also considered the following hypotheses by distinguishing two applicability levels:

  1. H3.

    XR represents an appealing channel for fashion communication;

  2. H4.

    XR applications can be adopted to buy fashion products.

Table 1 Items and questions used in the survey to assess the perceived ease and enjoyment of use (PEEU) and voice gain (VG) constructs
Table 2 Items and questions used in the survey to assess the attitude towards using for communication (ATUC) and behavioural intention (BI) constructs

3.7 Assessment survey

Once our participants tested both experiences, they were asked to complete a questionnaire.

This has been designed to assess four constructs, namely, perceived ease and enjoyment of use (PEEU), voice gain (VG), attitude towards using for communication (ATUC), and behavioural intention (BI), formulated according to Rese et al. (2017).

3.7.1 Perceived ease and enjoyment of use (PEEU) and voice gain (VG) constructs

The first set of questions aimed at investigating the PEEU (for both experiences) and the VG constructs (for the Virtual Store). For simplicity, a general overview of these constructs is reported in Table 1.

For what concerns the PEEU, a 5 point Likert scale Likert (1932) was used to quantify respondents’ agreement to the following items, constructed based on those used by Rese et al. (2017):

  1. I1.

    The interface is easy to use;

  2. I2.

    Once I learned how to use the interface, it was simple to manipulate objects;

  3. I3.

    I prefer the new interface to the mouse or keyboard;

  4. I4.

    I enjoy the overall experience.

The I1 sentence immediately gets to the point; item I2 would result to be crucial in case I1 achieved a low score. On the contrary, in case I1 received a high agreement, question I3 would allow checking the users’ willingness to adopt the proposed technologies, changing their habits. Through I4, we ask for a broad evaluation of each VR application. In this first section, we also included some open questions to assess issues related to motion sickness.

Asking the users to evaluate the interfaces separately, we aimed at understanding whether the voice-command integration indeed facilitated their approach to a new device, hence a comparison of the average evaluations was considered into the VG construct. The VG construct contains the following specific sentences (evaluated through a 5 point Likert scale):

  1. A1.

    The introduction of voice commands makes the system easier to use;

  2. A2.

    I would prefer to complete all tasks with voice commands and have no hand controllers at all.

3.7.2 Attitude towards using for communication (ATUC) and behavioural intention (BI) constructs

The second set of questions are meant to assess the ATUC and BI constructs. They exploit the participants’ knowledge and studies to analyze the actual potential and feasibility of the whole XR-based fashion e-retailer. Such items have been inspired by previous studies and have been adapted to suit our case (Rese et al. 2017; Kim et al. 2007). These constructs were evaluated only on the Virtual Store application. A general overview is reported in Table 2.

In particular, both constructs were investigated exploiting two groups of questions: the F and Q groups. The F group is formed by the following questions that were evaluated with a 5 point Likert scale:

  1. F1.

    I would visit an XR retail store;

  2. F2.

    I would buy products on an XR application;

  3. F3.

    I would buy more products on an XR application than on online stores.

This group of items appears sufficient to answer and evaluate our constructs. Nevertheless, to avoid the neutral score problem related to the odd-Likert scales (Croasmun and Ostrom 2011), we encouraged users to take a stronger stance by also submitting them the Q-question group, evaluated with a yes/no scale:

  1. Q1.

    Would you use an XR application to advertise your products?

  2. Q2.

    Would you use an XR application to compose outfits?

  3. Q3.

    Would you use an XR application to explore new fashion collections?

  4. Q4.

    Would you use an XR application to buy online?

  5. Q5.

    Would you use an XR application to buy a swimsuit?

  6. Q6.

    Would you use an XR application to buy a sweater?

As visible from Table 2, Q1–Q3 questions reinforce the opinion regarding the user perception of the XR retail store (F1) ensuring a good evaluation of the ATUC construct. In addition, Q1 puts the respondents in the shoes of a business owner who needs to advertise her/his products and reach as many customers as possible. All the remaining questions, listed in Table 2, adopt a consumer’s perspective. Sentences F2–F3 and Q4–Q6 measure the behavioural intention (BI) from the consumers’ point of view. Q4 is a control question to validate the F2 score regarding the purchase intention of a buyer in an XR application. We also included questions Q5 and Q6 concerning two practical scenarios, to understand whether XR can extend traditional e-commerce. Swimsuits and sweaters fit very differently. The former strictly depends on a client’s body shape, whereas the latter is more easily predictable.

4 Results

The collected data has undergone a reliability check to test its internal consistency and validate our research. We computed the widely used Cronbach’s alpha index, which corresponds to the Kuder-Richardson Formula 20 (KR-20) in case of binary choice questions, such as our Q-items.

The results reported in Table 3 show that the PEEU and ATUC items may be deemed reliable (\(>=\) 0.70, as indicated by Taber (2018)). The BI items fell slightly below the 0.70 threshold value, whereas the VG ones cannot be accepted according to the utilized theoretical framework. The latter inconsistency may be due to questions that proved to be somewhat divisive. Indeed, the first item (A1) asked whether the integration of a voice assistant with other interfaces could make the system easier to use. The second one (A2), instead, asked whether a participant would have exclusively relied on voice, renouncing to the other available interfaces.

Table 3 Cronbach’s \(\alpha\) index for the considered constructs

4.1 Survey analysis: perceived ease and enjoyment of use (PEEU) and voice gain (VG)

Figure 3 reports scores corresponding to the interface evaluations of the Fashion Island and the Virtual Store applications. We recall that the rule of Likert scale voting system is the higher, the better, and any score above the average value 3 indicates a satisfactory result.

Fig. 3
figure 3

Histogram comparison of five-point Likert questionnaire results related to I \(-\) x items, which are relative to the perceived ease and enjoyment of use (PEEU). In blue and red we report the mean scores obtained by the Fashion Island application and the Virtual Store, respectively, along with their standard deviations

The histogram reported in Fig. 3 exhibits satisfactory results for both the Fashion Island and Virtual Store applications. However, the scores obtained for the I \(-\) x sentences show that the Virtual Store application registered an average value slightly lower than the one obtained for the Fashion Island (4.14 vs. 4.32, with 0.3 and 0.35 the standard deviation values, respectively). In particular, Virtual Store scores slightly below Fashion Island ones for the I1 and I2 items: maybe voice commands increased the complexity of the interactions on such short experiments. We observe that the values obtained for the I3 and I4 items are roughly the same for the two applications: more complex interactions are, all in all, appreciated, and a similar enjoyment is reached. These observations are confirmed by the A1 item score reported in Table 4, which corroborates that the introduction of vocal commands and the presence of a virtual assistant made the VR store experience easier to use. Nevertheless, the users exhibited interest to preserve the use of hand controllers (item A2).

Table 4 Mean and standard deviation (std) of the five-points Likert questionnaire results for the A \(-\) x items. A \(-\) x items are those related to the Voice Gain (VG)

To further understand such results, we focused our analysis on those participants that assigned much lower values to the Virtual Store I \(-\) x items than to the Fashion Island one. In particular, we chose \(k=2\) as the difference between the two 5-point Likert scales (e.g., a subject assigned 4 to a Fashion Island I \(-\) x item and 2 to the corresponding item of the Virtual Store). For these participants, we carried out a qualitative analysis based on the “thinking aloud” method to catch the pros and cons of the Virtual Store experience Lewis (1982). To this aim, we here report the answers they gave to the following question: “Why was the Virtual Store experience not as easy to use as the Fashion Island one?”. The answers can be summarized with the following three arguments:

  1. (a)

    Participants would have appreciated having conversations with Alexa, pausing after each sentence. However, Alexa was designed to continuously process sentences within a short time frame. So, a few seconds of hesitation made Alexa sessions expire. For example, participant # 11 stated that: “It is annoying to call Alexa every 10 seconds!”;

  2. (b)

    Subjects were expected to be able to execute a greater variety of vocal commands for the same action (e.g., use words like “buy”, “buy this”, or “I want to buy that”). Indeed, subject # 17 expressed his/her will as follows: “I think that just having two or three commands to act is too restrictive.”;

  3. (c)

    Sometimes participants preferred hand controls because they found them more efficient. Among all the interviewees, number #26 demonstrated the following concern: “I think that the exclusive use of Alexa could slow down some operations”.

These points may have penalized the perceived ease of use of the Virtual Store. Nevertheless, even the most critical participants expressed the usefulness of vocal commands when using the application for the first time (as also confirmed by the high score recorded for item A1), also suggesting the usage of such vocal assistant for certain categories of people (e.g., people with disability or with specific diseases), and underlying its important role when multitasking (e.g., buy a dress while also doing something else).

4.2 Survey analysis: attitude towards using for communication (ATUC) and behavioural intention (BI) constructs

Now, we consider the responses to the five-point Likert sentences F1–F3, reported in Table 5.

Table 5 Mean and standard deviation (std) of the five-point Likert questionnaire results for the F \(-\) x items. F \(-\) x items are those related to the attitude towards using for communication (ATUC) and ehavioural intention (BI)

The F1 and F2 scores reflect the results presented so far: the users positively accepted the idea of purchasing a fashion item using an XR application. However, they would slightly prefer XR systems over classical online stores when buying multiple fashion products (F3). On the other hand, asking the participants to better clarify their attitude towards XR-based shopping through closed questions Q1–Q6, we were able to draw the chart shown in Fig. 4.

Fig. 4
figure 4

Yes/no answer percentages for the Q \(-\) x items. Q \(-\) x items are those related to the attitude towards using for communication (ATUC) and behavioural intention (BI)

Once forced to make a clear decision, the fashion experts confirmed the potential of XR technologies as tools for an expansion of marketing operations. Putting themselves in the shoes of a fashion manager, they would use XR as an innovative tool for their advertising campaign (90.3%) and conceive new outfits (96.8%). As customers, instead, they would use XR tools to explore new fashion collections (90.03%) and buy them (93.5%). When asked which kind of purchase they would carry out in XR, they raised an interesting issue: they exhibited different trust levels which depended on the products. They would feel more confident to use XR to buy items such as sweaters than swimsuits (80.06 vs. 64.5%, respectively): the latter might not be realistic and may not fit as expected. In conclusion, only one user rejected all the possibilities: this probably occurred because he/she reported suffering from a little headache and eye fatigue.

5 Discussion and future works

To set the stage for a discussion around the contribution presented in this paper, it is worth noticing that a very recent scientific survey has reviewed a body of seventy-two pieces of research that analyzed the use of VR in shopping Xi and Hamari (2021). Among these seventy-two works, it is worth mentioning that the two works that have been extended with our present contribution are included (Morotti et al. 2020; Donatiello et al. 2018). Such a survey provides us with a valuable tool that may be put to good use to place our present work in the current panorama and to foresee its possible future extensions. Our contribution amounts to one out of the two that have so far explored the usage of a voice input (Morotti et al. 2020; Speicher et al. 2017), two out of the 12 concerned with clothing (Morotti et al. 2020; Speicher et al. 2017; Dzardanova et al. 2017; Jang et al. 2019; Kapusy et al. 2017; Wong Lau and Lee 2019; Lau et al. 2014; Moes and van Vliet 2017; Park et al. 2018; Zhang et al. 2014; Huiyue et al. 2019; Bigne et al. 2016; Ketelaar et al. 2018; Martínez-Navarro et al. 2019; Zhao et al. 2017; Wölfel and Reinhardt 2019), one of the three involving a specific social-related factors as the inclusion of a sales assistant (Dzardanova et al. 2017; Zhao et al. 2017; Morotti et al. 2020), and the only one integrating a voice assistant in VR shopping Morotti et al. (2020).

Building upon the analysis performed in Xi and Hamari (2021), the results presented in Sect. 4 let us highlight how the present contribution advances the field regarding the use of VR technologies in the fashion domain. We start observing that the PEEU (i.e., Perceived Ease and Enjoyment of Use) and the VG (i.e., Voice Gain) evaluations confirm both hypotheses H1 (i.e., fashion experts appreciate immersive VR interface) and H2 (i.e., the VR headset and controllers, together with the voice command integration, provide a positive immersive experience): in essence, even if the proposed implementation with Amazon Alexa as a voice assistant has imposed some design limitations, the presence of a voice assistant may improve the perceived immersion. It is worth noticing that such results were partially anticipated by Speicher et al. (2017), where the following two hypotheses were studied implementing a WebVR shopping environment, empowered by a vocal search service, on a mobile VR platform: (H2 in Speicher et al.) “VR is preferred by the user in terms of user experience and usability” and (H4 in Speicher et al.) “VR with speech input outperforms the others in all aspects”. Studying these hypotheses, the authors found that: (a) the usage of voice inputs are preferred instead of head-pointing devices both in terms of usability and user experience, (b) the combination of VR and voice inputs outperform all the tested combinations in terms of usability and user experience, and, (c) VR is not preferred over the classical keyboard/mouse inputs in terms of usability. The findings provided in Sect. 4 reinforce and extend the reach of Speicher et al.’s work in different directions. From the point of view of the sales and retail domain, we concentrate on fashion. We did this by constructing VR experiences related to fashion and performing an experimental campaign that included fashion domain experts. In terms of technology, our proposal exceeds the inclusion of voice inputs in VR scenarios adding a voice assistant (i.e., Amazon Alexa), thus probably anticipating the near future: smart avatars may appear to assist users in their shopping activities, adding a social component to VR. Comparing now our results to finding (a) extracted from Speicher’s et al. work, our group of users appreciated an integrated approach which would include both a voice assistant and VR pointing inputs, whereas the voice was not preferred over pointing devices (solely interacting using a voice assistant presented some criticisms). Finding (b) is confirmed by our experiments. Finally, for finding (c), our results lead to an opposite conclusion, as we observe a preference for using VR interfaces instead of the classical keyboard/mouse one. This fact may be because a more advanced HMD device was employed in our work, thus equipped with easier-to-use controllers than a head-pointing mobile device. This aspect may lay the direction for future research in the field of Mobile VR.

Moving on to the usability and feasibility findings of XR, the questionnaire results for the ATUC (i.e., Attitude Towards Using for Communication) and BI (i.e., Behavioural intention) construct confirm the H3 hypothesis (i.e., XR represents an appealing channel for fashion communication): our respondents enjoyed the opportunity of exploring virtual collections, composing new outfits laid by XR and would buy products on an XR shop. Such findings confirm those reported by Park et al. (2018) [one of the twelve works concerned with clothing as reported in Xi and Hamari (2021)], where the effect of VR on fashion marketing was explored. In Park et al. immersive experiences were found to be positively related to pleasure and purchase intention. In addition, the authors of this work also presented VR as a promising tool to understand consumers’ in-store exploratory behavior and evaluation of store designs, minimizing the time and money associated with developing physical fashion stores. For what concerns H3, we observe that also other researchers have already started to analyze the effects of XR on fashion marketing, using instead AR-based smartphone apps (Rauschnabel et al. 2019; Yim and Park 2019; Rauschnabel et al. 2016; Cruz et al. 2019; Brengman et al. 2019). For example, Rauschnabel et al. (2019) concentrated on the inspirational power of such apps, where inspiration is defined as a motivational state where the revelation of new possibilities may lead to the realization of new ideas, demonstrating that consumer inspiration amounts to a mediating construct between the benefits consumers derive from AR apps and changes in brand attitude. Our contribution may benefit such an existing line of work as it provides a new perspective (i.e., the role that a VA such as Alexa may have in an XR experience) for interaction with a complex digital system. In essence, it may help the user concentrate on what most inspires him/her, rather than on any interface-related details and technicalities (e.g., simply say “Alexa, I want to wear a purple shirt and blue jeans.” rather than having to identify the correct sequence of commands necessary to achieve the same goal). With hypothesis H4 (i.e., XR applications can be adopted to buy fashion products), we extend such lines of work. While the ATUC items confirm H4, we observe a controversial situation in the BI assessment: only two out of three BI yes/no questionnaire item results are strongly positive. We cannot confirm that x-commerce may overcome some of the well-known issues of fashion e-commerce (e.g., high return rate because of fitting problems). Indeed, the online selling of particular products (e.g., swimsuits, shoes, trousers, and dresses) remains a difficult task, as most customers would probably prefer the opportunity of a traditional shopping experience. Such challenges were not addressed in our prototype experiences, as realistic dress fitting was not possible.

Concluding, Xi et al.’s survey together with the discussion of our results provides us with ideas regarding how x-commerce may be implemented in the best of ways: (a) further exploiting the use of voice assistants, fully exploring their social dimension and their use in mobile settings, (b) using realistic and personalized avatars, and, (c) exploiting algorithms capable of providing a realistic fitting of clothes. Such thoughts provide us with a roadmap for future investigations: this may be further pursued by increasing the number of users involved in the analysis, developing the role of the voice assistant in all of its aspects (e.g., appearance and interaction capabilities), and including a realistic fitting service as one of the features offered within an XR context.

To steer also additional directions of work, we asked our group of participants to answer further questions regarding their experience with a fashion app developed for the Microsoft HoloLens glasses, the Real Dress Up application Microsoft App Store 2019; Hololens 2021. This application allows creating outfits by overlapping pre-selected clothes on a girl’s picture. The participants expressed a remarkable appreciation for the HoloLens-based AR technology, as it did not alienate the user from reality. This suggests that not only the avatar realism and dress fitting are important, but also the relationship between the user and the exploited XR technology. This also provides an interesting direction for future contributions, considering that most of the AR literature in such a field is concentrated on the usage of smartphones, rather than on the adoption of AR glasses.

Finally, to summarize the contribution of this work, we tried to anticipate possible managerial implications deriving from the adoption of XR technologies in the fashion domain (Herz and Rauschnabel 2019; XR Technology Survey: Key Stakeholders Optimistic About Mass Adoption 2019)). Unlike others, we involved fashion experts with little or no technical skills to assess our VR experiences and integrated the use of another technology that will likely occupy an important place on the stage, i.e., voice assistants. The implications of our results are limited by the low diffusion of such technologies, as highlighted by Herz and Rauschnabel (2019). Nevertheless, with an increase in the penetration rate of VR, which cannot but grow with the diffusion of low-cost HMD devices and LiDAR-enabled smartphones (devices such as the Iphone 12 and the Samsung S21 are expected to be game-changers for the tech usage of AR/VR), the present contribution may provide a valuable asset in the analysis of future trends.

6 Conclusion

Fashion is characterized by a search for technological innovation and digital presence, by a strong attachment to physical contact with products, and by a strong relationship with brands through their flagship stores. Nevertheless, despite initial skepticism, global trends exhibit an increasing acceptance of fashion e-commerce by a wide class of consumers. We here considered the particular case of VR, which as well as the many other technologies involved in the provision of XRs, may also be exploited to further enhance customers’ virtual experiences and in general strengthen the impact of brand retail strategies. In this work, we started analyzing one of the aspects that have restrained the diffusion of VR applications in retail, i.e., the difficulty of VR interfaces for non-expert users. As a viable solution, we identified in the embedding of a Voice Assistant into fashion VR applications a potential improvement in the users’ perceived ease of use. We thus designed an experiment based on two VR immersive experiences simulating the processes typically involved in fashion stores. Only one, however, allowed its users to interact vocally with a virtual shopping assistant. Our findings demonstrate a high interest in the exploitation of voice-based interactions leveraging the use of a popular Voice Assistant such as Amazon Alexa. The Technology Acceptance Model has driven our work, linking the simplicity of the VR interface to the subsequent availability to adopt such a technology in a fashion retail setting. The group of users who tested the application confirmed the feasibility of x-commerce for fashion retail purposes, ranging from advertising to shopping platform providers. Although technological limitations of the proposed solution have emerged, x-commerce appears as a potential mainstream channel for fashion retail, among others, as soon as the readiness level and the costs of XR devices will appeal to the mass market.