A Proposal of Objective Evaluation Measures Based on Eye-Contact and Face to Face Conversation for Videophone

Masuda, Keiko; Hishiki, Ryuhei; Hangai, Seiichiro

doi:10.1007/978-3-319-68548-9_32

A Proposal of Objective Evaluation Measures Based on Eye-Contact and Face to Face Conversation for Videophone

Keiko Masuda¹⁷,
Ryuhei Hishiki¹⁷ &
Seiichiro Hangai¹⁷

Conference paper
First Online: 13 October 2017

2424 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10485))

Abstract

In order to realize eye-contact and face to face communication, videophone or virtual conversational system which takes gaze line into consideration. Although many systems have been studied and reported, there is no objective measure for evaluating the quality of conversations. In this paper, we propose two objective measures such as the eye-contact conversation ratio (ECCR) and the face to face conversation ratio (FFCR) for evaluating the communication quality. By changing the position of camera from the above to the center of display, the ECCR increases from 24.3% to 25.3% in talking and decreases from 33.1% to 25.6% in listening. It is also found that the FFCR improved from 74.7% to 88.0% by centering a camera.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

In popular personal videophone system using PCs and Tablets, listening and talking with downcast eyes is inevitable. This is because a camera is installed above a display and there is no gaze line matching between two persons. In the teleconference system for multiple persons, half mirrors and cameras were used for realizing eye contact talks [1]. In another study, the picture plane was rotated for compensating the gaze direction, and the improvement of subjective perception was reported by the number of votes by 52 subjects [2]. Gaze correction method [3] and multi-viewpoint videos merging method [4] have been reported the improvement of eye-contact communication, too. However, there is no objective evaluation result in those studies.

Generally, in a natural conversation, eye-contact and face to face communication can be observed frequently, and those human behaviors should be taken into account by evaluating a system. In e-learning applications, eye mark recorder which recorded the fixation point movement on the view was applied to analyze the effectiveness of presentation methods [5].

In this paper, we define two objective measures, i.e., the ECCR and the FFCR, and show the experimental results using eye mark recorder [6] with changing the position of camera from the above to the center of display. We also discuss what makes a conversation natural using videophone and virtual conversational system.

2 Human Behaviors in Conversation

Mutual gaze during natural conversation is one of important interactions [7]. However, in personal videophone system, inconsistent gaze behavior, e.g., gaze at partner’s clothes or out of display, is frequently observed.

Figure 1 shows the relationship between the gaze at partner’s eye G_eye(t), the gaze at partner’s face G_face(t), the Talk by subject T_s(t), and the Talk by partner T_p(t), and behavioral states. As each feature is represented by ON(1) or OFF(0), there are 16 behavioral states. In this figure, the duration of G_eye(t) = 1 and G_face(t) = 1, when T_s(t) = 1 or T_p(t) = 1, is the most important state. As shown in Fig. 1, we define Eye-Contact Conversation (ECC) state in which G_eye(t) = 1 and T_s(t) = 1 or T_p(t) = 1, and Face to Face Conversation (FFC) state in which G_face(t) = 1 and T_s(t) = 1 or T_p(t) = 1. The former is marked up by black bar and the latter is by gray bar in the bottom of figure.

2.1 Eye-Contact Conversation Ratio (ECCR)

In order to estimate the eye-contact conversation objectively, we add up ECC durations and calculate ECC ratio by the following equation,

$$ ECCR = \frac{{\mathop \sum \nolimits_{m = 1}^{M} T_{ECC} \left( m \right)}}{{\mathop \sum \nolimits_{i = 1}^{I} T_{s} \left( i \right) + \mathop \sum \nolimits_{j = 1}^{J} T_{p} \left( j \right)}} \times 100 $$

(1)

where, T_ECC(m) is the m-th duration of ECC state, T_s(i) is the i-th duration of subject’s talk, and T_p(j) is the j-th duration of partner’s talk.

From the equation, ECCR presents eye-contact conversation ratio in both talking period and listening period.

2.2 Face to Face Conversation Ratio (FFCR)

As same as ECCR, we sum up FFC durations and calculate FFC ratio by the following equation,

$$ FFCR = \frac{{\mathop \sum \nolimits_{n = 1}^{N} T_{FFC} \left( n \right)}}{{\mathop \sum \nolimits_{i = 1}^{I} T_{s} \left( i \right) + \mathop \sum \nolimits_{j = 1}^{J} T_{p} \left( j \right)}} \times 100 $$

(2)

where, T_FFC(n) is the n-th duration of FFC state.

From the equation, FFCR presents face to face conversation ratio in both talking period and listening period. As shown in Fig. 1, FFC duration includes ECC duration, because eye is a part of face.

3 Experimental System

In order to evaluate the eye-contact conversation and the face to face conversation using videophone with different camera position, we have developed a videophone system, in which the camera position can be changed. Gaze point is recorded by the eye mark recorder which uses the infrared reflection of pupil/cornea, and the decision whether the gaze position of subject is at face or at eye or at others is made by analyzing the recorded images. Conversations are also recorded and separated into subject’s talk and partner’s talk after noise reduction.

In this section, the developed videophone system and the flow of signal processing are described.

3.1 Videophone System

The developed videophone system is shown in Fig. 2. A half mirror is located in front of a subject with 45° angle for realizing face to face conversation with the image of a partner. The flipped horizontal image is displayed on the monitor for avoiding left and right being reversed.

The height of the camera can be changed in any position. In this experiment, we use two positions such as center position and above position which simulates PC’s camera. Two sets of systems are used in the experiment. Specification of each system is as follows,

Display size: 24.1 in. LCD
Videophone application: Skype
Left and Right Reverse: ManyCam^{Footnote 1}
Camera: 640pixels (H) by 480pixels (W), 24bit color, 30fps
Audio: fs = 44.1 kHz, 16bit

A scene of the experiment using the developed videophone system is shown in Fig. 3.

3.2 Gaze Point Estimation

The gaze points and audio information of the subject wearing the eye mark recorder (EMR-9) [6] are recorded.

The recorder measures the sight angle of the subject based on the infrared reflection image in the cornea and the pupil movement. The detection range is ±40° in horizontal and ±20° in vertical. The gaze points (Left: + , Right: □) and a parallax corrected gaze point: ○ are displayed on the image (640 × 480 pixels) taken by field of view cameras installed at the brim of a cap as shown in Fig. 4. The image was being recorded while conversation and analyzed together with recorded voice after the experiment to look into the location of the parallax gaze point.

Figure 4 shows examples of (a) “Eye-Contact Conversation in talking” scene, and (b) “Face to Face Conversation in listening” scene.

Face area and eye area are manually determined by the face features such as skin color, eye-brow, eye, nose, mouth, and tin as shown in Fig. 5.

3.3 Signal Processing Flow

Figure 6 shows the flow of signal processing to get 4 signals, i.e., G_eye(t), G_face(t), T_s(t), and T_p(t), and 4 states, i.e., T_ECC and T_FFC in talking/listening. In this study, G_eye(t) and G_face(t) are extracted manually, and the Talk by subject T_s(t) and the Talk by partner T_p(t) are extracted automatically based on the audio signal power.

4 Experimental Results and Discussion

10 male persons (Age: 22–24) were divided into 5 groups and made free conversation for 6 min. or more. After 1 min. passed, image including gaze point shown in Fig. 4 and speech were recorded for 5 min. and analyzed.

4.1 ECCR and FFCR in Higher Camera Position

It is expected that both a subject and their partner using PC based videophone are talking or listening with downcast eyes. This degrades the communication quality and leads to low ECCR and FFCR. Table 1 shows the ECCR and the FFCR in the conversation of 5 min. Total talking time of subject and Total talking time of partner are also indicated in seconds.

Table 1. ECCR, FFCR, and talking time in higher camera position

Full size table

From Table 1, averaged ECCR of five subjects is 29.0% and the deviation is not large. On the other hand, averaged FFCR shows 74.7% even if a partner talks or listens with downcast eyes. Five FFCRs depend on subjects, and varies between 53.9% and 89.3%.

In order to inspect ECCR and FFCR in detail, we separate them into the ratios in talking and listening, and summarized as shown in Table 2. The suffix “T” and “L” shows “talking” and “listening” respectively.

Table 2. ECCR and FFCR in talking and listening in higher camera position

Full size table

Except for the ECCR of subject 5, both averaged ECCR_T and FFCR_T are less than averaged ECCR_L and FFCR_L, respectively. This means that almost all subjects watch the partner’s eye and face in listening rather than in talking. Also, it is found that the subjects watch the partner’s face rather than in the eye while talking.

4.2 ECCR and FFCR in Center Camera Position

Table 3 shows the ECCR, the FFCR, Total talking time of subject, and Total talking time of partner, and Table 4 shows the detail of ECCR and FFCR.

Table 3. ECCR, FFCR, and talking time in center camera position

Full size table

Table 4. ECCR and FFCR in talking and listening in center camera position

Full size table

By comparing Table 3 with Table 1, the following are found,

(1)
Averaged ECCR decreases by locating a camera to the center. This trend can be seen except for subject 1.
(2)
Averaged FFCR increases by locating a camera to the center. This trend can be seen true for all subjects.

By comparing Table 4 with Table 2, the following are found,

(3)
Averaged ECCR_T slightly increases by centering a camera. But, this is not a remarkable trend.
(4)
Averaged ECCR_L slightly decreases by centering a camera. This trend can be seen except for subject 5.
(5)
Both FFCR_T and FFCR_L increase by centering a camera. This trend can be seen true for all subjects.

5 Conclusion

In order to improve the naturality of a conversation using videophone or virtual conversational system, we have proposed two objective measures, ECCR and FFCR, and developed a videophone system with half mirror. By changing the position of camera from above to center, FFCR increases from 74.7% to 88.0%. This means that the face to face conversation is affected by the gaze of the partner. Obviously, the conversation with mutual gaze increased by centering a camera, and the naturality of a conversation have been improved. However, because of wide eye area and no consideration of partner’s gaze, ECCR in talking/listening does not change. For clarifying the true eye contact conversation ratio, the eye area and gaze of partner should be considered in future work. In addition, measures of affect or measure of emotion such as PANAS (Positive and Negative Affect Schedule) should be studied.

Notes

1.
https://manycam.com/.

References

De Silva, L.C., et al.: A teleconferencing system capable of multiple person eye contact using half mirrors and cameras placed at common points of extended lines of gaze. IEEE Trans. Circ. Syst. Video Technol. 5(4), 268–277 (1995)
Article Google Scholar
Solina, F., Ravnik, R.: Fixing missing eye-contact in video conferencing system. In: Proceedings of ITI 2011, pp. 233–236 (2011)
Google Scholar
Lu, J., Tao, X., Dong, L., Ge, N.: Chunk-wise face model based gaze correction in conversational videos with single camera. In: Proceedings of CITS 2016, pp. 1–5 (2016)
Google Scholar
Ebara, Y., Nabuchi, T., Sakamoto, N., Koyamada, K.: Study on eye-to-eye contact by multi-viewpoint videos merging system for tele-immersive environment, In: Proceedings of AINA 2006, vol.2 (2006)
Google Scholar
Ando, M., et al.: An analysis using eye-mark recorder of the effectiveness of presentation methods for e-learning. In: Proceedings of ICALT 2017, pp. 183–185 (2007)
Google Scholar
http://www.eyemark.de/downloads/EMR9_Basic_Operations.pdf
Broz, F., et al.: Mutual gaze, personality, and familiarity: dual eye-tracking during conversation. In: Proceedings of IEEE RO-MAN 2012, pp. 858–864 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Faculty of Engineering, Tokyo University of Science, 6-3-1 Niijuku, Katsushika, Tokyo, 1258585, Japan
Keiko Masuda, Ryuhei Hishiki & Seiichiro Hangai

Authors

Keiko Masuda
View author publications
You can also search for this author in PubMed Google Scholar
Ryuhei Hishiki
View author publications
You can also search for this author in PubMed Google Scholar
Seiichiro Hangai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seiichiro Hangai .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Sebastiano Battiato
University of Catania, Catania, Italy
Giovanni Gallo
University of Milano-Bicocca, Milan, Italy
Raimondo Schettini
University of Catania, Catania, Italy
Filippo Stanco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Masuda, K., Hishiki, R., Hangai, S. (2017). A Proposal of Objective Evaluation Measures Based on Eye-Contact and Face to Face Conversation for Videophone. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds) Image Analysis and Processing - ICIAP 2017 . ICIAP 2017. Lecture Notes in Computer Science(), vol 10485. Springer, Cham. https://doi.org/10.1007/978-3-319-68548-9_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-68548-9_32
Published: 13 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68547-2
Online ISBN: 978-3-319-68548-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Abstract

1 Introduction

2 Human Behaviors in Conversation

2.1 Eye-Contact Conversation Ratio (ECCR)

2.2 Face to Face Conversation Ratio (FFCR)

3 Experimental System

3.1 Videophone System

3.2 Gaze Point Estimation

3.3 Signal Processing Flow

4 Experimental Results and Discussion

4.1 ECCR and FFCR in Higher Camera Position

4.2 ECCR and FFCR in Center Camera Position

5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation