Abstract
In order to realize eye-contact and face to face communication, videophone or virtual conversational system which takes gaze line into consideration. Although many systems have been studied and reported, there is no objective measure for evaluating the quality of conversations. In this paper, we propose two objective measures such as the eye-contact conversation ratio (ECCR) and the face to face conversation ratio (FFCR) for evaluating the communication quality. By changing the position of camera from the above to the center of display, the ECCR increases from 24.3% to 25.3% in talking and decreases from 33.1% to 25.6% in listening. It is also found that the FFCR improved from 74.7% to 88.0% by centering a camera.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
In popular personal videophone system using PCs and Tablets, listening and talking with downcast eyes is inevitable. This is because a camera is installed above a display and there is no gaze line matching between two persons. In the teleconference system for multiple persons, half mirrors and cameras were used for realizing eye contact talks [1]. In another study, the picture plane was rotated for compensating the gaze direction, and the improvement of subjective perception was reported by the number of votes by 52 subjects [2]. Gaze correction method [3] and multi-viewpoint videos merging method [4] have been reported the improvement of eye-contact communication, too. However, there is no objective evaluation result in those studies.
Generally, in a natural conversation, eye-contact and face to face communication can be observed frequently, and those human behaviors should be taken into account by evaluating a system. In e-learning applications, eye mark recorder which recorded the fixation point movement on the view was applied to analyze the effectiveness of presentation methods [5].
In this paper, we define two objective measures, i.e., the ECCR and the FFCR, and show the experimental results using eye mark recorder [6] with changing the position of camera from the above to the center of display. We also discuss what makes a conversation natural using videophone and virtual conversational system.
2 Human Behaviors in Conversation
Mutual gaze during natural conversation is one of important interactions [7]. However, in personal videophone system, inconsistent gaze behavior, e.g., gaze at partner’s clothes or out of display, is frequently observed.
Figure 1 shows the relationship between the gaze at partner’s eye Geye(t), the gaze at partner’s face Gface(t), the Talk by subject Ts(t), and the Talk by partner Tp(t), and behavioral states. As each feature is represented by ON(1) or OFF(0), there are 16 behavioral states. In this figure, the duration of Geye(t) = 1 and Gface(t) = 1, when Ts(t) = 1 or Tp(t) = 1, is the most important state. As shown in Fig. 1, we define Eye-Contact Conversation (ECC) state in which Geye(t) = 1 and Ts(t) = 1 or Tp(t) = 1, and Face to Face Conversation (FFC) state in which Gface(t) = 1 and Ts(t) = 1 or Tp(t) = 1. The former is marked up by black bar and the latter is by gray bar in the bottom of figure.
2.1 Eye-Contact Conversation Ratio (ECCR)
In order to estimate the eye-contact conversation objectively, we add up ECC durations and calculate ECC ratio by the following equation,
where, TECC(m) is the m-th duration of ECC state, Ts(i) is the i-th duration of subject’s talk, and Tp(j) is the j-th duration of partner’s talk.
From the equation, ECCR presents eye-contact conversation ratio in both talking period and listening period.
2.2 Face to Face Conversation Ratio (FFCR)
As same as ECCR, we sum up FFC durations and calculate FFC ratio by the following equation,
where, TFFC(n) is the n-th duration of FFC state.
From the equation, FFCR presents face to face conversation ratio in both talking period and listening period. As shown in Fig. 1, FFC duration includes ECC duration, because eye is a part of face.
3 Experimental System
In order to evaluate the eye-contact conversation and the face to face conversation using videophone with different camera position, we have developed a videophone system, in which the camera position can be changed. Gaze point is recorded by the eye mark recorder which uses the infrared reflection of pupil/cornea, and the decision whether the gaze position of subject is at face or at eye or at others is made by analyzing the recorded images. Conversations are also recorded and separated into subject’s talk and partner’s talk after noise reduction.
In this section, the developed videophone system and the flow of signal processing are described.
3.1 Videophone System
The developed videophone system is shown in Fig. 2. A half mirror is located in front of a subject with 45° angle for realizing face to face conversation with the image of a partner. The flipped horizontal image is displayed on the monitor for avoiding left and right being reversed.
The height of the camera can be changed in any position. In this experiment, we use two positions such as center position and above position which simulates PC’s camera. Two sets of systems are used in the experiment. Specification of each system is as follows,
-
Display size: 24.1 in. LCD
-
Videophone application: Skype
-
Left and Right Reverse: ManyCamFootnote 1
-
Camera: 640pixels (H) by 480pixels (W), 24bit color, 30fps
-
Audio: fs = 44.1 kHz, 16bit
A scene of the experiment using the developed videophone system is shown in Fig. 3.
3.2 Gaze Point Estimation
The gaze points and audio information of the subject wearing the eye mark recorder (EMR-9) [6] are recorded.
The recorder measures the sight angle of the subject based on the infrared reflection image in the cornea and the pupil movement. The detection range is ±40° in horizontal and ±20° in vertical. The gaze points (Left: + , Right: □) and a parallax corrected gaze point: ○ are displayed on the image (640 × 480 pixels) taken by field of view cameras installed at the brim of a cap as shown in Fig. 4. The image was being recorded while conversation and analyzed together with recorded voice after the experiment to look into the location of the parallax gaze point.
Figure 4 shows examples of (a) “Eye-Contact Conversation in talking” scene, and (b) “Face to Face Conversation in listening” scene.
Face area and eye area are manually determined by the face features such as skin color, eye-brow, eye, nose, mouth, and tin as shown in Fig. 5.
3.3 Signal Processing Flow
Figure 6 shows the flow of signal processing to get 4 signals, i.e., Geye(t), Gface(t), Ts(t), and Tp(t), and 4 states, i.e., TECC and TFFC in talking/listening. In this study, Geye(t) and Gface(t) are extracted manually, and the Talk by subject Ts(t) and the Talk by partner Tp(t) are extracted automatically based on the audio signal power.
4 Experimental Results and Discussion
10 male persons (Age: 22–24) were divided into 5 groups and made free conversation for 6 min. or more. After 1 min. passed, image including gaze point shown in Fig. 4 and speech were recorded for 5 min. and analyzed.
4.1 ECCR and FFCR in Higher Camera Position
It is expected that both a subject and their partner using PC based videophone are talking or listening with downcast eyes. This degrades the communication quality and leads to low ECCR and FFCR. Table 1 shows the ECCR and the FFCR in the conversation of 5 min. Total talking time of subject and Total talking time of partner are also indicated in seconds.
From Table 1, averaged ECCR of five subjects is 29.0% and the deviation is not large. On the other hand, averaged FFCR shows 74.7% even if a partner talks or listens with downcast eyes. Five FFCRs depend on subjects, and varies between 53.9% and 89.3%.
In order to inspect ECCR and FFCR in detail, we separate them into the ratios in talking and listening, and summarized as shown in Table 2. The suffix “T” and “L” shows “talking” and “listening” respectively.
Except for the ECCR of subject 5, both averaged ECCRT and FFCRT are less than averaged ECCRL and FFCRL, respectively. This means that almost all subjects watch the partner’s eye and face in listening rather than in talking. Also, it is found that the subjects watch the partner’s face rather than in the eye while talking.
4.2 ECCR and FFCR in Center Camera Position
Table 3 shows the ECCR, the FFCR, Total talking time of subject, and Total talking time of partner, and Table 4 shows the detail of ECCR and FFCR.
By comparing Table 3 with Table 1, the following are found,
-
(1)
Averaged ECCR decreases by locating a camera to the center. This trend can be seen except for subject 1.
-
(2)
Averaged FFCR increases by locating a camera to the center. This trend can be seen true for all subjects.
By comparing Table 4 with Table 2, the following are found,
-
(3)
Averaged ECCRT slightly increases by centering a camera. But, this is not a remarkable trend.
-
(4)
Averaged ECCRL slightly decreases by centering a camera. This trend can be seen except for subject 5.
-
(5)
Both FFCRT and FFCRL increase by centering a camera. This trend can be seen true for all subjects.
5 Conclusion
In order to improve the naturality of a conversation using videophone or virtual conversational system, we have proposed two objective measures, ECCR and FFCR, and developed a videophone system with half mirror. By changing the position of camera from above to center, FFCR increases from 74.7% to 88.0%. This means that the face to face conversation is affected by the gaze of the partner. Obviously, the conversation with mutual gaze increased by centering a camera, and the naturality of a conversation have been improved. However, because of wide eye area and no consideration of partner’s gaze, ECCR in talking/listening does not change. For clarifying the true eye contact conversation ratio, the eye area and gaze of partner should be considered in future work. In addition, measures of affect or measure of emotion such as PANAS (Positive and Negative Affect Schedule) should be studied.
Notes
References
De Silva, L.C., et al.: A teleconferencing system capable of multiple person eye contact using half mirrors and cameras placed at common points of extended lines of gaze. IEEE Trans. Circ. Syst. Video Technol. 5(4), 268–277 (1995)
Solina, F., Ravnik, R.: Fixing missing eye-contact in video conferencing system. In: Proceedings of ITI 2011, pp. 233–236 (2011)
Lu, J., Tao, X., Dong, L., Ge, N.: Chunk-wise face model based gaze correction in conversational videos with single camera. In: Proceedings of CITS 2016, pp. 1–5 (2016)
Ebara, Y., Nabuchi, T., Sakamoto, N., Koyamada, K.: Study on eye-to-eye contact by multi-viewpoint videos merging system for tele-immersive environment, In: Proceedings of AINA 2006, vol.2 (2006)
Ando, M., et al.: An analysis using eye-mark recorder of the effectiveness of presentation methods for e-learning. In: Proceedings of ICALT 2017, pp. 183–185 (2007)
Broz, F., et al.: Mutual gaze, personality, and familiarity: dual eye-tracking during conversation. In: Proceedings of IEEE RO-MAN 2012, pp. 858–864 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Masuda, K., Hishiki, R., Hangai, S. (2017). A Proposal of Objective Evaluation Measures Based on Eye-Contact and Face to Face Conversation for Videophone. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds) Image Analysis and Processing - ICIAP 2017 . ICIAP 2017. Lecture Notes in Computer Science(), vol 10485. Springer, Cham. https://doi.org/10.1007/978-3-319-68548-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-68548-9_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68547-2
Online ISBN: 978-3-319-68548-9
eBook Packages: Computer ScienceComputer Science (R0)