Detecting individuals' spatial familiarity with urban environments using eye movement data

https://doi.org/10.1016/j.compenvurbsys.2022.101758Get rights and content

Highlights

  • It is the first study using real-world eye movement data to infer spatial familiarity.

  • Performance of different time window sizes and feature types were compared.

  • We achieved a best accuracy of 81% in a 10-fold classification and 70% in the leave-one-task-out classification.

  • We found important eye movement features for detecting users' spatial familiarity.

  • Only a few seconds (e.g., 5 s) of eye movement data is sufficient for spatial familiarity detection.

Abstract

The spatial familiarity of environments is an important high-level user context for location-based services (LBS). Knowing users' familiarity level of environments is helpful for enabling context-aware LBS that can automatically adapt information services according to users' familiarity with the environment. Unlike state-of-the-art studies that used questionnaires, sketch maps, mobile phone positioning (GPS) data, and social media data to measure spatial familiarity, this study explored the potential of a new type of sensory data - eye movement data - to infer users' spatial familiarity of environments using a machine learning approach. We collected 38 participants' eye movement data when they were performing map-based navigation tasks in familiar and unfamiliar urban environments. We trained and cross-validated a random forest classifier to infer whether the users were familiar or unfamiliar with the environments (i.e., binary classification). By combining basic statistical features and fixation semantic features, we achieved a best accuracy of 81% in a 10-fold classification and 70% in the leave-one-task-out (LOTO) classification. We found that the pupil diameter, fixation dispersion, saccade duration, fixation count and duration on the map were the most important features for detecting users' spatial familiarity. Our results indicate that detecting users' spatial familiarity from eye tracking data is feasible in map-based navigation and only a few seconds (e.g., 5 s) of eye movement data is sufficient for such detection. These results could be used to develop context-aware LBS that adapt their services to users' familiarity with the environments.

Introduction

Currently, location-based services (LBS) often provide adaptive services based on various factors of users and user contexts, such as the user location, walking speed, road conditions, weather, and user history visits. Understanding users and their needs is key to LBS (Huang, Gartner, Krisp, Raubal, & Weghe, N.V.d., 2018; Jiang & Yao, 2006). In addition to providing well-established services (e.g., planning the shortest route and recommending points of interest based on the user location), recent LBS integrate solutions that are more tailored to high-level user context and user information needs. Examples include suggesting beautiful, pleasant and quiet routes rather than the shortest route (Huang et al., 2014; Quercia, Schifanella, & Aiello, 2014) and even adapting the map content according to users' visual behavior (Anagnostopoulos et al., 2017; Giannopoulos, Kiefer, & Raubal, 2015).

This study focused on an important high-level user context that has received little attention: users' familiarity with the environment. Knowing users' familiarity level of environments is helpful for enabling context-aware LBS that can automatically adapt information services according to users' familiarity with the environment (Savage, Chun, Chavez, & Mexico, 2011; Schmid & Richter, 2006; van Haeren & Mackaness, 2015; Zhou, Weibel, & Huang, 2021). First, displaying adaptive content on maps according to users' familiarity level can optimize the communication of map information, and thus increase the effectiveness and/or efficiency of map services. This will avoid overwhelming familiar users with unnecessary detailed information, or providing insufficient information for unfamiliar users. This can also help to decrease the user's cognitive load (Bunch & Lloyd, 2006) in processing geoinformation. Second, users' spatial familiarity level affects their route choice in wayfinding (Lovelace, Hegarty, & Montello, 1999; van Haeren & Mackaness, 2015). Users who are familiar with an environment may be more willing to actively explore new changes and new things, whereas unfamiliar users may want to find their destinations more quickly and accurately. Therefore, a navigation service can provide familiar users with more interesting routes and places and offer unfamiliar users with the shortest routes. Third, LBS adapted to users' familiarity with the environment can potentially help them acquire new spatial knowledge much more easily, because they can relate new places to their previously known places (Merriman, Ondřej, Roudaia, O'Sullivan, & Newell, 2016).

Unlike state-of-the-art studies that used questionnaires, mobile phone positioning (GPS) data, and social media data to measure spatial familiarity (as reviewed in Section 2.1), this study focused on the use of a new type of sensory data - eye movement data - to infer users' familiarity with urban environments. People perceive and understand urban environments largely through the visual world. Human visual behavior is considered associated with users' high-level cognitive states and intentions (Henderson, Shinkareva, Wang, Luke, & Olejarczyk, 2013). Thus, tracking people's eye movements to infer high-level user context is a more intuitive method than mining their GPS and social media data. Furthermore, advances in eye-tracking technology have made eye trackers lighter, less expensive, and more accurate. Eye trackers may become ubiquitous sensors (similar to GPS) that are embedded in portable devices such as smartphones, tablets, and glasses in the near future. Therefore, eye-gaze based human-computer interactions (HCIs) (e.g., detecting users' familiarity by tracking eye movements in real time and providing tailored information according to their familiarity level) may be possible in future LBS (Anagnostopoulos et al., 2017).

In this study, we explored the potential of eye movement data to infer users' familiarity with environments using a machine learning approach. We collected users' eye movement data when they were navigating familiar and unfamiliar urban environments. We extracted two sets of eye movement features (namely, basic statistical features and fixation semantic features) to characterize users' visual behavior. We then trained and cross-validated a random forest classifier to infer whether the users were familiar or unfamiliar with the environments (i.e., binary classification). While the results of this work can be used to enable familiarity-aware LBS, such an investigation using mobile eye tracking may also be helpful for research on activity/intention recognition, tourist behavior, environment perception, and urban planning.

The manuscript has five additional sections. Related work on measuring spatial familiarity and eye tracking is reviewed in Section 2. Methods are described in Section 3. Results are presented in Section 4 and discussed in Section 5. Finally, we draw conclusions and propose future work in Section 6.

Section snippets

Measuring individuals' spatial familiarity with environments

An intuitive interpretation of the term ‘spatial familiarity’ is a “close acquaintance with an environment or its elements” (Gale, Golledge, Halperin, & Couclelis, 1990). Spatial familiarity reflects the completeness of an individual's understanding of an environment, such as the location, distance and direction of objects; the routes between two buildings; and the overall structure of the environment (Lovelace et al., 1999). When an individual is more familiar with an environment, their

Data

The data were collected in a previously reported experiment (Experiment 1) (anonymous for peer review, 2019) and a newly conducted experiment (Experiment 2). Both experiments were reviewed and approved by the local institutional ethic review board. All participants provided their written consent.

Accuracy

Fig. 4 shows the accuracy across feature types and time window sizes in the cross-route probe (10-fold classification). Using basic statistical features achieved an accuracy of 64% to 69%, which was lower than that using fixation semantic features (78% to 81%). However, combining the two feature sets did not result in a higher accuracy (79% to 81%). Interestingly, the accuracy was insensitive to changes in time window sizes, as increasing the time window size did not lead to a higher accuracy.

Important features to infer spatial familiarity

The pupil diameter ranked first in the feature importance of basic statistical features in both the cross-route and cross-task probes (Figs. 6a and 10a). We speculate that this was because wayfinding requires a higher cognitive workload in unfamiliar environments than in familiar environments. Pupillary responses have been proven to be associated with the cognitive workload; a higher cognitive workload can increase the pupil diameter (Bunch & Lloyd, 2006; van der Wel & van Steenbergen, 2018).

Conclusions and future work

This study explored using eye movement data to infer users' spatial familiarity levels of environments. Machine learning methods were used to classify users' eye movement data into either familiar or unfamiliar groups. By combining basic statistical features and fixation semantic features, we achieved a best accuracy of 81% in a 10-fold classification and 70% in the leave-one-task-out classification. We found that the pupil diameter, fixation dispersion, saccade duration, fixation count and

Funding

This research is supported by the National Natural Science Foundation of China [NSFC, Grant Nos. 42001410 and 41871366], Natural Science Foundation of Hunan Province [Grant No. 2021JJ40350] and Scientific Research Foundation of Hunan Provincial Education Department [Grant No. 19B367].

Declaration of Competing Interest

None.

References (48)

  • A. Bulling et al.

    Eye movement analysis for activity recognition using electrooculography

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2011)
  • R.L. Bunch et al.

    The cognitive load of geographic information

    The Professional Geographer

    (2006)
  • M. Cordts et al.

    The cityscapes dataset for semantic urban scene understanding

  • A. Cutler et al.

    Random forests

  • W. Dong et al.

    Comparing pedestrians' gaze behavior in desktop and in real environments

    Cartography and Geographic Information Science

    (2020)
  • S.I. Fabrikant et al.

    Cognitively inspired and perceptually salient graphic displays for efficient spatial inference making

    Annals of the Association of American Geographers

    (2010)
  • N. Gale et al.

    Exploring spatial familiarity

    The Professional Geographer

    (1990)
  • I. Giannopoulos et al.

    GazeNav: Gaze-based pedestrian navigation

  • L. Gokl et al.

    Towards urban environment familiarity prediction

  • M. van Haeren et al.

    The influence of familiarity on route choice: Edinburgh as a case study

  • J. Henderson

    Eye movements and scene perception

  • J.M. Henderson et al.

    Predicting cognitive state from eye movements

    PLoS One

    (2013)
  • H. Huang et al.

    Location based services: Ongoing evolution and research agenda

    Journal of Location Based Services

    (2018)
  • H. Huang et al.

    AffectRoute–considering people’s affective responses to environments for enhancing route-planning services

    International Journal of Geographical Information Science

    (2014)
  • Cited by (9)

    View all citing articles on Scopus
    View full text