Feature analysis for human recognition and discrimination: Application to a person-following behaviour in a mobile robot

https://doi.org/10.1016/j.robot.2012.05.014Get rights and content

Abstract

One of the most important abilities that personal robots need when interacting with humans is the ability to discriminate amongst them. In this paper, we carry out an in-depth study of the possibilities of a colour camera placed on top of a robot to discriminate between humans, and thus get a reliable person-following behaviour on the robot. In particular we have reviewed and analysed the possibility of using the most popular colour and texture features used in object and texture recognition, to identify and model the target (person being followed). Nevertheless, the real-time restrictions make necessary the selection of a reduced subset of these features to reduce the computational burden. This subset of features was selected after carrying out a redundancy analysis, and considering how these features perform when discriminating amongst similar human torsos. Finally, we also describe several scoring functions able to dynamically adjust the relevance of each feature considering the particular conditions of the environment where the robot moves, together with the characteristics of the clothes worn by the persons that are in the scene. The results of this in-depth study have been implemented in a novel and adaptive system (described in this paper), which is able to discriminate between humans to get reliable person-following behaviours in a mobile robot. The performance of our proposal is clearly shown through a set of experimental results obtained with a real robot working in real and difficult scenarios.

Highlights

► Person-following behaviour on a robot to learn routes from demonstration. ► Fusion of information provided by a camera and a laser scanner. ► Selection of colour and texture features according to a redundancy analysis. ► Real-time weighting of features to make the target clearly distinguishable. ► Use of scoring functions to determine the discrimination ability of each feature.

Introduction

In the next few years, personal service robots are expected to become part of our everyday life, playing an important role as our appliances, servants and assistants; they will be our helpers and elder-care companions. These robots will need to be able to acquire a sufficient understanding of the environment, be aware of different situations, or detect and track people with minimum instruction and with high quality and precision. This will allow the achievement of natural human–robot interaction, and it will also allow the robot to focus its attention on one individual. Dautenhahn [1] summarised the importance of recognising a human with the following expression: ‘Humans are individuals, and they want to be treated as such’. Thus, we believe that many robots could enhance their abilities by including human recognition and discrimination in the tasks that they are already able to carry out. Robots should be able to discriminate between humans or in general amongst similar objects even when they move through cluttered areas, there are objects overlapping in the visual field, shadows, illumination changes, objects being introduced or removed from the scene, etc. Hence, robots need robust human recognition and tracking that can account for such a wide range of effects.

As an example of service robot applications that require such person recognition, we can mention the entertainment robot SDR-4X [2], the museum or exhibition guide robot TOURBOT [3], a shopping mall guide robot by Kanda et al. [4], and also robots designed to care for the elderly such as the well-known Care-O-Bot [5]. In all the aforementioned examples the robot needs a robust and flexible human discrimination ability to distinguish the person which it is interacting with (target), avoiding mistaking this target for the rest of the people present in the same scene (distractors).

Our goal is to create a robust and adaptive system for human recognition able to discriminate our target from the rest of the people. Our system must be flexible enough to handle important variations in illumination, scene clutter, multiple moving objects, and other arbitrary changes to the observed scene. On the other hand, the person being recognised and followed will not need to wear special clothes or location gadgets, thus achieving a more natural human–robot interaction. We intend to use the system described in this paper in a general-purpose guide robot which can be deployed at different museums or events where it will have to show routes of interest to the visitors. The robot will learn the routes that later it will have to show to the visitors, by following a person who will demonstrate them to the robot. We will refer to this person as a demonstrator or instructor (we will use both words indistinctly). This person will probably be a staff member of the museum or the event, without any kind of expertise in robotics. Due to this, we will also need a person-following behaviour on the robot, so that the robot is able to recognise a target that must be followed or a visitor who wants to follow the robot across the event. If our robot mistakes a distractor for the target during the route learning process, it will probably need to start over in learning the route.

In this paper we describe our proposal of a person-following behaviour which enhances the role of a camera as a full-body human discriminator. We start by giving a general description of the design of our multimodal person-following behaviour. We highlight the major enhancements performed on the camera, and we give details on how our robot merges the information from the different sensors. After this introduction to the behaviour, we get into details on how our discrimination works. First, we review the most important colour and texture features (extracted from the human’s torso region) which can be used for human recognition in a robot. In this part of the work, our goal is to get a subset of these features which achieve a good discrimination rate, but with a low computational cost. Then, we describe how we compare these features, and investigate mechanisms which provide our proposal with important adaptability, necessary to work on crowded and changing environments. Basically we pursue a strategy able to dynamically select those features that allow the best recognition of the person being followed at each instant. Finally, in the experimental results we show the performance of our proposal when it is applied on a real robot working on real and challenging environments.

Section snippets

Related work

In the past few years, there have been many works which have dealt with the problem of object tracking [6], [7]. However, in most cases the system is restricted to rigid objects, such as cars, or semi-rigid such as faces or humans with limited variations in their pose. These works often describe strategies that have been developed for conventional cameras located in a fixed position and far from the people or objects being tracked. In this paper we want to present a new alternative to achieve a

The person-following behaviour

An overview of the system we have developed to solve the person-following task can be seen in Fig. 1. The robot will use information provided by a camera and a laser scanner (Fig. 2) to obtain the position of a target. This information will be sent to a motion controller which will have to determine the motor commands that the robot must carry out so that it follows the target and avoids colliding with the environment.

Therefore, in our system there are two clearly separated modules which work

Analysis of an initial pool of features suitable for the task

As was mentioned before, the aim of the camera module is to recognise the target from its torso. With this aim, the first task that this module carries out is detecting people in the images using the algorithm developed by Dalal [26]. Using this algorithm it is possible to detect areas in the image where there seem to be a person. Next, for each one of these areas, this camera module will extract the visual features of the region that contains the person’s torso to determine whether the person

Offline feature selection

In general we believe that the use of a set of several features is good to obtain adaptive human recognition and discrimination, since certain redundancy can help when there is little prior information about the people in the scene, or when two persons dress similarly. Nevertheless, the higher the number of features that we use, the slower the process will be, or the worse the performance of the robot. Therefore, we should find a trade-off amongst the requirement of real-time operation, and the

Discrimination algorithm

The discrimination algorithm pursues the differentiation of the demonstrator (target being followed by the robot) from the rest of the people moving in the same area (distractors). This algorithm runs inside the camera module (human discrimination block in Fig. 1). Basically, the discrimination algorithm will use the information of the torsos extracted from the people detected in the image to identify whether the target is present or not. To understand the process we must realise the fact that

Online feature weighting

This section describes the process called ‘feature weighting’ in Fig. 1. This process consists of dynamically selecting the most appropriate weights for each feature to adapt to the changes in the environment, such as the illumination conditions or people’s clothes, and thus enhance the discrimination of the target from the distractors. This process has been studied in the area of image retrieval with query relevance feedback, and consists of measuring the discrimination ability of each feature

Experimental results

We have implemented our proposal (Fig. 1) on a Pioneer P3DX mobile robot, equipped with a laser scanner (SICK-LMS200) and a camera (PointGrey Chameleon CMLN-13S2C with a FUJINON FUJIFILM Vari-focal CCTV Lens (omnidirectional)). The robot’s brain was a laptop with a Core 2 Duo P8600 (2.4 GHz) processor and 4 GB of RAM. The whole system (Fig. 1), excluding the human detection process (human detector block in Fig. 1), is able to run at about 20 ms.

We ran several experiments at the Department of

Conclusions

Real-time human recognition and tracking is very relevant in robotics, particularly in service robots requiring advanced human–robot interaction capabilities. This article shows an alternative to carry out this task for a robot that requires detecting and following a specific human (target) in crowded environments where the illumination conditions might change drastically.

We present a mechanism that combines colour and texture features to characterise human appearance. Our proposal is able to

V. Alvarez-Santos received the B.S. degree in Computer Science Engineering and the M.S. degree in Information Technologies from the University of Santiago de Compostela (Spain) in 2010, and 2011, respectively. He is currently doing his Ph.D. at the University of Santiago de Compostela. His research interests include human–robot interaction, robot vision, and robot learning.

References (40)

  • A. Yilmaz et al.

    Object tracking: a survey

    ACM Computing Surveys (CSUR)

    (2006)
  • M. Mucientes, W. Burgard, Multiple hypothesis tracking of clusters of people, in: Intelligent Robots and Systems, 2006...
  • K.O. Arras, S. Grzonka, M. Luber, W. Burgard, Efficient people tracking in laser range data using a multi-hypothesis...
  • M. Kleinehagenbrock, S. Lang, J. Fritsch, F. Lomker, G.A. Fink, G. Sagerer, Person tracking with a mobile robot based...
  • N. Bellotto et al.

    Multisensor-based human detection and tracking for mobile service robots

    IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics)

    (2009)
  • C. Wong, D. Kortenkamp, M. Speich, A mobile robot that recognizes people, in: Tools with Artificial Intelligence, 1995,...
  • S. Chen et al.

    A real-time face detection and recognition system for a mobile robot in a complex background

    Artificial Life and Robotics

    (2010)
  • S. Lang, M. Kleinehagenbrock, S. Hohenner, J. Fritsch, G.A. Fink, G. Sagerer, Providing the basis for...
  • H. Zhou et al.

    Target detection and tracking with heterogeneous sensors

    IEEE Journal of Selected Topics in Signal Processing

    (2008)
  • J. Fritsch, M. Kleinehagenbrock, S. Lang, G.A. Fink, G. Sagerer, Audiovisual person tracking with a mobile robot, in:...
  • Cited by (50)

    • Multi-sensor detection and control network technology based on parallel computing model in robot target detection and recognition

      2020, Computer Communications
      Citation Excerpt :

      A series of comparative experiments using standard genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony optimization (ABC) and PIO algorithm prove the robustness and effectiveness of the algorithm. Alvarez-Santos V [14] delved into the possibility of placing a color camera on top of the robot to identify different people, resulting in a reliable human follow-up behavior. Liu M Y [15] proposed a vision-based robotic storage picking system.

    • Identification of a specific person using color, height, and gait features for a person following robot

      2016, Robotics and Autonomous Systems
      Citation Excerpt :

      Possible types of information include color of clothing [6], height [7], face [8], gait [9,10], and skeletal information [11]. Many appearance features are used in image-based identification, for example HSV, Lab and XYZ color space histogram [12,13], Haar-like [14], HOG [15], LBP [15] and SIFT [16] features. Those features are, however, not applicable to severe illumination environments such as a strong backlight or darkness, where colors and edges are not reliably obtained.

    • A Little Bit Attention Is All You Need for Person Re-Identification

      2023, Proceedings - IEEE International Conference on Robotics and Automation
    View all citing articles on Scopus

    V. Alvarez-Santos received the B.S. degree in Computer Science Engineering and the M.S. degree in Information Technologies from the University of Santiago de Compostela (Spain) in 2010, and 2011, respectively. He is currently doing his Ph.D. at the University of Santiago de Compostela. His research interests include human–robot interaction, robot vision, and robot learning.

    X.M. Pardo is a Professor of Software and Computer Systems at the University of Santiago de Compostela. He received a Ph.D. in Physics from the same university in 1998, and he has been a post-doctoral research fellow at the Computer Vision Center (Barcelona, Spain), and the INRIA Sophia-Antipolis (France), between 1998 and 2000. His research interest include visual saliency, object and scene recognition, human activity recognition and machine learning. At the moment, he is mostly working on projects related to robot vision, photogrammetry and dynamic visual attention models.

    R. Iglesias received the B.S. and Ph.D. degrees in physics from the University of Santiago de Compostela, Spain, in 1996 and 2003, respectively. He is currently an Associate Professor in the Department of Electronics and Computer Science at the University of Santiago de Compostela, Spain. His research interests focus on control and navigation in mobile robotics, continuous robot and machine learning (mainly reinforcement learning and artificial neural networks), and scientific methods in mobile robotics (modelling and characterisation of robot behaviour).

    A. Canedo-Rodriguez received the B.S. degree in Telecommunication Engineering from the University of Vigo (Spain) in 2010 and the M.S. degree in Information Technologies from the University of Santiago de Compostela (Spain) in 2011. He is currently doing his Ph.D. at the University of Santiago de Compostela. His research interests focus on ubiquitous robotics, computer vision, and sensor networks applied to robotics.

    C.V. Regueiro received the B.S. and Ph.D. degrees in Physics from the University of Santiago de Compostela, Spain, in 1992 and 2002, respectively. Since December 1993 he has been an Associate Professor in the Faculty of Computer Science at the University of A Coruna, Spain, where he teaches undergraduates and graduates courses on computer architecture. His research interests focus on control architectures, perception, control, localization, navigation and machine learning in mobile robotics.

    This work was funded by the research grants TIN2009-07737 and INCITE08PXIB262202PR.

    View full text