Learning search heuristics for finding objects in structured environments
Highlights
► A technique for efficiently finding an object with a mobile robot in an unknown, structured environment. ► Improving search efficiency by learning domain-specific search heuristics. ► Search heuristics are learned in training environments and evaluated in an unknown test environment. ► Example scenario: searching for a product in an unknown supermarket.► Simulation and real-world experiments and a comparison to alternative methods and to the performance of humans.
Introduction
Consider the situation where you want to find a product in a supermarket that you have never been to before. Certainly, you will not just wander around randomly through the market nor will you systematically visit each so far unvisited aisle in the market until you find the product. Your search will rather be guided by the current observations and the expectations you have about how objects in supermarkets are usually arranged. Over time, you might even have developed some heuristics that proved to be useful for quickly finding a certain product, like “if you want to find yogurt, follow the aisle with the cooling shelves”.
The search for a product in an unknown supermarket is a problem that everyone is familiar with and it therefore is an illustrative instance of the kind of search problems we want to tackle with the techniques presented in this article. Even for humans [1] this task is not an easy one and we will therefore also compare our search techniques to the performance of human participants that took part in a field study conducted in a real supermarket [2]. However, the supermarket is just an example scenario. Searching for objects or places in offices or domestic environments is conceptually similar. All we assume is that the environment is structured and the object arrangements exhibit some spatial dependencies such that a generalization to an unknown yet similarly structured environment is possible. We regard this as a rather weak assumption that holds for a huge variety of man-made environments. We thus strive for a general way to model, learn, and utilize background knowledge such that a mobile robot is able to find an object more efficiently than it would have been possible without such domain-specific knowledge. For this, we present and evaluate two approaches and provide alternative views on the search problem.
The first approach is a reactive search technique that only depends on local information about the objects in the robot’s vicinity when deciding where to search next. This approach emphasizes the sequential nature of the search process, which is a sequential decision making process. Being in a certain state we must choose among a set of available actions. In this setting, background knowledge can be encoded as a state-to-action mapping, a policy, that tells us what to do in a certain situation. In the supermarket scenario, a state includes the currently observed objects in direction of the different aisles and the available actions correspond to the aisle we may choose to visit next. To learn this state-to-action mapping, we draw on ideas from imitation learning [3]. In particular, we want to imitate a simulated robot that exhibits an optimal search behavior by approaching the target object on the shortest path. In each visited state of a demonstrated example path, the robot takes a certain action and discards the other available actions in this state. Thereby, it provides positive and negative examples of state-action pairs to be taken or not, respectively. These examples can be used to learn a classifier for state-action pairs, which yields a classifier-based policy representation [4]. This might either be a multi-class classifier that directly outputs the action to be taken in a given state, or it might be a binary classifier that labels each available action in a state as promising or non-promising (if there is a tie, we may choose randomly among the promising actions). The latter has been empirically shown [4] to yield policies that perform better than the ones that are represented by multi-class classifiers. We use decision trees as binary classifiers which result in compact policy representations that resemble search heuristics like the above-mentioned heuristic for finding yogurt.
The second approach treats the search problem as an inference problem. This is motivated by the observation that we are constantly reasoning about the location of the object while searching for it. This reasoning process will be influenced by the thus far observed objects and structure of the environment as well as our expectations about usual object arrangements in such environments. In the supermarket scenario this means, for example, that if we are searching for beer and in one aisle we observe milk we may conclude that the beer is probably not in the same aisle. In this setting, background knowledge is encoded as expectations about how objects co-occur. However, co-occurrence of objects can only be defined with regard to a spatial context — like objects being “in the same aisle”, or one object being “in the neighboring aisle” of the other. Each particular spatial context induces a different local co-occurrence model. In general, there is no single best spatial context and object arrangements in real-world environments are too complex to be faithfully represented by any of these rather basic models alone. Nevertheless, each local model captures useful statistical properties of such object arrangements. Based on these considerations and motivated by the idea of combining an ensemble of base classifiers to form a more robust classifier, we proceed as follows: we use a diverse set of local co-occurrence models, each considering a different spatial relation, and fuse their outcomes as features in a maximum entropy model (MaxEnt) [5], [6] which in our case models the discrete distribution over all possible locations of the target object. The robot then essentially moves to the location which most likely contains the target object. Each time new information becomes available, e.g., newly detected objects or newly discovered parts of the environment, the robot recomputes the distribution.
These two approaches have quite different properties. The first approach uses only local information, as it depends only on the objects in the vicinity of the robot, while the second takes into account all observations made so far. Furthermore, the underlying model of the first approach is learned by observing optimal search behavior, while the model of the second is learned from object arrangements of similarly structured example environments.
This article is organized as follows. After discussing related work, Section 3 introduces the representation of the supermarket environments utilized by the first approach, the decision-tree strategy. Section 4 describes how the search heuristics of this strategy have been learned from data of optimal search paths. Section 5 then describes several alternative search strategies, including variants of the decision-tree strategy, the inference-based approach, and an exploration strategy that serves as a baseline approach. Finally, in Section 6 we present the results of an experimental evaluation including simulation and real-world experiments. The results demonstrate that our proposed techniques yield significantly shorter search paths than a search strategy that does not take domain-specific information into account.
Section snippets
Related work
There exists considerable theoretical work on general search problems in the fields of robotics and artificial intelligence [7] as well as operations research [8]. Finding an optimal search path in a graph that either minimizes the expected time to detection [9] or the expected search costs [10] is known to be NP-hard. Besides complexity considerations in theoretical work, some prior work evaluated proposed search strategies in simulation. The approach presented in [11], for example, used a
Modeling the environment
A supermarket contains a set of shelves and a graph , as illustrated in Fig. 1. Each shelf is associated with a location and an orientation . The relation associates each shelf with its corresponding market. Furthermore, we define a set of shelf types and each shelf is associated with exactly one type as defined by the relation . Each shelf contains at least one product and the same product might
Learning search heuristics
We are interested in learning a reactive search strategy that depends only on local information in order to find a certain target product. We therefore classify the outgoing edges of the current node by a decision tree into promising and non-promising directions based on the information associated with each edge. For learning such a decision tree, we first need to define appropriate edge attributes and then generate training data by observing optimal search paths in a training set of
Alternative search strategies
Besides discussing some variants of our proposed strategy based on decision trees, we describe two other search strategies–including the inference-based approach–that we evaluate in comparison to the decision tree strategy. The results will be presented in the next section.
Experimental evaluation
In the following, we present several experimental evaluations. The first experiment is aimed at comparing the performance of the different search strategies in comparison to the performance of humans searching in a real supermarket environment. The second experiment is aimed at a more thorough evaluation of the search strategies, though we do not have data from human participants for this setting. And finally, we present the results obtained by real-world experiments in which a robot
Conclusions
We presented two approaches for efficiently finding an object in an unknown environment. The first approach, which was the focus of this article, is a reactive search technique that only depends on local information about the objects in the robot’s vicinity when deciding where to search next. This strategy is based on search heuristics that can be learned from data of optimal search paths. As a proof of concept, we presented real-world experiments in which a mobile robot searched autonomously
Acknowledgements
This work has been supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) under contract number SFB/TR 8 Spatial Cognition (R6-[SpaceGuide]). The field study involving human participants in the real supermarket was carried out by Christopher Kalff and colleagues of the Center for Cognitive Science at the University of Freiburg, Germany. The support provided by Jörg Müller (Department of Computer Science, University of Freiburg) during the preparation of the
Dominik Joho is a research assistant at the University of Freiburg, Germany working in the Laboratory for Autonomous Intelligent Systems headed by Wolfram Burgard. He studied computer science at the University of Freiburg and received his diploma degree in 2007. He is currently working towards his Ph.D. degree. His general research interests lie in artificial intelligence, machine learning, and mobile robotics.
References (24)
- et al.
A survey of robot learning from demonstration
Robotics and Autonomous Systems
(2009) Optimal search with positive switch cost is NP-hard
Information Processing Letters
(1985)- et al.
Bayesian space conceptualization and place classification for semantic maps in mobile robotics
Robotics and Autonomous Systems
(2008) - et al.
Conceptual spatial representations for indoor mobile robots
Robotics and Autonomous Systems
(2008) - et al.
Robot task planning using semantic maps
Robotics and Autonomous Systems
(2008) - et al.
Consumer wayfinding tasks, strategies, and errors: An exploratory field study
Psychology and Marketing
(1996) - C. Kalff, G. Strube, Where is the fresh yeast? The use of background knowledge in human navigation, in: Spatial...
- I. Rexakis, M.G. Lagoudakis, Classifier-based policy representation, in: Proc. of the Int. Conf. on Machine Learning...
Information theory and statistical mechanics
The Physical Review
(1957)- et al.
A maximum entropy approach to natural language processing
Computational Linguistics
(1996)
Agent-centered search
AI Magazine
A survey of the search theory literature
Naval Research Logistics
Cited by (32)
Generalized Object Search
2023, arXivKimera: From SLAM to spatial perception with 3D dynamic scene graphs
2021, International Journal of Robotics ResearchSearching for objects in human living environments based on relevant inferred and mined priors*
2021, 2021 10th European Conference on Mobile Robots, ECMR 2021 - ProceedingsEfficient object search through probability-based viewpoint selection
2020, IEEE International Conference on Intelligent Robots and Systems
Dominik Joho is a research assistant at the University of Freiburg, Germany working in the Laboratory for Autonomous Intelligent Systems headed by Wolfram Burgard. He studied computer science at the University of Freiburg and received his diploma degree in 2007. He is currently working towards his Ph.D. degree. His general research interests lie in artificial intelligence, machine learning, and mobile robotics.
Martin Senk studies computer science at the University of Freiburg, Germany. He received his bachelor’s degree from the University of Freiburg in 2009 and is currently working towards his master’s degree.
Wolfram Burgard is a professor for computer science at the University of Freiburg, Germany where he heads the Laboratory for Autonomous Intelligent Systems. He received his Ph.D. degree in computer science from the University of Bonn in 1991. His areas of interest lie in artificial intelligence and mobile robots. In the past, Wolfram Burgard and his group developed several innovative probabilistic techniques for robot navigation and control. They cover different aspects such as localization, map-building, path-planning, and exploration. For his work, Wolfram Burgard received several best paper awards from outstanding national and international conferences. In 2009, Wolfram Burgard received the Gottfried Wilhelm Leibniz Prize, the most prestigious German research award.