1 Introduction

Negotiation is a process in which parties interact to settle a mutual concern to improve their status quo. Negotiation is a core activity in human society, and is studied by various disciplines, including economics [147, 158], artificial intelligence [69, 91, 106, 107, 118, 182], game theory [19, 69, 91, 118, 120, 147, 172], and social psychology [170].

Traditionally, negotiation is a necessary, but time-consuming and expensive activity. Therefore, in the last two decades, there has been a growing interest in the automation of negotiation and e-negotiation systems [18, 71, 91, 99, 107], for example in the setting of e-commerce [20, 79, 105, 126]. This attention has been growing since the beginning of the 1980s with the work of early adopters such as Smith’s Contract Net Protocol [186], Sycara’s persuader [189, 190], Robinson’s oz [164], and the work by Rosenschein [168] and Klein [102]. The interest is fueled by the promise of automated agents being able to negotiate on behalf of human negotiators and to find better outcomes than human negotiators [20, 55, 89, 121, 126, 149, 192].

The potential benefits of automation include reduced time and negotiation costs resulting from automation [3335, 126], a potential increase in negotiation usage when the user can avoid social confrontation [27, 126], the ability to improve the negotiation skills of the user [80, 121, 125], and the possibility of finding more interesting deals by exploring more promising portions of the outcome space [80, 126].

One of the key challenges for a successful negotiation is that usually only limited information is available about the opponent [143]. Despite the fact that sharing private information can result in mutual gains, negotiators are unwilling to share information in situations with a competitive aspect to avoid exploitation by the other party [50, 75, 81, 158]. In an automated negotiation, this problem can be partially overcome by deriving information from the offers that the agents exchange with each other. Taking advantage of this information to learn aspects of the opponent is called opponent modeling.Footnote 1

Having a good opponent model is a key factor in improving the quality of the negotiation outcome and can further increase the benefits of automated negotiation, including the following: reaching win-win agreements [90, 123, 206]; minimizing negotiation cost by avoiding non-agreement [151, 153, 183, 184]; and finally, avoiding exploitation by adapting to the opponent’s behavior during the negotiation [57, 85, 199]. Experiments have shown that by employing opponent models, automated agents can reach more efficient outcomes than human negotiators [22, 124, 149].

Besides improving the quality of the negotiation process, opponent models are essential for the transition of automated negotiation from theory to practice. It has been shown that non-adaptive agents are exploitable given a sufficiently large negotiation history as their behavior becomes predictable [24, 135]. The risk of exploitation can be minimized by creating adaptive agents that use opponent models to adapt their behavior.

Despite the advantages of creating an opponent model and two decades of research, there is no recent study that provides either an overview of the field, or a comparison of different opponent modeling techniques. Therefore, in order to stimulate the development of efficient future opponent models, and to outline a research agenda for the field of opponent modeling in negotiation, this survey provides an overview of existing opponent models and their underlying concepts. It discusses how to select the best model depending on the negotiation setting, and identifies a number of problems that are still open. One of our major findings is that despite the variety in opponent modeling techniques, most current models rely on a small, common set of learning techniques. Furthermore, it turns out that there are only four types of opponent attributes that are learned by these techniques.

Apart from employing different techniques to build an opponent model, different benchmarks have been used to test the effectiveness of opponent models. This makes it particularly difficult to compare present techniques. An additional contribution of this work is to give an exhaustive overview of measures that are used throughout the literature. We distinguish two types of measures, and we recommend which measures to use to reliably quantify the quality of an opponent model.

The opponent modeling techniques discussed in this work are applicable to a large diversity of negotiation protocols. Protocols may differ in many aspects, including the domain configuration, the number of agents, state of issue(s), and availability of information. Table 1 provides an overview of the scope of this survey; distinguishing the main parameters of negotiation protocols as defined by Lomuscio et al. [126] and Fatima et al. [65].

Table 1 Types of negotiation settings discussed in this work. Classification based on Lomuscio et al. [126]

Finally, note that the problems involved in automated negotiation are very different from human negotiation. In negotiation sessions between humans, only as little as ten bids may be exchanged, whereas negotiating agents may exchange thousands of bids in less than a minute. Humans may compensate for this lack of information exchange by explicitly communicating information about their preferences, both verbally and nonverbally. To delimit our scope, we do not discuss attributes that are relevant in human negotiations but are not yet used in automated negotiation, such as emotions [54, 100].

The remainder of our work is organized as follows. We start by providing an overview of related surveys in Sect. 2. Section 3 sets out the basic concepts of bilateral negotiation. Section 4 describes the fundamentals underlying the learning methods that have been applied to construct an opponent model. Different opponent models are created to learn different negotiation aspects; we introduce our taxonomy of the various concepts that are learned, and how they are learned, in Sect. 5. Section 6 provides recommendations on how to measure the quality of an opponent model. Finally, in Sect. 7 we cover the lessons learned, we examine the latest trends, and we provide directions for future work.

2 Related surveys

The field of automated negotiation has produced over 2000 papers in the last two decades. This work covers the period from the first opponent models introduced around 1997 (cf. [205]) to the latest models developed in 2014 (cf. [37, 46, 76, 77, 92]). During this period, several surveys have been conducted that are related to our work, including surveys by Beam and Segev [18], Papaioannou et al. [152], Masvoula et al. [134], Yang [203], and Chen and Pu [42]. Our work incorporates all techniques for bilateral negotiations covered in these surveys, as we consider various types of opponent models based on multiple different learning techniques, including Bayesian learning and artificial neural networks. In comparison to these surveys, we discuss a larger body of research and categorize the opponent models based on the aspect of the opponent they aim to model. Furthermore, we provide an overview of measures used to quantify the quality of opponent models, and provide guidelines on how to apply these metrics.

Beam and Segev surveyed the state of the art in automated negotiation in 1997 [18]. Their work describes machine learning techniques applied by intelligent negotiation agents, mainly discussing the potential of genetic algorithms to learn an effective negotiation strategy. Their survey naturally misses out on more recent developments, such as on-line opponent modeling techniques used in one-shot negotiations, as for example introduced by Buffett and Spencer [31, 32] and Hindriks and Tykhonov [83]. More recently, Papaioannou et al. surveyed learning techniques based on neural networks to model the opponent’s behavior in both bilateral and multilateral negotiations [152]. Masvoula et al. also surveyed learning methods to enhance the strategies of negotiation agents [134]. One of the strengths of their survey is that it provides a comprehensive overview of learning methods. The modeling techniques are divided based on the type of strategy in which they are applied. Finally, Chen and Pu survey preference elicitation methods for user modeling in decision support systems [42]. The goal of these systems is to capture the user’s preferences in a setting in which the user is willing to share their preferences, or at least does not try to misrepresent them. While the goal of decision support systems differs from opponent modeling in automated negotiation, similar learning techniques—such as pattern matching—are used to estimate the user’s or opponent’s preferences.

A number of surveys have been conducted on the general topic of automated negotiation, for example by Jennings et al. [91], Kraus [107], Braun et al. [25], and Li et al. [118]. Jennings et al. argue that automated negotiation is a main concern for multi-agent system research [91]; and Kraus examines economic theory and game-theory techniques for reaching agreements in multi-agent environments [107]. Braun et al. review electronic negotiation systems and negotiation agents, concisely describing how learning techniques have been used to learn characteristics of the opponent [25]. Li et al. distinguish different types of negotiation, and briefly discuss opponent modeling [118]. Despite the wide scope of all of the surveys above, their discussion of opponent modeling is limited.

Negotiation is also studied as an extensive-form game within the game theory literature [118], a field of study founded on the work by Nash [141] and Raiffa [157]. In cooperative game theory, the aim is to jointly find a solution within the outcome space that satisfies particular axioms, an example being the Nash outcome (see also Sect. 6.1 on accuracy measures, p. 36) that satisfies the Nash axioms [141]. Non-cooperative game theory is concerned with identifying rational behavior using the concept of strategy equilibrium: the state in which for every agent it is not beneficial to change strategy assuming the other agents do not switch their tactic [148].

The game theory literature on the topic of negotiation is vast. For an overview we refer to Binmore and Vulkan [19]; Li and Giampapa [118]; and Chatterjee [40]. One prominent example of game theoretic negotiation research is by Rubinstein [172], who considers an alternating offers negotiation protocol without deadline in which two agents negotiate about the division of a pie. Another example is the work by Zlotkin and Rosenschein [208], which investigates a monotonic concession strategy that results in a strategy equilibrium. As outlined in [64], agents do not typically perform opponent modeling in the game theoretic model, but instead determine their strategy through theoretical analysis, which is possible because of the assumption of perfect rationality. The assumption of common knowledge—an assumption typically made in cooperative game theory—can lead to difficulties in practice [40, 41, 64, 118] as competitive agents aim to not share information to prevent exploitation [50, 75, 81, 158]. Other practical issues include the computational intractability of full agent rationality [40, 41, 91, 118] and the applicability of game theoretical results to specific negotiation settings only [19, 91, 118]. Despite these concerns, several authors have promoted the application of game theory results in the design of heuristic and learning negotiation strategies [91, 118]. For instance, evolutionary game theory (EGT) is a framework to describe the dynamics and evolution of strategies under the pressure of natural selection [178]. In this approach, negotiating agents can learn the best strategy through repeated interactions with their opponents. This has just started to make its impact on research into the negotiation dynamics of multi-agent bargaining settings [10, 43]. In Sect. 6.2, we discuss EGT as a way to quantify the robustness of a negotiation strategy to exploitability in an open negotiation environment.

An interesting area, although out of scope of this paper, is that of user modeling in general (see e.g., [136] for a survey by McTear on the topic), and in particular that of using machine learning of dialogue-management strategies by Schatzmann and colleagues [180]. McTear’s work surveys artificial intelligence techniques applied to user modeling and is by now 20 years old (a newer one has not been published to date). Characteristics of users modeled by AI techniques include goals, plans, capabilities, attitudes, preferences, knowledge, and beliefs. The relevant parts with respect to our survey are the preference profiling and the distinction between learning models of individual users versus models for classes of users, and between models for one session and models maintained and updated over several sessions.

A survey on preference modeling is by Braziunas and Boutilier [26] and focuses on direct elicitation methods; i.e., by asking direct questions to the user and is therefore not in the scope of this paper. Schatzmann’s survey [180] addresses systems and methods to learn a good dialogue strategy for which automatic user simulation tools are essential. The methods to learn these strategies can be relevant for argumentation-based negotiation systems.

Another related area of research is the topic of machine learning techniques in game playing; e.g., checkers, rock-paper-scissors, scrabble, go, and bridge. Fürnkranz argues that opponent modeling has not yet received much attention in the computer games community [67]—take for example chess, in which opponent modeling is not a critical component. However, it is essential in others, such as computer poker [171]. This is due to the fact that, as in negotiation, maximizing the reward against an effectively exploitable opponent is potentially more beneficial than exhibiting optimal play [171]. These surveys make several distinctions that we do, such as offline and online learning, and they employ many techniques that can also be used in negotiation, such as Bayesian learning and neural networks.

3 Preliminaries

Before we discuss opponent models, we first introduce the terminology used throughout the paper. The defining elements of a bilateral negotiation are depicted in Fig. 1. A bilateral automated negotiation concerns a negotiation between two agents, usually called A and B or buyer and seller. The party that is negotiated with is also called the partner or opponent.

The negotiation setting consists of the negotiation protocol, the negotiating agents, and the negotiation scenario. The negotiation protocol defines the rules of encounter to which the negotiating agents have to adhere. The negotiation scenario takes place in a negotiation domain, which specifies all possible outcomes (the so-called outcome space). The negotiation agents have a preference profile, which expresses the preference relations between the possible outcomes. Together, this defines the negotiation scenario that takes place between the agents. The negotiation scenario and protocol specify the possible actions an agent can perform, given the negotiation state.

Fig. 1
figure 1

Overview of the defining elements of an automated bilateral negotiation

3.1 Negotiation domain

The negotiation domain—or outcome space—is denoted by \({\varOmega }\) and defines the set of possible negotiation outcomes. The domain size is the number of possible outcomes \(|{\varOmega }|\). A negotiation domain consists of one or more issues, which are the main resources or considerations that need to be resolved through negotiation; for example, the price or the color of a car that is for sale. Issues are also sometimes referred to as attributes, but we reserve the latter term for opponent attributes, which are properties that may be useful to model to gain an advantage in a negotiation.

To reach an agreement, the agents must settle on a specific alternative or value for each negotiated issue. That is, an agreement on n issues is an outcome that is accepted by both parties of the form \(\omega =\langle \omega _1,\ldots , \omega _n\rangle \), where \(\omega _i\) denotes a value associated with the ith issue. We will focus mainly on settings with a finite set of discrete values per issue. A partial agreement is an agreement on a subset of the issues. We say that an outcome space defined by a single issue is a single-issue negotiation, and a multi-issue negotiation otherwise.

Negotiating agents can be designed either as general purpose negotiators, that is, domain-independent [122] and able to negotiate in many different settings, or suitable for only one specific domain (e.g., the Colored Trail domain [66, 68], or the Diplomacy game [52, 56, 108]). There are obvious advantages to having an agent designed for a specific domain: it enables the agent designer to construct more effective strategies that exploit domain-specific information. However, this is also one of the major weaknesses, as such agents need to be tailored to every new available domain and application; this is why many of the agents and learning mechanisms covered in this survey are domain-independent.

3.2 Negotiation protocol

A negotiation protocol fixes the rules of encounter [169], specifying which actions each agent can perform at any given moment. Put another way, it specifies the admissible negotiation moves. The literature discussed in this survey assumes that the protocol is shared knowledge, and that the agents strictly adhere to it. Our focus here is on bilateral negotiation protocols. For other work in terms of one-to-many and many-to-many negotiations (for example to learn when to pursue more attractive outside options in a setting with multiple agents), we refer to [3, 119, 142, 147, 156, 183]. We do not aim to provide a complete overview of all protocols, instead we recommend Lomuscio et al. [126] for an overview of high-level parameters used to classify them, and Marsa-Maestre et al. [128] for guidelines on how to choose the most appropriate protocol to a particular negotiation problem.

An often used negotiation protocol in bilateral automated negotiation is the alternating offers protocol, which is widely studied and used in the literature, both in game-theoretic and heuristic settings (a non-exhaustive list includes [61, 107, 109, 147, 148]). This protocol dictates that the two negotiating agents propose outcomes, also called bids or offers, in turns. That is, the agents create a bidding history: one agent proposes an offer, after which the other agent proposes a counter-offer, and this process is repeated until the negotiation is finished, for example by time running out, or by one of the parties accepting.

In the alternating offers setting, when agent A receives an offer \(x_{B\rightarrow A}\) from agent B, it has to decide at a later time whether to accept the offer, or to send a counter-offer \(x_{A\rightarrow B}\). Given a bidding history between agents A and B, we can express the action performed by A with a decision function [62, 181]. The resulting action is used to extend the current bidding history between the two agents. If the agent does not accept the current offer, and the deadline has not been reached, it will prepare a counter-offer by using a negotiation strategy or tactic to generate new values for the negotiable issues (see Sect. 3.6).

Various alternative versions of the alternating offers protocol have been used in automated negotiation, extending the default protocol, and imposing additional constraints; for example, in a variant called the monotonic concession protocol [143, 169], agents are required to initially disclose information about their preference order associated with each issue and the offers proposed by each agent must be a sequence of concessions, i.e.: each consecutive offer has less utility for the agent than the previous one. Other examples are the three protocols discussed by Fatima et al. [65] that differ in the way the issues are negotiated: simultaneously in bundles, in parallel but independently, and sequentially. The first alternative is shown to lead to the highest quality outcomes. A final example is a protocol in which only one offer can be made. In such a situation, the negotiation can be seen as an instance of the ultimatum game, in which a player proposes a deal that the other player may only accept or refuse [185]. In [176], a similar bargaining model is explored as well; that is, models with one-sided incomplete information and one sided offers. It investigates the role of confrontation in negotiations and uses optimal stopping to decide whether or not to invoke conflict.

3.3 Preference profiles

Negotiating agents are assumed to have a preference profile, which is a preference order \(\ge \) that ranks the outcomes in the outcome space. Preferences are said to be ordinal when they are fully specified by a preference order. Together with the domain they make up the negotiation scenario.

In many cases, the domain and preferences stay fixed during a single negotiation encounter, but while the domain is common knowledge to the negotiating parties, the preferences of each player are private information. This means that the players do not have access to the preferences of the opponent. In this sense, the negotiators play a game of incomplete information. However, the players can attempt to learn as much as they can during the negotiation encounter.

An outcome \(\omega '\) is said to be weakly preferred over an outcome \(\omega \) if \(\omega ' \ge \omega \). If in addition \(\omega \not \ge \omega '\), then \(\omega '\) is strictly preferred over \(\omega \), denoted \(\omega ' > \omega \). An agent is said to be indifferent between two outcomes if \(\omega ' \ge \omega \) and \(\omega \ge \omega '\). In that case, we also say that these outcomes are equally valued and we write \(\omega ' \sim \omega \). An indifference curve or iso-curve is a set of outcomes that are equally valued by an agent. In a total preference order, one outcome is always (weakly) preferred over the other outcome for any outcome pair, which means there are no undefined preference relations. Finally, an outcome \(\omega \) is Pareto optimal if there exists no outcome \(\omega '\) that is preferred by an agent without making another agent worse off [158]. For two players A and B with respective preference orders \(\ge _A\) and \(\ge _B\), this means that there is no outcome \(\omega '\) such that:

$$\begin{aligned} \left( \omega ' >_A \omega \wedge \omega ' \ge _B \omega \right) \vee \left( \omega ' >_B \omega \wedge \omega ' \ge _A \omega \right) . \end{aligned}$$

An outcome that is Pareto optimal is also said to be Pareto efficient. When an outcome is not Pareto efficient, there is potential, through re-negotiation, to reach a more preferred outcome for at least one of the agents without reducing the value for the other.

The outcome space can become quite large, which means it is usually not viable to explicitly state an agent’s preference for every alternative. For this reason, there are more succinct preference representations for preferences [48, 53].

A well-known and compact way to represent preference orders is the formalism of conditional preference networks (CP-nets) [23]. CP-nets are graphical models, in which each node represents an negotiation issue and each edge denotes preferential dependency between issues. If there is an edge from issue i to issue j, the preferences for j depend on the specific value for issue i. To express conditional preferences, each issue is associated with a conditional preference table, which represents a total order of possible values for that issue, given its parents’ values.

A preference profile may be specified as a list of ordering relations, but it is more common in the literature to express the agent’s preferences by a utility function. A utility function assigns a utility value to every possible outcome, yielding a cardinal preference structure.

Cardinal preferences are ‘richer’ than ordinal preferences in the sense that ordinal preferences can only compare between different alternatives, while cardinal preferences allow for expressing the intensity of every preference [48]. Any cardinal preference induces an ordinal preference, as every utility function u defines an order \(\omega ' \ge \omega \) if and only if \(u(\omega ') \ge u(\omega )\).

Some learning techniques make additional assumptions about the structure of the utility function [98], the most common in negotiation being that the utility of a multi-issue outcome is calculated by means of a linear additive function that evaluates each issue separately [98, 158, 159]. Hence, the contribution of every issue to the utility is linear and does not depend on the values of other issues. The utility \(u(\omega )\) of an outcome \(\omega =\langle \omega _1,\ldots ,\omega _n\rangle \in {\varOmega }\) can be computed as a weighted sum from evaluation functions \(e_i(\omega _i)\) as follows:

$$\begin{aligned} u(\omega ) = \sum _{i=1}^n w_i\cdot e_i(\omega _i), \end{aligned}$$
(1)

where the \(w_i\) are normalized weights (i.e. \(\sum w_i = 1\)). Linear additive utility functions make explicit that different issues can be of different importance to a negotiating agent and can be used to efficiently calculate the utility of a bid at the cost of expressive power, as they cannot represent interaction effects (or dependencies) between issues.

A common alternative is to make use of non-linear utility functions to capture more complex relations between offers at the cost of additional computational complexity. Non-linear negotiation is an emerging area within automated negotiation that considers multiple inter-dependent issues [88, 129]. Typically this leads to larger, richer outcome spaces in comparison to linear additive utility functions. A key factor in non-linear spaces is the ability of a negotiator to make a proper evaluation of a proposal, as the utility calculation of an offer might even prove NP-hard [52]. Examples of this type of work can be found in [87, 101, 127, 166].

For non-linear utility functions in particular, a number of preference representations have been formulated to avoid listing the exponentially many alternatives with their utility assessment [48]. The utility of a deal can be expressed as the sum of the utility values of all the constraints (i.e., regions in the outcome space) that are satisfied [87, 130]. These constraints may in turn exhibit additional structure, such as being represented by hyper-graphs [74]. One can also decompose the utility function into subclusters of individual issues, such that the utility of an agreement is equal to the sum of the sub-utilities of different clusters [166]. This is a special case of a utility structure called k-additivity, in which the utility assigned to a deal can be represented as the sum of basic utilities of subsets with cardinality \(\le k\) [49]. For example, for \(k = 2\), the utility \(u(\omega _1, \omega _2, \omega _3)\) might be expressed as the utility value of the individual issues \(u_1(\omega _1) + u_2(\omega _2) + u_3(\omega _3)\) (as in the linear additive case), plus their 2-way interaction effects \(u_4(\omega _1, \omega _2) + u_5(\omega _1, \omega _3) + u_6(\omega _2, \omega _3)\). This is in turn closely related to the OR and XOR languages for bidding in auctions [144], in which the utility is specified for a specific set of clusters, together with rules on how to combine them into utility functions on the whole outcome space.

Finally, the preference profile of an agent may also specify a reservation value. The reservation value is the minimal utility that the agent still deems an acceptable outcome. That is, the reservation value is equal to the utility of the best alternative to no agreement. A bid with a utility lower than the reservation value should not be offered or accepted by any rational agent. In a single-issue domain, the negotiation is often about the price P of a good [59, 62, 205, 206]. In that case, the agents usually take the roles of buyer and seller, and their reservation values are specified by their reservation prices; i.e., the highest price a buyer is willing to pay, and the lowest price at which a seller is willing to sell.

3.4 Time

Time in negotiation is limited, either because the issues under negotiation may expire, or one or more parties are pressing for an agreement [39]. Without time pressure, the negotiators have no incentive to accept an offer, and so the negotiation might go on forever. Also, with unlimited time an agent may simply try a large number of proposals to learn the opponent’s preferences. The deadline of a negotiation refers to the time before which an agreement must be reached [158]. When the deadline is reached, the negotiators revert to their best alternative to no agreement.

The negotiator’s nearness to a deadline is only one example of time pressure [38], which is defined as a negotiator’s desire to end the negotiation quickly [154]. An alternative way to model time pressure is to supplement the negotiation scenario with a discount factor, which models the decline of the negotiated goods over time. Let \(\delta \) in [0, 1] be the discount factor and let t in [0, 1] be the current normalized time. A way to compute the real-time discounted utility \(u^\delta (\omega )\) from the undiscounted utility \(u(\omega )\) is as follows:

$$\begin{aligned} u^\delta (\omega ) = u(\omega ) \cdot \delta ^t. \end{aligned}$$
(2)

If \(\delta = 1\), the utility is not affected by time, and such a scenario is considered to be undiscounted, while if \(\delta \) is very small, there is high pressure on the agents to reach an agreement.

Alternatively, time may be viewed as a discrete variable, in which the number of negotiation exchanges (or rounds) are counted. In that case, the deadline is specified as a maximum number of rounds n and discounting is applied in every round \(k \le n\) as \(u^\delta (\omega ) = u(\omega ) \cdot \delta ^k\). Note that, from a utility point of view, the presence of a discount factor \(\delta \) is equivalent to the probability \(1 - \delta \) that the opponent walks away from the negotiation in any given negotiation round.

Deadlines and discount factors can have a strong effect on the outcome of a negotiation and may also interact with each other. For example, it is shown in [177] that in a game-theoretic setting with fully rational play, time preferences in terms of deadlines may lead to a game of ‘sit and wait’ and may completely override other effects such as time discounting.

3.5 Outcome spaces

A useful way to visualize the preferences of both players simultaneously is by means of an outcome space plot (Fig. 2). The axes of the outcome space plot represent the utilities of player A and B, and every possible outcome \(\omega \in {\varOmega }\) maps to a point \((u_A(\omega ), u_B(\omega ))\). The line that connects all of the Pareto optimal agreements is the Pareto frontier.

Note that the visualization of the outcome space together with the Pareto frontier is only possible from an external point of view. In particular, the agents themselves are not aware of the opponent utility of bids in the outcome space and do not know the location of the Pareto frontier.

Fig. 2
figure 2

A typical example of an outcome space between agents A and B. The points represent all outcomes that are possible in the negotiation scenario. The line is the Pareto frontier, which connects all of the Pareto efficient outcomes

From Fig. 2 we can immediately observe certain characteristics of the negotiation scenario that are very important for the learning behavior of an agent. Examples include the domain size, the relative occurrence of Pareto optimal outcomes, and whether the bids are spread out over the domain.

3.6 Negotiation tactics

The bidding strategy, also called negotiation tactic or concession strategy, is usually a complex strategy component. Two types of negotiation tactics are very common: time-dependent tactics and behavior-dependent tactics. Each tactic uses a decision function, which maps the negotiation state to a target utility. Next, the agent can search for a bid with a utility close to the target utility and offer this bid to the opponent.

3.6.1 Time-dependent tactics

Functions which return an offer solely based on time are called time-dependent tactics. The standard time-dependent strategy calculates a target utility u(t) at every turn, based on the current time t. Perhaps the most popular time-based decision function can be found in [59, 61], which, depending on the current normalized time \(t \in [0, 1]\), makes a bid with utility closest to

$$\begin{aligned} u(t) = P_{min} + (P_{max} - P_{min}) \cdot (1 - F(t)), \end{aligned}$$
(3)

where

$$\begin{aligned} F(t) = k + (1 - k) \cdot t^{1 / e}. \end{aligned}$$

The constants \(P_\text {min}, P_\text {max} \in [0,1]\) control the range of the proposed offers, and \(k \in [0, 1]\) determines the value of the first proposal. For \(0 < e < 1\), the agent concedes only at the end of the negotiation and is called a Boulware negotiation tactic. If \(e \ge 1\), the function concedes quickly to the reservation value, and the agent is then called a Conceder. Figure 3 shows a plot of several time-dependent tactics for varying concession factors e.

Fig. 3
figure 3

Target utility through time of time-dependent tactics with concession factor \(e \in \{0.2, 0.5, 1, 2, 5\}\)

The specification of these strategies given in [59, 61] does not involve any opponent modeling; that is, given the target utility, a random bid is offered with a utility closest to it.

3.6.2 Baseline tactics

The Hardliner strategy (also known as take-it-or-leave-it, sit-and-wait [4] or Hardball [117]) can be viewed as an extreme type of time-dependent tactic. This strategy stubbornly makes a bid of maximum utility for itself and never concedes, at the risk of reaching no agreement.

Random Walker (also known as the Zero Intelligence strategy [70]) generates random bids and thus provides the extreme case of a maximally unpredictable opponent. Because of its limited capabilities, it can also serve as a useful baseline strategy when testing the efficacy of other negotiation strategies.

3.6.3 Behavior-dependent tactics

Faratin et al. introduce a well-known set of behavior-dependent tactics or imitative tactics in [59]. The most well-known example of a behavior-dependent tactic is the Tit for Tat strategy, which tries to reproduce the opponent’s behavior of the previous negotiation rounds by reciprocating the opponent’s concessions. Thus, Tit for Tat is a strategy of cooperation based on reciprocity [5].

Tit for Tat has been applied and found successful in many other games, including the Iterated Prisoner’s Dilemma game [6]. In total three tactics are defined: Relative Tit for Tat, Random Absolute Tit for Tat, and Averaged Tit for Tat. The Relative Tit for Tat agent mimics the opponent in a percentage-wise fashion by proportionally replicating the opponent’s concession that was performed a number of steps ago.

The standard Tit for Tat strategies from [59] do not employ any learning methods, but this work has been subsequently extended by the Nice Tit for Tat agent [15] and the Nice Mirroring Strategy [81]. These strategies achieve more effective results by combining a simple Tit for Tat response mechanism with learning techniques to propose offers closer to the Pareto frontier.

4 Learning methods for opponent models

An extensive set of learning techniques have been applied in automated negotiation. Below we provide an introduction to the most commonly used underlying methods. Those that are already familiar with these techniques can skip to the next section.

The first two sections discuss Bayesian Learning (Sect. 4.1) and Non-linear Regression (Sect. 4.2). Both methods have mainly been applied as an online learning technique, because they do not require a training phase to produce a reasonable estimate, and because their estimates can be improved incrementally during the negotiation.

In contrast, the other two methods, Kernel Density Estimation (Sect. 4.3) and Artificial Neural Networks (Sect. 4.4), generally require a training phase, and are mainly applied when a record of the negotiation history is available. With these methods, it is computationally inexpensive to take advantage of the learned information during the negotiation.

4.1 Bayesian learning

Bayesian learning is the most prominent probabilistic approach in opponent modeling. Bayesian learning is based on Bayes’ rule:

$$\begin{aligned} P(H \mid E) = \frac{P(E \mid H)\cdot P(H)}{P(E)}. \end{aligned}$$
(4)

Bayes’ rule is a tool for updating the probability that a hypothesis H holds based on observed evidence E. In the formula above, \(P(H \mid E)\) is the posterior probability that the hypothesis H holds given evidence E, and \(P(E \mid H)\) is called the conditional probability of an event E occurring given the hypothesis H. P(H) denotes the prior probability for the hypothesis, independent of any evidence, and similarly, P(E) denotes the prior probability that evidence E is observed.

Bayesian learning is typically used to identify the most likely hypothesis \(H_i\) out of a set of hypotheses \(\mathcal {H}=\{H_1,\ldots ,H_n\}\). In the negotiation literature typically a finite set of hypotheses is assumed, for example the type of the opponent. In that case the likelihood of the hypotheses given observed evidence E can be determined using the alternative formulation of the Bayes’ rule:

$$\begin{aligned} P(H_i \mid E) = \frac{P(E \mid H_i)\cdot P(H_i)}{\sum _{j=1}^n P(E \mid H_j) \cdot P(H_j)}. \end{aligned}$$
(5)

An agent can formulate a set of independent hypotheses about a property of the opponent and discover—using evidence—which hypothesis is most likely valid. The idea is that each time new evidence E is observed, we can use Equation (5) to update and compute an improved estimate of the posterior probability \(P(H_i \mid E)\). After processing the evidence, an agent can conclude which hypothesis is most probable.

One disadvantage of using Bayesian learning is its computational complexity. Updating a single hypothesis \(H_i\) given a piece of evidence \(E_k\) may have a low computational complexity; however, there may be many such hypotheses \(H_i\), and pieces of evidence \(E_k\). For example, when modeling the opponent’s preferences, this set of hypotheses can be custom made, or generated from the structure of the functions assumed to model the preferences. Even in a negotiation scenario with linear additive utility functions, modeling the preferences requires a set of preference profiles for each negotiable issue. This already leads to a number of hypotheses that is exponential in the number of issues. Another challenge lies in defining the right input for the learning method (e.g. finding a suitable representation of the opponent’s preference profile); in general it is not straightforward to define a suitable class of hypotheses, and it may be hard to determine the conditional probabilities.

4.2 Non-linear regression

Non-linear regression is a broad field of research, and we only present the aspects needed for the application of this technique to opponent modeling. We provide a brief introduction based on [138]. For a more complete overview of the field of non-linear regression, we refer to [17].

Non-linear regression is used to derive a function which “best matches” a set of observational sample data. It is employed when we expect the data to display a certain functional relationship between input and output, from which we can then interpolate new data points. A typical negotiation application is to estimate the opponent’s future behavior from the negotiation history assuming that the opponent’s bidding strategy uses a known formula with unknown parameters.

A simple non-linear regression model consists of four elements: the dependent (or response) variable, the independent (or predictor) variables, the (non-linear) formula, and its parameters. To illustrate, suppose we have a set of observations as shown in Fig. 4, and we want to find the relationship between x and y in order to predict the value of y for new values of x. Suppose the relationship is believed to have the form \(y'(x) = ax^2 + bx + c\), where a and b are parameters with unknown values. In this formula, \(y'\) is the dependent variable and x is the independent variable. Using non-linear regression, we can estimate the parameters a and b such that the error between the predicted \(y'\) values and the observed y values is minimized. The error is calculated using a loss function. In the negotiation literature typically the error is calculated as the sum of squared differences between the predicted and observed values. Alternative loss functions may for example calculate the absolute difference, or treat positive and negative errors differently.

The parameters for the quadratic formula discussed in this example can be solved using a closed-form expression. Non-linear regression is typically used when this is not possible, for example when there are a large number of parameters that have a non-linear relation with the solution. The calculation of the parameters is based on an initial guess of the parameters, after which an iterative hill-climbing algorithm is applied to refine the guess until the error becomes negligible. Commonly used algorithms are the Marquardt Method and the simplex algorithm. An introduction to both these methods is provided by Motulsky and Ransnas [138]. The main problem with hill-climbing algorithms is that they can return a local optimum instead of the global optimum. Furthermore, in extreme cases the algorithm may even not converge at all. This can be resolved by using multiple initial estimates and selecting the best fit after a specified amount of iterations.

4.3 Kernel density estimation

Kernel density estimation (KDE) is a mathematical technique used to estimate the probability distribution of a population given a set of population samples [50]. Figure 5 illustrates the estimated probability density function constructed from six observations.

The first step of KDE consists of converting each sample into a so-called kernel function. A kernel function is a probability distribution which quantifies the uncertainty of the observation. Common choices for kernels are the standard normal distribution or the uniform probability distribution. The second step is to accumulate all kernels to estimate the probability distribution of the population.

Fig. 4
figure 4

Example of a non-linear regression based on a polynomial of the second degree. The best fit is shown as the black line

Fig. 5
figure 5

Example of the use of KDE. Observations are marked with a line on the x axis. The dashed lines mark the standard normal kernels that sum up to the probability density estimation indicated by the solid line

While KDE makes no assumptions about the values of the samples, or in which order the samples are obtained, the kernel function typically requires a parameter called bandwidth, which determines the width of each kernel. When a large number of samples is available over the complete range of the variable of interest, then a small bandwidth can lead to an accurate estimate. With few samples, a large bandwidth can help generalize the limited available information. The choice of bandwidth needs to strike a balance between under-fitting and over-fitting the resulting distributions. As there is no choice that works optimally in all cases, heuristics have been developed for estimating the bandwidth. Jones et al. provide an overview of commonly used bandwidth estimators [93]. The heuristics are based on statistical characteristics of the sample set, such as the sample variance and sample count. The estimation quality of KDE can be further improved by varying the bandwidth for each kernel, for example based on the amount of samples found in a window centered at each observation. Using an adaptive bandwidth is called adaptive (or variable) KDE and can further decrease the estimation error at the cost of additional workload.

KDE is a computationally attractive learning method. The computationally intensive parts (automatic bandwidth selection and the construction of a kernel density estimate) can be done offline, after which the lookup can be performed during the negotiation.

4.4 Artificial neural networks

Artificial neural networks are networks of simple computational units that together can solve complex problems. Below we provide a short introduction to artificial neural networks (ANN’s) based on Kröse et al. [110]. Our overview is necessarily incomplete due to broadness of the field; therefore, for a more complete overview we refer to Haykin [78] and for a survey of the applications of neural networks in automated negotiation to Papaioannou et al. [152].

An ANN is a computational model with the ability to learn the relationship between the input and output by minimizing the error between the output signal of the ANN and the expected output. Since all that is required is a mechanism to calculate the error, ANN’s can be applied when the relation between input and output is unknown. ANN’s have been used for several purposes, including classification, remembering, and structuring of data.

Fig. 6
figure 6

Example of an ANN symbolizing the logical XOR. The input neurons expect a value of 0 or 1. The edges show the weights. The activation functions are threshold functions, which values are depicted inside the nodes. The combination rule is the sum of the inputs. If the combined input is larger than or equal to the threshold the combined input is propagated. Otherwise, the value 0 is propagated on the output line

A neural network consists of computational units called neurons, which are connected by weighted edges. Figure 6 visualizes a simple neural network consisting of six neurons. A single neuron can have several incoming and outgoing edges. When a neuron has received all inputs, it combines them according to a combination rule, for example the sum of the inputs. Next, it tests whether it is triggered by this input by using an activation function; e.g., whether a threshold has been exceeded or not. If the neuron is triggered, it propagates the combined signal over the output lines, else it sends a predefined signal.

The set of neurons function in an environment that provides the input signals and processes the output signals of the ANN. The environment calculates the error of the output, which the neural network uses to better learn the relation between the input and output by adjusting the weights on the edges between the neurons.

Neurons can be ordered in successive layers based on their depth. In Fig. 6 each layer has a unique color. The first layer is called the input layer, the last one is the output layer. Both the input and output neuron generally have no activation function. The layers in between are called hidden layers as they are not directly connected to the environment.

To illustrate how a simple neural network works, assume that the input \(x = 0\) and \(y = 1\) are fed to the network in Fig. 6. In that case, the leftmost light gray neuron receives input 0, which results in the output 0, as the neuron is not triggered. The rightmost light gray neuron however, is triggered since it receives input 1 and therefore propagates the output 1. The middle light gray neuron receives the input 0 and 1, which are integrated using the combination rule. The combined signal is insufficient to trigger the neuron, resulting in a 0 as output. Since the rightmost light gray neuron is the only neuron that produced a non-zero output, the final output is 1.

The amount of neurons and their topology determines the complexity of the input-output relationship that the ANN can learn. Overall, the more neurons and layers, the more flexible the ANN. However, the more complex the ANN, the more complex the learning algorithm and consequentially the higher the computational cost of learning.

An ANN is typically used when there is a large amount of sample data available, and when it is difficult to capture the relationship between input and output in a functional description; e.g., when negotiating against humans.

5 Learning about the opponent

A bilateral negotiation may be viewed as a two-player game of incomplete information where both players aim to achieve the best outcome for themselves. In general, an opponent model is an abstracted description of a player (and/or its behavior) during the game [193]. In negotiation, opponent modeling often revolves around three questions:

  • Preference estimation What does the opponent want?

  • Strategy prediction What will the opponent do, and when?

  • Opponent classification What type of player is the opponent, and how should we act accordingly?

These questions are often highly related. For example, some form of preference estimation is needed in order to understand how the opponent has acted according to its own utility. Then, by adequately interpret the opponent’s actions, we may deduce its strategy, which in turn can help predict what the agent will do in the future.

Constructing an opponent model may alternatively be viewed as a classification problem where the type of the opponent needs to be determined from a range of possibilities [179]; one example being the work by Lin et al. [124]. Here the type of an opponent refers to all opponent attributes that may be modeled to gain an advantage in the game. Taking this perspective is particularly useful when a limited number of opponent types are known in advance, which at the same time is its main limitation.

Note that our definition excludes work in which a pool of agents are tuned or evolved to optimize their performance when playing against each other, without having an explicit opponent modeling component themselves. For readers interested in this type of approach we refer to Liang and Yuan [120], Oliver [145], Sánchez-Anguix et al. [175], and Tu et al. [191].

Opponent modeling can be performed online or offline, depending on the availability of historical data. Offline models are created before the negotiation starts, using previously obtained data from earlier negotiations. Online models are constructed from knowledge that is collected during a single negotiation session. A major challenge in online opponent modeling is that the model needs to be constructed from a limited amount of negotiation exchanges, and a real-time deadline may pose the additional challenge of having to construct the model as fast as possible.

Opponent modeling can be performed at many different levels of granularity. The most elementary of preference models may only yield a set of offers likely to be accepted by the opponent, for instance by modeling the reservation value. A more detailed preference model is able to estimate the acceptance probability for every outcome (e.g. using a probabilistic representation of the reservation value). An even richer model can involve the opponent’s preference order, allowing us to rank the outcomes. We can achieve the richest preference representations with a cardinal model of preferences, yielding an estimate of the opponent’s full preference profile. The preferred form of granularity depends not only on the complexity of the negotiation scenario, but also on the level of information required by the agent. For instance, if the agent is required to locate Pareto optimal outcomes in a multi-issue domain, it will require at least an ordinal preference model.

Note that in most cases, comparing different approaches is impossible due to the variety of quality measures, evaluation techniques and testbeds in use; we will have more to say on how to evaluate the different approaches in Sect. 6.

Even though there are large differences between the models, a common set of high level motivations behind their construction can be identified. We found the following motivations for why opponent models have been used in automated negotiation:

  1. 1.

    Minimize negotiation cost [79, 11, 50, 72, 73, 90, 103, 113, 137, 143, 146, 149, 151, 153, 155, 160, 165, 166, 183, 184, 188, 205207] In general, it costs time and resources to negotiate. As a consequence, (early) agreements are often preferred over not reaching an agreement. As such, an opponent model of the opponent’s strategy or preference profile aids towards minimizing negotiation costs, by determining the bids that are likely to be accepted by the opponent. An agent may even decide that the estimated negotiation costs are too high to warrant a potential agreement, and prematurely end the negotiation.

  2. 2.

    Adapt to the opponent [1, 15, 28, 29, 4446, 57, 72, 73, 75, 81, 82, 85, 92, 133, 139, 140, 150, 155, 162, 173, 197, 199, 204] With the assistance of an opponent model, an agent can adapt to the opponent in multiple ways. One way is to estimate the opponent’s reservation value in an attempt to deduce the best possible outcome that the opponent will settle for. Another method is to use an estimate of the opponent’s deadline to elicit concessions from the opponent by stalling the negotiation, provided, of course, that the agent itself has a later deadline. Finally, an opponent model can be used to estimate the opponent’s concessions to accurately reciprocate them.

  3. 3.

    Reach win-win agreements [11, 12, 15, 21, 30, 50, 51, 81, 83, 90, 94, 103, 113, 123, 124, 137, 143, 146, 149, 155, 160, 165, 166, 174, 183, 184, 188, 194, 195, 198, 205207] In a cooperative environment, agen ts aim for a fair result, for example because there might be opportunity for future negotiations. Cooperation, however, does not necessarily imply that the parties share explicit information about their preferences or strategy, as agents may still strive for a result that is beneficial for themselves and acceptable for their opponent. An agent can estimate the opponent’s preference profile to maximize joint utility.

We found that existing work on opponent models can fulfill any of the goals above by learning a combination of four opponent attributes, which we have listed in Table 2. The remainder of this section discusses, for each attribute, the applicable opponent modeling techniques, following the order of Table 2.

Table 2 All learning techniques and methods that help to learn four different opponent attributes

5.1 Learning the acceptance strategy

All negotiation agent implementations need to deal with the question of when to accept. The decision is made by the acceptance strategy of a negotiating agent, which is a boolean function indicating whether the agent should accept the opponent’s offer. Upon acceptance of an offer, the negotiation ends in agreement, otherwise it continues. More complex acceptance strategies may be probabilistic or include the possibility of breaking off the negotiation without an agreement—if that is supported by the protocol.

A common default is for the agent to accept a proposal when the value of the offered contract is higher than the offer it is ready to send out at that moment in time. The bidding strategy then effectively dictates the acceptance strategy, making this a significant case in which it suffices to learn the opponent’s bidding strategy (see Sect. 5.4). Examples include the time-dependent negotiation strategies defined in [167] (e.g. the Boulware and Conceder tactics). The same principle is used in the equilibrium strategies of [61] and the Trade-off agent [60]. Other agents use much more sophisticated methods to accept; for example, acceptance strategies based on extrapolation of all received offers [97], dynamic time-based acceptance [2, 51], and optimal stopping [13, 104, 116, 176, 201].

Learning an opponent’s acceptance strategy is potentially of great value as it can help to find the best possible deal for an agent which at the same time is satisfactory for the opponent. Two general approaches have been used to estimate the acceptance strategy, depending on the negotiation domain:

  • Estimating the reservation value (Sect. 5.1.1) In a negotiation about a single quantitative issue, where the opponent’s have opposing preferences that are publicly known—such as the price of a service—, knowledge of the opponent’s reservation value is sufficient to determine all acceptable bids. An opponent model can learn the opponent’s reservation value by extrapolating the opponent’s concessions.

  • Estimating the acceptance strategy (Sect. 5.1.2) An alternative approach applied to multi-issue negotiations is to estimate the probability that a particular offer is accepted, based on the similarity with bids that the opponent previously offered and/or accepted.

5.1.1 Learning the acceptance strategy by estimating the reservation value

Current methods for estimating the reservation value stem from the idea that an agent will cease to concede near its reservation value, and that this behavior occurs when the negotiation deadline approaches. These methods make assumptions about the availability of domain knowledge, or assume that the opponent uses a particular strategy.

The oldest and most popular approach is by Zeng and Sycara [205, 206], who propose a Bayesian learning method to estimate the reservation value, using data from previous negotiations. One single quantitative issue is negotiated, for which it is assumed the agents have opposing preferences. Before the negotiation, a set of hypotheses \(\mathcal {H}=\{H_1,\ldots ,H_n\}\) about the opponent’s reservation value is generated. Each hypothesis \(H_i\) is of the form \(\mathrm {rv} = v_i\), where \(v_i\) is one of the possible values for the opponent’s reservation value \(\mathrm {rv}\). The hypotheses, the values \(v_i\), and their a priori likelihood, are all determined based on domain knowledge derived from previous negotiations. By applying Bayesian learning during the negotiation, the probabilities of the hypotheses are updated based on observed behavior and the available domain knowledge. Intuitively, the idea is that an offer at the beginning of the negotiation is likely to be far from the reservation value. The reservation value is estimated by using the weighted sum of the hypotheses according to their likelihood. This method is widely applied; for example in work by Ren and Anumba [162] and Zhang et al. [207].

Closely related to the work by Zeng and Sycara is the work by Sim et al., who apply the same procedure when the opponent is constrained to use a particular time-dependent tactic, but with a private deadline [72, 92, 183, 184]. The opponent’s decision function is assumed to be of a particular form in which the reservation value and deadline are related, in the sense that one can be derived from the other.

A different approach is taken by Hou [85], who presents a method to estimate the opponent’s tactic in a negotiation about a single quantitative issue with private deadlines. It is assumed that the opponent employs a tactic dependent on either time, behavior, or resources. Non-linear regression is used to estimate which of the three types of strategies is used, and to estimate the values of the parameters associated with the tactic [85], including the reservation value (cf. Sect. 5.4.1). A similar approach is followed by Agrawal and Chari, who model the opponent’s decision function as an exponential function [1]. When the deadline is public knowledge, Haberland’s method [73] can be used to estimate the opponent’s reservation value assuming the opponent uses a time-dependent tactic.

To improve reliability of the estimates, Yu et al. [204] combine non-linear regression with Bayesian learning to estimate the opponent’s reservation value, as well as the deadline. In their model, the opponent is assumed to use a time-dependent tactic with unknown parameters. Each round the parameters are estimated using non-linear regression. Next, that round’s estimate is used to create a more reliable set of hypotheses about the opponent’s reservation value and deadline by using Bayesian learning.

All these methods estimate the opponent’s reservation value in a single-issue negotiation using Bayesian learning (which is more computationally involved), or non-linear regression (which is faster, but requires knowledge about the structure of the opponent’s strategy). To our knowledge, artificial neural networks and kernel density estimation have not been used for this purpose. Furthermore, all these methods assume that given the reservation value all acceptable bids are known due to the known ordering of the possible values. An interesting open problem is how to apply these techniques to situations where such a ordering is not straightforward.

5.1.2 Learning the acceptance strategy by estimating the acceptance probability

The acceptance strategy can be learned by keeping track of what offers were accepted in previous negotiations and by recording the offers the opponent sends out. From this information, an agent can estimate the probability that a bid will be accepted in a particular negotiation state. As it is unlikely that such an estimate can be derived for all possible bids, regression methods can be applied to determine the acceptance probability for the entire outcome space.

It is easiest to apply this method in repeated single-issue negotiations, as Saha and Sen do in [173]. In this scenario, a seller may only propose a price once, which a buyer then accepts or rejects. An increasingly better estimate of the buyer’s acceptance strategy allows the seller to maximize its profit over time. To derive the set of samples, the buyer first samples the outcome space to find bids that are either always rejected, or always accepted. After that, a number of in-between values are sampled, until the acceptance probability of a sufficient amount of bids has been determined. In order to estimate the acceptance probability of all possible offers, polynomial interpolation is applied, using Chebyshev polynomials [131]. Given the probability distribution of acceptance of each offer, the seller can determine the optimal price to maximize profit.

Interpolation of the acceptance likelihood does not directly carry over to a multi-issue negotiation setting, because the multi-issue preference space lacks the structure of the single-issue case with opposite preferences. The key approach in overcoming this challenge is by Oshrat et al. [149] and relies on a database of negotiations against a set of human negotiators with known preference profiles. During the negotiation, it is assumed that the opponent’s preference profile is known, or that the Bayes classifier introduced in [123, 124] can be applied to reliably learn the opponent’s profile. The database traces then determine what bids have been proposed or accepted by the opponent, which are pooled together under the assumption that if an agent makes an offer, it is also willing to accept it. The authors then use kernel density estimation to estimate the acceptance probability for all the other bids.

Lau et al. apply a similar method based on Bayesian learning with the addition that the effect of time pressure and possible changes of the opponent’s negotiation strategy are taken into account [113]. The underlying idea is that a bid which is unacceptable for an opponent at the beginning of the negotiation might be acceptable at the end. The effect of time pressure is modeled by giving recent bids in a negotiation a higher weight. In addition, more recent negotiation traces receive a higher weight to account for possible opponent strategy changes.

Finally, Fang et al. [57] present a lesser known technique for multi-issue negotiation, which assumes that every presented bid is also acceptable for the opponent. The set of acceptable offers from earlier negotiations are used to train a simple neural network that can then test whether any particular bid is acceptable or not.

5.2 Learning the deadline

The deadline of a negotiation refers to the time before which an agreement must be reached to achieve an outcome better than the best alternative to a negotiated agreement [158]. Each agent can have its own private deadline, or the deadline can be shared among the agents. The deadline may be specified as a maximum number of rounds [187], or alternatively as a real-time target. Note that when the negotiation happens in real time, the time required to reach an agreement depends on the deliberation time of the agents (i.e., the amount of computation required to evaluate an offer and produce a counter offer).

When the opponent’s deadline is unknown, it is of great value to learn more about it, as an agent is likely to concede strongly near the deadline to avoid non-agreement [63]. Because of this strong connection between the two, most of the procedures discussed in Sect. 5.1.1 can also be used to estimate the opponent’s deadline. Hou [85] for example, estimates the deadline following the same procedure for estimating the reservation value. Yu et al. [204] apply a similar method with the additional constraint that the opponent uses a time-dependent tactic. Finally, Sim et al. directly calculate an estimate for the deadline from the estimated reservation value [72, 92, 184].

As is the case for the reservation value, these methods assume a single-issue negotiation and make strong assumptions about the opponent’s strategy type. How to weaken these assumptions and estimate the deadline in multi-issue negotiations is still an open research topic.

5.3 Learning the preference profile

The preference profile of an agent represents the private valuation of the outcomes. To avoid exploitation, agents tend to keep their preference information private [50, 206]; however, when agents have limited knowledge of the other’s preferences, they may fail to reach a Pareto optimal outcome as they cannot take the opponent’s desires into account [83].

In order to improve the efficiency of the negotiation and the quality of the outcome, agents can construct a model of the opponent’s preferences [50, 83, 206]. Over time, a large number of such opponent models have been introduced, based on different learning techniques and underlying assumptions [12]. Learning the opponent’s preference profile can be of great value, as it provides enough information to allow an agent to propose outcomes that are Pareto optimal and thereby increase the chance of acceptance [60, 81].

Four approaches have been used to estimate the opponent’s preference information. The first approach, which is discussed in Sect. 5.3.1, assumes that the opponent uses a linear additive utility function. Many of the other three approaches are applicable in negotiation settings with non-linear utility functions as well.

  • Estimation of issue preference order (Sect. 5.3.1). The agent can estimate the importance of the issues by assuming that, as the opponent concedes, the issues it values the least are conceded first.

  • Classifying the opponent’s negotiation trace (Sect. 5.3.2). During the negotiation, the agent classifies the opponent’s negotiation trace, using a finite set of groups of which the preferences are known.

  • Data mining aggregate preferences (Sect. 5.3.3). The agent is assumed to have available a large database containing aggregate customer data. The problem of opponent modeling then essentially reduces to a data mining problem.

  • Applying logical reasoning and heuristics to derive outcome order (Sect. 5.3.4). The agent deduces preference relations of the opponent from the opponent’s negotiation trace, using common sense reasoning and heuristics.

5.3.1 Learning the preference profile by estimating the issue preference order

The issue preference order of an agent is the way the agent ranks the negotiated issues according to its preferences; that is, it is an ordinal preference model over the set of issues, rather than the full set of outcomes. Learning the opponent ranking of issues can be already sufficient to improve the utility of an agreement [60]. If the opponent is assumed to use a linear additive utility function (as defined in Sect. 3.3), then this reduces the problem of learning the continuous preference of all possible outcomes to learning the ranks of n issues, effectively limiting the size of the search space to n! discrete possibilities. Needless to say, such an assumption might not be realistic depending on the definition of the issues and the complexity of the negotiation scenario. However, especially in situations where the number of interactions between the negotiating parties is limited, an agent can fall back on learning the opponent’s issue preference order.

The learning techniques discussed in this section estimate the importance of the issues by analyzing the opponent’s concessions, assuming that an opponent concedes more strongly on issues that are valued less. They all follow the same pattern: initially, each issue is assigned an initial weight. Next, each round the difference in value for each issue between the current and previous bid is mapped to an issue weight by applying a similarity measure. Finally, the estimated weights are used to update an incremental estimate.

There are multiple approaches to map the similarity between two bids to a set of issue weights. The most established one is given by Coehoorn and Jennings, who propose to use kernel density estimation [50] (a similar approach is discussed by Farag et al. [58]). The main assumption here is that the opponent uses a time-dependent, concession-based bidding strategy. Based on this assumption, the distance between sequential offers is mapped to a kernel, which represents the estimated issue weight, and the probability of the weight given the distance. The mappings are derived from previous negotiations and stored in a database. Each round, a new estimation of the issue weights is calculated by combining the current estimate with all previous estimates, using kernel density estimation (cf. Sect. 4.3). This results in a probability distribution for each issue weight.

The second most popular approach is by Jonker et al. [21, 94], who first classify the issues as either predictable or unpredictable [82], and then estimate the weights of the predictable issues by using domain-dependent heuristics. When the opponent’s offer is received, each issue value is first converted to an evaluation scale based on a predefined mapping per issue. For example, the value “5000” for the issue price and value “red” for the issue color are both mapped to the evaluation “good”. Next, the evaluation distance between two sequential bids is calculated for each issue, and mapped to weights based on a predefined mapping. As a final step, all issue weights are normalized to obtain an estimate of the weights for every predictable issue. Carbonneau and Vahidov [37] propose a similar method to estimate on which issue the opponent will likely concede. Instead of using a common evaluation scale, the difference in issue value between two sequential offers is normalized by dividing by the range of the issue.

Niemann and Lang [143] introduce an alternative method that relies on Bayesian learning. This approach assumes that the agents publicly announce their preference directions and the acceptable ranges for every issue. They solely negotiate on issues they disagree about, and thus, only the issue weights of the n issues with opposing directions need to be estimated. Note that this is rather strong assumption, effectively assuming all win-win outcomes can be achieved, even before starting the negotiation. Initially, all issue weights have the same likelihood, and the updating process proceeds as follows: given two consecutive offers of an opponent, a normalized concession ratio \(c_i\) per issue is computed that is supposed to be inversely related to the weight associated with the issue; i.e., the more an issue is conceded on, the less important it is: \(w_i=1-c_i\). Using these estimates, the hypotheses about the weight distribution are updated using Bayesian learning.

As an alternative to these approaches, the opponent’s issue weights can also be estimated using knowledge about the opponent’s bidding strategy [90]. In this approach, the ranges of the negotiated issues are subdivided in a set of fuzzy membership functions, i.e.: probability distributions over a subrange of a variable. For example, a single issue can have four membership functions that are centered around 0.25, 0.50, 0.75, and 1. Next, a set of hypotheses about the issue weights is generated, which are mappings from each issue to a membership function. The set of possible hypotheses is initialized as the Cartesian product of all possible assignments, after which hypotheses with significantly low weights are removed. During the negotiation, the likelihood of the hypotheses are updated by using a predefined estimate of the lower and upper bound of the opponent’s target utility for a particular negotiation round. In every round, the most likely hypothesis is used as the current estimate of the issue weights.

The quality of the models discussed above inherently depends on the quality of the mapping that is derived from domain knowledge, or the reliability of the estimate of the opponent’s bidding strategy. These models are thus not suited for agents that negotiate on beforehand unknown domains against unknown opponents.

5.3.2 Learning the preference profile by classifying the negotiation trace

Learning the opponent’s outcome preferences using classification consists of two steps: identifying the opponent groups and their preferences; and applying a classification algorithm to categorize the opponent. Bayesian learning is the most common way to classify the opponent, by determining which opponent type is most likely given its negotiation actions.

There are two main Bayesian classification methods. The first is given by Lin et al. [123, 124], who propose a method in which the set of possible groups are given. The preference profile of an agent is a mapping from an offer to a Luce number, which is the utility of the offer divided by sum of utilities of all possible offers. Bayesian learning is used to determine which preference profile \(t^i\) is the best match given a finite set \(\{t^1, \ldots ,t^k\}\) of possible preference profiles. The preference profile \(t^i\) with the highest probability in a round is used as the currently estimated profile.

The second important Bayesian approach is by Hindriks and Tykhonov [83], who also enumerate all possible hypotheses, but they do so for the more complex setting of multi-issue negotiation. The exact parameters of the opponent’s utility function are unknown, but it is assumed to be linear additive. The set of all possible preference profiles is generated by combining two sets of hypotheses. The first set corresponds to all possible hypotheses about the ranks of the issues. The second set is used for estimating the evaluation functions for each issue (as in Eq. (1), p. 8). Each issue evaluation function is approximated by a set of basic triangular (i.e., continuously single-peaked) functions that consist of (combinations of) linear functions. Figure 7a shows that even two basic functions can already reasonably approximate a more complex evaluation function. The complete set of hypotheses is the Cartesian product of both sets.

Fig. 7
figure 7

Estimating the opponent’s decision function using Bayesian learning [83]. a Combination of two basic functions. b Hypotheses of the opponent’s decision function

The next step is to update the likelihood of the hypotheses. To do so, the opponent is assumed to concede over time to its reservation value. As it is unknown what kind of bidding strategy the opponent is actually using, a probability distribution over a set of concession strategies is maintained, as illustrated in Fig. 7b. Each time a bid is presented, the estimated utility is calculated for each preference order hypothesis, and compared to the predicted utility by the strategy hypotheses. The difference in utility is used to update the likelihood of the hypotheses about the opponent’s preference profiles and strategy.

Due to poor scalability of the model, this approach only works for small negotiation domains. Therefore, Hindriks and Tykhonov also propose a scalable variant in which the additional assumption is made that the issue weights and evaluation functions can be learned separately. This particular variant is used by multiple negotiation agents [15, 51, 198] in the Automated Negotiating Agent Competition [10, 16, 200, 202].

Building upon the approach by Hindriks and Tykhonov, Rahman et al. [155] learn the opponent’s preference profile using data from previous negotiations. The main difference between the two methods is that the availability of historical data is exploited to estimate the issue weights. This reduces the hypothesis space, further improving the scalability of the method.

Finally, Buffett and Spencer [31, 32] discuss a contrasting method for multi-object negotiation (a special case of multi-issue negotiation about the inclusion of items in a set), in which the possible groups are automatically enumerated. The classification method relies on the assumptions that the interaction effects between objects are minimal and that both agents use a pure concession strategy. Given the set of hypotheses about the possible preference profiles, Bayesian learning is used to determine which one is most likely to be valid. The update mechanism derives individual preference relations and uses these to update the hypotheses; e.g., when the opponent has presented a bid that contains only objects X and Y, and later on presents the offer \(X,\, Z\); then it can be assumed that Y is valued over Z.

5.3.3 Learning the preference profile by data mining aggregate preferences

Learning the opponent’s outcome preference order from aggregate negotiation data is fundamentally different from classification, as it deals with a single group of agents. The agents in this group are assumed to have similar preferences, but may employ different strategies. In this setting, the challenge is to derive the opponent’s preference profile from a large database of negotiation traces from similar—but not identical—opponents.

The work by Robu et al. [165, 166] has had the most impact in this area. They introduce the concept of utility graphs for complex, non-linear negotiation settings, in which the opponent’s evaluation of a bundle is assumed to be the sum of the evaluation of its clusters. For example, if a bundle contains three items \(X,\, Y\), and Z, then some clusters can be indicated to have interdependency between the items (for example \(\{\{X, Y\}, \{X, Z\}\}\)). Each cluster is assumed to have a certain evaluation for the buyer. A positive evaluation of a cluster with more than one item means that that the items augment each other (a left and a right shoe), whereas a negative evaluation means that redundant items are included (two right shoes).

The learning method models the interaction effects between the items using a graph where every node is an item, and an edge between two nodes is drawn when they belong to a common cluster. Each time the opponent presents an offer, the links of the clusters corresponding to the offer are strengthened whereas the others are weakened. Selecting the best counter-offer then reduces to finding the bundle with the utility. The graph is assumed to be initialized using a reasonable approximation of the buyer’s preferences. In [165], Robu and Poutré introduce a method to derive such an approximation using collaborative filtering to estimate the structure of the utility graph from a database of anonymous negotiation data.

Klos et al. [103] study the same setting in an e-commerce scenario in which interaction effects are common. A buyer and seller agent negotiate about the price of a bundle of items, which are described by a boolean vector that specifies which items are included in the offer. The buyer is a selfish agent, whereas the seller tries to take the buyer’s preferences into account by recommending bundles with high social welfare and thereby minimize negotiation cost and maximize consumer satisfaction. The utility of a bundle for the buyer is defined as its valuation minus the cost; the utility for the seller is the price paid by the customer minus the production cost of the items in the bundle. The gains from trade for a bundle b is defined as the sum of the utility of both parties, which is equal to customer’s valuation minus the seller’s valuation. Given this formalization, the goal of the method is to estimate the buyer’s preference profile to find a Pareto improvement upon the currently negotiated bundle.

There are two main methods to learn the opponent’s preferences: a simple aggregation method [103] and a method that estimates the parameters of a conjectured utility function [103, 188]. When a negotiation about a bundle ends—which happens upon acceptance or when an alternative bundle is offered—the price the customer is willing to pay for the bundle is known. The aggregation method determines the relative valuation of a bundle by comparing the price paid for similar bundles. The problem with this approach is that there are \(2^n\) possible bundles given n items. Therefore, [103, 188] introduce the assumption that the buyer uses a particular utility function of which the parameters are treated as random variables. In addition, the learning method preprocesses the data to minimize the influence of the negotiation strategy.

Finally, Saha and Sen discuss a similar concept for a multi-issue negotiation, but for the case where the agents can use arguments to decide upon the negotiation context and to reach dynamic agreements [174]. The authors discuss the idea of an opponent model, but leave its implementation for future work. The idea is that all the relevant attributes of a negotiation can be modeled using a Bayesian network. The agent has an initial estimate of the Bayesian network (derived, for example, from a database of previous negotiations), which could be updated and refined during the negotiation based on the opponent’s arguments.

5.3.4 Learning the preference profile by applying logical reasoning and heuristics

Modeling the opponent’s preferences is even more challenging when no previous negotiations against the same or similar opponents have been conducted. In that case, the opponent’s preferences need to be learned from a limited amount of data, within a limited amount of time. To do so, certain assumptions and heuristics are required to interpret the opponent’s behavior. In this section we discuss several opponent models, starting with the models that make the weakest assumptions about the opponent and its preferences.

The candidate elimination algorithm only assumes that the opponent’s preferences do not change during the negotiation. The algorithm is an inductive learning algorithm that can be adapted to learn the preferred bids of the opponent during a negotiation [8, 9]. In this approach, the opponent’s preferences are represented as a set of acceptable offers. When the opponent sends out an offer, this is interpreted as a positive training instance. When a counter-offer is rejected by the opponent, this counts as a negative example, and general hypotheses are specialized not to cover this example anymore. As a concrete example, consider the following situation: an agent negotiates with the opponent over three different issues at the same time, and receives the offer \((x_1, x_2, x_3)\). Suppose the agent responds by making the counter-offer \((x_1, x_2, x_3')\), proposing a different value for the third issue. If this offer is rejected, it reveals a lot of information on the opponent’s preferences. Before this exchange of offers, the agent could do no better than to have the general hypothesis that any offer is acceptable for the opponent. However, the rejection of its last offer counts as a negative example, and the agent can conclude that \(x_3\) is an important value for the opponent, and can specialize the general hypothesis to exclude \(x_3'\). Note that while the opponent model makes few assumptions, it is likely that only a part of the relationships between the outcomes is found.

Related to this concept, Restificar and Haddawy [163] estimate the opponent’s preference profile in a negotiation over a single issue in which the agents are assumed to have conflicting preference directions. This work assumes that the negotiators use a particular type of bidding strategy that is based on the concept of an offer/counter-offer gamble. Such a gamble is a decision whether to accept the opponent’s offer, or to make a counter-offer and thereby risk that the negotiation will (ultimately) result in non-agreement. By making this assumption, the agent can interpret the opponent’s moves to derive preference relations, for example how much the seller prefers making a counter-offer over accepting a particular bid. Similar to the approach discussed above, the authors assume that it is impossible to learn all relations and therefore, an artificial neural network is trained using the derived relations. The agent can use the network to estimate whether the opponent will accept an offer, or will take the risk of proposing an alternative offer that will potentially result in disagreement.

An important method with many possible applications is introduced in [60], where Faratin et al. propose to measure the similarity between the opponent’s most recent bid and a set of bids under consideration. The idea is that the bid most similar to the opponent’s previous bid has the highest chance to be accepted. The method is not descriptive in the sense that it does not define a model of the opponent’s preferences as such; nevertheless, we briefly discuss the work as it borders on the scope of this survey. Faratin et al. show that applying this approach in a negotiation agent can result in mutually beneficial outcomes with a relatively higher gain. The heuristic is implemented in [111], and Lau et al. use a similar approach combined with a genetic algorithm [112, 114].

Buffett et al. combine concepts of the approaches discussed above in [30], assuming that the opponent uses a similarity maximizing strategy. The challenge in this setting lies in the fact that the opponent is not guaranteed to concede in each turn. It is assumed that the opponent’s similarity function is public or at least close to the agent’s function. The applied heuristic is as follows: if the opponent presents a bid with a similarity higher than all previous bids, then it is certain that the opponent utility of this bid is lower than all previous bids, or else it would have been offered earlier. For the remaining relations, a probabilistic approach is introduced that estimates the round the other offers could likely have been considered for the first time. Combined, this results in a set of estimated preference relations between the presented offers. Note that the approach only learns the preference relations between the bids presented by the opponent and will therefore generally not result in a complete estimate of the opponent’s preference profile.

The approaches discussed above focus on identifying a subset of the preference relations between all possible bids. If an agent needs to estimate the entire preference profile, stronger assumptions are required. A popular example is the frequency analysis heuristic [75, 194, 195], which is a relatively simple technique to estimate the opponent’s preference profile in a multi-issue negotiation by keeping track of how often values occur in the opponent’s offers. Together with Bayesian learning techniques (Sect. 5.3.2), frequency analysis is one of the most popular preference profiling technique used by the participants of the Automated Negotiating Agent Competition [10, 16, 200, 202]. It is an attractive technique especially in large outcome spaces, in which scalable learning methods are required. The main idea is that preferred values of an issue are offered relatively more often in a negotiation trace. For the issue weights, it works the other way around: if an issue changes value often, it is probably relatively unimportant to the opponent. In [194, 195], both the set of issue weights and value weights are estimated, while Hao and Leung [7577] ignore the issue weights completely.

5.4 Learning the bidding strategy

A negotiating agent employs a negotiation strategy to determine its offer in a given negotiation state. The mapping function may range from a simple time-dependent function to a complex function that dynamically depends on the opponent’s behavior.

Research on agent negotiators has given rise to a broad variety of bidding strategies that have been established both in the literature and in implementations [47, 59, 60, 86, 95, 125]. Examples of such general agent negotiators in the literature include, among others: Zeng and Sycara [206], who introduce a generic agent called Bazaar; Faratin et al. [60], who propose an agent that is able to make trade-offs in negotiations and is motivated by maximizing the joint utility of the outcome; Karp et al. [96], who take a game-theoretic view and propose a negotiation strategy based on game-trees; Jonker et al. [95], who propose a a concession oriented strategy called ABMP; and Lin et al. [124], who propose an agent negotiator called QOAgent.

Learning the opponent’s bidding strategy is clearly advantageous to a negotiating agent, as this would—in theory—allow an agent to exploit the bidding behavior of the opponent to reach the best possible deal. Learning the bidding strategy, however, is very challenging as there is a wide diversity of possible negotiation strategies. And worse: the opponent may change its behavior according to the offers that an agent makes [14]. That is, learning the opponent’s strategy is a moving target problem, where the agent simultaneously attempts to acquire new knowledge about the opponent while optimizing its decisions based on what is currently known.

Two approaches have been followed to estimate the opponent’s bidding strategy:

  • Regression analysis (Sect. 5.4.1) If an outline of the opponent’s strategy is known in the form of a formula with unknown parameters, then the problem of estimating the opponent’s bidding strategy reduces to regression analysis.

  • Time series forecasting (Sect. 5.4.2) On the other hand, if the opponent’s strategy is unknown, time series forecasting can be applied to predict the opponent’s future offers.

Both methods have been used rather extensively, and we will cover each of them in the sections below.

5.4.1 Learning the bidding strategy using regression analysis

An agent is generally unaware of the opponent’s exact negotiation strategy, but might have knowledge about the type of strategy used. If such knowledge is available and can be captured in a formula with unknown parameters, the opponent’s strategy can be estimated by applying regression analysis.

There are two main approaches to this problem. The first is given by Mudgal and Vassileva [139], who employ probabilistic influence diagrams to predict the opponent’s counteraction to an offer in a single-issue negotiation. Bayesian learning is used to update a probabilistic influence diagram of the opponent, which yields the probability distribution for the next opponent’s action only, so this can be viewed as a one-step regression method.

Another key approach is by Hou [85], who introduces a method to estimate the opponent’s strategy in a single-issue negotiation assuming that a standard tactic dependent on time, behavior, or resources is used (as discussed by Faratin et al. [59]). Hou derives a model for the time-dependent and resource-dependent strategies and use a non-linear regression to estimate their parameters. The opponent is estimated to use the best matching model, except when the error is higher than a threshold, in which case the opponent is assumed to use a behavior-dependent tactic. Following a similar approach, Agrawal and Chari estimate the opponent’s decision function as an exponential function [1]; and Haberland et al. [73], Ren and Zhang [161], and Yu et al. [204] present methods to estimate the decision function when the opponent employs a time-dependent tactic.

Brzostowski et al. [28] introduce a more general method than Hou to predict the opponent’s bidding strategy, by applying non-linear regression to estimate the parameters of four complex models that mix time- and behavior-based components. The utility gained by using the model is significantly higher than their earlier method, which uses derivatives to estimate the opponent’s strategy [29] (we will discuss this further in Sect. 5.4.2).

With more focus on application, Papaioannou et al. compare the performance of multiple estimators in predicting the opponent’s bidding strategy [151, 153]. The estimators are used to predict the opponent’s decision function, which is then used to determine which offer should be proposed in the final round to avoid non-agreement. The setting is a single-issue bilateral negotiation where a client and provider exchange offers in turn. Three parameter estimation methods are evaluated: one is based on polynomial interpolation using cubic splines; another uses 7th degree polynomial interpolation; the third is a genetic algorithm that evolves the parameters of a polynomial function. All methods significantly improve the acceptance ratio of the negotiation.

5.4.2 Learning the bidding strategy using time series forecasting

When little is known about the general structure of the opponent’s bidding strategy, time series forecasting is a viable alternative to the regression-based methods described above. A time series is simply a set of observations that is sequentially ordered in time. In the context of negotiation, the time series typically consists of the utilities of offers received from the opponent, but causally related series can also be used (e.g., perceived cooperativeness of the opponent over time). Learning the opponent’s bidding strategy then boils down to creating a forecast of the time series, using a set of statistical techniques and smoothing methods.

We identified four ways to do so: using neural networks, derivatives, signal processing methods, and Markov chains.

Artificial neural networks The most frequently used method to predict the opponent’s offers is to represent the opponent’s decision function by an artificial neural network. The network is first trained using a large database of previous negotiation exchanges and is then used to predict the next bid. Neural networks are very powerful and can be used to approximate complex functions; however, studying their structure will not in general give any additional insights in the function being approximated.

Oprea [146] was one of the first to demonstrate the potential of using neural networks to predict the opponent’s future offers in bilateral negotiations. The approach focuses on single-issue negotiations and only takes one of the negotiation sides into account. The input neurons are the values for the last five opponent’s bids, which means that the agent’s own offers are assumed to have no influence on the opponent’s behavior.

To predict the offers of a human negotiator, Carbonneau et al. [35] use an artificial neural network in a specific domain consisting of four issues. They extend their approach in [36], in which all possible pairs of issues are allowed to serve as input to the neural network. While this significantly complicates the structure of the neural network, this allows to find patterns between issues. This also facilitates training an opponent model on a particular scenario and makes it easy to apply it to scenarios where one of the issues is removed from the negotiation domain. A similar approach is discussed by Lee and Ou-Yang [115], who are even able to predict the value for each issue; for this, four output neurons are created, each returning the value for one of the four issues. Despite that the model was trained using data derived from other opponents, the authors find a positive correlation between the actual and predicted values.

For the general multi-issue case, multilayer perceptrons (MLPs) can be used, as is done by Masvoula et al. [133]. MLPs are artificial neural networks where some nodes have a nonlinear activation function. Masvoula et al. test two networks in an experimental setting: a network in which each issue is approximated by a separate MLP, and a network where a single MLP is used for all issues. The amount of input neurons and hidden layer neurons are empirically determined for both networks, after which they are shown to reliably predict the opponent’s next offer, with the single MLP network resulting in the lowest mean error. In more recent work [132], Masvoula investigates the performance of two artificial neural networks that learn the opponent’s strategy without relying on historical knowledge. The first model is a simple MLP that is retrained every round, using the complete negotiation trace of the opponent. The second one is more advanced (and outperforms the first one), as the structure of the neural network is optimized in every round, using a genetic algorithm that rates neural networks based on their complexity and prediction error.

The methods above do not explicitly constrain the opponent’s strategy. If it is known that the opponent employs a time-dependent tactic, work by Rau et al. and Papaioannou et al. [151, 153, 160] can be used. Owing to the reduced search space of time-dependent tactics, Rau et al. [160] find that the concession tactic and weight of every issue offered by the opponent can be learned from this process in an exact manner. Papaioannou compares the performance of five estimators for the opponent’s bidding strategy, of which we have already discussed three regression-based estimators in Sect. 5.4.1. Of the remaining two estimators, one is based on a multi-layer perceptron neural network, and the other uses a radial basis function neural network [151, 153]. The latter estimator outperforms all other estimators when it comes to predicting the opponent’s future offers. It achieves the lowest overall error and its application results in the the largest number of successful negotiations.

Derivatives Derivatives of the concession function of the opponent is another way to reveal a lot of information on the type of decision function used and to what extent the opponent responds to different negotiation factors. In a single-issue negotiation, given a sequence of opponent offers \(b_i \in [0, 1]\), the derivatives of order k are defined as follows :

$$\begin{aligned} {\varDelta }^1 b_i= & {} b_{i+1} - b_i, \end{aligned}$$
(6)
$$\begin{aligned} {\varDelta }^{k+1} b_i= & {} {\varDelta }^{k} b_{i+1} - {\varDelta }^{k} b_i. \end{aligned}$$
(7)

Brzostowski et al. [29] argue that there are two main factors contributing to the overall behavior of a negotiating agent: a time-dependent and a behavior-dependent component. They make a prediction for both components and combine the results to estimate the opponent’s behavior. The two predictions are combined based on two observational measures: the time influence metric and the behavior influence metric.

For the calculation of the time influence metric, it is assumed that a time-dependent tactic may be modeled by using a polynomial or an exponential function. If the opponent uses a pure time-dependent tactic, all derivatives should have the same sign:

$$\begin{aligned} \forall _{i, j, k} \, \mathrm {sgn}\big ({\varDelta }^k b_i\big ) = \mathrm {sgn}\big ({\varDelta }^k b_j\big ). \end{aligned}$$
(8)

For each kth order derivative, Brzostowski et al. calculate how strongly the time-dependent assumption holds by comparing the amount of positive and negative signs within an order. Given a kth order derivative, if all the differences are either positive or negative, the time-dependent criterion is fully satisfied, otherwise the time-dependent assumption is violated. The results of all derivatives can be combined to create the time influence metric.

The behavior influence metric measures to what extent the opponent responds to the agent’s actions. For each round i, the agent’s own concession \({\varDelta }^1 a_i\) is compared with the opponent’s concession \({\varDelta }^1 b_i\), resulting in a metric \(r_i\) for each round:

$$\begin{aligned} r_i = \frac{{\varDelta }^1 a_i}{{\varDelta }^1 b_i}. \end{aligned}$$
(9)

The results for each round are then aggregated in a weighted sum called r. If r is equal to one, then the opponent makes concessions of the same size. For \(r < 1\), the opponent is competitive, and if \(r > 1\), the opponent is cooperative. This metric can be combined with a measure for the monotonicity of the sequence \(r_i\) to create the behavior influence metric, analogous to the time-dependent metric.

With this information, two predictions can be made of the future behavior of the opponent and then combined to yield the final forecast. The behavior-dependent prediction uses extrapolation of the behavior influence metric to predict the opponent’s next offer. The second prediction is solely based on time and uses the negotiation history to determine the opponent’s next offer. The prediction is based on the concavity and convexity of the opponent’s concession curve as measured by the differentials. The opponent’s time-dependent behavior can then be approximated by polynomials to make a prediction of the opponent’s future offers. Again, if it is known the opponent uses a time-dependent strategy, more specific methods can be used to approximate the opponents concession curve, such as the derivative-based approach by Mok and Sundarray [137].

Signal processing A third way to forecast the opponent’s offers is to employ techniques used in signal processing. This type of modeling technique has recently attracted attention from a number a negotiation researchers, and three main methods have been developed since 2010.

The first main approach is to use a Gaussian process to predict the opponent’s decision function. During the negotiation, the opponent’s bidding history is recorded as a series of ordered pairs of time and observed utility. Next, a Gaussian process regression technique is used to determine a Gaussian distribution of expected utility for each time step. Williams et al. [196, 197] use this technique to estimate the optimal concession rate in a multi-issue negotiation with time-based discounts. Their approach can handle a wider range of scenarios compared to the derivation-based methods discussed above, because the opponent tactic can be more complicated than a weighted combination of time- and behavior-dependent tactics. To counter noise, only a small number of time-windows are sampled, from which only the maximum utility offered by the opponent is used to make the predictions. The strategy was implemented in the IAMhaggler2011 [199] agent, which finished third in ANAC 2011 [10]. The agent performed much better than the others on large domains, however only performed averagely on small domains.

Chen and Weiss [44, 46] also predict the opponent’s preferences to determine the agent’s optimal concession rate. Similar to Williams et al., the maximum offered utility in a set of time windows is recorded. This time, discrete wavelet transformation is used to decompose the signal in two parts: an approximation and a detail part. The idea is that the first captures the trend of the signal, whereas the latter contains the noise, which is therefore omitted. After an initial smoothing procedure, cubic spline interpolation is used to make a prediction for future time windows. The end result is a smooth function that indicates the maximum utility that can be expected in the future.

An alternative method employed by Chen and Weiss relies on empirical mode decomposition and autoregressive moving averaging [45]. The same procedure is used to sample the opponent’s decision function, but now, empirical mode decomposition is used to decompose the sampled signal into a finite set of components, after which autoregressive moving averaging is used to predict the future behavior of each of these components.

Finally, when the opponent is expected to change its strategy over time without signaling this explicitly to the agent, the work by Ozonat and Singhal [150] can be used to estimate the opponent’s strategy in a multi-issue negotiation. Using switching linear dynamical systems, a technique commonly used in signal processing literature to model dynamical phenomena, the opponent’s decision function is predicted in terms of what utility the agent can expect in the future.

Markov chains The final time series forecasting method relies on Markov chains. The idea is that the set of opponent strategies is known; however, it is undisclosed when the opponent changes its strategy. The set of strategies make up the states of a Markov chain, where the transition matrix represents the probability of going from one state—a strategy—to another. Narayanan and Jennings [140] use this method to model the opponent’s strategy in a single issue negotiation and apply Bayesian learning to estimate the transition matrix, where each hypothesis presents a possible transition matrix. The hypotheses are updated each round using the received opponent’s offers, and then used to derive a counter-strategy.

6 Measuring the quality of an opponent model

In the previous section, we provided an overview of several learning methods for each opponent attribute. A natural question with regard to agent design is: which of the depicted models is best for each attribute? Unfortunately, it is impossible to provide a conclusive answer to this, as most authors evaluated their opponent model in their own setting, and relative to their own baseline. A valuable direction for future work is therefore to compare the quality of these models in a common setting, or in any case, to use the existing models as baselines when designing new learning techniques.

However, even if we fix a common negotiation setting for every model, quantifying the quality of an opponent model is not straightforward, as a large number of different quality measures are being used, each with their own advantages and shortcomings, which impedes a fair comparison of different approaches. In this section, we provide an overview of the different types of measures found in the literature. To do so, we surveyed the most popular quality measures currently in use, and we show how they relate to the main benefits of opponent modeling.

In general, we found that the quality of an opponent model can be measured in two main ways: accuracy measures (Sect. 6.1), which measure how closely the opponent model resembles reality; and performance measures (Sect. 6.2), which measure the performance gain when a negotiation strategy is supplemented by a model. We provide an overview in Tables 3 and 4 of both types of quality measures. Moreover, based on this overview, we recommend in Sect. 6.3 which measures to use to evaluate an opponent model given a specific modeling aim.

6.1 Accuracy measures for opponent models

Accuracy measures are direct measures of opponent model quality, as they quantify the difference between the estimate and the estimated. We found accuracy measures for preference modeling methods (Sect. 5.3) and for strategy prediction methods (Sect. 5.4).

To the best of our knowledge there are no accuracy measures for the acceptance strategy and deadline, although some existing measures can be used for these instances as well. We will come back to this when we discuss the other accuracy measures below. Table 3 provides an overview of metrics found in the surveyed work.

Table 3 Overview of accuracy measures used in the surveyed work

Similarity between issue weights We can measure the accuracy of models that estimate the issue weights of the opponent’s preference profile (assuming linear additive utility functions, see Sect. 5.3.1) in several ways. All of them use a distance metric between the issue weights \(w = (w_1, \ldots , w_n)\) of the real opponent preferences \(u_{op}\) and the issue weights \(w' = (w'_1, \ldots , w'_n)\) of the estimated preferences \(u'_{op}\). One way to do so is to measure the distance between the issue weight vectors [90]:

$$\begin{aligned} d_{\mathrm {Euclidean}}(w, w')= \sqrt{\sum _{i = 1}^n (w_i - w'_i)^2}. \end{aligned}$$
(10)

Of course, this measure can be used for scalars as well. When modeling the opponent’s deadline (or reservation value) \(x \in \mathbb {R}\) with an estimate \(x'\), Eq. (10) simplifies to \(d_{\mathrm {Euclidean}}(x, x')= |x - x'|\).

Another way is to check whether the issue weights are ranked correctly [84] by evaluating all possible pairs of issues \(i_1, \ldots , i_n\):

$$\begin{aligned} d_{\mathrm {rank}}(w, w')=\frac{1}{n^2} \sum _{j=1}^n \sum _{k=1}^n c_{w, w'}(i_k, i_j), \end{aligned}$$
(11)

where \(c_{w, w'}\) is the conflict indicator function, which is equal to one when the ranking of the weights of issues \(i_k\) and \(i_j\) differs between the two profiles, and zero otherwise. This measure is particularly useful when the utility values are of less concern than their relative importance. An alternative is to measure the correlation between the weight vectors [84]:

$$\begin{aligned} d_{\mathrm {Pearson}}(w, w') = \frac{\displaystyle \sum _{i=1}^n (w_i - \overline{w})(w_i' - \overline{w'})}{\sqrt{\displaystyle \sum _{i=1}^n (w_i - \overline{w} )^2 \sum _{i=1}^n (w_i' - \overline{ w'})^2}}. \end{aligned}$$
(12)

Note that this expression may be undefined, for example when all weights are equal.

Similarity between bidding strategies For opponent models that estimate the opponent’s bidding strategy (Sect. 5.4), the accuracy of the model can be determined by comparing two equally sized vectors of length N, where one vector x denotes the actual behavior of the opponent, and the other vector \(x'\) is the predicted behavior; e.g., the utilities of received offers at every time slot, or the values the opponent chooses for a particular issue. One way to measure the similarity between the two vectors is to calculate the mean squared error [115, 137]:

$$\begin{aligned} d_{\mathrm {MSE}}(x, x')= \frac{1}{N} \sum _{i = 1}^{N} (x_i - x_i')^2. \end{aligned}$$
(13)

A possible disadvantage of this metric is that it may overemphasize outliers. In this case, the mean absolute error is a viable alternative [115]:

$$\begin{aligned} d_{\mathrm {MAE}}(x, x')= \frac{1}{N} \sum _{i = 1}^{N} \left| x_i - x_i'\right| . \end{aligned}$$
(14)

Both metrics quantify the error as a positive value, which has the disadvantage that the sign is lost: in some settings a positive error may be worse than a negative error. A metric taking this into account is the percentage of error [115], which is normally calculated separately for every element:

$$\begin{aligned} d_{\mathrm {PCE}}(x_i, x'_i)= \frac{x'_i}{x_i} - 1. \end{aligned}$$
(15)

Again, the exact value of the measure may not matter; e.g., when the negotiation agent tries to predict if the opponent will concede. In this case the correlation between the two vectors can be measured, for example by using Eq. (12) [115, 132, 133].

Similarity between preference profiles When opponent models estimate the opponent’s preferences fully (as described in Sect. 5.3), the quality of these models depends on the similarity between the real opponent’s preference profile \(u_{op}\) and the estimated profile \(u_{op}'\) for all bids \(\omega \) in the outcome space \({\varOmega }\). Suppose the opponent uses a utility function \(u_{op}(\omega )\) to calculate its utility for bid \(\omega \), then we define the opponent model’s estimate of this function as \(u'_{op}(\omega )\).

One approach is to calculate the absolute difference between all outcomes in \({\varOmega }\) [12]:

$$\begin{aligned} d_{\mathrm {abs}}(u_{op},u_{op}')=\frac{1}{|{\varOmega }|} \sum _{\omega \in {\varOmega }} |u_{op}(\omega ) - u'_{op}(\omega )|. \end{aligned}$$
(16)

A shortcoming of this approach is that the result is not invariant to scaling and translation. An alternative is to use the ranking distance of bids [84] measure that compares all preference orderings in a pairwise fashion, which is especially useful when ordinal preferences are involved:

$$\begin{aligned} d_{\mathrm {rank}}(u_{op},u_{op}')=\frac{1}{|{\varOmega }|^2} \sum _{\omega \in {\varOmega }, \omega ' \in {\varOmega }} c_{\prec u, \prec u'} (u_{op}(\omega ), u'_{op}(\omega )), \end{aligned}$$
(17)

where \(c_{\prec u, \prec u'}\) is the conflict indicator function, which is equal to one when the ranking of the outcomes \(\omega \) and \(\omega '\) differs between the two profiles, and zero otherwise. Identically, Buffett et al. count the amount of correctly estimated preference relations [31, 32]. A disadvantage of these approaches is their scalability because all possible outcome pairs need to be compared. This problem can be overcome by using a Monte Carlo simulation; however, a more efficient solution can be to use the Pearson correlation of bids [84], which is defined as follows:

$$\begin{aligned} d_{\mathrm {Pearson}}(u_{op}, u_{op}') = \frac{ \displaystyle \sum _{\omega \in {\varOmega }} (u_{op}(\omega ) - \overline{u_{op}})(u_{op}'(\omega ) - \overline{ u_{op}' }) }{\sqrt{\displaystyle \sum _{\omega \in {\varOmega }} (u_{op}(\omega ) - \overline{u_{op}} )^2 \sum _{\omega \in {\varOmega }} (u_{op}' (\omega ) - \overline{u_{op}'})^2}}. \end{aligned}$$
(18)

In this formula \(\overline{u_{op}}\) is the average utility of all possible outcomes for the opponent, and \(\overline{u_{op}'}\) the estimated average utility. As for the weight vectors, a downside of this measure, although unlikely to occur in practice, is that it is not defined for all inputs, for example when the opponent’s utility is estimated to be constant.

All preference profiles metrics can also be applied to assess the quality of acceptance strategy models. After each negotiation round, the opponent’s acceptance strategy can be asked to provide the acceptance probability for a set of bids. Next, for these bids the actual and predicted acceptance probability can be compared using one of the metrics above.

6.2 Performance measures for negotiation strategies

The ultimate aim of employing an opponent model is to increase overall performance of the negotiation, which is why performance measures are the most popular quality measure. The most popular way is to measure the gain in utility of the outcomes due to the usage of an opponent modeling technique. Other measures that the agent designer might choose are the duration of the negotiation (i.e., how fast the agent is able to reach agreements), or fairness of the outcome (i.e., whether the agreement satisfies all negotiation parties).

Sometimes performance measures can be incorporated directly into the utility function, as is the case for discounted utility through time. However, it is usually advisable to have multiple independent performance measures available, especially when the designer wishes to assess several aspects of the negotiation outcome.

To measure the quality of an opponent model, the model can be applied by a set of agents that compete against various opponents on a number of negotiation domains. Ideally, the opponent model is tested in combination with multiple negotiation strategies to minimize the influence of how the model is applied by the strategy. Note that the measurements strongly depend on the negotiation setting [10], which therefore should be chosen with care: an opponent model may appear to be of low quality when its assumptions are not satisfied. This effect can be minimized by testing the model in a large and balanced set of negotiation settings, as discussed in [11, 12]. Furthermore, as performance measures only consider the quality of the outcome, we recommend to also include the accuracy measures of Sect. 6.1 for benchmarking purposes.

Table 4 provides an overview of the types of performance measures we found in existing work.

Table 4 Overview of performance measures used in the surveyed work

Average utility Average utility is by far the most popular performance measure. A common application is to consider the average utility of an agent with and without an opponent model against a group of opponents on several domains (see for example [45, 75, 143]).

Distance to a fair outcome Similar to the average utility, a large number of authors are concerned with achieving a fair outcome. Fairness is an important aspect, especially when the parties expect to conduct future negotiations. The Nash solution \(\omega _{\mathrm {Nash}}\) in a bilateral negotiation is defined as the outcome that maximizes the product of the utilities:

$$\begin{aligned} \omega _{\mathrm {Nash}} = \max _{\omega \in {\varOmega }} u_{a}(\omega ) \cdot u_{b}(\omega ). \end{aligned}$$
(19)

If \(u_{\max _a}\) and \(u_{\max _b}\) are the highest possible utilities achievable by agent a and b in a negotiation, then the Kalai-Smorodinsky is defined as the Pareto optimal bid \(\omega \) for which:

$$\begin{aligned} \frac{u_{\max _a}}{u_{\max _b}} = \frac{u_{a}(\omega )}{u_{b}(\omega )}. \end{aligned}$$
(20)

Distance to a fair outcome is then calculated as the average distance to a fair solution, such as the Nash solution [11, 81, 82, 137] or Kalai-Smorodinsky [11, 81, 82].

Distance to Pareto frontier An opponent model of the opponent’s preferences aids in identifying Pareto optimal bids. For this type of model—assuming it is applied by a bidding strategy that takes the opponent’s utility into account—the distance to the nearest Pareto optimal bid directly correlates with the model’s quality (see for example [11, 143, 155]). Minimizing this distance to the Pareto-optimal frontier improves fairness and the probability of acceptance.

Joint utility An alternative method to measure the fairness of an outcome is to calculate the joint utility. The majority of the authors simply use the sum of the utility of the final outcome for the agents (see for example [123, 124]). An alternative used by several authors [146, 205207] is to consider the normalized joint utility:

$$\begin{aligned} u_{\mathrm {joint}} = \frac{(P - R_s) (R_b - P)}{(R_b - R_s)^2}. \end{aligned}$$
(21)

In this equation, P is the agreed upon price, and \(R_b\) and \(R_s\) are the reservation prices of the buyer and seller respectively. Note that this definition is only applicable to single-issue negotiations. An alternative measure for multi-issue negotiations used by Jazayeriy et al. [90] is the geometric mean:

$$\begin{aligned} u_{\mathrm {joint}} = \sqrt{u_a \cdot u_b}, \end{aligned}$$
(22)

where \(u_a\) and \(u_b\) are the utilities achieved by the agents. An attractive property of this metric is that when the utilities are highly unbalanced, this formula better reflects unfairness than by simply calculating the sum of the utilities.

Percentage of agreements An opponent model may lead to better bids being offered to the opponent, possibly avoiding non-agreement. In situations where an agreement is always better than no agreement, the percentage of agreements is a direct measure of success (see for example [9, 30, 72, 73]). An important disadvantage is that the acceptance ratio does not capture the quality of the agreement, thus it is advised to also measure the average utility.

Buffett et al. Mudgal and Vassileva, and Agrawal and Chari use a related measure in which they calculate how often one agent outperforms the other with regard to the final outcome [1, 30, 139]. A disadvantage of this method is that an agent might outperform other agents, but still reach a bad outcome. An alternative metric is applied by Robu and Poutré [165, 166], which calculates how often an outcome is reached that maximizes social welfare.

Robustness Many of the performance measures only give a fairly narrow view of the performance of the agents, as they do not consider the interactions between different strategies. For instance, an agent may be exploitable by opponent strategies that were not expected in the design phase, requiring a switch to a different strategy [10]. To make this notion precise, game theory techniques can be combined with evolutionary modeling [10, 43]—referred to as evolutionary game theory (EGT)—to measure the robustness of negotiation strategies. EGT is used to measure the distribution of negotiation strategies evolving over time, assuming that the players may switch their agent’s strategy based on the pay-off matrix to maximize their utility against their opponents. The authors show that some agents that work well in a static setup perform poorly in an open environment in which players can change strategy.

Time of agreement Various authors measure the duration of the negotiation (e.g., [103, 149, 155]), or the communication load, because in practical settings there is often a non-negligible cost associated with both. Opponent models can lead to earlier agreements, and thereby reduce costs. An important disadvantage of this metric is that while an opponent model may lead to an earlier agreement, the quality of the outcome for the agent might be lower.

Trajectory analysis The quality of bidding strategies can be measured by analyzing the percentage and relative frequency of certain types of moves [81]. For example, unfortunate moves are offers that decrease the utility for both agents at the same time. Theoretically, a perfect opponent model of the opponent’s preferences would allow an agent to prevent any such unfortunate moves. A disadvantage of this method is that it highly depends on the concession strategy that is used in combination with the opponent model.

6.3 Benchmarking opponent models

As a starting point towards quantifying the quality of existing opponent models, and thereby the transition from theoretical agents to practical negotiation agents, this section provides guidelines on how to select the appropriate quality measures for a given opponent model. These guidelines are based on the overview provided above, and our previous work on quantifying the quality of opponent models that estimate the opponents preference profile [11, 12]. We argue that a benchmark should consist of three components: a set of accuracy measures, a set of performance measures, and a fair tournament setup.

Both types of measures should be included, as a high accuracy demonstrates the approach is successful in its own right, independent of any other factors, while a good performance shows that the model is correctly applied, and fits into the agent design as a whole.

The construction of a fair negotiation setup is a challenge in its own right and worthy of a survey of its own. The negotiation strategies and negotiation scenarios of an experimental setup should be carefully selected to not favor particular types of modeling approaches. On the other hand, the setup should be representative of what can be expected in practical negotiation. Another challenge is that a balanced tournament requires access to a large set of negotiation strategies and scenarios designed for an identical negotiation setting. We refer to [11, 12], which discuss benchmarks for quantifying the performance and accuracy of a large set of preference models.

Table 5 An overview of performance measures applied in the literature and how they relate to opponent modeling aims

In Sect. 5, we distinguished four types of opponent models: models for the acceptance strategy, deadline, preference profile, and bidding strategy. Below, we provide accuracy measure recommendations for each one of them.

  1. 1.

    Acceptance strategy For models that estimate the reservation value (Sect. 5.1.1) we recommend to use the percentage of error (Eq. (15), p. 34) as it quantifies the signed distance, which for this type of models is especially important; a reservation value which is estimated too low may lead to non-agreement. For the models discussed in (Sect. 5.1.2) that estimate the acceptance probability for every bid in the outcome space, distance metrics such as the ranking distance of bids (Eq. (17), p. 34) or the more scalable Pearson correlation of bids (Eq. (18), p. 34) are most suitable, assuming it is possible to request the opponent’s acceptance probability of every bid.

  2. 2.

    Deadline There is a close relation between the reservation value and the deadline as they are both scalars that become more easy to estimate near the end of a negotiation. For deadlines, it is particularly important to not overestimate the actual deadline, as this can lead to non-agreement, and hence decreased performance. Therefore, similar to the reservation value, we recommend to use the percentage of error (Eq. (15), p. 34).

  3. 3.

    Preference profile We discussed four types of approaches to estimate the preference profile. Some assess only the issue weights, others model a larger part of the preference relations, up to the complete preference profile. For the issue weights (Sect. 5.3.1), we suggest to use the Euclidean distance between the actual and estimate issue weights (Eq. (10), p. 33), as in most cases, it is crucial to get them exactly right, and it is not sufficient if they are merely correlated with the real weights, or in the right order. However, for larger subspaces of the preference profile, we can do with less precision. For the other approaches, we propose to use the ranked distance of bids (Eq. (17), p. 34) or the Pearson correlation of bids (Eq. (18), p. 34), especially when computational performance is an issue.

  4. 4.

    Bidding strategy All bidding strategy models produce an estimate of the bids that will be offered, or of the utility of these bids. The distance between the actual and predicted utility is best expressed as a single value using the mean squared error [115, 137] (Eq. (13), p. 33). In contrast to the mean absolute error (Eq. (14), p. 34), this metric puts more emphasize on outliers, which we believe is important as this type of model is often used to exploit the opponent. We do not recommend to use the percentage of error (Eq. (15), p. 34) as it is not straightforward how to aggregate the data, and does not allow for easy comparison of models.

While the selection of an accuracy measure solely depends on the specific type of the opponent model, the right choice for a performance measure depends on the aim of the agent designer. Therefore, Table 5 gives an overview of how existing work has assessed the performance of opponent models, together with the associated aim(s) with which the models have been applied. Depending on the aim, we recommend the following measures:

  1. 1.

    Minimize negotiation cost The negotiation cost is dependent on the time passed before reaching an agreement and the percentage of successful negotiations. The straightforward measure to use for this aim is the time of agreement (measured in terms of rounds or real time passed) when the utility is discounted or when negotiation rounds incur a cost. When there are multiple negotiations, the percentage of agreements needs to be taken into account. To augment these measures we recommend to also measure the average utility to ensure that negotiation performance is not sacrificed to minimize costs.

  2. 2.

    Adapt to the opponent By adapting, an agent firstly aims to optimize its average utility. In addition to this metric, we recommend to use trajectory analysis to get a better insight into the application of the opponent model by the agent, and how it is influenced by the opponent. Assuming a multitude of possible opponent strategies, the robustness of the strategy could be evaluated using evolutionary game theory to validate that conversely, opponent adaptation does not lead to exploitation of the agent.

  3. 3.

    Reach win-win agreements The quality of a win-win agreement can be expressed in many ways, from its distance to the Pareto frontier to its joint utility. However, many of the important characteristics of a win-win agreement are already captured by measuring the distance to a fair outcome; therefore, we propose this measure to quantify win-win solutions.

7 Conclusion

The research field of opponent modeling in negotiation is constantly evolving, driven by more than twenty years of interest in automated negotiation and negotiation agent design in particular. In this work, we survey opponent modeling techniques used in bilateral negotiations, and we discuss all possible ways in which opponent models are employed to benefit negotiation agents. There are essentially two main opponent model categories: models that learn what the opponent wants, in terms of its reservation value, deadline, and preference order, and secondly, models that learn what the opponent will do, in terms of the opponent’s bidding strategy and acceptance strategy. In this comprehensive survey, we create a taxonomy of currently existing opponent models, based on what opponent attributes are learned, which specific methods are used to model the attribute, and finally, what general learning techniques are applied in each case.

There exists a clear relation between every opponent attribute and the corresponding learning techniques. Bayesian learning is the most popular technique, and can be applied to learn any of the four opponent attributes we distinguish in our exposition. Time series forecasting and regression techniques are the second most popular techniques, and can be used whenever a trend can be established in the opponent’s behavior, which holds true for almost all learning tasks, except preference profile estimation. Other techniques, such as reinforcement learning, have not been used so far, but this might be only a matter of time. Learning what the opponent will do is even more challenging than finding out what the opponent wants, because the former depends on the latter. This is why the most advanced machine learning techniques are required for this case, ranging from artificial neural networks to signal processing techniques.

For each type of opponent attribute, there are a large number of different learning methods to choose from. Generally, it is unfeasible to compare these models with each other, as most authors evaluate them in their own specific negotiation setting. This is no surprise, given the fact that no universal model of negotiation has been adopted yet, let alone a common method to compare different learning techniques. This seems to call for a negotiation benchmark that could reliably compare different approaches that have been taken so far and those that will emerge in the future. As a first step, an additional contribution of this work is that we discuss all performance and accuracy metrics that are used to evaluate the quality of opponent models, and how each metric helps to quantify the opponent model benefits we have outlined in this survey. Consistently applying these measures would greatly improve comparability of results, and would provide insight in possible improvements to existing models and how they can be combined to augment each other.

The many different approaches and assumptions of the opponent modeling literature raise the additional challenge of pinpointing gaps in current research. Based on our analysis, we found a number of additional directions for future research:

Learning the opponent’s reservation value and deadline in multi-issue negotiations against arbitrary opponents. Currently, all opponent models that estimate the reservation value or deadline do so in a single-issue negotiation about quantitative issue, using Bayesian learning and/or non-linear regression. Furthermore, most models assume that the general form of the opponent’s decision function is known, except for its parameters. It would be interesting to extend current techniques to negotiations in which the opponent is not constrained to use a particular strategy and the opponent’s preference order is unknown beforehand. Since this is a multi-layered learning process of preference learning and strategy prediction, neural networks would be a natural candidate for this task.

Modeling the opponent’s preference profile in highly complex environments. We found that many models assume linear additive utility functions when estimating the opponent’s preferences. In practice, there are often interaction effects between issues, which cannot be captured using linear utility functions. While we discussed some models able to capture non-linear preferences, scalability appears to be a general issue. A direction for future work is to develop opponent models that can model non-linear preferences in large outcome spaces.

Estimating the opponent’s acceptance strategy in multi-issue negotiations with arbitrary opponents. We found few models that estimate the opponent’s acceptance strategy in multi-issue negotiations, and those that do either assume the opponent’s preferences are known (or can be easily estimated) or ignore the effect of time pressure. An important direction for future work is a more general solution method to estimate what and when the opponent will accept, and to define measures for their accuracy.

Development of accuracy metrics for models estimating the reservation value, deadline, and acceptance strategy. As we discussed in Sect. 6, accuracy measures are still lacking for these three types of models. The first two types could be calculated as the distance between the actual and estimated value, but more advanced methods are not yet determined. Formulating an accuracy measure for the acceptance strategy may be the most challenging of all, and could be an important step towards more advanced strategies for multi-session negotiations.

Finally, we focused on learning about the opponent, and not so much on the reverse problem: what the opponent learns, or should learn, about us; i.e., second order opponent models. On the one hand, we might actively attempt to keep the opponent from learning about us to avoid exploitation, but there is a balance to be maintained as well; it is, after all, also important for both parties to jointly explore the outcome space to reach a win-win agreement. Models that quantify the opponent’s knowledge about us can act as an essential link in combining different opponent modeling techniques. For example, we could employ a well-established preference estimation method to assess how the opponent acted according to its own utility. Then, we can apply one of the regression or forecasting techniques we have discussed to deduce the opponent’s strategy according to its own utility. In combination with what the opponent knows about us, this can predict what the agent will do in the future, this time according to our utility. In such a setting, it may be worthwhile to study techniques that are able to reveal exactly the right kind of information in order to reach the most beneficial outcome.