Keywords

1 Introduction

Clients requiring complex Information Technology (IT) services typically submit a request for proposals that can fulfil their demands. Service providers have to respond with a proposed solution and enter a competitive bidding process trying to win the contract [1]. One of the critical factors in this process is the pricing of various services included in the proposal, though it is not the only factor for winning the deals [1, 2].

Traditionally, solutioners responsible for preparing the solution proposal followed the “bottom-up” pricing approach. In this approach, they obtain a quote by determining costs and prices of each detailed service component included in the solution and then summing them up [3, 4], given the hierarchical nature of IT services [6]. This is often a time consuming and resource-intensive task; because, in a complex service proposal, there could be thousands of items at the lowest level to be priced. In our previous work [3, 4], we proposed a “top-down” pricing method for IT services and demonstrated how this approach could lead to efficient pricing with adequate accuracy. The method makes use of historical data of prior deals as well as market data to determine the price using a minimal input from the user. It works via mining those data at the highest level [3] or the second hierarchical level [4]. The algorithm works in two main steps; peer selection and cost calculation, where after selecting similar peer historical and market deals, it mines these deals to estimate the costs and later the prices.

Previous results showed that this approach enables efficient pricing and that using the second hierarchical level (referred to as level two below) can improve estimation accuracy. However, one challenge was that not all level two subservices in the deal being priced are always included in the chosen peer deals to be mined. Since every deal can have different IT services demands, it can happen that some of the demanded services were not included in the chosen peer deals. The difference could become more significant as they are computed at granular service levels. In our previous work [4], we described a simple way to treat this problem; that is we assumed that a missing service at level two in a peer deal has a cost equal to the average of the costs of that same service in all the chosen peer deals that have that service.

In this work, we do a more advanced way of addressing this problem. That is, we formulate it as a recommendation problem [5] to complement the missing data, then proceed with the rest of our pricing algorithm. Figure 1 gives the overview of all mentioned approaches. We statistically show that our new approach is significantly more accurate to use than our two previous approaches. In addition, doing the recommendation can be used in several other applications rather than deal pricing. One important such applications is performing all kinds of analytics and trend analysis on historical and market data. Obtaining these analytical insights could help the business revise and reassess their competitiveness.

Fig. 1.
figure 1

Overview of the bottom-up approach for pricing IT service deals, top-down latest approach in [4], and our proposed one

Therefore, the contributions of this paper are three-folds. First, we provide a novel formulation of the problem of missing services data in historical and market deals as a machine learning recommender system. Second, we show how embedding the results of this system in our prior pricing algorithm significantly enhances our pricing accuracy, statistically speaking. Thirdly, our approach can be directly used to enable all kinds of data analytics on the historical deal data; an insight of high business value.

The rest of this paper is organized as follows: in Sect. 2, we provide a review of the literature in the area of our problem. In Sect. 3, we present our new approach and show how it can be embedded in the pricing algorithm. We then show our numerical results in Sect. 4 and end the paper with the conclusions and ideas for future work in Sect. 5.

2 Related Work

References [1, 2, 79] provide an overview of the area of IT services deals and how competitive bidding process works. For pricing these deals, we refer the reader to our previous works in [3, 4]. This paper is an extension to the latter two papers. In [3], we proposed a method for top-down pricing of IT service deals, in which high level data for the included services in these deals are used. A peer selection algorithm along with a calculation logic was presented. Numerical results showed the validity of the hypothesis in the paper that mining historical data can lead to more accurate pricing than the business-traditionally used approach of using market benchmark data. The algorithm was enhanced in [4], where the data mining was done at level two of the services although the input is still at level one and thus the concept of top down pricing is preserved. Results showed that doing the pricing at such lower level can yield more accurate results. Note that there is no justification of pricing at a lower level, or doing it at the nth level; since there is typically no known information at such lower levels when the deal is priced in the beginning of the reply to the request for proposal process. In addition, there are typically three, or at most four, levels of IT services in this type of business [3]. In this work, we extend the method in these latter two references and show the benefit of using a recommender system for augmenting missing data values.

Another related work in the literature is the work of Akkiraju et al. [9]. They presented a method for assessing the competitiveness of deals after being priced. That is, the deals need to be priced first in order for their method to work, rather than pricing the deals with minimal user input as the works in [3, 4] as well as the current one.

More literature about services pricing though not in the field of complex IT services deals can be seen in the studies in [1020]. In [10], Li et al. provided a study of different pricing models for cloud storage. Their models focus on special characteristics of the cloud, e.g., storage types and configurations. Li et al. [11] studied queuing systems-based pricing models for other IT services. Ibrahim et al. [12] proposed a pricing framework for the pay-as-you-consume cloud computing service.

Basu et al. [13] developed optimal pricing strategies for cloud providers. Their method incorporates the utility of cloud users as a function of a set of parameters directly proportional to the utility and another set of parameters that have a negative effect on utility. In [14], Laatikainen and Ojala presented pricing models for software as a service (SaaS), in which they highlighted the relationship between architectural and pricing characteristics for this service. They showed that such relationship is tight when the value of firm proposition is at a high cloud maturity levels. Tawalbeh [15] stated an empirical study of a pricing method for mobile service providers driven by a cost based model. One of their main conclusions was that service providers should focus on market oriented pricing when their objectives are related to profitability, market share, and sales maximization.

For the sake of conciseness, we refer the reader to the other works for general services pricing [1620]. In general, all these studies do not put in consideration the characteristics of IT services in complex deals as do our work and the prior two studies [3, 4] to it. Additionally, a literature survey for this class of these distantly related studies can be found in [21]. In the next section, we present our methodology in detail.

3 Methodology

In this section, we first present our notation in Sect. 3.1 followed by the formulation of our problem as a recommender system followed by the recommender algorithm that we used/applied to our real-world data in Sect. 3.2. We then show in Sect. 3.3 how we embed the results we can obtain from this modeling approach into our pricing algorithm.

3.1 Notation

We identify two categories of services that are included in any IT service deal in our context: regular services (referred to as services below) and common services. The regular services have baselines/units and their costs are independent of other services in a deal. Common services do not have baselines/units however their costs are dependent on different regular services included in the deal. Examples of regular services are databases and end user, and account management is a common service, for which the cost dependent on all regular services, which need some account management, in the deal.

We define any deal \( d \in D \) by the tuple sets (Meta Information, Services, Common Services), where D is the set of deals (either historical or market benchmark), and Meta Information is the set of the meta-data of the deal, namely: Meta Information = {Deal Outcome, Contract Year, Geography, Industry}. Deal Outcome is either won or lost (client did not pursue or provider withdraw from biding). Contract year is the calendar year at which delivery of the services will begin. Geography and industry refer to geographical location and industry of the client respectively.

We define the set of regular services as: Services = {Service 1 , ……, Service i , ….., Service M }, where M is the cardinality of the set Services. Similarly, we specify the set of common services as: Common Services = {Common Service 1 , ……, Common Service j , ….., Service N }, where N is the cardinality of the set Common Services.

Let us define (a) any regular service as \( Service_{i} \in Services \), where \( i = \{ 1, \ldots ,M\} \), by the tuple (Baseline, Cost, Price), and (b) any common service as \( Common\; Serivce_{j} \in Common \;Services \), where \( j = \{ 1,\; \ldots ,\;N\} \), by the tuple (Percentage of Total Cost, Cost, Price).

We further decompose each regular service \( Service_{i} \), into a set of level two services: \( L2Services = \, \{ L2Service_{1} ,\, \ldots ,\,L2Service_{a} ,\, \ldots ,\,L2Service_{P} \} \) where a = {1,…, P}, and P is the cardinality of the set L2Services. Similar to level one service \( Service_{i} \), we define any \( L2Service_{a} \) with the tuple (L2Baseline, L2Cost, L2Price) where cardinality P may vary for different level one services.

Similarly, we break down each common service \( Common\;Service_{j} \) into a set of level two common services \( L2 \, Common\;Services = \, \{ L2Common\;Service_{1} ,\, \ldots ,\,L2Common\;Service_{b} ,\, \ldots ,\,L2Service_{Q} \} \) where b = {1, …, Q}, and Q is the cardinality of the set L2Common Services. Similar to level one service \( Common\;Service_{j} \), we define any \( L2Common\;Service_{b} \) with the tuple (L2Percentage of Total Cost, L2Cost, L2Price) where cardinality Q may vary for different level-one common services.

Finally, we define any scenario S to be a new deal to be priced, by the tuple sets {Meta Information, Service s , Common Service s }. The following are the inputs for our approach: the elements of the set Meta Information s , the values of Baselines for each \( service_{k} \in Services_{s} \) and the scope values for each \( Common \;Service_{l} \in Common \;Services_{s} \). The output of our approach are: the estimated Cost and Price for each element of the sets \( Services_{s} \) and \( Common\; Services_{s} \), and thus the total cost and price of Scenario s . In the following section, we briefly describe the details of our approach.

3.2 Formulating Our Problem as a Recommendation System

In a typical recommendation system, there are “users” and “items”. The “ratings” of some user-item pairs are known while the rest are unknown [22]. An example of this is the movie recommendation. In that problem, there are some users who have seen some movies and rated them, but not all users have seen all movies. The movies’ provider would like to predict the rating of users for the movies that haven’t seen so that he can recommend to them those movies that they would have rated highly had they seen them. Another example is in online retailers, where the “users” are buyers and the “items” are the products that they can purchase.

There are generally two main classes of recommender systems; content-based recommendations and collaborative recommendations [5]. In the former one, the user will be recommended items that are similar to the ones that he/she preferred in the past. In the latter one, the user will be recommended items that were preferred in the past by people with similar tastes to him/her.

Now, we look into our problem. Considering the historical and market deals as users and the services at any level as items/movies, one can see the mapping between the two problems. Figure 2 shows this analogy.

Fig. 2.
figure 2

Formulation of our problem as a recommender system

We note that the content-based recommendation is the one that better suits our application. This is because, in our pricing algorithm, we select peers at the highest level first and then perform cost mining for the preselected peers. Thus, the recommendation of missing data will be performed on the already filtered set of similar deals to our deal that we are trying to price. Thus, the collaborative recommendation is not applicable since we are already using a subset of “similar” deals. This observation was confirmed by a set of preliminary experiments that we did. Basically, we did a standard machine learning experiment, where we divided our data set into training and testing sets. Then, we trained several recommender systems on the training data set and applied them to the testing one. We found that the context based ones (also known as “item recommendation”) give more accurate results on both sets. That is our method identified similar deals better than collaborative filtering; as it uses our expertise knowledge of problem structure. Thus, we decided to embed it in our approach.

We now provide an overview of the context-based recommender that we use. The basic idea is that we compute a similarity s between every item i that user u has no preference for yet and for every item j that he/she has a preference for. Then, u’s preference for j, weighted by s is added to a running average. Lastly, the top items ranked by weighted average is returned. Note that, in our problem, that last step is not relevant; since we do not recommend the some services among the missing ones for each deal, but we are rather interested in only coming up with a score for item i (which would be that of j weighted by s. For a more detailed explanation of context-based recommendation, we refer the reader to [23]. In the next subsection, we show how/where we exactly embed this system in our pricing approach.

3.3 Embedding the Recommender System in Our Pricing Algorithm

Our previous studies [3, 4] extensively describe our top down pricing approach. In this paper, we provide a summary of the approach using similar definitions used in our previous work and in the above Subsect. 3.1.

Our approach contains the following steps: selecting peer deals, calculating scope and baselines for services at level two, recommending missing service cost values in peers at level two, and estimating costs/prices at level two and aggregating them to compute costs/prices at level one. In the following sub sections, we briefly describe each step.

Selecting Peer Deals.

For each regular and common service of a scenario, our approach selects a set of historical and market benchmark deals as peers to draw the unit cost values of the service from them. Our approach compares the Meta Information s (Deal Outcome, Contract Year, Geography, and Industry) of the scenario to that of all historical and market benchmark deals to select the matching ones. The reason behind the choice of suitable Meta Information are explained in detail in our previous work [3].

Once deals are selected based on the Meta information match, our approach sorts the deals based on two different criterias separately defined for regular and common services.

For each regular service \( \forall {\text{Service}} \;k \in Services_{s} \), our approach adopts a criteria based on baselines proximity. We denote Baseline Proximity dsk be the baseline proximity between deal \( d \in D \) and scenario S for \( Service \;k \in Services_{s} \) and define it as:

$$ \begin{aligned} Baseline \, Proximity_{dsk} = |Baseline \, for & \, Service_{i} \;of \, deal \, d{-}Baseline \, for \, Service \, k \, of \, \\ scenario \, S| \\ \end{aligned} $$

For each common service \( \forall Common \;{\text{Service}} l \in Common \;Services_{s} \), our approach sorts the selected deals based on a different proximity which is denoted as Common Service Proximity dsl (the proximity between deal \( d \in D \) and scenario S for \( Common \;Service\; l \in Common \;Services \)) and defined as follows:

$$ \begin{aligned} Common \, & Service \, Proximity_{dsl} = \left| {Sum \, of \, Costs \, of \, regular \, services \, for \, deal \, d{-}} \right. \\ & \left. {Sum \, of \, costs \, of \, regular \, services \, for \, our \, scenario \, S} \right| \\ \end{aligned} $$

We refer the readers to [3] for detailed explanation of the proximity criterias defined above.

Calculating Scope and Baselines for Services at Level Two.

Note that our approach requires scope and baseline values for services at level one. Hence the approach estimates scope and baselines values for services at level two. Each \( \forall {\text{Service}} \;k \in Services_{s} \) has many \( L2Service_{ka} \in L2Services_{s} \). To decide which of them are in-scope, our approach rely on a set of predefined business rules. To calculate the baselines for \( L2Service_{ka} \in L2Services_{s} \), the method uses the peer deal selected for the corresponding level one \( {\text{Service}} \;k \in Services_{s} \) from market benchmark data. We denote \( p_{m} \in D \) as the market peer deal for a \( {\text{Service}}\; k \in Services_{s} \) of a scenario S, \( Baseline_{pma} \) as the baseline of the corresponding L2Serviceka of peer \( p_{m} \), and \( Baseline_{pmi} \) as the corresponding level one Servicei of peer \( p_{m} . \) Then the baselines for \( L2Service_{ka} \in L2Services_{s} \) can be defined as:

$$ L2Baseline_{ka} = \frac{{L2BaseLine_{pma} *Baseline_{k} }}{{Baseline_{pmi} }} $$
(1)

Recommending Missing Service Cost.

We report on how our approach finds the recommended service cost values for selected peers for services of a scenario. For each service, \( {\text{Service}} \;k \in Services_{s} \) of a scenario S, let us assume that there are selected peer deals \( p_{h} \in D \) where \( h = \{ 1,\, \ldots ,\,H\} \), and H is the number of selected peers of that particular \( {\text{Service}}\; k \in Services_{s} \). For each \( L2Service_{ha} \in L2Services_{h} \), of the corresponding service from each peer, let us denote the cost as \( L2Cost_{ha} \) which may be missing for some peers. For the peers that do not have the \( L2Cost_{ha} \), our approach uses a recommender algorithm to estimate cost values from the pool of selected peers \( p_{h} \in D \). Note that these selected peers of particular \( {\text{Service}} \;k \in Services_{s} \) are similar to each other with respect to (a) their Meta information, and (b) baseline proximity for regular services. Note also that we implicitly assume, either here or in our overall methodology, that historical data is available. This is a quite realistic assumption practically speaking in this domain area.

Estimating Costs/Prices.

We describe how our approach estimates the costs for each regular and common service for both the historical data and market benchmark views.

Cost Calculation for Regular Services of a Scenario.

For each \( L2Service_{ka} \) of \( {\text{Service}} \;k \in Services_{s} \), our approach retrieves the unit costs of that level two service in each of its sorted peer deals and then compute the n th Percentile of these peer unit costs. For L2Service ka , we denote the resulting unit cost as L2Unit − Cost ka and its cost is computed as follows:

$$ L2Cost_{ka} = L2Unit{-}Cost_{ka} *L2Baseline_{ka} $$
(2)

Finally, our approach computes the cost of the \( {\text{Service}} \;k \in Services_{s} \), as follows:

$$ Cost_{ka} = \sum\nolimits_{{a \in L2 Services_{k} }} {L2Cost_{ka} } $$
(3)

Cost Calculation for Common Services of a Scenario.

For each \( L2Common\;Service_{s,\,l,\,b} \) of \( Common \;Service l \in Common \;Services_{s} \), our approach calculates the percentage of the cost for that level two service to the overall cost of the deal in each of its sorted peer deals. Then it use that percentage as is without any normalization and applies the l th Percentile to the set of percentages of these peer percentages values to get the percentage of that service to the total cost of our scenario S. For each \( L2Common\;Service_{s,\,l,\,b} \), we denote the resulting percentage as \( L2P_{s,\,l,\,b} \).

Let us describe how to calculate the cost values for each \( L2Common\;Service_{s,\,l,\,b} \). Let us define the total cost of all services in our scenario S as

$$ Sum_{s, \,all } = Sum_{s,\,com} + Sum_{s,\, reg} $$
(4)

Where SUM s,all is the total cost of the scenario (sum of the costs for all services, both regular and common ones); \( Sum_{s,\,reg} \) is the sum of the costs for the regular services-\( {\text{Service}}\; k \in Services_{s} \); SUM s,com is the sum of the costs for the common services- \( Common \;{\text{Service}}\; l \in Common\; Services_{s} \) and computed as follows

$$ Sum_{s,com} = \sum\nolimits_{{l \in Common \,Services_{s} }} {Cost_{s,l} } $$
(5)

Where \( Cost_{s,l} \) refers to the cost of \( Common \;Service \;l \in Common \;Services_{s} \).\( Cost_{s,l} \) can be further defined using the level two cost values as follows:

$$ Cost_{s,l} = \sum\nolimits_{{b \in L2Common\; Services_{s,l} }} {Cost_{s,l,b} } $$
(6)

Now we replace \( Cost_{s,l} \) in Eq. (5) with the definition in Eq. (6) which lead to the following:

$$ Sum_{s,com} = \sum\nolimits_{{l \in Common \;Services_{s} }} {\sum\nolimits_{{b \in L2Common \;Services_{s,l} }} {Cost_{s,l,b} } } $$
(7)

Now we have that for each \( L2Common\; Services_{s,l,b} \) in our scenario S:

$$ Cost_{s,l,b} = Sum_{s,all} *L2P_{s,l,b} $$
(8)

Finally we transform the above set of linear equations to a standard form as follows:

$$ \begin{aligned} \left( {L2P_{s,1,1} - 1} \right)*Cost_{s,1,1} + L2P_{s,1,1} *Cost_{s,1,2} + \ldots + \hfill \\ L2 P_{s,1,1} *Cost_{{s,1, B_{1} }} + L2P_{s,1,1} *Cost_{s,2,1} + \ldots + \hfill \\ L2P_{s,1,1} *Cost_{{s,2,B_{2} }} + \hfill \\ L2P_{s,1,1} *Cost_{{s,L,B_{L} }} = - L2P_{s,1,1} *Sum_{s,reg} \hfill \\ \end{aligned} $$
(10)
$$ \begin{aligned} L2P_{s,1,2} *Cost_{s,1,1} + (L2P_{s,1,2} - 1)*Cost_{s,1,2} + \ldots + \hfill \\ L2P_{s,1,2} *Cost_{{s,1, B_{1} }} + L2P_{s,1,2} *Cost_{s,2,1} + \ldots + \hfill \\ L2P_{s,1,1} *Cost_{{s,2,B_{2} }} + \hfill \\ L2P_{s,1,2} *Cost_{{s,L,B_{L} }} = - L2P_{s,1,2} *Sum_{s,reg} \hfill \\ \end{aligned} $$
(11)
$$ \begin{aligned} L2P_{s,l,b} *Cost_{s,1,1} + (L2P_{s,1,2} - 1)*Cost_{s,1,2} + \ldots + \hfill \\ L2P_{s,l,b} *Cost_{{s,l, B_{1} }} + L2P_{s,l,2} *Cost_{s,2,1} + \ldots + \hfill \\ L2P_{s,1,1} *Cost_{{s,2,B_{2} }} + \hfill \\ (L2P_{s,l,b} - 1)*Cost_{{s,L,B_{L} }} = - L2P_{s,L,B} *Sum_{s,reg} \hfill \\ \end{aligned} $$
(12)

Where \( L \) refers to the cardinality of the set \( Common \;Services_{s} \); \( B_{1} , B_{2} ,\, \ldots \,{\text{B}}_{\text{L}} \) are cardinalities of the sets \( L2Common \;Sevices_{1} ,\, \ldots .,\, L2Common \;Services_{L} \). Our approach solves these equations straightforwardly as they fulfil the requirement for such set of linear equations (see, for instance, [24]). By solving above equations, the approach computes the cost of each common service at level-two (\( Cost_{s,l,b} \)) per year.

We refer the readers to [3] to understand the difference between the cost estimations for historical and market benchmark perspectives.

To aggregate the costs at deal level, our approach add the costs for services at level two to get the costs at level one. Then, it sums up all level one service costs to compute the cost at deal or scenario level. To estimate the price, our approach adds a chosen arbitrary gross profits to the estimated costs. Figure 3 shows the overall overview of our approach. We also note that market data follows the same structure as historical data, with the difference in its source; market data are from market rates rather than historical deals. Therefore, the exact same aforementioned method for calculating the historical price point applies for calculating the market price point.

Fig. 3.
figure 3

Overall overview of our proposed pricing approach

Note that we can straightforwardly embed our outputted prices in a prediction model as the one in [2, 3] in order to assess the probability of winning the deal at different pricing points. We refer the reader to the details in these two references for the prediction model; since it is out of scope of the present work.

4 Evaluation

In this section, we first present our evaluation setup bed in Sect. 4.1 and then report some numerical results that show the usefulness of using our new approach.

4.1 Evaluation Setup

From an industrial data repository of an IT service provider, we retrieved 30 random historical deals with their complete cost structure (at level one and level two) information. For each of the deal, our test bed generated a corresponding scenario by using the deal’s meta-data and baselines and scopes of level one services. The test bed further generated the cost estimation for the services of the scenarios by invoking the pricing approach described in Sect. 3. In addition, to compare with our previous approach, the test bed also generated the cost estimations for the services of the same scenarios, however by invoking the earlier version of top down pricing algorithm with recommending missing cost values of services of peer deals. The selected deals to create scenarios were excluded from being selected as peers when invoking the pricing algorithms. Figure 4 shows an overview of our test bed.

Fig. 4.
figure 4

Overview of our evaluation test bed

4.2 Numerical Results

For recommending missing values, we use the item-based recommender algorithm implementation in Apache Mahout [25]. More specifically, we use the Pearson Correlation based Item Similarity algorithm [26]. That is, similarity between any two services \( u \in L2Services \) and \( w \in L2Services \) is calculated from the below equation:

$$ Pearson^{'} s \;Similarity \;Correlation \;Coefficient\;\left( {w,u} \right) = \frac{{\mathop \sum \nolimits_{i \in L2Services} \left( {c_{i} - c_{avg,w} } \right)*\left( {c_{i} - c_{avg,u} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i \in L2Services} \left( {c_{i} - c_{avg,w} } \right)^{2} *\mathop \sum \nolimits_{i} \left( {c_{i} - c_{avg,u} } \right)^{2} } }} $$
(13)

Where, \( c_{i} \) is the cost of service \( i \in L2Services\backslash \{ u,w\} , \) while \( c_{avg,w} \) is the average cost of service \( w \in L2Service \) among all peer deals that have this service. \( c_{avg,u} \) is the same but for service \( u \in L2Service. \) Note that if the denominator in Eq. (13) is zero, we set the corresponding coefficient to 1.

To use Mahout library, first our approach prepares the Mahout-compliant data. For that, the approach maps each peer deal’s service data in the form: “dealID, serviceID, costValue”, which is in the form of “userID, itemID, prefValue”. Then, the approach invokes Mahout library’s ItemSimilarity function to build the correlation map for the services from peer deals and GenericItemBasedRecommenderBuilder function to recommendations. Note that this all is done (the call to the Mahout library) in the appropriate step in our method, as explained in the previous section.

For each of our latest work in [4] and this work, we generated two cost points out of two perspectives; historical and market data. Note that we do our comparisons for costs values since prices are calculated by adding user chosen gross profits.

Now, we define the following errors for each \( {\text{S}}ervice \;k \in Service \) and \( Common \;Service \;l \in Common \;Services_{s} \):

$$ \begin{aligned} Ver1\_Error &_{historical\, data} \\ & \quad \quad = |Calculated \;Cost_{i,v1} \; From \;Historical \;Data_{{scenario_{s} }} \\ & \quad \quad - Actual \;Cost_{{i,deal_{s} }} | \\ \end{aligned} $$
$$ \begin{aligned} Ver1\_Error &_{market\, data} = \\ & \quad \quad \quad |Calculated \;Cost_{i,v1} \; From\; Market \;Data_{{scenario_{s} }} \\ & \quad \quad - Actual \;Cost_{{i,deal_{s} }} | \\ \end{aligned} $$
$$ \begin{aligned} Ver2\_Error &_{historical \,data} = \\ & \quad \quad |Calculated\; Cost_{i,v2} \;From\; Historical \;Data_{{scenario_{s} }} \\ & \quad \quad - Actual \;Cost_{{i,deal_{s} }} | \\ \end{aligned} $$
$$ \begin{aligned} Ver2\_Error &_{market \,data} = \\ & \quad \quad \quad |Calculated \;Cost_{i,v2} \;From \;Market \;Data_{{scenario_{s} }} \\ & \quad \quad - Actual\; Cost_{{i,deal_{s} }} | \\ \end{aligned} $$

Where \( Veri\_Error_{historical data} \) is the absolute difference between the actual cost and that calculated using level i, where i = 1 (the previous work [4]) or i = 2 (this current work). Similarly, \( Veri\_Error_{market data} \) shares the same definition but for market data.

Then, just like in [3, 4], we validate the same result for this work. That is; we obtain the results in Table 1 that show the confirming results that there is a significant increase in accuracy when historical data are mined to estimate costs compared to market ones. The paired t-test of hypothesis here (see [26, 27]) is as follows:

$$ H_{o} :\mu_{D} = 0 $$
$$ H_{1} : \mu_{D} < 0 $$
Table 1. Hypothesis test result for difference between historical data and market data using the method in this paper

Where,

$$ \mu_{D} = \mu_{{\left\{ {historical\, data \,error} \right\}}} - \mu_{{\left\{ {market\, data\, error} \right\}}} $$

And, \( \mu_{D} \) is the difference between the mean of the historical data errors (\( Ver2\_Error _{historical data} \)) and that of market data (\( Ver2\_Error _{market data} ) \).

Next, we compare the costs obtained by our new pricing algorithm to that of the previous one in [4]. We use the difference between the errors of the two algorithms for historical data only. The reason for not performing this for market data is that typically, market data is complete and usually there is only one market deal for each geography setting. Thus, there is no need to apply our method for recommending missing values to that latter case.

Table 2 shows the results of that comparison. One can see that our claim is justified; that is, statistically speaking, for almost all services in our study, performing calculations using the method presented in this paper is more accurate than the previous one in [4] (which was shown to outperform the previous results in [3]). The test of hypothesis in Table 2 is as follows:

$$ H'_{o} :\mu '_{D} = 0 $$
$$ H'_{1} : \mu '_{D} < 0 $$
Table 2. Hypothesis test result for the difference between calculations using the method in this paper and those in [4] for historical data

Where, \( \mu '_{D} = \mu_{{Ver_{2} }} - \mu_{{Ver_{1} }} \)

Here, \( \mu '_{D} \) is the difference between the mean of the calculations for historical data errors of our current approach (e.g., \( Ver1\_Error_{historical\;data} \)) and that of our previous work (i.e., \( Ver1\_Error _{historical \;data} ) \).

Lastly, note that using a paired t-test is justified using the same argument in [3, 4]. We also refer the reader to the texts in [26, 27] for more details on these tests. Also, note that the two-sided test is significant (\( H_{o} \) is rejected) for all the test shown above, and thus justifies doing the one sided tests whose results are shown in the three mentioned tables. Lastly, we note that we used a significance level of 0.1 and that at an 0.05 significance level, results will vary slightly as one can see in Table 2.

5 Conclusion and Future Work

In this paper, we presented an enhanced top-down pricing method for IT services deals. Our approach models the problem of missing values in the historical data, that is mined to estimate the costs for the deals to be priced, as an item-based recommender system. Using such system, we augment the missing values and embed the resulting complete set of data in the top-down pricing approach we proposed before. We showed that doing so could yield significant increase in the accuracy of services pricing, statistically speaking. Additionally, using the resulted complete set of data, one can perform more analytics to gain business insights and recommendations for the business. We also showed that our results still agree with the hypothesis we proposed in our previous works; that is, statistically speaking, using historical data could yield significantly more accurate results compared to the traditional business usage of market data.

There are multiple directions for future work. One direction is to further automate the user-input percentile step of our algorithm; as this could potentially improve the pricing accuracy. Another direction for future work is to apply some of the more sophisticated machine learning recommenders that use the context of both the users/items in the prediction. Lastly, applying this method to other general services that have a tinder process like ours, might be another direction for future research.