1 Introduction

The paper works out a multi-period/multilateral price index to effectively compare sets of commodities over time and/or across countries. This is done by using a regression framework and a reference basket based on the set of all commodities that appear at least in two periods/countries. A linear model in the deflator indexes and reference prices is specified, and estimation is performed by the method-of-averages argument. The resulting price index estimator follows as a by-product and enjoys fitting properties in between least squares and least absolute values. The price index constructed in this manner belongs to the family of stochastic indexes, which can be traced back to the works of Jevons (1863, 1869) and Edgeworth (1887, 1925). Its computation requires the knowledge of the quantities and values of the commodities included in the basket. Neither the model’s specification nor the estimation method used explicitly calls for prices. This aspect becomes particularly useful when the prices of some commodities in specific periods/countries are not known. In the multi-period case, this can occur when some commodities enter/exit the basket due to a shift in consumer attitude, while in a multilateral perspective, it could happen when a commodity is not present or ceases to be present on the market in a given country. The reference basket of the index at hand is the result of the union of the intersections in pairs of the specific baskets for each period/country. The absence of a commodity in a given period/country simply means that both its quantity and value are set equal to zero in that period/country.

Thus, just like hedonic (Pakes 2003; Brachinger et al. 2018), GESKS (Balk 2012) and country/time-product-dummy (CPD/TPD) approaches with an incomplete price tableau (Rao and Hajargasht 2016; Weinand 2021), the index does not drop items which are not present in all countries/periods. A commodity contributes to the index’s construction provided it occurs with non-null quantities in at least two periods/countries. The reference basket is, therefore, more inclusive and representative than baskets commonly referred to in the extant literature. Consequently, the index proves effective in a multi-period and/or multilateral framework and compares favorably to the Rao and Hajargasht (2016), GEKS, Ivancic et al. (2011) indexes.

Under sphericalness of errors in the regression model underlying the index estimator, the latter tallies with the Geary–Khamis (GK) index in a temporal setting and provides both a general closed-form representation of the latter and an inferential statistical apparatus as a dowry.

Moreover, the paper proposes an additional index under the assumption of commodity-dependent variances of the regression errors. This provides a more extended solution to the price index problem, which encompasses the Geary–Khamis Index as a special case and paves the way to further generalizations. In addition, the so-called reference prices, namely the prices expected to be paid for commodities in the base period/country, are easily obtainable as a spin-off of the regression estimation and can be conveniently used to evaluate the prices of those commodities that, for whatever reason, are not observable in a given period/country.

The paper is organized as follows. Section 2 formally states the problem of finding a multi-period/multilateral price index starting from the quantities and values of the commodities in each period/country in the basket/index. The regression model uses deflators and reference prices as parameters to be estimated. Section 3 shows how the deflators are estimated using the method of averages and how the estimator of the price indexes is obtained from the deflator estimates as a by-product. In Sect. 4, the hypotheses on the errors of the parent regression model are relaxed allowing the variance to possibly be commodity dependent. In Sect. 5, the novel index formula and the TPD index are compared, in the case of a complete and an incomplete price tableau, using simulated data. Here we show the effectiveness of our method of estimating missing prices using reference prices in comparison with the standard techniques based on the imputation of missing values. Some conclusive remarks are made in Sect. 6. An appendix provides the proofs of the statements in Sect. 3.

2 Formulation of the price index problem

In order to set up the regression framework needed to work out the index, let us consider the \(N\times T\) matrices \(\varvec{V}=[v_{n,t}]\), \(\varvec{Q}=[q_{n,t}]\) and \(\varvec{P}=[p_{n,t}]\) whose entries are values, quantities and prices, respectively, of a basket of N goods in T periods (or countries). Such a basket is the union of the intersections in pairs of the baskets in the T periods (or countries), and we assume that it covers all (and only those) commodities which occur with non-null quantities in at least two periods (or countries). The matrices \(\varvec{V}\), \(\varvec{Q}\) and \(\varvec{P}\) satisfy the equality

$$\begin{aligned} \underset{(N,T)}{\varvec{V}}=\underset{(N,T)}{\varvec{Q}} *\underset{(N,T)}{\varvec{P}} \end{aligned}$$
(1)

where \(*\) is the Hadamard-product symbol. The price index problem can be read as the problem of approximating the matrix \(\varvec{P}\) by the outer product \(\varvec{\pi }\varvec{\lambda }'\) of a vector of reference prices, \(\varvec{\pi }> \varvec{0}_{N}\), and a vector of price indexes, \(\varvec{\lambda }> \varvec{0}_{T}\), that is,

$$\begin{aligned} \underset{(N,T)}{\varvec{P}} \approx \underset{(N,1)}{\varvec{\pi }} \underset{(1,T)}{\varvec{\lambda }'}. \end{aligned}$$
(2)

So, the equality in Eq. (1) can be reformulated as follows

$$\begin{aligned} \varvec{V}=\varvec{Q} *\varvec{\pi } \varvec{\lambda }'+\varvec{H}=\varvec{D}_{\varvec{\pi }}\varvec{Q}\varvec{D}_{\varvec{\lambda }}\,+\,\varvec{H} \end{aligned}$$
(3)

where \(\varvec{D}_{\varvec{a}}\) denotes a diagonal matrix whose diagonal entries are the elements of the vector \(\varvec{a}\) and the matrix \(\varvec{H}\) accounts for discrepanciesFootnote 1. Post-multiplying Eq. (3) by \(\varvec{D}_{\varvec{\phi }}=\varvec{D}_{\varvec{\lambda }}^{-1}\), yields

$$\begin{aligned} \varvec{V}\varvec{D}_{\varvec{\phi }}=\varvec{D}_{\varvec{\pi }}\varvec{Q}+ \varvec{E} \end{aligned}$$
(4)

where \(\varvec{E}=\varvec{H} \varvec{D}_{\varvec{\phi }}\) is a matrix of disturbance terms. Henceforth, in the wake of Theil (1960), a sphericalness (i.e., constant variance and incorrelation) assumption will be made for the error components in (4), namely for the entries of the matrix \(\varvec{E}\). The issue of possibly relaxing this assumption by dropping the hypothesis of constant variance across commodity errors will be addressed in Sect. 4. The vector \(\varvec{\phi }\), whose elements are the reciprocals of the elements of the price-index vector \(\varvec{\lambda }\), is the deflator/exchange rate vector. Taking \(t=1\) as the base period (country), namely \(\phi _{1}=\lambda _{1}=1\), and partitioning \(\varvec{V}\), \(\varvec{Q}\), \(\varvec{E}\), \(\varvec{\phi }\) and \(\varvec{D}_{\varvec{\phi }}\) as follows

$$\begin{aligned} \begin{aligned}&\varvec{V}=\left[ \underset{(N,1)}{\varvec{v}_{1},}\,\, \underset{(N,T-1)}{\varvec{V}_2}\right] ,\,\,\, \varvec{Q}=\left[ \underset{(N,1)}{\varvec{q}_{1},}\,\, \underset{(N,T-1)}{\varvec{Q}_2}\right] ,\,\,\, \varvec{E}=\left[ \underset{(N,1)}{\varvec{\varepsilon }_{1}, } \underset{(N,T-1)}{\varvec{E}_2}\right] \\&\underset{(T,1)}{\varvec{\phi }}=\begin{bmatrix} 1 \\ \underset{(T-1,1)}{\varvec{\varphi }}\end{bmatrix}, \underset{(T,T)}{\varvec{D}_{\phi }}=\begin{bmatrix} 1 &{} \underset{(1,T-1)}{\varvec{0}} \\ \underset{(T-1,1)}{\varvec{0}} &{} \underset{(T-1,T-1)}{\varvec{D}_{\varvec{\varphi }}} \end{bmatrix}, \end{aligned} \end{aligned}$$
(5)

we pass from the matrix Eq. (4) to the system

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}\underset{(N,1)}{\varvec{v}_1}=\underset{(N,N)}{\varvec{D}_{\varvec{\pi }}}\underset{(N,1)}{\varvec{q}_1}+\underset{(N,1)}{\varvec{\varepsilon }_1} \\ &{} \underset{(N,T-1)}{\varvec{V}_2} \underset{(T-1,T-1)}{\varvec{D}_{\varvec{\varphi }}}=\underset{(N,N)}{\varvec{D_{\varvec{\pi }}}}\underset{(N,T-1)}{\varvec{Q}_2}+\underset{(N,T-1)}{\varvec{E}_2} \end{array}\right. }. \end{aligned}$$
(6)

The latter can be written in staked form as follows

$$\begin{aligned} \varvec{y}=\varvec{X} \varvec{\beta } +\varvec{\varepsilon }. \end{aligned}$$
(7)

by setting

$$\begin{aligned} \begin{aligned}&\underset{(NT,1)}{\varvec{y}}=\begin{bmatrix} \varvec{v}_{1}\\ \varvec{0}\end{bmatrix}, \underset{(NT,T-1+N)}{\varvec{X}}=\begin{bmatrix} \varvec{0} &{} \varvec{D}_{\varvec{q}_{1}}\\ -(\varvec{I}_{T-1}\otimes \varvec{V}_{2})\varvec{R}_{T-1}' &{} (\varvec{Q}_{2}'\otimes \varvec{I}_{N})\varvec{R}_{N}' \end{bmatrix}, \\&\underset{(T-1+N,1)}{\varvec{\beta }}=\begin{bmatrix} \varvec{\varphi } \\ \varvec{\pi } \end{bmatrix}, \underset{(NT,1)}{\varvec{\varepsilon }}= \begin{bmatrix}{\varvec{\varepsilon }_{1}}\\ vec\varvec{\varvec{E}_{2}} \end{bmatrix}= \begin{bmatrix}{\varvec{\varepsilon }_{1}}\\ {\varvec{\varepsilon }_{2}} \end{bmatrix}. \end{aligned} \end{aligned}$$
(8)

Here \(\otimes \) is the Kronecker product symbol, vec is the staking operator, and \(\varvec{R}_{j}\) is the following matrix (Faliva 1996)

$$\begin{aligned} \underset{(j\times j^2)}{\varvec{R}_j} = \begin{bmatrix} \underset{(1,j)}{\varvec{e}_{1}'}\otimes \underset{(1,j)}{\varvec{e}_{1}'} &{}\\ \underset{(1,j)}{\varvec{e}_{2}'}\otimes \underset{(1,j)}{\varvec{e}_{2}'} &{}\\ \dots &{} \\ \underset{(1,j)}{\varvec{e}_{j}'}\otimes \underset{(1,j)}{\varvec{e}_{j}'} \end{bmatrix}, \end{aligned}$$
(9)

where \(\varvec{e}_{i}\) denotes the ith elementary vectorFootnote 2. The model in Eq. (7) together with a sphericalness hypothesis for the error terms is a classical linear regression model. The estimation of the parameters will be accomplished in the next section by using the Method of Averages, in short MA, (Kveětoň 1987). The estimator so obtained turns out to be an instrumental-variable (IV) estimator (Goldberger 1964) with a binary instrument matrix and enjoys the desirable properties of both MA and IV inferential procedures.

3 The solution of the index estimation problem and its meaning

In the following, the MA is applied to estimate the price-indexes via the intermediate estimation of the deflator indexes. To this end, let us look at

$$\begin{aligned} \varvec{X}\varvec{\beta }\approx \varvec{y} \end{aligned}$$
(10)

as an over-identified system of linear equations and solve the derived system

$$\begin{aligned} \varvec{L}'\varvec{X}\varvec{\beta }=\varvec{L}'\varvec{y}, \end{aligned}$$
(11)

where \(\varvec{L}\) is a binary matrix of the same dimensions as \(\varvec{X}\) which satisfies the rank condition

$$\begin{aligned} r(\varvec{L}'\varvec{X})=N+T-1. \end{aligned}$$
(12)

If the binary matrix \(\varvec{X}^{b}=[x_{ij}^{b}]\), \(x_{ij}^{b}=1\) if \(x_{ij} \ne 0\) and \(x_{ij}^{b}= 0\) otherwise, associated with \(\varvec{X}\) satisfies the rank condition (12), then this matrix provides a convenient choice for \(\varvec{L}\). This leads to the linear system

$$\begin{aligned} (\varvec{X}^{b})'\varvec{X}\varvec{\beta }=(\varvec{X}^{b})'\varvec{y} \end{aligned}$$
(13)

whose solution

$$\begin{aligned} \varvec{\widehat{\beta }}=((\varvec{X}^{b})'\varvec{X})^{-1}(\varvec{X}^{b})'\varvec{y} \end{aligned}$$
(14)

gives the intended index estimator. Actually, the solution (14) occurs to be an instrumental variables (IV) estimator (Goldberger 1964), with the columns of \(\varvec{X}^{b}\) acting as instruments. Going back to the linear model in Eq. (7) and assuming sphericalness for the error terms, i.e.,

$$\begin{aligned} \mathbb {E}(\varvec{\epsilon })=\varvec{0}_{NT}, \mathbb {E}\{\varvec{\epsilon }\varvec{\epsilon }'\}=\sigma ^{2} \varvec{I}_{NT}, \end{aligned}$$
(15)

the dispersion matrix of the estimator (14) is given by

$$\begin{aligned} \varvec{\Sigma }(\varvec{\widehat{\beta }})=\sigma ^{2}((\varvec{X}^{b})'\varvec{X})^{-1}(\varvec{X}^{b})'\varvec{X}^{b}(\varvec{X}'\varvec{X}^{b})^{-1} \end{aligned}$$
(16)

and the error variance \(\sigma ^{2}\) is estimated by

$$\begin{aligned} \widehat{\sigma }^{2}=\frac{1}{NT-N-T+1}(\varvec{y}-\varvec{X}\varvec{\widehat{\beta }})'(\varvec{y}-\varvec{X}\varvec{\widehat{\beta }}). \end{aligned}$$
(17)

The estimator of the price index vector follows as a by-product of the estimator in Eq. (14), and the statistical properties of the former can be derived from those of the latter, accordingly. The estimator in Eq. (14) crucially rests on the following

Lemma 1

The rank condition

$$\begin{aligned} r((\varvec{X}^{b})'\varvec{X}))=N+T-1 \end{aligned}$$
(18)

holds true for \(\varvec{X}\) in Eq. (8).

Proof

See Appendix. \(\square \)

On this premise, we can establish the following

Theorem 1

The estimator \(\varvec{\hat{\varphi }}=[\varvec{I}_{T-1}, \varvec{0}]\widehat{\varvec{\beta }}\) of the deflator vector is given by

$$\begin{aligned} \varvec{\widehat{\varphi }}=(\varvec{D}_{w}-\varvec{Q}'_2\varvec{D}_{\varvec{\bar{q}}}^{-1}\varvec{V}_2)^{-1}\,\varvec{Q}'_2\varvec{D}_{\varvec{\bar{q}}}^{-1}\varvec{v}_{1}= \end{aligned}$$
(19)
$$\begin{aligned} \begin{aligned} =\begin{bmatrix} \sum _{i=1}^{N}\left( v_{i2}-\frac{q_{i2}v_{i2}}{\sum _{t=1}^{T}q_{it}}\right) &{} \dots &{}-\sum _{i=1}^{N}\left( \frac{q_{i2}v_{iT}}{\sum _{t=1}^{T}q_{it}}\right) \\ \dots &{} \dots &{} \dots \\ -\sum _{i=1}^{N}\left( \frac{q_{iT}v_{i2}}{\sum _{t=1}^{T}q_{it}}\right) &{} \dots &{}\sum _{i=1}^{N}\left( v_{iT}-\frac{q_{iT}v_{iT}}{\sum _{t=1}^{T}q_{it}}\right) \end{bmatrix} ^{-1} \begin{bmatrix} \sum _{i=1}^{N}\left( \frac{q_{i2}v_{i1}}{\sum _{t=1}^{T}q_{it}}\right) \\ \dots \\ \sum _{i=1}^{N}\left( \frac{q_{iT}v_{i1}}{\sum _{t=1}^{T}q_{it}}\right) \end{bmatrix} \end{aligned} \end{aligned}$$
(20)

where \(\varvec{D_{w}}\) and \(\varvec{D_{\bar{q}}}\) are diagonal matrices with the elements of the vectors

$$\begin{aligned} \varvec{w}=\varvec{V}_{2}'\varvec{u}_{N} \, \, \,\text{ and }\,\,\, \bar{\varvec{q}}=\varvec{Q}\varvec{u}_{T} \end{aligned}$$
(21)

as diagonal entries, respectively; \(\varvec{u}_{N}\) and \(\varvec{u}_{T}\) are vectors of 1’s with N and T components, respectively. An estimator of the variance–covariance matrix of the estimator is

$$\begin{aligned} \widehat{\varvec{\Sigma }}(\varvec{\widehat{\varphi }})= \widehat{\sigma }^{2}(\varvec{D}_{\varvec{w}}-\varvec{Q}'_2\varvec{D}_{\varvec{\bar{q}}}^{-1}\varvec{V}_2)^{-1}\,\varvec{Q}'_{2}\varvec{D}_{\varvec{\bar{q}}}^{-2}\varvec{Q}_{2}\,(\varvec{D}_{\varvec{w}}-\varvec{Q}'_2\varvec{D}_{\varvec{\bar{q}}}^{-1}\varvec{V}_2)^{-1} \end{aligned}$$
(22)

where

$$\begin{aligned} \begin{aligned} \widehat{{\sigma }}^{2}=&\frac{1}{NT-N-T+1}\left\{ \varvec{v}_{1}'\varvec{v}_{1}-2[\varvec{0}',\varvec{v}_{1}'](\varvec{X}'\varvec{X}^{b})^{-1}\begin{bmatrix}\varvec{0}\\ \varvec{D}_{q_{1}}\varvec{v}_{1} \end{bmatrix}\right. \\&\left. +[\varvec{0}',\varvec{v}_{1}'](\varvec{X}'\varvec{X}^{b})^{-1}\varvec{X}'\varvec{X}((\varvec{X^{b})'}\varvec{X})^{-1}\begin{bmatrix}\varvec{0}\\ \varvec{v}_{1} \end{bmatrix}\right\} \end{aligned} \end{aligned}$$
(23)

is an estimator of the error variance.

Proof

See Appendix. \(\square \)

In what follows, we will refer to \(\widehat{\lambda _{t}}\) as the reciprocal of the estimator of the parent deflator vector \(\widehat{\varphi }_{t-1}\). In this connection, we establish the following

Corollary 1

The estimator of the price index in period (or country) t, \( 2 \le t \le T\), is given by

$$\begin{aligned} \widehat{\hat{\lambda }_{t}}=(\varvec{e}_{t-1}(\varvec{D}_{w}-\varvec{Q}'_2\varvec{D}_{\varvec{\bar{q}}}^{-1}\varvec{V}_2)^{-1}\,\varvec{Q}'_2\varvec{D}_{\varvec{\bar{q}}}^{-1}\varvec{v}_{1}))^{-1} \end{aligned}$$
(24)

where \(\varvec{e}_{t-1}\) is the \(t-1^{th}\) elementary vector of \(T-1\) components and \(\varvec{\widehat{\varphi }}_{t-1}\) is the \(t-1\)-th component of the estimator (19). The estimated variance of \(\widehat{\lambda _{t}}\) is approximated by

$$\begin{aligned} \widehat{var}(\widehat{\lambda }_{t})\approx {\widehat{\varphi }}_{t-1}^{-4}\,\widehat{var}({\widehat{\varphi }}_{t-1})={\widehat{\varphi }}_{t-1}^{-4}\,\varvec{e}'_{t-1}\widehat{\varvec{\Sigma }}(\varvec{\widehat{\varphi }})\varvec{e}_{t-1} \end{aligned}$$
(25)

where \(\widehat{\varvec{\Sigma }}(\varvec{\widehat{\varphi }})\) is the matrix  (22).

Proof

See Appendix. \(\square \)

It can be shown that under sphericalness the analysis of the residuals of the regression model in Eq. (7) associated with the estimator in Eq. (14) leads to the systems of equations which determine the Geary–Khamis (GK) index. Thus, the estimator is a closed-form expression of the GK index. Although the GK index has been widely investigated (see, e.g., Diewert and Fox (2022, 2017); Balk (2012); Heston and Lipsey (2007)), a closed-form formula of the index for the general case of an arbitrary number of periods and/or countries is lacking up to now. The index formula devised in this paper provides the intended result within a regression model framework with the inherent inferential statistical toolkit as a dowry. In order to prove that \(\widehat{\lambda }_{t}\) represents a closed-form expression of the GK index, check that

$$\begin{aligned} (\varvec{X}^{b})'\varvec{e}=\varvec{0}_{N+T-1} \end{aligned}$$
(26)

holds true for

$$\begin{aligned} \underset{(NT,1)}{\varvec{e}}=\varvec{y}-\varvec{X}\varvec{\widehat{\beta }} \end{aligned}$$
(27)

where

$$\begin{aligned} \varvec{\widehat{\beta }}=\begin{bmatrix} \varvec{\widehat{\varphi } }\\ \varvec{\widehat{\pi }}\end{bmatrix} \begin{bmatrix} \lambda _{2}^{-1}\\ \dots \\ \widehat{\lambda }_{T}^{-1}\\ \widehat{\pi }_{1}\\ \dots \\ \widehat{\pi }_{N} \end{bmatrix}=((\varvec{X}^{b})'\varvec{X})^{-1}(\varvec{X}^{b})'\varvec{y}. \end{aligned}$$
(28)

Simple computations, bearing in mind Eq. (50) and (61) in Appendix, show that

$$\begin{aligned}&(\varvec{X}^{b})'\varvec{e}=(\varvec{X}^{b})'\varvec{y}-(\varvec{X}^{b})'\varvec{X}\varvec{\widehat{\beta }}= \end{aligned}$$
(29)
$$\begin{aligned}&\quad =\begin{bmatrix} \underset{((T-1),1)}{\varvec{0}} \\ \underset{(N,1)}{\varvec{v}_{1}} \end{bmatrix} - \begin{bmatrix}\varvec{D}_{w} &{} -\varvec{Q}_{2}' \\ -\varvec{V}_{2} &{} \varvec{D}_{\tilde{q}} \end{bmatrix} \begin{bmatrix}\underset{((T-1),1)}{\varvec{\widehat{\varphi }}} \\ \underset{(N,1)}{\varvec{\widehat{\pi }}}\end{bmatrix} =\begin{bmatrix}\underset{((T-1),1)}{0} \\ \underset{(N,1)}{\varvec{0}} \end{bmatrix}. \end{aligned}$$
(30)

The latter, together with Eq. (26), leads to the pair of equation systems

$$\begin{aligned}&\varvec{D}_{w}\varvec{\widehat{\varphi }}=\varvec{Q}_{2}'\varvec{\pi } \end{aligned}$$
(31)
$$\begin{aligned}&\varvec{v}_{1}+\varvec{V}_{2}\varvec{\widehat{\varphi }}=\varvec{D}_{\tilde{q}}\varvec{\widehat{\pi }} \end{aligned}$$
(32)

that can be rewritten as

$$\begin{aligned}&\sum _{i=1}^{N}v_{it}\widehat{\lambda }_{t}^{-1}= \sum _{i=1}^{N}q_{it}\widehat{\pi }_{i}, t=2,\dots ,T \end{aligned}$$
(33)
$$\begin{aligned}&v_{i1}+\sum _{t=2}^{T}v_{it}\widehat{\lambda }_{t}^{-1}=\sum _{t=1}^{T}q_{it}\widehat{\pi }_{i} i=1,\dots ,N. \end{aligned}$$
(34)

Solving Eqs. (33) and (34) for \(\widehat{\lambda }_{t}\) and \(\widehat{\pi }_{i}\) yield

$$\begin{aligned}&\widehat{\lambda }_{t}=\frac{\sum _{i=1}^{N}v_{it}}{\sum _{i=1}^{N}q_{it}\widehat{\pi }_{i}} \end{aligned}$$
(35)
$$\begin{aligned}&\widehat{\pi }_{i}= \frac{\sum _{t=1}^{T}\frac{v_{it}}{\widehat{\lambda }_{t}}}{\sum _{t=1}^{T}q_{it}}. \end{aligned}$$
(36)

Under \(\lambda _{1}=1\), Eqs. (35) and (36) read as the equation systems of the GK index in the temporal setting and the index can be obtained, accordingly (as noticed by an anonymous referee we are indebted to).

4 Dropping the assumption of constant-variance errors

In the previous section, the estimation procedure of the deflator vector \(\varvec{\varphi }\) (and eventually of the price index \(\varvec{\lambda }\)) has been performed under the assumption of error sphericalness which embodies both uncorrelation and constant-variance of disturbances. Leaving apart the issue of dependence, in particular correlation, that we exclude from our analysis, let us investigate the assumption of constant variance. A hypothesis of constant variance over time for errors is tenable by virtue of the argument that the model specification is the outcome of a deflating transformation via \(\varvec{D}_{\varvec{\phi }}=\varvec{D}_{\varvec{\lambda }}^{-1}\). No a-priori justification can be advanced for a constant-variance hypothesis for different commodities, if not computational convenience. So it is worth considering the issue more deeply. The analysis cannot but start from the residuals corresponding to the estimator \(\varvec{\hat{\beta }}\), that is

$$\begin{aligned} \underset{(TN,1)}{\widehat{\varvec{\varepsilon }}}=\varvec{y}-\varvec{X}((\varvec{X}^{b})'\varvec{X})^{-1}(\varvec{X}^{b})'\varvec{y}. \end{aligned}$$
(37)

As a simple computation shows the (sub)vector of the residuals referable to the nth commodity over the time span \(1 \le t \le T\) is given by

$$\begin{aligned} \underset{(T,1)}{\widehat{\varvec{\zeta }}_{n}}=\underset{(T,TN)}{\varvec{J}_{n}}\underset{(TN,1)}{\widehat{\varvec{\varepsilon }}}=\begin{bmatrix} \widehat{\varepsilon }_{n}\\ \widehat{\varepsilon }_{n+N}\\ \vdots \\ \widehat{\varepsilon }_{n+(T-1)N} \end{bmatrix} \end{aligned}$$
(38)

where \(\varvec{J}_{n}\) is the selection matrix

$$\begin{aligned} \underset{(T,TN)}{\varvec{J}_{n}}=\begin{bmatrix} \widehat{\varvec{e}'}_{n}\\ \widehat{\varvec{e}'}_{n+N}\\ \vdots \\ \widehat{{\varvec{e}'}}_{n+(T-1)N} \end{bmatrix} \end{aligned}$$
(39)

with \(\varvec{e}_{j}\) denoting the jth elementary vector of TN components. Accordingly, an estimator of the variance \(\sigma ^{2}_{n}\) of the T errors referable to the nth commodity is given by

$$\begin{aligned} \widehat{\sigma }^{2}_{n}=\frac{1}{T}\underset{(1,T)}{\widehat{\varvec{\zeta }'}} \underset{(T,1)}{\widehat{\varvec{\zeta }}}=\widehat{\sigma }^{2}\widehat{\varsigma }_{n}^{2} \end{aligned}$$
(40)

with \(\widehat{\sigma }^{2}\) given by Eq. (17) and \(\widehat{\varsigma }_{n}^{2}\) ensuing as a by-product. It follows that the former assumption of a scalar dispersion matrix for the errors no longer holds and it must be replaced by the following specification

$$\begin{aligned} E(\varvec{\varepsilon }\varvec{\varepsilon }')=\sigma ^{2}(\varvec{I}_{T}\otimes \varvec{D}^{2}_{\varvec{\hat{\varsigma }}}) \end{aligned}$$
(41)

where \(\varvec{D}_{\varvec{\hat{\varsigma }}}\) is the \(N\times N\) diagonal matrix whose diagonal entries are the squares of the scalars \(\hat{\varsigma }_{1},\hat{\varsigma }_{2},\ldots ,\hat{\varsigma }_{N}\). Under Eq. (41), the model

$$\begin{aligned} \varvec{y}=\varvec{X}\varvec{\beta } + \varvec{\varepsilon } \end{aligned}$$
(42)

is no longer a classical linear model. Nevertheless, it can easily be brought back to a classical model by premultiplying both sides of Eq. (42) by the matrix \((\varvec{I}_{T}\otimes \varvec{D}_{\varvec{\hat{\varsigma }}}^{-1})\), which yields the specification

$$\begin{aligned} \tilde{\varvec{y}}=\tilde{\varvec{X}}\varvec{\beta }+\tilde{\varvec{\varepsilon }} \end{aligned}$$
(43)

where

$$\begin{aligned} \begin{aligned}&\tilde{\varvec{y}}=(\varvec{I}_{T}\otimes \varvec{D}_{\varvec{\hat{\varsigma }}}^{-1})\varvec{y}, \\&\tilde{\varvec{X}}=(\varvec{I}_{T}\otimes \varvec{D}_{\varvec{\hat{\varsigma }}}^{-1})\varvec{X}, \\&\tilde{\varvec{\varepsilon }}=(\varvec{I}_{T}\otimes \varvec{D}_{\varvec{\hat{\varsigma }}}^{-1})\varvec{\varepsilon }, \end{aligned}\end{aligned}$$
(44)

with \(\tilde{\varvec{\varepsilon }}\) enjoying the sphericalness property

$$\begin{aligned} E(\tilde{\varvec{\varepsilon }})=\varvec{0},\,\, E(\tilde{\varvec{\varepsilon }}\tilde{\varvec{\varepsilon }}')=\sigma ^{2}\varvec{I}_{NT}. \end{aligned}$$
(45)

Noting that

$$\begin{aligned} \tilde{\varvec{X}^{b}}=\varvec{X}^{b} \end{aligned}$$
(46)

the vector \(\varvec{\beta }\) can be newly estimated via the moving-average approach, with \(\varvec{X}^{b}=\tilde{\varvec{X}^{b}}\) playing the role of the instrumental variable matrix. Eventually, we get the estimator

$$\begin{aligned} \tilde{\varvec{\beta }}=[(\tilde{\varvec{X}} ^{b})'\tilde{\varvec{X}}]^{-1}(\tilde{\varvec{X}} ^{b})'\tilde{\varvec{y}}= [(\varvec{X}^{b})'(\varvec{I}_{T}\otimes \varvec{D}_{\widehat{\varvec{\varsigma }}}^{-1})\varvec{X})]^{-1}(\varvec{X}^{b})'(\varvec{I}_{T}\otimes \varvec{D}_{\widehat{\varvec{\varsigma }}}^{-1})\varvec{y}. \end{aligned}$$
(47)

This shows that the sphericalness assumption can be relaxed in the case of interest. This paves the way to further extensions, if required.

5 A simulation based on log-normal random draws

In this section we illustrate the performance of the index developed in Sect. 3, called MA index hereafter, through three simulated examples. The scope of this analysis is to investigate, in a comparative manner, the capability of both the MA and the time dummy product (TPD) (de Haan et al. 2020) index to reproduce the “true” index values, \(\varvec{\lambda }\), in a multi-period perspective. To this aim, let us assume that quantities, \(\varvec{Q}\), reference prices, \(\varvec{\pi }\), and price indexes, \(\varvec{\lambda }\), of four commodities over six periods are specified as follows

$$\begin{aligned}\begin{aligned}&\varvec{Q}= \begin{bmatrix} 5.00 &{} \quad 7.00 &{}\quad 7.00 &{} \quad 8.00 &{}\quad 10.00 &{}\quad 12.00 \\ 15.00 &{}\quad 20.00 &{} \quad 21.00 &{} \quad 24.00 &{}\quad 25.00 &{}\quad 27.00 \\ 25.00 &{}\quad 22.00 &{}\quad 20.00 &{}\quad 23.00 &{}\quad 23.00 &{}\quad 25.00 \\ 5.00 &{}\quad 6.00 &{}\quad 6.00 &{}\quad 8.00 &{}\quad 10.00 &{}\quad 15.00 \\ \end{bmatrix}= \begin{bmatrix} \underset{(4,1)}{\varvec{q}_{1}},\dots ,\underset{(4,1)}{\varvec{q}_{6}} \end{bmatrix}, \,\,\, \\&\varvec{\lambda }= \begin{bmatrix} 1.00\\ 1.05\\ 1.11\\ 1.15\\ 1.18\\ 1.25 \end{bmatrix},\,\,\, \varvec{\pi }= \begin{bmatrix} 3.5\\ 2.5\\ 1.7\\ 2.9 \end{bmatrix}. \end{aligned} \end{aligned}$$

Then, with these data at hand, the values, \(\varvec{V}\), have been computed as in (3) where, the random terms, \(\varvec{H}\), without lack of generality, have been generated from a standard log-Normal distribution

$$\begin{aligned}\begin{aligned}&\varvec{V}= \begin{bmatrix} 22.22 &{} \quad 27.38 &{} \quad 28.59 &{} \quad 37.77 &{} \quad 45.90 &{} \quad 54.09 \\ 37.85 &{} \quad 53.82 &{} \quad 59.10 &{} \quad 69.94 &{} \quad 73.88 &{} \quad 85.56 \\ 42.97 &{} \quad 39.73 &{} \quad 39.93 &{} \quad 47.30 &{} \quad 46.47 &{} \quad 53.49 \\ 15.15 &{} \quad 22.48 &{} \quad 20.59 &{} \quad 26.88 &{} \quad 34.77 &{} \quad 58.54 \\ \end{bmatrix}= \begin{bmatrix} \underset{(4,1)}{\varvec{v}_{1}},\ldots ,\underset{(4,1)}{\varvec{v}_{6}}. \end{bmatrix} \end{aligned} \end{aligned}$$

The prices, \(\varvec{P}\), needed to compute the TPD index, have been worked out as ratios between values and quantities:

$$\begin{aligned}\begin{aligned}&\varvec{P}= \begin{bmatrix} 3.26 &{} 3.75 &{} 3.81 &{} 3.93 &{} 4.00 &{} 4.41 \\ 2.52 &{} 2.66 &{} 2.74 &{} 2.88 &{} 2.95 &{} 3.14 \\ 1.74 &{} 1.76 &{} 1.87 &{} 2.00 &{} 2.05 &{} 2.10 \\ 2.43 &{} 2.96 &{} 3.06 &{} 3.32 &{} 3.18 &{} 3.59 \\ \end{bmatrix}=\begin{bmatrix} \underset{(4,1)}{\varvec{p}_{1}},\ldots ,\underset{(4,1)}{\varvec{p}_{6}}. \end{bmatrix} \end{aligned} \end{aligned}$$

In this simulation, the matrices \(\varvec{Q}\) and \(\varvec{V}\) have been used to compute both the MA and the TPD indexes, in a multi-period perspective. In this regard, we have considered three different cases that cover three empirical scenario:

  • Case.1 Complete price tableau, implying a reference basket including a complete dataset for the four commodities;

  • Case.2 Incomplete price tableau, assuming missing the second and fourth commodity in the first and second period, respectively, (that is \(q_{41}=v_{41}=0\) and \(q_{22}=v_{22}=0\)), with a “standard” reference basket that includes only the first and the third commodities;

  • Case.3 Incomplete price tableau assuming missing the second and fourth commodity in the first and second period, respectively, (that is \(q_{41}=v_{41}=0\) and \(q_{22}=v_{22}=0\)), with the MA reference basket that includes commodities present in at least two periods, namely all the four commodities.

The outcome of the Breusch-Pagan test has led to rule out the presence of heteroschedasticity in all these three scenarios. In what follows the sum of the squares of the differences between the estimated MA and TPD indexes, \(\hat{\varvec{\lambda }}\), and the “real” index, \(\varvec{\lambda }\) have been worked out for the three said cases:

  • Case.1 Complete price tableau: 0.0032 (MA) and 0.0066 (TPD);

  • Case.2 Incomplete price tableau (“standard” basket): 0.004 (MA) and 0.010 (TPD);

  • Case.3 Incomplete price tableau (“novel” basket): 0.001 (MA) and 0.003 (TPD).

It is worth noting that the MA index provides always the best fit to the index \(\varvec{\lambda }\) compared to the TPD one. As expected the index coincides, given the absence of heteroschedasticity across commodities, the MA index turns out to tally with the GK one. In all cases, the MA estimates turn out to be more efficient (see Fig. 1), as they have lower variances and, consequently, they are always included in a \(2\sigma \) confidence band of the TPD index, as shown in Fig. 1. Looking at Fig. 2, we see that the values of the MA index provide the best fit to the real price index \(\varvec{\lambda }\), avoiding the TPD overestimation issue present in all cases and, in particular, when there are missing prices.

Fig. 1
figure 1

Comparison of the MA and the TPD indexes (with 2\(\sigma \) confidence bands, given (22) for the MA index) for Case.1, Case2. and Case.3, respectively

Fig. 2
figure 2

Comparison of the MA and the TPD indexes (without confidence bands) for Case.1, Case.2 and Case.3, respectively

These examples also highlight the role played by the reference prices, \(\tilde{\varvec{p}}\), which are the prices that consumers are expected to pay for the commodities in a given period/country. Reference prices prove useful in obtaining estimates of the prices of those commodities which, being missing in the basket, can not be determined. Indeed, the price of a commodity, say i, missing in a period, say t, is undetectable, but can be determined as \(\hat{\pi }_i' \,\hat{\lambda }_{t}\), where \(\hat{\pi }_i\) and \(\hat{\lambda }_t\) are the estimate of the reference price of the commodity i and the MA index at time t, respectively. This strategy has been used to estimate the prices of the second and fourth commodity in Case.3. According to (2), the prices \(p_{4,1}\) and \(p_{2,2}\) can be estimated as follows

$$\begin{aligned} \widehat{p}_{i,t}=\hat{\pi }_i\hat{\lambda }_t. \end{aligned}$$
(48)

Note that \(\hat{\pi }_i\) represents the price of the \(i^{th}\) commodity in the base period/country, here assumed to be \(t=1\). According to (48), the price estimates \(\widehat{p}_{i,t}\) at times \(t=2,3,\dots ,T\) are obtained by updating \(\hat{\pi }_i\) by means of the values of the index \(\hat{\lambda }_t\) at these periods (or for the countries \(t=2,3,\dots ,T\)). To assess the goodness of the \(\widehat{p}_{i,t}\) estimates, the sum of the squares between observed, \(p_i\), and estimated prices, \(\widehat{p}_i\), have been computed for all the commodities in the three cases under study. These are

$$\begin{aligned} \begin{aligned}&\sum _{t=1}^6\sum _{i=1}^4(\hat{p}_{i,t}-p_{i,t})^2=0.258\; \text {for Case.1}\\&\sum _{t=1}^6\sum _{i=\{1,3\}}(\hat{p}_{i,t}-p_{i,t})^2=0.028\; \text {for Case.2}\\&\sum _{t=1}^6\sum _{i=1}^4(\hat{p}_{i,t}-p_{i,t})^2=0.102\; \text {for Case.3} \end{aligned} \end{aligned}$$
(49)

and confirm the satisfactory performance of the MA index in reproducing missing prices by using reference ones. Clearly, this result hinges on the ability of this approach to provide good estimates of the reference prices. In this regard, Fig. 3 that compares the estimates of the MA reference prices with the “real” ones provides evidence of the goodness of the estimated reference prices for all commodities, also in the presence of missing prices. Indeed, all the points are close to the bisector of each panel. Unfortunately, given that the TPD approach does not provide reference prices as spin off, a comparison between the MA and the TPD indexes under this latter aspect, that is the “quality” of missing prices constructed by using reference prices, is not possible.

Fig. 3
figure 3

MA estimates of the reference prices compared with the “real” ones for Case.1, Case.2 and Case.3, respectively. The dotted line represents the bisector

6 Concluding remarks

The paper provides the solution to the multi-period and/or multilateral price index in closed form, under proper error assumptions, taking a regression model specification as the frame of reference and the method of averages as the estimation approach. Two specifications are assumed in turn: sphericalness and commodity-dependent variances of the error terms. The former leads to an expression in compact form of the price indexes for an arbitrary number of periods and/or countries. The regression inferential apparatus applies accordingly. The price-index expression thus obtained proves to tally with the already known Geary–Khamis (GK) index, which is eventually endowed of the inferential heritage of the former. The second error specification drops the homoskedastic assumption in favour of a commodity-dependent hypothesis for the error variances, which leads to a new and more general price-index formula with a significance that extends beyond the GK case and opens up pathways for further research.