Skip to main content

Advertisement

Log in

A Latent Class Nested Logit Model for Rank-Ordered Data with Application to Cork Oak Reforestation

  • Published:
Environmental and Resource Economics Aims and scope Submit manuscript

Abstract

We analyze stated ranking data collected from recreational visitors to the Alcornocales Natural Park (ANP) in Spain. The ANP is a large protected area which comprises mainly cork oak woodlands. The visitors ranked cork oak reforestation programs delivering different sets of environmental (reforestation technique, biodiversity, forest surface) and social (jobs and recreation sites created) outcomes. We specify a novel latent class nested logit model for rank-ordered data to estimate the distribution of willingness-to-pay for each outcome. Our modeling approach jointly exploits recent advances in discrete choice methods. The results suggest that prioritizing biodiversity would increase certainty over public support for a reforestation program. In addition, a substantial fraction of the visitor population are willing to pay more for the social outcomes than the environmental outcomes, whereas the existing reforestation subsidies are often justified by the environmental outcomes alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Several empirical studies have recently shown the importance of consequentiality by comparing stated preference surveys (Herriges et al. 2010); financially binding experiments with different provision rules with a stated preference survey (Vossler and Evans 2009; Vossler et al. 2012); and a real referendum with a stated preference survey (Vossler and Watson 2013). These studies find that WTP differs depending on whether or not respondents believe that the survey is minimally consequential, and that hypothetical WTP and real WTP converge when respondents believe that their responses will have an impact on policy design. Although these studies focused on discrete choice referendum questions, the implications of consequentiality may apply to the hypothetical scenarios in other survey-based discrete choice methods.

  2. Carson and Groves (2007) recommend the use of coercive payment vehicles, such as a tax, in a preference elicitation survey as they are incentive-compatible.

  3. Note that when \(\tau =1\), formulas (3) and (4) simplify to the MNL model.

  4. The idea of using choice probabilities to derive a rank-ordering probability has been proposed by McFadden (1986), who acknowledges the related contributions of Falmagne (1978) and Barberá and Pattanaik (1986). Layton and Levine (2003) exploit it to derive a probit model for partially ranked (best-worst) data in the presence of a large number of alternatives, and develop an accompanying Bayesian estimation algorithm. Layton and Lee (2006a, (2006b) exploit it to develop likelihood ratio tests for the poolability of different formats of stated preference responses, including rankings and ratings. To our best knowledge, however, the study of Dagsvik and Liu (2009) is the first one to derive and estimate the nested rank-ordered logit model which relaxes IIA of the exploded logit while maintaining a closed-form likelihood.

  5. The latent class approach can be exploited for a number of distinctive modeling uses. Building on the idea of Heckman and Singer (1984), our usage of the latent class approach is to specify a type of mixed logit specification which is called, inter alia, finite mixture logit, discrete mixture logit, or non-parametric mixed logit (Train 2008; Claassen et al. 2013; Keane and Wasi 2013; Yoo and Doiron 2013). As the last name, due to Train (2008), makes explicit, this specification uses a discrete distribution with C support points as a tool to obtain a non-parametric approximation to the unknown population distribution of random utility parameters, without committing to a particular parametric form (e.g. multivariate normal) of the population distribution a priori. Given this objective, it is desirable to specify as many support points as compatible with a model selection criterion (e.g. Bayesian Information Criterion) to accomplish a better approximation. For ease of exposition, it is common practice to adopt the typical latent class parlance and interpret each support point as preference “class” or “segment”, and the probability mass at each point as “class share” or “prior probability of class membership”.

    An alternative use of the latent class approach is to specify an endogenous segmentation model, a la Bhat (1997). Instead of viewing a discrete distribution as a tool to obtain a non-parametric approximation, this model makes a stronger structural assumption that there are indeed C different preference classes and decision makers are probabilistically assigned to different classes. The assignment probabilities are sometimes called class shares and often modeled as a function of demographic characteristics.

    The use of the latent class approach to obtain a finite mixture model or an endogenous segmentation model thus has distinct conceptual foundations. Moreover, while the finite mixture model has class shares (i.e. probability mass points) invariant with respect to demographic characteristics, in practice it is not necessarily a restricted functional form of the endogenous segmentation model. As Bhat (1997) points out, the number of classes, C, that one can empirically identify from a data set tends to be smaller when class shares are allowed to vary with demographic characteristics: estimating more class share parameters entails estimating fewer utility parameters. It is therefore difficult to recast a preferred finite mixture model as an endogenous segmentation model: such recasting often requires reducing the number of classes, which is at odds with the initial non-parametric approximation motive of specifying the finite mixture model.

  6. We have, however, found it useful to follow the advice of Scachar and Nalebuff (2004, p. 388) on numerical optimization and estimate our model in both spaces to double-check convergence. Over many sets of starting values, the WTP space model often resulted in a worse log-likelihood than the equivalent preference space model.

  7. The estimation routine has been written in TSP International 5.1 and is available from the authors upon request.

  8. This latter marginal WTP equals \(\omega _{SUR}+2\omega _{SUR^{2}}\)SUR.

  9. Following Layton (2000), Calfee et al. (2001), Berry et al. (2004), Train and Winston (2007), and Dagsvik and Liu (2009) among others, we focus on modeling an observed rank ordering as a realized preference ordering, instead of modeling it as a sequence of choices. When there are three alternatives per choice set and the ROL model is the true model, a single observation on a rank-ordering can be exploded into two pseudo-observations on the best and second-best choices made in sequence (Beggs et al. 1981; Train 2009, pp. 156–158). Based on this type of result, some studies contend that one should test for the poolability of pseudo-observations on the best and second-best choices a la Chapman and Staelin (1982) and Ben-Akiva et al. (1992), before making use of rank-ordered data. But except when one adopts a sequential choice model as the underlying behavioral framework so that a rank ordering is viewed as a sequence of repeated choices instead of a realized preference ordering (Giergiczny et al. 2013), such testing procedures entail a stringent maintained assumption that the ROL model is the true model so that the probability of a rank ordering equals a product of marginal choice probabilities (Hausman and Ruud 1987). The Monte Carlo study of Yan and Yoo (2014) shows that such testing procedures are highly sensitive to the maintained assumption: even when the ROL model is a slightly misspecified model (e.g. because the true error distribution is i.i.d. normal instead of i.i.d. type 1 extreme value), popular tests falsely reject poolability almost always in plausible sample size configurations. Both our NMNL and NROL estimation results reject \(H_{0}: \tau = 1\) at the 1 % level, soundly suggesting that the error terms are not i.i.d. type 1 extreme value.

  10. Strictly speaking, \(\hbox {ASC}^{\text {RF}}\) measures the constant utility change from choosing a reforestation program using artificial plantation. For simplicity, our discussion treats it as that from choosing a reforestation program; this is adequate for our purpose because in all estimation results, the sum of \(\hbox {ASC}^{\text {RF}}\) and the WTP for natural regeneration (NAT) has the same sign as \(\hbox {ASC}^{\text {RF}}\). While TAX \(\times \) −1 is a composite parameter capturing both the marginal utility of money and the overall scale of utility, we abstract from the conceptual distinction between the two components as they are observationally indistinguishable.

  11. A discrete mixing distribution implicitly allows for an unrestricted pattern of correlations between random utility parameters, as no restriction is placed on how different one support point (i.e. class-specific preference vector) should be from another.

  12. All our estimation results were obtained using a Windows 7 PC running on Intel i7-4790 CPU and 32 GB of RAM.

  13. One may also incorporate demographic characteristics by specifying the population class shares, \(\pi _{c}\) for \(c=1,2,\ldots ,C\), directly as a function of demographic characteristics. From a statistical perspective, this may be the most appealing way to explore the association between demographic characteristics and preference classes. But, as we discuss in footnote 5, such an “endogenous segmentation model” specification is conceptually distinct from our use of the latent class approach. Besides, we note that our preferred 4-class model cannot be directly “generalized” by specifying the population shares to vary with the characteristics listed in Table 5, as the maximum likelihood estimation routine then fails to achieve convergence, suggesting that the resulting model is not empirically identified. For empirical identification, the dimension of the model may need to be curtailed by reducing the number of classes, thereby compromising the quality of the non-parametric approximation and comparability with our preferred model; and also by reducing the number of demographic characteristics, which would require cumbersome specification search especially because the optimal number of classes is likely to vary with which particular subset of characteristics are allowed to affect class shares.

References

  • Akaich F, Nayga RM, Gil JM (2013) Are results from non-hypothetical choice-based conjoint analyses and non-hypothetical recoded-ranking conjoint analyses similar? Am J Agric Econ 95:946–963

    Google Scholar 

  • Barberá S, Pattanaik PK (1986) Falmagne and the rationalizability of stochastic choices in terms of random orderings. Econometrica 54:707715

    Article  Google Scholar 

  • Bateman IJ, Mace GM, Fezzi C, Atkinson G, Turner K (2011) Economic analysis for ecosystem service assessments. Environ Resour Econ 48:177–218

    Article  Google Scholar 

  • Beggs S, Cardell S, Hausman J (1981) Assessing the potential demand for electric cars. J Econ 17:1–19

    Article  Google Scholar 

  • Ben-Akiva M, Morikawa T, Shiroish F (1992) Analysis of the reliability of preference ranking data. J Bus Res 24:149–164

    Article  Google Scholar 

  • Ben-Akiva M, Lerman SR (1985) Discrete choice analysis: theory and application to travel demand. MIT Press, Cambridge

    Google Scholar 

  • Berry S, Levinsohn J, Pakes A (2004) Differentiated products demand system from a combination of micro and macro data. J Polit Econ 112:68–105

    Article  Google Scholar 

  • Bhat C (1997) An endogenous segmentation mode choice model with an application to intercity travel. Transp Sci 31:34–48

    Article  Google Scholar 

  • Boyle KJ, Holmes TP, Teisl MF, Roe B (2001) A comparison of conjoint analysis response formats. Am J Agric Econ 83:441–454

    Article  Google Scholar 

  • Calfee J, Winston C, Stempski R (2001) Econometric issues in estimating consumer preferences from stated preference data: a case study of the value of automobile travel time. Rev Econ Stat 83:699–707

    Article  Google Scholar 

  • Cameron TA, Poe GL, Ethier RG, Schulze WD (2002) Alternative non-market value-elicitation methods: are the underlying preferences the same? J Environ Econ Manag 34:391–425

    Article  Google Scholar 

  • Caparrós A, Campos P, Montero G (2003) An operative framework for total Hicksian income measurement: application to a multiple use forest. Environ Resour Econ 26:173–198

    Article  Google Scholar 

  • Caparrós A, Oviedo JL, Campos P (2008) Would you choose your preferred option? Comparing choice and recoded ranking experiments. Am J Agric Econ 90:843–855

    Article  Google Scholar 

  • Carson RT, Groves T (2007) Incentive and information properties of preference questions. Environ Resour Econ 37:181–210

    Article  Google Scholar 

  • Chang JB, Luck JL, Norwood (2009) How closely do hypothetical surveys and laboratory experiments predict field behavior? Am J Agric Econ 91:518534

    Article  Google Scholar 

  • Chapman RG, Staelin R (1982) Exploiting rank ordered choice set data within the stochastic utility model. J Mark Res 19:288–301

    Article  Google Scholar 

  • Claassen R, Hellerstein D, Kim SG (2013) Using mixed logit in land use models: can expectation-maximization (EM) algorithms facilitate estimation? Am J Agric Econ 95:419–425

    Article  Google Scholar 

  • Clark SL, Muthén B (2009) Relating latent class analysis results to variables not included in the analysis. mimeo. https://www.statmodel.com/download/relatinglca. Cited 24 Mar 2015

  • Dagsvik JK, Liu G (2009) A framework for analyzing rank-ordered data with application to automobile demand. Transp Res A 43:1–12

    Article  Google Scholar 

  • Doblas-Miranda E, Martínez-Vilalta J, Lloret F, Álvarez A, Ávila A, Bonet FJ, Brotons L, Castro J, Curiel Yuste J, Díaz M, Ferrandis P, Garca-Hurtado E, Iriondo JM, Keenan TF, Latron J, Llusiá J, Loepfe L, Mayol M, Moré G, Moya D, Peñuelas J, Pons X, Poyatos R, Sardans J, Sus O, Vallejo VR, Vayreda J, Retana J (2014) Reassessing global change research priorities in Mediterranean terrestrial ecosystems: how far have we come and where do we go from here? Glob Ecol Biogeogr 24:25–43

    Article  Google Scholar 

  • Duke JM, Ilvento TW (2004) A conjoint analysis of public preferences for agricultural land preservation. Agric Resour Econ Rev 33:209–219

    Article  Google Scholar 

  • European Commission (2014) Comission Regulation (EU) N0 702/2014 of 25 June 2014. Off J Eur Union 57:193

    Google Scholar 

  • Falmagne JC (1978) A representation theorem for finite scale systems. J Math Psychol 18:5272

    Article  Google Scholar 

  • Fok D, Paap R, Van Dijk B (2012) A rank-ordered logit model with unobserved heterogeneity in ranking capabilities. J Appl Econ 27:831–846

    Article  Google Scholar 

  • Giergiczny M, Hess S, Dekker T, Chintakayala PK (2013) Testing the consistency (or lack thereof) between choices in best-worst surveys. Paper presented at the 3rd international choice modelling conference, Sydney, 3–5 July 2013

  • Hausman J, Ruud P (1987) Specifying and testing econometric models for rank-ordered data. J Econ 34:83–104

    Article  Google Scholar 

  • Heckman J, Singer B (1984) A method for minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica 52:271–320

    Article  Google Scholar 

  • Herriges J, Kling C, Liu C-C, Tobias J (2010) What are the consequences of consequentiality? J Environ Econ Manag 59:6781

    Article  Google Scholar 

  • Herriges JA, Phaneuf DJ (2002) Inducing patterns of correlation and substitution in repeated logit models of recreation demand. Am J Agric Econ 84:1076–1090

    Article  Google Scholar 

  • Hess S, Ben-Akiva M, Gopinath D, Walker J (2011) Advantages of latent class over continuous mixture of logit models. mimeo. http://www.stephanehess.me.uk/papers/Hess_Ben-Akiva_Gopinath_Walker_May_2011. Cited 3 Mar 2016

  • Hole AR (2015) MIXLOGITWTP: Stata module to estimate mixed logit models in WTP space. Statistical Software Components S458037, Boston College, Department of Economics

  • Hoyos D, Mariel P, Pascual U, Etxano I (2012) Valuing a Natura 2000 Network site to inform land use options using a discrete choice experiment: an Illustration from the Basque Country. J For Econ 18:329–344

    Google Scholar 

  • Huber R, Hunziker M, Lehmann B (2011) Valuation of agricultural land-use scenarios with choice experiments: a political market share approach. J Environ Plan Manag 54:93–113

    Article  Google Scholar 

  • Johnson KA, Polasky S, Nelson E, Pennington D (2012) Uncertainty in ecosystem services valuation and implications for assessing land use tradeoffs: an agricultural case study in the Minnesota River Basin. Ecol Econ 79:71–79

    Article  Google Scholar 

  • Keane MP, Wasi N (2013) Comparing alternative models of heterogeneity in consumer choice behavior. J Appl Econ 28:1018–1045

    Google Scholar 

  • Krawczyk M (2012) Testing for hypothetical bias in willingness to support a reforestation program. J For Econ 18:282–289

    Google Scholar 

  • Layton DF (2000) Random coefficient models for stated preference surveys. J Environ Econ Manag 40:21–36

    Article  Google Scholar 

  • Layton DF, Brown G (2000) Heterogeneous preferences regarding global climate change. Rev Econ Stat 82:616–624

    Article  Google Scholar 

  • Layton DF, Lee ST (2006a) From ratings to rankings: the econometric analysis of stated preference ratings data. In: Halvorsen R, Layton DF (eds) Explorations in environmental and natural resource economics: essays in honor of Gardner M. Brown, Jr. Edward Elgar Publishing, Cheltenham

    Google Scholar 

  • Layton DF, Lee ST (2006b) Embracing model uncertainty: strategies for response pooling and model averaging. Environ Resour Econ 34:51–85

    Article  Google Scholar 

  • Layton DF, Levine RA (2003) How much does the far future matter? A hierarchical Bayesian analysis of the public’s willingness to mitigate ecological impacts of climate change. J Am Stat Assoc 98:533–544

    Article  Google Scholar 

  • Loomis J (2005) Economic values without prices: the importance of nonmarket values and valuation for informing public policy debates. Choices 20:179–182

    Google Scholar 

  • Louviere JJ, Flynn TN, Marley AAJ (2015) Best-worst scaling: theory, methods and applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • McFadden D (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P (ed) Frontiers in econometrics. Academic Press, New York

    Google Scholar 

  • McFadden D (1978) Modeling the choice of residential location. In: Karlqvist A, Lundqvist L, Snickars F, Weibull J (eds) Spatial interaction theory and planning models. North Holland, Amsterdam

  • McFadden D (1986) The choice theory approach to market research. Mark Sci 5:275297

    Article  Google Scholar 

  • McFadden D, Train KE (2000) Mixed MNL models of discrete response. J Appl Econ 15:447–470

    Article  Google Scholar 

  • Mogas J, Riera P, Bennett J (2006) A comparison of contingent valuation and choice modeling with second-order interactions. J For Econ 12:5–30

    Google Scholar 

  • Nunes PALD, Schokkaert E (2003) Identifying the warm glow effect in contingent valuation. J Environ Econ Manag 45:231–245

    Article  Google Scholar 

  • Othman J, Rahajeng A (2013) Economic valuation of Jogjakarta’s tourism attributes: a contingent ranking analysis. Tour Econ 19:187–201

    Article  Google Scholar 

  • Ovando P, Campos P, Montero G (2007) Forestaciones con Encinas y Alcornoques en el Área de la Dehesa en el Marco del Reglamento (CEE) 2080/92 (1993–2000). Rev Española Estudio Agrosoc Pesq 214:173–186

    Google Scholar 

  • Pacifico D, Yoo HI (2013) lclogit: a Stata command for fitting latent-class conditional logit models via the expectation-maximization algorithm. Stata J 13:625–639

    Google Scholar 

  • Resano H, Sanjuan AI, Albisu LM (2012) Consumers response to the EU Quality policy allowing for heterogeneous preferences. Food Policy 37:355365

    Article  Google Scholar 

  • Santos T, Tellería JL, Díaz M, Carbonell R (2006) Evaluating the benefits of CAP reforms: can afforestations restore bird diversity in Mediterranean Spain? Basic Appl Ecol 7:483–495

    Article  Google Scholar 

  • Scachar R, Nalebuff B (2004) Verifying the solution from a nonlinear solver: a case study: a comment. Am Econ Rev 94:382–390

    Article  Google Scholar 

  • Scarpa R, Thiene M, Train KE (2008) Utility in willingness-to-pay space: a tool to address confounding random scale effects in destination choice to the Alps. Am J Agric Econ 90:994–1010

    Article  Google Scholar 

  • Scarpa R, Notaro S, Louviere J, Raffaelli R (2011) Exploring scale effects of best/worst rank ordered choice data to estimate benefits of tourism in Alpine Grazing Commons. Am J Agric Econ 93:813–828

    Article  Google Scholar 

  • Schulz N, Breustedt G, Latacz-Lohman U (2013) Assessing farmers’ willingness to accept “greening”: insights from a discrete choice experiment in Germany. J Agric Econ 65:26–48

    Article  Google Scholar 

  • Train KE (2008) EM algorithms for nonparametric estimation of mixing distributions. J Choice Model 1:40–69

    Article  Google Scholar 

  • Train KE (2009) Discrete choice methods with simulation, 2nd edn. Cambridge University Press, New York

    Book  Google Scholar 

  • Train KE, Weeks M (2005) Discrete choice models in preference space and willingness-to-pay space. In: Alberini A, Scarpa R (eds) Applications of simulation methods in environmental resource economics. Springer, Dordrecht

    Google Scholar 

  • Train KE, Winston C (2007) Vehicle choice behavior and the declining market share of U.S. automakers. Int Econ Rev 48:1469–1496

    Article  Google Scholar 

  • Varela E, Giergiczny M, Riera P, Mahieu P-A, Soliño M (2014) Social preferences for fuel break management programs in Spain: a choice modelling application to prevention of forest fires. Int J Wildland Fire 23(2):281–289

    Article  Google Scholar 

  • Vossler CA, Doyon M, Rondeau D (2012) Truth in consequentiality: theory and field evidence on discrete choice experiments. Am Econ J Microecon 4:145–171

    Article  Google Scholar 

  • Vossler CA, Evans MF (2009) Bridging the gap between the field and the lab: environmental goods, policy maker input, and consequentiality. J Environ Econ Manag 58:338–345

    Article  Google Scholar 

  • Vossler CA, Watson SB (2013) Understanding the consequences of consequentiality: testing the validity of stated preferences in the field. J Econ Behav Organ 86:137–147

    Article  Google Scholar 

  • Wustemann H, Meyerhoff J, Ruhs M, Schafer A, Hartje V (2014) Financial costs and benefits of a program of measures to implement a national strategy on biological diversity in Germany. Land Use Policy 36:307–318

    Article  Google Scholar 

  • Yan J, Yoo HI (2014) The seeming unreliability of rank-ordered data as a consequence of model misspecification. MPRA Paper No. 56285. https://mpra.ub.uni-muenchen.de/56285/. Cited 3 Mar 2016

  • Yoo HI, Doiron D (2013) The use of alternative preference elicitation methods in complex discrete choice experiments. J Health Econ 32:1166–1179

    Article  Google Scholar 

Download references

Acknowledgments

We thank Alejandro Caparrós and Pablo Campos for allowing us to access the data used in this study. We wish to thank Editor Christian Vossler and two anonymous referees for helpful and constructive comments. All views expressed herein are our own.

Funding Oviedo’s involvement in this study was funded by the Spanish Ministry of Economy and Competitiveness (VEABA ECO2013-42110-P, I + D National Plan).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Il Yoo.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 2671 KB)

Appendices

Appendix 1: Summary of EM Algorithm

This appendix summarizes the expectation-maximization (EM) algorithm for estimating our latent class nested rank-ordered logit (LC-NROL) model. The use of the EM algorithm has been motivated by generic numerical difficulties associated with estimating finite mixture or latent class models via direct maximization of the sample log-likelihood function, not by any peculiar issue arising from estimating the LC-NROL model. While our own program was written in TSP International 5.1, it does not rely on any of TSP International’s specialized features. Programming the EM algorithm entails setting up a rudimentary loop over updating tasks (20) and (21) to be explained below, and can be implemented in any software package that allows the user to supply a self-written log-likelihood function e.g. Stata (Pacifico and Yoo 2013).

Unless specified otherwise, we follow the notations introduced in Sect. 3, and let \(n=1,2,\ldots ,N\) index individuals and \(c=1,2,\ldots ,C\) index classes. Our objective is to estimate three types of parameters: dissimilarity coefficient \(\tau \), class-specific preference parameters \(\varvec{\theta }=(\varvec{\theta }_{1},\varvec{\theta }_{2},\ldots ,\varvec{\theta }_{C})\) in the willingness-to-pay space, and the population share of each class \(\varvec{\pi }=(\pi _{1},\pi _{2},\ldots ,\pi _{C-1})\). The population share of class \(C, \pi _{C}\), is not a free parameter to be estimated since the class shares must add up to 1 and hence \(\pi _{C}\equiv 1-\sum \nolimits _{c=1}^{C-1}\pi _{c}\). As discussed at the beginning of Sect. 4, the total number of classes C needs be set by the researcher: the common empirical practice is to estimate several specifications that vary in C, and choose one that results in the best Bayesian Information Criterion (BIC).

The sample log-likelihood function, \(\ln L(\tau ,\varvec{\theta },\varvec{\pi })\), can be constructed in the usual manner by summing the log of each person’s likelihood, \(L_{n}(\tau ,\varvec{\theta },\varvec{\pi })\) in Eq. (13):

$$\begin{aligned} \ln L(\tau ,\varvec{\theta },\varvec{\pi })=\sum \limits _{n=1}^{N}\ln L_{n}(\tau ,\varvec{\theta },\varvec{\pi })=\sum \limits _{n=1}^{N}\ln \left( \sum \limits _{c=1}^{C}\pi _{c}L_{n|c}(\tau ,\varvec{\theta }_{c})\right) . \end{aligned}$$
(18)

As explained in detail around Eq. (12), the kernel function \(L_{n|c}(\tau ,\varvec{\theta }_{c})\) uses class c’s preference parameters to evaluate the baseline NROL model’s likelihood of observing person n’s actual sequence of rank orderings over choice scenarios. Since \(L_{n|c}(\tau ,\varvec{\theta }_{c})\) has a closed-form expression, the sample log-likehood \(\ln L(\tau ,\varvec{\theta },\varvec{\pi })\) also has a closed-form expression. It is therefore possible to proceed in the usual manner to compute the maximum likelihood estimates, say \(\{\tau ^{ML},\varvec{\theta }^{ML},\varvec{\pi }^{ML}\}\), by applying gradient-based optimization methods (e.g. Newton method or quasi-Newton methods like BFGS) to maximize \(\ln L(\tau ,\varvec{\theta },\varvec{\pi })\) with respect to \(\{\tau ,\varvec{\theta },\varvec{\pi }\}\). But, as Bhat (1997) and Train (2008) note in the context of latent class multinomial logit models, maximizing the sample log-likelihood of a finite mixture model tends to be susceptible to convergence failures which often arise as a numerical optimizer gets trapped in flat regions of the sample log-likelihood function.

It turns out that parametric values \(\{\tau ^{ML},\varvec{\theta }^{ML},\varvec{\pi }^{ML}\}\) that maximize \(\ln L(\tau ,\varvec{\theta },\varvec{\pi })\) should, in theory, also maximize another objective function \(Q(\tau ,\varvec{\theta },\varvec{\pi })\) and vice versa. The EM algorithm is an estimation strategy that aims at obtaining \(\{\tau ^{ML},\varvec{\theta }^{ML},\varvec{\pi }^{ML}\}\) by maximizing this alternative objective function which is specified as

$$\begin{aligned} Q(\tau ,\varvec{\theta },\varvec{\pi })= & {} \sum \limits _{n=1}^{N}\sum _{c=1}^{C}h_{nc}(\tau ,\varvec{\theta },\varvec{\pi })\times \left( \ln \pi _{c}+\ln L_{n|c}(\tau ,\varvec{\theta }_{c})\right) \nonumber \\= & {} \sum \limits _{n=1}^{N}\sum _{c=1}^{C}h_{nc}(\tau ,\varvec{\theta },\varvec{\pi })\ln \pi _{c}+\sum \limits _{n=1}^{N}\sum _{c=1}^{C} h_{nc}(\tau ,\varvec{\theta },\varvec{\pi })\ln L_{n|c}(\tau ,\varvec{\theta }_{c}). \end{aligned}$$
(19)

\(h_{nc}(\tau ,\varvec{\theta },\varvec{\pi })\) refers to person n’s posterior probability of membership in class c or \(h_{nc} \) defined in Eq. (14), but we change the notation slightly here to emphasize that it is derived from the set of parameters being estimated. \(Q(\tau ,\varvec{\theta },\varvec{\pi })\) may be interpreted as an expected complete data log-likelihood that views a set of indicators, \(1[\varvec{\theta }_{n}=\varvec{\theta }_{c}]\) for \(n=1,2,\ldots ,N\) and \(c=1,2,\ldots ,C\), as missing data. Maximizing \(Q(\tau ,\varvec{\theta },\varvec{\pi })\) is computationally easier and more numerically stable than maximizing \(\ln L(\tau ,\varvec{\theta },\varvec{\pi })\) directly because, as we shall summarize shortly, \(Q(\tau ,\varvec{\theta },\varvec{\pi })\) can be maximized with respect to \(\{\tau ,\varvec{\theta }\}\) and \(\varvec{\pi }\) in two separate tasks. While the use of the EM algorithm to estimate a finite mixture model is common outside discrete choice modeling too, it is Bhat (1997) who introduced this estimation strategy into the discrete choice modeling literature. Train (2008, (2009) masterfully summarizes the conceptual foundations and operational aspects of the EM algorithm for discrete choice models.

Our implementation of the EM algorithm builds on Bhat (1997) and Train (2008, (2009). Specifically, let superscript s denote candidate estimates obtained at the \(s^{th}\) iteration of this algorithm. Then, at iteration \(s+1 \), the estimates are updated as follows.

$$\begin{aligned} \{\tau ^{s+1},\varvec{\theta }^{s+1}\}= & {} \arg \max _{\{\tau ,\varvec{\theta }\}}\sum \limits _{n=1}^{N}\sum _{c=1}^{C}h_{nc}(\tau ^{s},\varvec{\theta }^{s},\varvec{\pi }^{s})\ln L_{n|c}(\tau ,\varvec{\theta }_{c}) \end{aligned}$$
(20)
$$\begin{aligned} \varvec{\pi }^{s+1}= & {} \arg \max _{\varvec{\pi }}\sum \limits _{n=1}^{N}\sum _{c=1}^{C}h_{nc}(\tau ^{s},\varvec{\theta }^{s},\varvec{\pi }^{s})\ln \pi _{c} \end{aligned}$$
(21)

Each person’s posterior class membership probabilities are evaluated at the sth estimates, thereby influencing computation of the \((s+1)\)th estimates only as any other type of known sampling weight would be. Both (20) and (21) therefore represent relatively simple maximization tasks. The algebraic structure of task (20) is just like that of estimating the baseline NROL model (not the LC-NROL model) using C years of data, allowing for year-specific coefficients and accounting for sampling weights: it can be readily solved by a maximum likelihood estimation program coded for the baseline NROL model. Our own program for estimating the baseline NROL model initially uses the BHHH method, a quasi-Newton method which is the default optimizer for maximum likelihood estimation in TSP International, and double-checks convergence by supplying the BHHH solution as starting values for executing the Newton method. The second updating task (21) is even easier to solve since it does not require any numerical optimization. An analytic solution to (21) can be derived and coded as

$$\begin{aligned} \pi _{c}^{s+1}=\frac{\sum _{n=1}^{N}h_{nc}(\tau ^{s},\varvec{\theta }^{s},\varvec{\pi }^{s})}{\sum _{n=1}^{N}\sum _{l=1}^{C} h_{nl}(\tau ^{s},\varvec{\theta }^{s},\varvec{\pi }^{s})}\quad \text {for}\quad c=1,2,\ldots ,C \end{aligned}$$
(22)

using that \(\pi _{C}\equiv 1-\sum \nolimits _{c=1}^{C-1}\pi _{c}\).

Once starting values are provided for initial estimates at \(s=0\), the EM algorithm proceeds by repeatedly updating the candidate estimates as above until \(\Delta \ln L^{s+1}=\ln L(\tau ^{s+1},\varvec{\theta }^{s+1},\varvec{\pi }^{s+1})-\ln L(\tau ^{s},\varvec{\theta }^{s},\varvec{\pi }^{s})\) is small enough. Our own program uses the following set of starting values. To initialize the dissimilarity coefficient and class shares, we assume no within-nest correlation and equal class sizes i.e. \(\tau ^{0}=1\) and \(\pi _{c}^{0}=1/C\) for all \(c=1,2,\ldots ,C\). To initialize class-specific preference parameters, \(\varvec{\theta }^{0}=(\varvec{\theta }_{1}^{0},\varvec{\theta }_{2}^{0},\ldots ,\varvec{\theta }_{C}^{0})\) , we randomly partition the sample into equally sized C subsamples and estimate the ROL model on each subsample: then, the ROL estimates from the \(c^{th}\,\)subsample are used as starting values for class c’s parameters, \(\varvec{\theta }_{c}^{0}\). Our program takes the \((s+1)^{th}\) estimates as the final estimates if \(\Delta \ln L^{s+1}\) is smaller than \(0.0001\,\%\) of \(\ln L(\tau ^{s},\varvec{\theta }^{s},\varvec{\pi }^{s})\): call these final estimates \(\{\tau ^{EM},\varvec{\theta }^{EM},\varvec{\pi }^{EM}\}\).

There are two drawbacks inherent in the EM algorithm as implemented here. First, as one may infer from the updating tasks (20) and (21), it does not produce valid standard errors of the final estimates, \(\{\tau ^{EM},\varvec{\theta }^{EM},\varvec{\pi }^{EM}\}\). Second, while \(\{\tau ^{EM},\varvec{\theta }^{EM},\varvec{\pi }^{EM}\}\) are equivalent to \(\{\tau ^{ML},\varvec{\theta }^{ML},\varvec{\pi }^{ML}\}\) in theory, they may diverge in practice since the EM algorithm may declare convergence prematurely, or even in case the model is empirically unidentified: its stopping criterion is based on \(\Delta \ln L^{s+1}\) and does not execute checks on the gradient and Hessian of \(\ln L(\tau ^{s+1},\varvec{\theta }^{s+1},\varvec{\pi }^{s+1})\). To obtain standard errors and double-check convergence, therefore, we have followed the hybrid estimation strategy of Bhat (1997) that uses \(\{\tau ^{EM},\varvec{\theta }^{EM},\varvec{\pi }^{EM}\}\) as starting values for direct maximization of \(\ln L(\tau ,\varvec{\theta },\varvec{\pi })\). The main manuscript reports the resulting estimates \(\{\tau ^{ML},\varvec{\theta }^{ML},\varvec{\pi }^{ML}\}\) and associated standard errors. In our experience, this direct maximization step always achieves convergence within a very small number of iterations, since the use of a stringent stopping criterion (\(0.0001\,\%\) change in the sample log-likelihood) ensures that \(\{\tau ^{EM},\varvec{\theta }^{EM},\varvec{\pi }^{EM}\}\) are almost identical to \(\{\tau ^{ML},\varvec{\theta }^{ML},\varvec{\pi }^{ML}\}\) in practice as well as in theory. Since starting values \(\{\tau ^{EM},\varvec{\theta }^{EM},\varvec{\pi }^{EM}\}\) tend to be close to the final solution \(\{\tau ^{ML},\varvec{\theta }^{ML},\varvec{\pi }^{ML}\} \), the use of a quasi-Newton method does not bring in practical benefits and our own program immediately uses the Newton method for this direct maximization step.

Appendix 2

See Tables 6 and 7.

Table 6 Comparison of LC-NROL with mixed ROL estimation results
Table 7 Individual characteristics and definitions

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oviedo, J.L., Yoo, H.I. A Latent Class Nested Logit Model for Rank-Ordered Data with Application to Cork Oak Reforestation. Environ Resource Econ 68, 1021–1051 (2017). https://doi.org/10.1007/s10640-016-0058-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10640-016-0058-7

Keywords

JEL Classification

Navigation