Skip to main content
Log in

On the design of optimal health insurance contracts under ex post moral hazard

The Geneva Risk and Insurance Review Aims and scope Submit manuscript

Abstract

We analyze the design of optimal medical insurance under ex post moral hazard, i.e., when illness severity cannot be observed by insurers and policyholders decide for themselves on their health expenditures. The trade-off between ex ante risk sharing and ex post incentive compatibility is analyzed in an optimal revelation mechanism under hidden information and risk aversion. The optimal contract provides partial insurance at the margin, with a deductible when insurers’ rates are affected by a positive loading, and it may also include an upper limit on coverage. The potential to audit the health state leads to an upper limit on out-of-pocket expenses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Blomqvist (1997) argues that the indemnity schedule is S-shaped, with marginal coverage increasing for small expenses and decreasing for large expenses. As we will see, this conclusion is not valid when bunching and limit conditions are adequately taken into account.

  2. Bunching may also occur in adverse selection principal–agent models with risk-averse agents—Salanié (1990) and Laffont and Rochet (1998)—and in the Mirrlees’ optimal income tax model— Lollivier and Rochet (1983), Weymark (1986) and Ebert (1992).

  3. It is well known that optimal insurance contracts may include a deductible because of transaction costs (Arrow 1963), ex ante moral hazard (Holmström 1979) or costly state verification (Townsend 1979). Drèze and Schokkaert (2013) extend Arrow’s theorem of the deductible to the case of ex post moral hazard. Although ceilings on coverage are widespread, they have been justified by arguments that are much more specific: either the insurer’s risk aversion for large risks and regulatory constraints (Raviv 1979), or bankruptcy rules (Huberman et al. 1983) or the auditor’s risk aversion in costly state verification models (Picard 2000).

  4. A straight deductible contract, i.e., full coverage of losses above a deductible, is optimal when effort affects the probability of an accident, but not the probability distribution of losses, conditionally on the occurrence of an accident.

  5. See, for instance, the description of the health insurance plans in the Affordable Care Act at https://www.healthcare.gov/health-plan-information/.

  6. See, in particular, the references provided by Ellis et al. (2015) on multiple health treatment goods, correlated sources of health uncertainty and trade-off between treatment and prevention, and by Pflum (2015) on physician incentives.

  7. Regarding the empirical analysis of utility functions that depend on health status, see particularly Viscusi and Evans (1990), Evans and Viscusi (1991) and Finkelstein et al. (2013).

  8. For notational simplicity, we assume that there is no probability weight at the no-sickness state \(x=0\), but the model could easily be extended in that direction.

  9. In addition to being realistic, assuming that I(m) is non-decreasing is not a loss of generality if policyholders can claim insurance payment for only a part of their medical expenses: in that case, only the increasing part of their indemnity schedule would be relevant. Piecewise differentiability means that I(m) has only a finite number of non-differentiability points, which includes the indemnity schedule features that we may have in mind, in particular those with a deductible, a rate of coinsurance, or an upper limit on coverage. \(I(0)=0\) corresponds to the way insurance works in practice, but it also acts as a normalization device. Indeed, replacing contract \(\{I(m),P\}\) by \(\{I(m)+k,P+k\}\) with \(k>0\), would not change the net transfer \(I(m)-P\) from insurer to insured, hence an indeterminacy of the optimal solution. This indeterminacy vanishes if we impose \(I(0)=0\).

  10. Our notations are presented by presuming that policyholders pay m (i.e., the total cost of medical services) and they receive the insurance indemnity I(m). However, we may also assume that the insurer and policyholders, respectively, pay I(m) and \(m-I(m)\) to medical service providers. Both interpretations correspond to different institutional arrangements, and both are valid in our analysis.

  11. We use Lemma 1-(ii) to restrict attention to functions \(\widehat{I}(x)\) and m(x) that are continuous. Furthermore, \(\widehat{I}(x)\) and m(x) are piecewise differentiable because I(m) is piecewise differentiable. This allows us to use Pontryagin’s principle in the proof of Proposition 1. In this proof, it is shown that the optimal revelation mechanism is such that \(\widehat{I}^{\prime }(x)\ge 0\). Since \(m^{\prime }(x)\ge 0\), the optimal mechanism will be generated by a non-decreasing indemnity schedule I(m), as we have assumed. Note that Blomqvist (1997) studies a similar optimization problem, but he wrongly ignores the second-order conditions (8) and the sign conditions (9). Nor does he fully consider the technical implications of the assumption \(v^{\prime }(0)=+\infty\), in the absence of which we would have a corner solution with \(m(x)=0\) for x small.

  12. Note the relationship of Proposition 1 with optimal insurance under (ex ante) moral hazard when effort affects the distribution of losses should an accident occur, but not the probability of the accident itself. In that case, it may be optimal to fully cover small losses without a deductible. See Rees and Wambach (2008).

  13. This is the case, for instance, if the distribution of x is uniform or exponential.

  14. In more technical terms, we may define the value function \(v(I_{0},m_{0},x)\) to be the greatest expected utility over [xa], with unchanged insurance expected cost, if we start at \(\widehat{I}(x)=I_{0} ,m(x)=m_{0}\). The vector of costates \((\mu _{1}(x),\mu _{2}(x))\) is the gradient at x of the value function, evaluated along the optimal trajectory.

  15. \(\varphi (x)\) is called a “switching function” in the optimal control terminology, because its sign determines the sign of the control.

  16. These conditions can be deduced from the trajectories of \(\mu _{1}(x)\) and \(\mu _{2}(x)\).

  17. The proofs do not require this assumption.

  18. A similar but more complex argument is used in the proof of Proposition 2 to show that bunching cannot occur in intervals interior to [0, a].

  19. In practice, the optimal policy could be approximated by a piecewise linear schedule with slope between 0 and 1 until the upper limit \(\overline{m}\) and with a capped indemnity when \(m>\overline{m}\). It would be interesting to estimate the welfare loss associated with this piecewise linearization. The simulations presented in Sect. 3.3 suggest that this loss may be low.

  20. The same intuition is at work to show that \(\widehat{I}^{\prime }(x)>0\) when x is close to zero, and thus that the indemnity schedule should not include a deductible, with additional technical specificities induced by the sign constraint \(\widehat{I}(x)\ge 0\).

  21. We use the Bocop software (see Bonnans et al. 2016 and http://bocop.org). We refer the reader to ‘Computational approach’ in Appendix 2 and, for instance, to Betts (2001) and Nocedal and Wright (1999) for more details on direct transcription methods and non-linear programming algorithms.

  22. Note that f(a) and \(f^{\prime }(a)\) are close to 0 when a is large.

  23. More generally, the insurer could randomly audit claims, the probability of triggering an audit depending on the size of the claim. See the references in Picard (2013) on deterministic and random auditing for insurance claims.

  24. The policyholder is subject to prior authorization for increasing her medical expenses above \(m^{*}\). After auditing the health state, this authorization will be granted but capped by m(x) if \(x>x^{*}\), and otherwise it will be denied.

  25. Since an upward discontinuity of I(m) at \(m=m^{*}\) dominates the optimal solution when I(m) is constrained to be continuous, increasing I(m) as much as possible in a small interval \((m^{*},m^{*}+\varepsilon )\) would bring the continuous function I(m) arbitrarily close to this discontinuous function. No optimal solution would exist in the set of continuous functions I(m). Thus, in addition to being realistic from an empirical point of view, the assumption \(I^{\prime }(m)\le 1\) if \(m\ge m^{*}\) eliminates this reason for which an optimal solution may not exist. As previously shown, we have \(I^{\prime }(m)<1\) in the no-audit regime where \(m<m^{*}\).

  26. If \(c=0\), then the first-best allocation would be feasible with \(x^{*}=0\), that is by auditing the health state in all possible cases. Thus, choosing \(x^{*}\) smaller than a is optimal when c is not too large, and this is what we assume in what follows.

  27. See Gollier (1987) and Bond and Crocker (1997) for similar results; see also Picard (2013) for a survey on deterministic auditing in insurance fraud models. Lemma 2 also characterizes the optimal health expenses profile m(x) when there is auditing and full insurance at the margin (that is when \(x>\widehat{x}\)): we have \(m^{\prime }(x)=-v^{\prime }(m(x))/xv^{\prime \prime }(m(x))\), which means that the increase in health expenses which follows a unit increase in the illness severity x is equal to the inverse of the elasticity of the marginal efficiency of health expenses \(v^{\prime }(m(x))\). Equivalently, the marginal utility of health care expenses \(\gamma xv^{\prime }(m(x))\) should remain constant in the auditing regime.

  28. Of course, this discontinuity of function m(x) at \(x=x^{*}\) is compatible with a continuous function I(m).

  29. The bunching of types is no longer sustained by a kink in the indemnity schedule I(m) at \(m=\overline{m}\), but by the threat of an audit, since increasing expenses above \(\overline{m}\) will not be possible if \(x\le x^{*}\).

  30. An example is when the individual may lose a part of her business or wage income when her health level deteriorates.

  31. If \(\varepsilon\) is continuously distributed, then \(G_{\varepsilon }^{\prime }(\varepsilon \mid x)>0\) is the density of \(\varepsilon\) conditionally on x.

  32. In Fig. 5-top, indifference curves for \(x=7\) and 9 almost coincide. Figure 5-bottom shows that \(\overline{m}\) decreases when k increases, with a decrease in the upper limit of the insurance indemnity \(I(\overline{m})\). There is bunching only when \(k>0\) since Fig. 5 corresponds to the case of uniform distribution.

  33. Henceforth, we assume there is no background risk.

  34. \(U_{HR}^{\prime \prime }>0\) is assumed for the sake of simplicity. Lemma 3 is valid under more general conditions that are compatible with \(U_{HR}^{\prime \prime }\le 0\).

  35. Thus, utility is CRRA w.r.t. wealth. Parameters are \(\alpha =2,\beta =0.5,b_{0}=0.01,\) and \(b=1\).

  36. Figure 6-bottom adds a background risk and a loading factor, and it illustrates the optimality of a deductible, as shown in Sect. 5.3.

  37. Figures 8 and 9 correspond to the assumptions made in Sect. 3.3 without loading or background risk, when the distribution of x is exponential.

  38. The indifference curves are drawn for \(x=2\) in Fig. 9.

  39. See Picard (2016) for a case with linear coinsurance where this effect of wealth on the coinsurance rate vanishes completely.

  40. This is just an assumption made for illustrative purposes.

  41. The relevant values are such that \(m<1\), and thus \(q(m)=m^{\alpha }<m\).

  42. This corner solution is induced by the non-concavity of \(v(m^{\alpha })\) when \(\alpha =3\) and 4.

  43. For the sake of illustration, see for instance Kaiser Family Foundation (2009) for France, Germany, and Switzerland, and http://www.healthcare.gov for the ObamaCare Marketplace in the US.

  44. A similar proof applies to the case of leftward discontinuity.

  45. In optimal control problems with state variable constraints, the costate variable may be discontinuous at junctions between regimes where the constraint is binding or not binding; see for instance Sect. 7.6 in Beavis and Dobbs (1991). Here, \(\mu _{1}(x)\) may be discontinuous at junction points between intervals where \(\widehat{I}(x)=0\) and intervals where \(\widehat{I}(x)>0\). The proof is almost the same if the junction point is such that \(\widehat{I}(x)>0\) if \(x\in (x_{0}-\varepsilon ,x_{0}]\) and \(\widehat{I}(x)=0\) if \(x\in (x_{0} ,x_{0}+\varepsilon )\).

  46. Note that (26) and \(\mu _{1}(x)=\) \(\mu _{1}^{\prime }(x)=0\) for all \(x\in [0,x_{0}]\) imply that \(\delta (x)\) is continuous in this interval.

  47. Step 3 in the proof of Proposition 1 shows that \(\mu _{1}(x)>0\) for all \(x\in (0,a)\).

  48. We assume w.l.o.g. that h(x) is continuous at \(x=a\).

  49. We can straightforwardly check that (8) is not binding in this subproblem.

  50. On can check that \(A_{\widetilde{x}}^{\prime }{}_{\left| \widetilde{x}=x\right. }>0\) if \(U_{H}^{\prime }U_{RH}^{\prime \prime }-U_{R}^{\prime }U_{H^{2}}^{\prime \prime } >0\), which holds when \(U_{RH}^{\prime \prime }>0,U_{H^{2}}^{\prime \prime }<0\) as postulated, but which is also compatible with \(U_{RH}^{\prime \prime }<0\).

  51. Note however, that we may have \(I^{\prime }(D_{+})=0\) as illustrated in Figs. 6 (bottom) and 7.

References

  • Arrow, K.J. 1963. Uncertainty and the welfare economics of medical care. American Economic Review 53: 941–973.

    Google Scholar 

  • Arrow, K.J. 1968. The economics of moral hazard: Further comment. American Economic Review 58: 537–539.

    Google Scholar 

  • Arrow, K.J. 1971. Essays in the Theory of Risk Bearing. Chicago: Markham Publishing.

    Google Scholar 

  • Arrow, K.J. 1976. Welfare analysis of changes in health co-insurance rates. In The Role of Health Insurance in the Health Services Sector, ed. R. Rosett, 3–23. New York: NBER.

    Google Scholar 

  • Beavis, B., and I. Dobbs. 1991. Optimization and Stability Theory for Economic Analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  • Betts, J.T. 2001. Practical Methods for Optimal Control Using Nonlinear Programming. Philadelphia: Society for Industrial and Applied Mathematics (SIAM).

    Google Scholar 

  • Blomqvist, A. 1997. Optimal non-linear health insurance. Journal of Health Economics 16: 303–321.

    Article  Google Scholar 

  • Bond, E., and K.J. Crocker. 1997. Hardball and the soft touch: the economics of optimal insurance contracts with costly state verification and endogenous monitoring costs. Journal of Public Economics 63: 239–264.

    Article  Google Scholar 

  • Bonnans, F., D. Giorgi, V. Grelard, B. Heymann, S. Maindrault, P. Martinon, and O. Tissot. 2016. Bocop—A Collection of Examples. Technical Report, INRIA.

  • Cutler, D.M., and R.J. Zeckhauser. 2000. The anatomy of health insurance. In Handbook of Health Economics, vol. 1, ed. A. Culyer, and J.P. Newhouse, 563–643. Amsterdam: North-Holland.

    Google Scholar 

  • Drèze, J.H., and E. Schokkaert. 2013. Arrow’s theorem of the deductible: moral hazard and stop-loss in health insurance. Journal of Risk and Uncertainty 47 (2): 147–163.

    Article  Google Scholar 

  • Ebert, U. 1992. A reexamination of the optimal nonlinear income tax. Journal of Public Economics 49: 47–73.

    Article  Google Scholar 

  • Ellis, R.P., S. Jiang, and W.G. Manning. 2015. Optimal health insurance for multiple goods and time periods. Journal of Health Economics 41: 89–106.

    Article  Google Scholar 

  • Evans, W.N., and W.K. Viscusi. 1991. Estimation of state dependent utility functions using survey data. Review of Economics and Statistics 73: 94–104.

    Article  Google Scholar 

  • Feldman, R., and B. Dowd. 1991. A new estimate of the welfare loss of excess health insurance. American Economic Review 81: 297–301.

    Google Scholar 

  • Feldstein, M. 1973. The welfare loss of excess health insurance. Journal of Political Economy 81: 251–280.

    Article  Google Scholar 

  • Feldstein, M., and B. Friedman. 1977. Tax subsidies, the rational demand for insurance and the health care crisis. Journal of Public Economics 7: 155–178.

    Article  Google Scholar 

  • Finkelstein, A., E.F.P. Luttmer, and M.J. Notowidigdo. 2013. What good is wealth without health? The effect of health on the marginal utility of consumption. Journal of the European Economic Association 11: 221–258.

    Article  Google Scholar 

  • Gollier, C. 1987. Pareto-optimal risk sharing with fixed cost per claim. Scandinavian Actuarial Journal 13: 62–73.

    Article  Google Scholar 

  • Holmström, B. 1979. Moral hazard and observability. Bell Journal of Economics 10: 74–91.

    Article  Google Scholar 

  • Huberman, G., D. Mayers, and C.W. Smith Jr. 1983. Optimum insurance policy indemnity schedules. Bell Journal of Economics 14: 415–426.

    Article  Google Scholar 

  • Kaiser Family Foundation. 2009. Cost Sharing for Health Care: France, Germany, and Switzerland. Menlo Park, CA: The Henry J. Kaiser Family Foundation.

  • Laffont, J.-J., and J.-C. Rochet. 1998. Regulation of a risk-averse firm. Games and Economic Behavior 25: 149–173.

    Article  Google Scholar 

  • Lollivier, S., and J.-C. Rochet. 1983. Bunching and second-order conditions: A note on optimal tax theory. Journal of Economic Theory 31 (2): 392–400.

    Article  Google Scholar 

  • Ma, C.T.A., and M. Riordan. 2002. Health insurance, moral hazard, and managed care. Journal of Economics and Management Strategy 11: 81–107.

    Article  Google Scholar 

  • Nocedal, J., and S.J. Wright. 1999. Numerical Optimization. New-York: Springer-Verlag.

    Book  Google Scholar 

  • Pauly, M. 1968. The economics of moral hazard: Comment. American Economic Review 58: 531–537.

    Google Scholar 

  • Pflum, K.E. 2015. Physician incentives and treatment choices. Journal of Economics and Management Strategy 24: 712–751.

    Article  Google Scholar 

  • Picard, P. 2000. On the design of optimal insurance policies under manipulation of audit cost. International Economic Review 41 (4): 1049–1071.

    Article  Google Scholar 

  • Picard, P. 2013. Economic analysis of insurance fraud. In Handbook of Insurance, 2nd ed, ed. G. Dionne, 349–395. New York: Springer.

    Chapter  Google Scholar 

  • Picard, P. 2016. A note on health insurance under ex post moral hazard. Risks 4 (38): 1–9.

    Google Scholar 

  • Raviv, A. 1979. The design of an optimal insurance policy. American Economic Review 69: 854–896.

    Google Scholar 

  • Rees, R., and A. Wambach. 2008. The microeconomics of insurance. Foundations and Trends in Microeconomics 4 (1–2): 1–163.

    Article  Google Scholar 

  • Salanié, B. 1990. Sélection adverse et aversion pour le risque. Annales d’Economie et de Statistiques 18: 131–150.

    Article  Google Scholar 

  • Townsend, R. 1979. Optimal contracts and competitive markets with costly state verification. Journal of Economic Theory 21: 265–293.

    Article  Google Scholar 

  • Viscusi, W.K., and W.N. Evans. 1990. Utility functions that depend on health status: Estimates and economic implications. American Economic Review 80: 353–374.

    Google Scholar 

  • Wächter, A., and L.T. Biegler. 2006. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Mathematical Programming 106 (1): 25–57.

    Article  Google Scholar 

  • Walther, A., and A. Griewank. 2012. Getting started with adol-c. In Combinatorial Scientific Computing. Chapman-Hall CRC Computational Science, ed. U. Naumann, and O. Schenk. Boca Raton: CRC Press.

    Google Scholar 

  • Weymark, J.A. 1986. A reduced-form optimal nonlinear income tax problem. Journal of Public Economics 30 (2): 199–217.

    Article  Google Scholar 

  • Winter, R.A. 2013. Optimal insurance contracts under moral hazard. In Handbook of Insurance, ed. G. Dionne, 205–230. Second Edition: Springer.

    Chapter  Google Scholar 

  • Zeckhauser, R. 1970. Medical insurance: A case study of the tradeoff between risk spreading and appropriate incentives. Journal of Economic Theory 2: 10–26.

    Article  Google Scholar 

Download references

Acknowledgements

Pierre Picard gratefully acknowledges financial support from LabEX ECODEC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Picard.

Appendices

Appendix 1

Proof of Lemma 1

Step 1:

There exists an optimal revelation mechanism.

Let us change variables by denoting \(A(x)=u(w-P+\widehat{I}(x)-m(x))\) and \(B(x)=v(m(x))\). The incentive compatibility constraints and the insurer’s break-even constraint are, respectively, rewritten as

$$A(x)+\gamma xB(x)\ge A(\widetilde{x})+\gamma xB(\widetilde{x})\text { for all }x,\widetilde{x},$$
(21)
$$w\ge \int \nolimits _{0}^{a}[u^{-1}(A(x))+v^{-1}(B(x)]f(x){\rm{d}}x.$$
(22)

Furthermore, \(\widehat{I}(0)=m(0)=0\) gives \(A(0)=u(w-P)\) and \(B(0)=0\). Let \(\mathcal {S}\) be the subset of functions A(.), B(.) that belong to the Banach space \(\mathcal {L}^{\infty }([0,1],\mathbb {R}\times [0,1])\) with the sup norm topology \(\parallel .\parallel _{\infty }\) and that satisfy (21),(22), and \(B(0)=0\). Hence, \(\mathcal {S}\) is closed and convex, and furthermore \(\parallel (A(.),B(.))\parallel _{\infty }\le u(w)\) for all \((A(.),B(.))\in \mathcal {S}\). Let

$$J=\int \nolimits _{0}^{a}\{A(x)+h_{0}-\gamma x[1-B(x)]\}f(x){\rm{d}}x.$$

J is a linear (and thus weakly concave) function of A(.), B(.). Hence, it reaches a maximum in \(\mathcal {S}\), which proves the existence of an optimal incentive compatible mechanism, with \(P=w-u^{-1}(A(0))\)

Step 2:

For any incentive compatible mechanism, m(x) and \(\widehat{I}(x)\) are non-decreasing.

Incentive compatibility implies

$$u(w-P-m(x)+\widehat{I}(x))-u(w-P-m(\widetilde{x})+\widehat{I}(\widetilde{x} ))\ge \gamma x[v(m(\widetilde{x}))-v(m(x))],$$

and, reversing the roles of x and \(\widetilde{x},\)

$$u(w-P-m(x)+\widehat{I}(x))-u(w-P-m(\widetilde{x})+\widehat{I}(\widetilde{x} ))\le \gamma \widetilde{x}[v(m(\widetilde{x}))-v(m(x))].$$

We deduce \((\widetilde{x}-x)[v(m(\widetilde{x}))-v(m(x))]\ge 0\) for all \(x,\widetilde{x}\), which implies that m(.) is non-decreasing. Since I(.) is non-decreasing, \(\widehat{I}(.)\equiv I(m(.))\) is also non-decreasing.

Step 3:

For any optimal revelation mechanism, m(.) and \(\widehat{I}(.)\) are continuous.

Let \(\{m_{0}(.),\widehat{I}_{0}(.)\}\) be an optimal incentive compatible revelation mechanism and suppose that \(m_{0}(.)\) is rightward discontinuousFootnote 44 at \(x_{*}\in (0,a)\), with \(m_{0}(x)\rightarrow m_{0} (x_{*})+\Delta _{m}\) and \(\widehat{I}_{0}(x)\rightarrow \widehat{I} _{0}(x_{*})+\Delta _{I}\), when \(x\rightarrow x_{*},x>x_{*}\), with \(\Delta _{m}>0\) and \(\Delta _{I}\ge 0\). Incentive compatibility implies that a type \(x_{*}\) individual is indifferent between \(m_{0}(x_{*} ),\widehat{I}_{0}(x_{*})\) and \(m_{0}(x_{*})+\Delta _{m},\widehat{I} _{0}(x_{*})+\Delta _{I}\). If \(\Delta _{I}=0\), since I(m) is non-decreasing, it remains constant when \(m\in [m_{0}(x_{*}),m_{0}(x_{*} )+\Delta _{m}]\). Using the concavity of \(m\rightarrow u(w-P-m+\widehat{I} _{0}(x_{*}))+\gamma x_{*}v(m)\) then shows that the type \(x_{*}\) individual reaches a higher expected utility by choosing \(m\in (m_{0}(x_{*}),m_{0}(x_{*})+\Delta _{m})\) than by choosing \(m_{0}(x_{*})\), hence a contradiction. Thus, we have \(\Delta _{I}>0\).

Since \(\widehat{I}_{0}(x)\) is piecewise continuous, there exists \(\theta >0\) such that \(\widehat{I}_{0}(x)-\widehat{I}_{0}(x_{*})\ge \Delta _{I}/2\) for all \(x\in (x_{*},x_{*}+\theta )\). Consider another revelation mechanism \(\{m_{1}(.),\widehat{I}_{1}(.)\}\) defined by

(i) If \(x\in (x_{*},x_{*}+\theta )\), let \(m_{1}(x)=m_{1}^{*}\) and \(\widehat{I}_{1}(x)=I_{1}^{*}\) close to \(m_{0}(x_{*})\) and \(\widehat{I}_{0}(x_{*})\), respectively, with \(\widehat{I}_{0} (x)-I_{1}^{*}\ge \Delta _{I}/4\), and such that

$$u(w-P-m_{1}^{*}+I_{1}^{*})+\gamma xv(m_{1}^{*})\ge u(w-P-m_{0} (x)+\widehat{I}_{0}(x))+\gamma xv(m_{0}(x)),$$

for all \(x\in (x_{*},x_{*}+\theta )\), and

$$u(w-P-m_{1}^{*}+I_{1}^{*})+\gamma xv(m_{1}^{*})<u(w-P-m_{0} (x)+\widehat{I}_{0}(x))+\gamma xv(m_{0}(x)),$$

if \(x\le x_{*}\).

(ii) If \(x\notin (x_{*},x_{*}+\theta )\), then \(m_{1}(x)\equiv\) \(m_{0}(x)\) and \(\widehat{I}_{1}(x)\equiv \widehat{I}_{0}(x)\).

Let \(\widetilde{x}_{1}(x)\) be an optimal report of a type x policyholder in \(\{m_{1}(.),\widehat{I}_{1}(.)\}\), with \(\widetilde{x}_{1}(x)=x\) for all \(x\in [0,x_{*}+\theta )\), and let \(\{m_{2}(.),\widehat{I}_{2}(.)\}\) be the incentive compatible revelation mechanism defined by \(m_{2}(x)\equiv m_{1}(\widetilde{x}_{1}(x)),\widehat{I}_{2}(x)\equiv \widehat{I}_{1} (\widetilde{x}_{1}(x))\). For P unchanged, the policyholder’s expected utility is higher for \(\{m_{2}(.),\widehat{I}_{2}(.)\}\) than for \(\{m_{0}(.),\widehat{I}_{0}(.)\}\). Furthermore, \(\widehat{I}_{2} (x)=\widehat{I}_{0}(x)\) if \(x<x^{*}\), \(\widehat{I}_{2}(x)=I_{1}^{*}<\widehat{I}_{0}(x)-\Delta _{I}/4\) if \(x_{*}\le x<x_{*}+\theta\) and \(\widehat{I}_{2}(x)\le \widehat{I}_{0}(x)\) if \(x\ge x_{*}+\theta\). Hence, \(\{m_{2}(.),\widehat{I}_{2}(.)\}\) is feasible with P unchanged, which contradicts the optimality of \(\{m_{0}(.),\widehat{I}_{0}(.)\}\).

Step 4:

(4) and (5) are necessary and sufficient conditions for a continuous revelation mechanism to be incentive compatible.

Local first-order and second-order incentive compatibility conditions for type x are written, respectively, as

$$\frac{\partial V(x,\widetilde{x})}{\partial \widetilde{x}}\left| _{\widetilde{x}=x}\right. =0,$$
(23)
$$\frac{\partial ^{2}V(x,\widetilde{x})}{\partial \widetilde{x}^{2}}\left| _{\widetilde{x}=x}\right. \le 0,$$
(24)

at any point of differentiability. (23) and (24) are necessary conditions for incentive compatibility. We have

$$\frac{\partial V(x,\widetilde{x})}{\partial \widetilde{x}}=u^{\prime }(R(\widetilde{x}))[\widehat{I}^{\prime }(\widetilde{x})-m^{\prime }(\widetilde{x})]+\gamma xv^{\prime }(m(\widetilde{x}))m^{\prime } (\widetilde{x}),$$

and thus (23) yields (4).

Since (4) should hold for all \(x\in [0,a]\), a simple calculation yields

$$\frac{\partial ^{2}V(x,\widetilde{x})}{\partial \widetilde{x}^{2}}\left| _{\widetilde{x}=x}\right. =-\gamma v^{\prime }(m(x))m^{\prime }(x),$$

and thus (24) gives (5).

Conversely, assume (4) and (5) hold. (4) gives

$$\frac{\partial V(x,\widetilde{x})}{\partial \widetilde{x}}=\gamma (x-\widetilde{x})v^{\prime }(m(\widetilde{x}))m^{\prime }(\widetilde{x}).$$

Using (5) then shows that \(\partial V(x,\widetilde{x})/\partial \widetilde{x} \le 0\) if \(\widetilde{x}>x\) and \(\partial V(x,\widetilde{x})/\partial \widetilde{x}\ge 0\) if \(\widetilde{x}<x\), which implies incentive compatibility. \(\square\)

Proof of Proposition 1

Let \(\mu _{1}(x)\) and \(\mu _{2}(x)\) be costate variables for \(\widehat{I}(x)\) and m(x), respectively, and let \(\lambda\) and \(\delta (x)\) be Lagrange multipliers, respectively, for (2) and (9). The Hamiltonian is written as

$$\begin{aligned} \mathcal {H} =\,[u(R(x))+\gamma xv(m(x))]f(x)+\mu _{1}(x)h(x)\left[ 1-\frac{\gamma xv^{\prime }(m(x))}{u^{\prime }(R(x))}\right] \\&+\mu _{2}(x)h(x)-\lambda \widehat{I}(x)f(x)+\delta (x)\widehat{I}(x). \end{aligned}$$

The optimality conditions are

$$\varphi (x)\equiv \mu _{1}(x)\left[ 1-\frac{\gamma xv^{\prime }(m(x))}{u^{\prime }(R(x))}\right] +\mu _{2}(x)\le 0,=0\text { \ if \ }h(x)>0,$$
(25)
$$\mu _{1}^{\prime }(x)=[\lambda -u^{\prime }(R(x))]f(x)-\mu _{1}(x)h(x)\gamma x\frac{v^{\prime }(m(x))u^{\prime \prime }(R(x))}{u^{\prime }(R(x))^{2}}-\delta (x),$$
(26)
$$\begin{aligned}&\left. \mu _{2}^{\prime }(x)=[u^{\prime }(R(x))-\gamma xv^{\prime }(m(x))]f(x)\right. \nonumber \\&\qquad\quad\left. +\,\mu _{1}(x)h(x)\gamma x\left[ \frac{v^{\prime \prime }(m(x))u^{\prime }(R(x))+v^{\prime }(m(x))u^{\prime \prime }(R(x))}{u^{\prime }(R(x))^{2}}\right] ,\right. \end{aligned}$$
(27)
$$\mu _{1}(a)=\mu _{2}(a)=0,$$
(28)
$$\lambda -{\displaystyle \int _{0}^{a}}\left[ u^{\prime }(R(x))f(x)+\mu _{1}(x)h(x)\gamma x\frac{v^{\prime }(m(x))u^{\prime \prime }(R(x))}{u^{\prime }(R(x))^{2}}\right] {\rm{d}}x=0,$$
(29)

with \(\delta (x)\ge 0\) and \(\delta (x)=0\) if \(\widehat{I}(x)>0\). A tedious but straightforward calculation using (26) and (27) leads to

$$\varphi ^{\prime }(x)=\left[ \lambda f(x)-\delta (x)\right] \left[ 1-\frac{\gamma xv^{\prime }(m(x))}{u^{\prime }(R(x))}\right] -\gamma \mu _{1}(x)\frac{v^{\prime }(m(x))}{u^{\prime }(R(x))}.$$
(30)

We also have \(R^{\prime }(x)=\widehat{I}^{\prime }(x)-m^{\prime }(x)=-\gamma xh(x)v^{\prime }(m(x))/u^{\prime }(R(x))\le 0\). Thus, R(x) is non-increasing, and it is decreasing when \(h(x)>0\). The remaining part of the proof is in five steps.

Step 1:

\(m(x)>0\) for all \(x>0\).

Since \(m(0)=0\) and m(x) is non-decreasing, there exists \(\underline{x} \in [0,a]\) such that \(m(x)>0\) if and only if \(x>\underline{x}\). Suppose \(\underline{x}>0\), which implies \(h(x)=0\) over \([0,\underline{x}]\). Using \(\widehat{I}(0)=0\) and (6) gives \(\widehat{I}(x)=0\) for all \(x\in [0,\underline{x}]\). Let

$$\widehat{m}(x)\equiv \underset{\widetilde{m}\ge 0}{\arg \text { max} }\{u(w-P-\widetilde{m})+\gamma xv(\widetilde{m})\},$$
(31)

with \(\widehat{m}(x)>0\) for all \(x>0\). Define \(m_{0}(x)=\widehat{m} (x),I_{0}(x)=0\) if \(x\le \underline{x}\) and \(m_{0}(x)=m(x),I_{0} (x)=\widehat{I}(x)\) if \(x>\underline{x}\), and

$$x_{0}(x)\in \underset{\widetilde{x}\in [0,a]}{\arg \max }\{u(w-P-m_{0} (\widetilde{x})+I_{0}(\widetilde{x}))+xv(m_{0}(\widetilde{x}))\}.$$

The revelation mechanism \(m_{1}(.),\widehat{I}_{1}(.)\) defined by \(m_{1}(x)\equiv m_{0}(x_{0}(x))\) and \(\widehat{I}_{1}(x)\equiv I_{0} (x_{0}(x))\) is incentive compatible and it dominates the supposed optimal mechanism \(m(.),\widehat{I}(.)\)—i.e., it provides a higher expected utility to the policyholder and its expected profit is non-negative for P unchanged—, hence a contradiction. Thus, \(\underline{x}=0\).

Step 2:

\(\mu _{1}(x)\) is continuous in [0, a] with \(\mu _{1}(x)=0\) if\(\widehat{I}(x)=0\).

Let \(x_{0}\in (0,a)\) be a junction point such that \(\widehat{I}(x)=0\) if \(x\in (x_{0}-\varepsilon ,x_{0}]\) and \(\widehat{I}(x)>0\) if \(x\in (x_{0} ,x_{0}+\varepsilon )\), with \(0<\varepsilon <x_{0}\).Footnote 45

Using the same argument as in Step 1 shows that \(h(x)>0\) in \((x_{0} -\varepsilon ,x_{0})\). Let \(x\in (x_{0}-\varepsilon ,x_{0})\). Using \(h(x)>0,\widehat{I}^{\prime }(x)=0\) and (6) gives \(u^{\prime }(R(x))=\gamma xv^{\prime }(m(x))\). Then, \(\varphi (x)=0\) gives \(\mu _{2}(x)=0\) and thus \(\mu _{2}^{\prime }(x)=0\) for all \(x\in (x_{0}-\varepsilon ,x_{0}]\). Equation (30) implies \(\mu _{1}(x)=0\) for all \(x\in (x_{0}-\varepsilon ,x_{0})\), and this is true, more generally, for all \(x\in [0,a]\) such that \(\widehat{I}(x)=0\).

Let \(x\in (x_{0},x_{0}+\varepsilon )\). \(\widehat{I}(x)\) is locally increasing over \((x_{0},x_{0}+\varepsilon )\) and thus \(\widehat{I}^{\prime }(x)>0\) and \(h(x)>0\) (at least for \(\varepsilon\) small enough). Thus, we have \(\delta (x)=\varphi (x)=\varphi ^{\prime }(x)=0\) for all \(x\in (x_{0} ,x_{0}+\varepsilon )\). Since R(x) and m(x) are continuous functions and \(u^{\prime }(R(x_{0}))=\gamma x_{0}v^{\prime }(m(x_{0}))\), we have \(u^{\prime }(R(x))-\gamma xv^{\prime }(m(x))\rightarrow 0\) when \(x\searrow x_{0}\). Using (30) then gives \(\mu _{1}(x_{0})_{+}=0\). Thus, \(\mu _{1}(x)\) is continuous at \(x_{0}\).

Step 3:

\(\mu _{1}(x)\ge 0\) for all\(x\in [0,a]\).

Integrating \(\mu _{1}^{\prime }(x)\) given by (26) and using (28) and (29) give

$$\mu _{1}(0)=\int \nolimits _{0}^{a}\delta (x){\rm{d}}x\ge 0.$$

Suppose there exist \(x_{0},x_{1}\in [0,a]\) such that \(x_{0}<x_{1},\mu _{1}(x_{0})=\mu _{1}(x_{1})=0\) and \(\mu _{1}(x)<0\) if \(x\in (x_{0},x_{1})\). Thus, from Step 2, we have \(I(x)>0\) and \(\delta (x)=0\) if \(x\in (x_{0},x_{1})\). For \(\eta _{0}>0\) small enough, we have \(\mu _{1}^{\prime }(x_{0}+\eta _{0})<0\) and \(\delta (x_{0}+\eta _{0})=0\). Hence (26) gives

$${}[\lambda -u^{\prime }(R(x))]f(x)<\mu _{1}(x)h(x)\gamma x\frac{v^{\prime }(m(x))u^{\prime \prime }(R(x))}{u^{\prime }(R(x))^{2}}$$

for \(x=x_{0}+\eta _{0}\). The previous inequality holds when \(\eta _{0}\searrow 0\). Since \(\mu _{1}(x)\) is continuous and \(\mu _{1}(x_{0})=0\), we deduce \(u^{\prime }(R(x_{0}))\ge \lambda .\)

By a similar argument, for \(\eta _{1}>0\) small enough, we have \(\mu _{1} ^{\prime }(x_{1}-\eta _{1})>0\) and \(\delta (x_{1}-\eta _{1})=0\). Thus (26) gives

$${}[\lambda -u^{\prime }(R(x))]f(x)>\mu _{1}(x)h(x)\gamma x\frac{v^{\prime }(m(x))u^{\prime \prime }(R(x))}{u^{\prime }(R(x))^{2}}>0,$$

for \(x=x_{1}-\eta _{1}\). The previous inequality holds when \(\eta _{1}\searrow 0\), which implies \(\lambda >u^{\prime }(R(x_{1}))\). Thus, we have \(u^{\prime }(R(x_{0}))\ge \lambda >u^{\prime }(R(x_{1}))\). Since \(u^{\prime \prime }<0\), we deduce \(R(x_{0})<R(x_{1}),\) which contradicts \(R^{\prime }(x)\le 0\) and \(x_{0}<x_{1}\).

Step 4:

\(\widehat{I}^{\prime }(x)\ge 0\) for all \(x\in [0,a]\).

Suppose \(\widehat{I}(x)>0\) and \(\widehat{I}^{\prime }(x)<0\) if \(x\in \mathcal {[}x_{0},x_{1}]\subset (0,a]\) with \(x_{0}<x_{1}\). (6) and (8) yield \(h(x)>0\)—and thus \(\varphi (x)=0\)—and \(\gamma xv^{\prime }(m(x))>u^{\prime }(R(x))\) if \(x\in \mathcal {[}x_{0},x_{1}]\). We also have \(\delta (x)=0,\mu _{1}(x)\ge 0\) if \(x\in \mathcal {[}x_{0},x_{1}]\). Hence (30) gives \(\varphi ^{\prime }(x)<0\) if \(x\in \mathcal {[}x_{0},x_{1}]\), which contradicts \(\varphi (x)\equiv 0\) in \(\mathcal {[}x_{0},x_{1}]\). Thus, \(\widehat{I}(x)\) is non-decreasing over [0, a].

Step 5:

\(\widehat{I}(x)>0\) for all \(x\in (0,a].\)

Step 4 implies that there exists \(x_{0}\) in [0, a] such that \(\widehat{I}(x)=0\) if \(x\in [0,x_{0}]\) and \(\widehat{I}(x)>0\) if \(x\in (x_{0},a]\). Suppose \(x_{0}>0\). From Step 2, we have \(\mu _{1}(x)=0\) for all \(x\in [0,x_{0}]\), and

$$\mu _{1}(0)=\int \nolimits _{0}^{x_{0}}\delta (x){\rm{d}}x=0$$

implies \(\delta (x)=0\) over \([0,x_{0}]\).Footnote 46 (26) then gives \(R^{\prime }(x)=0\) and thus \(h(x)=0\) for all \(x\in [0,x_{0}]\). From the same argument as in Step 1, we have \(m(x)=\widehat{m}(x)\), and thus \(h(x)>0,\) for all \(x\in [0,x_{0}]\), hence a contradiction.

We know from (6) and (7) that \(\widehat{I}^{\prime }(x)<m^{\prime }(x)\) when \(m^{\prime }(x)>0\), and thus Steps 1 and 5 prove Proposition 1.

Figure 8 illustrates the simulated trajectories of \(\mu _{1}(x)\) and \(\mu _{2}(x)\) under the calibration hypothesis introduced in Sect. 3.3, in the case of an exponential distribution function (Fig. 14).

Fig. 14
figure 14

Trajectories of costate variables

Proof of Proposition 2

Suppose there are \(x_{1},x_{2},x_{3}\) in [0, a] such that \(x_{1}<x_{2} <x_{3},h(x)=0\) if \(x\in [x_{1},x_{2}]\) and \(h(x)>0\) if \(x\in (x_{2} ,x_{3}].\) Thus, m(x) and I(x) remain constant over \([x_{1},x_{2}]\), and we may write \(m(x)=m_{0}>0,I(x)=I_{0}>0,\) and \(R(x)=w-P+I_{0}-m_{0}=R_{0}\) in this interval. Let \(\varphi (x)\) be defined as in the proof of Proposition 1. Using (26), (30), and \(\delta (x)=h(x)=0\) if \(x\in [x_{1},x_{2}]\) yields

$$\varphi ^{\prime }(x)=\lambda \left[ 1-\frac{\gamma xv^{\prime }(m_{0})}{u^{\prime }(R_{0})}\right] f(x)-\gamma \mu _{1}(x)\frac{v^{\prime }(m_{0})}{u^{\prime }(R_{0})},$$
(32)

and

$$\begin{aligned} \varphi ^{\prime \prime }(x)&=\lambda \left[ 1-\frac{\gamma xv^{\prime } (m_{0})}{u^{\prime }(R_{0})}\right] f^{\prime }(x)-\gamma \frac{v^{\prime }(m_{0} )}{u^{\prime }(R_{0})}[\lambda f(x)+\mu _{1}^{\prime }(x)]\\&=\lambda \left[ 1-\frac{\gamma xv^{\prime }(m_{0})}{u^{\prime }(R_{0} )}\right] f^{\prime }(x)-\gamma \frac{v^{\prime }(m_{0})}{u^{\prime }(R_{0})} [2\lambda -u^{\prime }(R_{0})]f(x), \end{aligned}$$

if \(x\in [x_{1},x_{2}].\) Let

$$\Lambda (x)\equiv \frac{\varphi ^{\prime \prime }(x)}{f(x)}=\lambda \left[ 1-\frac{\gamma xv^{\prime }(m_{0})}{u^{\prime }(R_{0})}\right] \frac{d\ln f(x)}{{\rm{d}}x}-\gamma \frac{v^{\prime }(m_{0})}{u^{\prime }(R_{0})}[2\lambda -u^{\prime }(R_{0})],$$

We have

$$\Lambda ^{\prime }(x)=-\lambda \gamma \frac{v^{\prime }(m_{0})}{u^{\prime }(R_{0} )}\frac{d\ln f(x)}{{\rm{d}}x}+\lambda \left[ 1-\gamma x\frac{v^{\prime }(m_{0} )}{u^{\prime }(R_{0})}\right] \frac{{\rm{d}}^{2}\ln f(x)}{{\rm{d}}x^{2}}.$$

We also have \(\varphi (x)\le 0\) if \(x\in [x_{1},x_{2}]\) and \(\varphi (x_{2})=0\), which implies \(\varphi ^{\prime }(x_{2})_{-}\ge 0\). (30), \(\delta (x_{2})=0,\) and \(\mu _{1}(x_{2})>0\)Footnote 47 give \(\gamma x_{2}v^{\prime }(m_{0})\le u^{\prime }(R_{0})\). If \({\rm{d}}f(x)/{\rm{d}}x\le 0\) and \({\rm{d}}^{2}\ln f(x)/{\rm{d}}x^{2}\ge 0\), then we have \(\Lambda ^{\prime }(x)\ge 0\) if \(x\le x_{2}\). Suppose there is \(x_{4}\in [0,x_{2}]\) such that \(\varphi (x_{4})=0\) and \(h(x)=0\) for all \(x\in [x_{4},x_{2}]\). Since \(\varphi (x)=0\) for all \(x\in [x_{2},x_{3}]\), we have \(\varphi ^{\prime \prime }(x_{2})_{+}=0\). Since \(I_{0}>0\), \(\mu _{1}(x)\) is differentiable at \(x=x_{2}\). Thus, using (30) and \(\delta (x)=0\) if \(x\in [x_{1},x_{2}]\) allows us to write

$$\varphi ^{\prime \prime }(x_{2})_{-}=\varphi ^{\prime \prime }(x_{2})_{+} +\gamma [\lambda f(x_{2})x_{2}+\mu _{1}(x_{2})]\frac{{\rm{d}}}{{\rm{d}}x}\left( \frac{v^{\prime }(m(x))}{u^{\prime }(R(x))}\right) _{\left| x=x_{2}\right. +}<0.$$

\(\Lambda (x_{2})_{-}<0\) and \(\Lambda ^{\prime }(x)\ge 0\) then yield \(\varphi ^{\prime \prime }(x)<0\) for all \(x\in [x_{4},x_{2}]\). Since \(\varphi (x_{2})=0\) and \(\varphi ^{\prime }(x_{2})_{-}\ge 0\), we have \(\varphi (x)<0\) for \(x<x_{2},\) x close to \(x_{2}\). Since \(\varphi (x_{2})=\varphi (x_{4})=0\), there is \(x_{5}\in (x_{4},x_{2})\) where \(\varphi (x)\) has a local minimum, and thus such that \(\varphi ^{\prime \prime }(x_{5})\ge 0\), which contradicts \(\varphi ^{\prime \prime }(x)<0\) for all \(x\in [x_{4},x_{2}].\) Thus, \(\varphi (x)<0\) for all x in \([0,x_{2})\), which contradicts \(\varphi (0)=0\). Hence, if \(h(x)>0\) in an interval \((x_{2},x_{3}]\), then \(h(x)>0\) in \([0,x_{3}]\), which shows that there exists \(\overline{x} \in [0,a]\) such that \(h(x)>0\) if \(x<\overline{x}\) and \(h(x)=0\) if \(h(x)>\overline{x}\). We observe that \(\overline{x}>0\), for otherwise we would have \(I(x)=0\) for all x in [0, a].

Finally, if \(x\in (0,\overline{x})\) we have \(\mu _{1}(x)>0,\delta (x)=0,\varphi ^{\prime }(x)=0,\) and thus (30) gives \(\gamma xv^{\prime }(m(x))<u^{\prime }(R(x))\). Using (6) then yields \(\widehat{I}^{\prime }(x)>0\). \(\square\)

Proof of Corollary 1

For notational simplicity, assume \(a=1\) and \(f(x)=1\) for all \(x\in [0,1]\). Suppose \(\overline{x}<1\). Using (30) and \(h(x)=\delta (x)=0\) if \(x\in [\overline{x},1]\) gives

$$\varphi ^{\prime \prime }(x)=-\gamma \frac{v^{\prime }(\overline{m})}{u^{\prime }(\overline{R})}[2\lambda -u^{\prime }(\overline{R})]\equiv \overline{\varphi }^{\prime \prime }$$

if \(x\in (\overline{x},1]\). The same argument as in the proof of Proposition 2 gives \(\overline{\varphi }^{\prime \prime }=\varphi ^{\prime \prime }(\overline{x})_{+}<\varphi ^{\prime \prime }(\overline{x})_{-}=0\). Since \(\varphi ^{\prime }(\overline{x})_{+}\le 0\), we have \(\varphi ^{\prime }(x)<0\) for all \(x\in [\overline{x},1]\), which contradicts \(\varphi (\overline{x} )=\varphi (1)=0\). \(\square\)

Proof of Corollary 2

Assume \(f(a)=f^{\prime }(a)=0\) and \(f^{\prime \prime }(a)>0\). Suppose \(\overline{x}=a\) and thus \(h(x)>0\) for all \(x\in [0,a]\).Footnote 48 We also have \(\varphi ^{\prime }(x)=\delta (x)=0\) for all x. Differentiating (30) gives

$$h(x)=-\frac{v^{\prime }(m(x))J(x)}{\lambda xK(x)f(x)+v^{\prime \prime } (m(x))\mu _{1}(x)},$$

where

$$\begin{aligned} J(x)&=-\frac{{\rm{d}}\ln f(x)}{{\rm{d}}x}\mu _{1}(x)+f(x)[2\lambda -u^{\prime }(R(x))],\\ K(x)&=v^{\prime \prime }(m(x))+\frac{\gamma xu^{\prime \prime }(R(x))v^{\prime }(m(x))^{2}}{u^{\prime }(R(x))^{2}}<0. \end{aligned}$$

The rest of the proof is in three steps.

Step 1:

\(J(x)>0\) if \(x\in (0,a)\) and \(J(a)=J^{\prime }(a)=J^{\prime \prime }(a)=h(a)=0\).

Using \(K(x)<0,v^{\prime \prime }(m(x))\le 0,\mu _{1}(x)>0,\) and \(h(x)>0\) gives \(J(x)>0\) if \(x\in (0,a)\). Using \(\mu _{1}(a)=f(a)=0\) gives \(J(a)=0\). Furthermore, we have

$$\begin{aligned} J^{\prime }(x)= & {} -\frac{{\rm{d}}\ln f(x)}{{\rm{d}}x}\mu _{1}^{\prime }(x)-\frac{{\rm{d}}^{2}\ln f(x)}{{\rm{d}}x^{2}}\mu _{1}(x)\nonumber \\&+ f^{\prime }(x)[2\lambda -u^{\prime }(R(x))]-f(x)u^{\prime \prime }(R(x))R^{\prime }(x). \end{aligned}$$
(33)

Using \(\mu _{1}(a)=f(a)=0,\delta (x)=0\) for all x and (26) gives \(\mu _{1}^{\prime }(a)=0\). (33) and \(d\ln f(x)/{\rm{d}}x\) \(\nrightarrow -\infty ,d^{2}\ln f(x)/{\rm{d}}x^{2}\nrightarrow \pm \infty\) when \(x\rightarrow a\) gives \(J^{\prime }(a)=0\). Since \(J(x)>0\) if \(x\in (0,a)\) and \(J(a)=J^{\prime }(a)=0,\) we deduce that J(x) reaches a local minimum over [0, a] at \(x=a\), which implies \(J^{\prime \prime }(a)\ge 0\).

Using L’Hôpital’s rule twice yields \(h(a)=-v^{\prime }(m(a))J^{\prime \prime }(a)/\lambda aK(a)f^{\prime \prime }(a)=0\). Since \(h(x)\ge 0\) for all x, we deduce \(J^{\prime \prime }(a)\le 0\), and thus \(J^{\prime \prime } (a)=h(a)=0.\)

Step 2:

\(u^{\prime }(R(a))=\gamma av^{\prime }(m(a))=2\lambda.\)

Since \(f(a)=f^{\prime }(a)=\mu _{1}(a)=\mu _{1}^{\prime }(a)=0\), we deduce \(u^{\prime }(R(a))=\gamma av^{\prime }(m(a))\) from (26) and \(\varphi ^{\prime }(x)\equiv 0\) by using the L’Hôpital’s rule twice. Furthermore, (26) gives \(\mu _{1}^{\prime \prime }(a)=0\) and (33) then yields \(J^{\prime \prime }(a)=f^{\prime \prime }(a)[2\lambda -u^{\prime }(R(a))]\), which implies \(u^{\prime }(R(a))=2\lambda\).

Step 3:

Let \(\xi (x)\equiv u^{\prime }(R(x))\varphi ^{\prime }(x)\), where \(\varphi (x)\) is defined by (25). We have \(\xi ^{\prime \prime \prime }(a)<0\), which contradicts \(\varphi (x)=0\) for all \(x\in [0,a]\) when \(\overline{x} =a\)

\(\overline{x}=a\) implies \(\xi (x)=0\) for all \(x\in [0,a]\). We may write \(\xi (x)=\lambda f(x)\Delta _{1}(x)-\gamma \Delta _{2}(x)\), with \(\Delta _{1}(x)=u^{\prime }(R(x))-\gamma xv^{\prime }(m(x)),\Delta _{2}(x)=\mu _{1}(x)v^{\prime }(m(x))\). We have \(\Delta _{1}(a)=0,\Delta _{1}^{\prime }(a)=-\gamma v^{\prime }(m(a))\) from \(h(a)=0\) and \(u^{\prime }(R(a))=\gamma av^{\prime }(m(a))\). Using (26) and Step 2 gives

$$\begin{aligned} \Delta _{2}^{\prime \prime \prime }(a)&=\mu _{1}^{\prime \prime \prime }(a)v^{\prime }(m(a))\\&=f^{\prime \prime }(a)[\lambda -u^{\prime }(R(a))]v^{\prime }(m(a))\\&=-\lambda f^{\prime \prime }(a)v^{\prime }(m(a))\text {.} \end{aligned}$$

We have

$$\begin{aligned} \xi ^{\prime \prime }(x) =\,\lambda f^{\prime \prime }(x)\Delta _{1}(x)+2\lambda f^{\prime }(x)\Delta _{1}^{\prime }(x)\,+\lambda f(x)\Delta _{1}^{\prime \prime }(x)-\gamma \Delta _{2}^{\prime \prime }(x), \end{aligned}$$

and thus, using \(\Delta _{1}(a)=0\) and \(f(a)=f^{\prime }(a)=0\), we may write

$$\xi ^{\prime \prime \prime }(a)=3\lambda f^{\prime \prime }(a)\Delta _{1}^{\prime }(a)-\gamma \Delta _{2}^{\prime \prime \prime }(a)=-\frac{4\lambda ^{2} f^{\prime \prime }(a)}{a}<0.$$

\(\square\)

Proof of Proposition 3

The optimal non-linear indemnity schedule I(m) is such that

$$I^{\prime }(m)=\frac{\widehat{I}^{\prime }(x)}{m^{\prime }(x)}\text { when }m=m(x),$$

for all \(m\in (0,\overline{m})\). Thus, (6), (7), (30), and \(\varphi ^{\prime }(x)=\delta (x)=0\) if \(x\in (0,\overline{x})\) give

$$I^{\prime }(m(x))=1-\frac{\gamma xv^{\prime }(m(x))}{u^{\prime }(R(x))}=\mu _{1}(x)\frac{\gamma v^{\prime }(m(x))}{\lambda f(x)u^{\prime }(R(x))},$$

which implies \(I^{\prime }(m)\in (0,1)\) for all \(m\in (0,\overline{m}),I^{\prime }(\overline{m})=0\) if \(\overline{x}=a,I^{\prime }(\overline{m})>0\) if \(\overline{x}<a\), where \(\overline{m}=\) \(m(\overline{x}).\)

All types \(x\ge \overline{x}\) choose \(\overline{m}=m(\overline{x}),\) and thus the optimal allocation is sustained by an indemnity schedule such that \(I(m)=I(\overline{m})\) for \(m\ge \overline{m}\).

Let \(I^{\prime }(0)={\lim}_{X\rightarrow 0}I^{\prime }(m)\ge 0\). The rest of the proof shows that \(mv^{\prime \prime }(m)/v^{\prime }(m)\rightarrow \eta \in (0,1)\) when \(m\rightarrow 0\) (an assumption made in what follows) is a sufficient condition for \(I^{\prime }(0)>0\). The following lemma will be an intermediary step in an a contrario reasoning. \(\square\)

Lemma 5

Suppose \(I^{\prime }(0)=0\), then (i) \(h(x)\rightarrow +\infty\) when \(x\rightarrow 0\). (ii) There exists a sequence \(\{x_{n},n\in \mathbb {N\}} \subset (0,a]\) such that \(0<x_{n+1}<x_{n}\) for all \(n,x_{n}\rightarrow 0\) when \(n\rightarrow \infty\) and \(m(x_{n})/x_{n}>h(x_{n})\) for all \(n\in \mathbb {N}\).

Proof of Lemma 5

  1. (i)

    Note that \(I^{\prime }(0)=0\) implies \(C(x)\equiv xv^{\prime }(m(x))\rightarrow u^{\prime }(w-P)/\gamma\) when \(x\rightarrow 0\). If (i) does not hold, then there exists a sequence \(\{x_{n},n\in \mathbb {N\}}\subset (0,a]\) such that \(0<x_{n+1}<x_{n}\) for all \(n,x_{n}\rightarrow 0\) when \(n\rightarrow \infty\) and \(h(x_{n})\rightarrow \overline{h}<+\infty\) when \(n\rightarrow +\infty\). Using \(v(0)=0\) and L’Hôpital’s rule yields

    $$\begin{aligned} \underset{x\rightarrow 0}{\lim }C(x)=\frac{1}{\underset{x\rightarrow 0}{\lim }\left[ -\frac{v^{\prime \prime }(m(x))}{v^{\prime }(m(x))^{2}}h(x)\right] }=\frac{1}{\eta \overline{h}}\underset{x\rightarrow 0}{\lim }\left[ m(x)v^{\prime }(m(x))\right] . \end{aligned}$$

    Furthermore, \(mv^{\prime \prime }(m)/v^{\prime }(m)\rightarrow \eta >0\) implies \(mv^{\prime }(m)\rightarrow 0\) when \(m\rightarrow 0\). Hence, \(C(x)\rightarrow 0\) when \(x\rightarrow 0\), which contradicts \(C(x)\rightarrow u^{\prime }(w-P)/\gamma >0\) when \(x\rightarrow 0\).

  1. (ii)

    Let \(x_{0}\) such that h(x) is continuous over \((0,x_{0}]\) and consider the decreasing sequence \(\{x_{n},n\in \mathbb {N\}}\) defined by \(x_{n}=\sup \{x\in (0,x_{0}]\left| h(x^{\prime })\ge n\text { if }x^{\prime }\le x\right. \}\). \(x_{n}\) is well defined and such that \(x_{n}\rightarrow 0\) when \(n\rightarrow \infty\) from (i) and, using the continuity of h(x), we have \(h(x_{n})=n\) and \(h(x)>n\) if \(x<x_{n}\). Thus,

    $$\frac{m(x_{n})}{x_{n}}=\frac{\int \nolimits _{0}^{x_{n}}h(x){\rm{d}}x}{x_{n}} >n=h(x_{n}),$$

    which completes the proof of (ii).

We are now in the position to end up the proof of the Proposition. Let us suppose \(I^{\prime }(0)=0\), and let \(D(x)\equiv \gamma xv^{\prime } (m(x))-u^{\prime }(R(x))\) with \(D(x)<0\) if \(x>0\) from \(\widehat{I}^{\prime }(x)>0\), and \(D(0)=0\) from \(I^{\prime }(0)=0\). We thus have \(D^{\prime }(x)<0\) for x close to 0. We have

$$\begin{aligned} D^{\prime }(x)&=\gamma [v^{\prime }(m(x)+xv^{\prime \prime }(m(x))h(x)]-u^{\prime \prime }(R(x))R^{\prime }(x)\\&=\frac{\gamma xv^{\prime }(m(x))}{m(x)}\left[ \frac{m(x)}{x}+h(x)\left( \frac{v^{\prime \prime }(m(x))m(x)}{v^{\prime }(m(x))}+\frac{u^{\prime \prime }(R(x))}{u^{\prime }(R(x))}m(x)\right) \right] . \end{aligned}$$

Consider the sequence \(\{x_{n},n\in \mathbb {N\}}\) defined in Lemma 5-(ii). Using \(m(x_{n})/x_{n}>h(x_{n})\) gives

$$D^{\prime }(x_{n})=\frac{\gamma x_{n}h(x_{n})v^{\prime }(m(x_{n}))}{m(x_{n} )}\left[ 1+\frac{v^{\prime \prime }(m(x_{n}))m(x_{n})}{v^{\prime }(m(x_{n} ))}+\frac{u^{\prime \prime }(R(x_{n}))}{u^{\prime }(R(x_{n}))}m(x_{n})\right]$$

Since \(x_{n}\rightarrow 0\) when \(n\rightarrow +\infty ,u^{\prime \prime }(R(x))/u^{\prime }(R(x))\rightarrow u^{\prime \prime }(w-P)/u^{\prime }(w-P)\) and \(m(x)\rightarrow 0\) when \(x\rightarrow 0\), and \(-v^{\prime \prime }(m)m/v^{\prime }(m)\rightarrow \eta\) when \(m\rightarrow 0\), we deduce that \(\eta <1\) is a sufficient condition for \(D^{\prime }(x_{n})>0\) when n is large enough, which is a contradiction. We deduce \(I^{\prime }(0)>0\) when \(\eta <1\). \(\square\)

Appendix 2

1.1 Computational approach

Our simulations are performed through a discretization method. Under the notations that are standard in this field, an optimal control problem is usually written as follows, by denoting x the vector of state variables and u the vector of controls that are function of time \(t\in \mathbb {R}\):

$$\begin{aligned} \begin{array}{ll} \min \ J(x(\cdot ),u(\cdot ))=g_{0}(t_{f},x(t_{f}))&{}\text {Objective (Mayer form)}\\ \dot{x}(t)=f(t,x(t),u(t))\quad \forall t\in [0,t_{f}]&{}\text {Dynamics}\\ u(t)\in U\quad \text {for a.e. }t\in [0,t_{f}]&{}\text {Admissible Controls}\\ g(x(t),u(t))\le 0&{}\text {Path Constraints}\\ \Phi (x(0),x(t_{f}))=0&{}\text {Boundary Conditions} \end{array} \end{aligned}$$

The time discretization is as follows:

$$\begin{aligned} \begin{array}{lll} t\in [0,t_{f}] &{} \longrightarrow &{}t_{0}=0,\ldots ,t_{N}=t_{f}\\ x(.),u(.) &{} \longrightarrow &{}X=\{x_{0},\ldots ,x_{N},u_{0},\ldots ,u_{N}\}\\ &{}&{}{\text {------------------------------------------}}\\ \text {Objective} &{} \longrightarrow &{}\min \ g_{0}(t_{f},x_{N})\\ \text {Dynamics} &{} \longrightarrow &{}x_{i+i}=x_{i}+hf(x_{i},u_{i})\quad i=0,\ldots ,N\\ \text {Admissible Controls} &{} \longrightarrow &{}u_{i}\in \mathbf {U}\quad i=0,\ldots ,N\\ \text {Path Constraints} &{} \longrightarrow &{}g(x_{i},u_{i})\le 0\quad i=0,\ldots ,N\\ \text {Boundary Conditions} &{} \longrightarrow &{}\Phi (x_{0},x_{N})=0 \end{array} \end{aligned}$$

We therefore obtain a non-linear programming problem on the discretized state and control variables. In BOCOP, the discretized non-linear optimization problem is solved by the Ipopt solver that implements a primal–dual interior point algorithm; see Wächter and Biegler (2006). The derivatives required for the optimization are computed by the automatic differentiation tool Adol-C; see Walther and Griewank (2012).

1.2 Complementary proofs

Proof of Lemma 2

Let \(\widehat{I}(x),\) \(x\in [0,x^{*}],\) P, and \(c^{*}\) be given, with \(I^{*}=\) \(\widehat{I}(x^{*}),m^{*}=m(x^{*}),\) and \(I^{*}\le m^{*}\). Consider the subproblem in which \(\{\widehat{I} (x),m(x),g(x),h(x)\), \(x\in [x^{*},a]\}\) maximizes

$${\displaystyle \int _{x^{*}}^{a}} \left\{ u(w-P+\widehat{I}(x)-m(x))+h_{0}-\gamma x[1-v(m(x))]\right\} f(x){\rm{d}}x,$$
(34)

subject to (7) and (10)–(12).

Let \(\mu _{1}(x)\) and \(\mu _{2}(x)\) be costate variables, respectively, for \(\widehat{I}(x)\) and m(x) and let \(\eta (x)\) and \(\lambda\) be Lagrange multipliers, respectively, for (11) and (12) in this subproblem.Footnote 49 The Hamiltonian is written as

$$\begin{aligned} \mathcal {H} =\,[u(R(x))+\gamma xv(m(x))]f(x)+[\mu _{1}(x)-\eta (x)]g(x)\,+[\mu _{2}(x)+\eta (x)]h(x)-\lambda [\widehat{I}(x)+c]f(x), \end{aligned}$$

and the optimality conditions are

$$\mu _{1}(x)-\eta (x)\le 0,=0\text { if }g(x)>0,$$
(35)
$$\mu _{2}(x)+\eta (x)=0,$$
(36)
$$\mu _{1}^{\prime }(x)=[\lambda -u^{\prime }(R(x))]f(x),$$
(37)
$$\mu _{2}^{\prime }(x)=[u^{\prime }(R(x))-\gamma xv^{\prime }(m(x))]f(x),$$
(38)

for all x, with the transversality conditions \(\mu _{1}(a)=\mu _{2}(a)=0,\) and \(\eta (x)\ge 0\) for all x and \(\eta (x)=0\) if \(h(x)>g(x).\)

Let us consider \(x_{0}\in [x^{*},a]\) such that \(g(x)>0\) if x is in a neighborhood \(\mathcal {V}\) of \(x_{0}.\) Suppose \(h(x)>g(x)\), and thus \(\eta (x)=0\) if \(x\in \mathcal {V}\). Equation (35) gives \(\mu _{1}(x)=0\), and thus \(\mu _{1}^{\prime }(x)=0\) for all x \(\in \mathcal {V}\). Then (37) gives \(u^{\prime }(R(x))=\lambda\), and thus \(R(x)=w-P-m(x)+\widehat{I}(x)\) is constant in \(\mathcal {V}\). This implies \(m^{\prime }(x)-\widehat{I}^{\prime } (x)=h(x)-g(x)=0\), which contradicts \(h(x)>g(x)\). We deduce that \(h(x)=g(x)\) if \(x\in \mathcal {V}\). (35) and (36) yield \(\mu _{1}(x)=-\mu _{2}(x)=\eta (x),\) and thus \(\mu _{1}^{\prime }(x)=-\mu _{2}^{\prime }(x)\), for all \(x\in \mathcal {V}\). (37) and (38) then imply \(\gamma xv^{\prime }(m(x))=\lambda\) for all \(x\in \mathcal {V}\), which gives \(m^{\prime }(x)\) \(=-v^{\prime }(m(x))/xv^{\prime \prime }(m(x))\).

Let \(x_{0},x_{1},x_{2}\in [x^{*},a]\) such that \(x_{0}<x_{1}<x_{2}\) with \(g(x)=0\) if \(x\in [x_{0},x_{1}]\) and \(g(x)>0\) if \(x\in (x_{1} ,x_{2}]\). Let us show that we cannot have \(g(x)>0\) if \(x\in [x_{3} ,x_{0}]\) with \(x_{3}<x_{0}\). We have \(\mu _{1}(x)+\mu _{2}(x)\le 0\) if \(x\in [x_{0},x_{1})\) and \(\mu _{1}(x)+\mu _{2}(x)=0\) if \(x\in [x_{1},x_{2}]\). Let \(\Psi (x)\equiv [\mu _{1}^{\prime }(x)+\mu _{2}^{\prime }(x)]/f(x)\), with \(\Psi (x_{1})=0\) because \(\mu _{1}(x)+\mu _{2}(x)\) reaches a local maximum at \(x=x_{1}\). Note that \(\Psi (x)\) is differentiable. Let \(x\in [x_{0},x_{1})\). If \(m^{\prime }(x)=0\) (and thus \(R^{\prime }(x)=0\)), we have \(d[\mu _{1}^{\prime }(x)/f(x)]/{\rm{d}}x=0\) and \(d[\mu _{2}^{\prime }(x)/f(x)]/{\rm{d}}x=-\gamma v^{\prime }(m(x_{1}))<0\), and thus \(\Psi ^{\prime }(x)<0\). If \(m^{\prime }(x)>0\) (and thus \(R^{\prime }(x)<0\)), we have \(\eta (x)=\mu _{2}(x)=\mu _{2}^{\prime }(x)=0\) and \(d[\mu _{1}^{\prime }(x)/f(x)]=-u^{\prime \prime }(R(x))R^{\prime }(x)<0\), and thus we still have \(\Psi ^{\prime }(x)<0\). Suppose \(g(x)>0\) if \(x\in [x_{3},x_{0}]\) with \(x_{3}<x_{0}\). In that case we would have \(\mu _{1}(x)+\mu _{2}(x)=0\) if \(x\in [x_{3},x_{0}],\) and since \(\mu _{1}(x)+\mu _{2}(x)\le 0\) if \(x\in [x_{0},x_{1})\), we would have \(\Psi (x_{0})=0\). This contradicts \(\Psi (x_{1})=0,\Psi ^{\prime }(x)<0\) if \(x\in [x_{0},x_{1})\).

Suppose there are \(x_{0},x_{1},x_{2}\in [x^{*},a]\) such that \(x_{0}<x_{1}<x_{2}\) with \(g(x)>0\) if \(x\in [x_{0},x_{1}]\) and \(g(x)=0\) if \(x\in (x_{1},x_{2}]\). In that case \(\mu _{1}(x)+\mu _{2}(x)=0\) if \(x\in [x_{0},x_{1}]\) and \(\mu _{1}(x)+\mu _{2}(x)\le 0\) if \(x\in [x_{1},x_{2}]\). Since \(\mu _{1}(a)+\mu _{2}(a)=0\) and \(\mu _{1}(x)\) and \(\mu _{2}(x)\) are continuous, we may choose \(x_{2}\) such that \(\mu _{1}(x_{2})+\mu _{2}(x_{2})=0\). The same calculation as above implies \(\Psi (x_{1})=0,\) \(\Psi ^{\prime }(x)<0\) if \(x\in [x_{1},x_{2}]\) and thus \(\Psi (x)<0\) if \(x\in [x_{1},x_{2}]\), which contradicts \(\mu _{1}(x_{2})+\mu _{2}(x_{2})=0\).

Overall, we deduce that there exists \(\widehat{x}\in [x^{*},a]\) such that \(\widehat{I}^{\prime }(x)=0\) if \(x\in [x^{*},\widehat{x}]\) and \(\widehat{I}^{\prime }(x)=m^{\prime }(x)>0\) if \(x\in [\widehat{x},a]\). The same reasoning—replacing \(\Psi (x)\) by \(\Phi (x)\equiv \mu _{2}^{\prime }(x)/f(x)\)—shows that there exists \(\widetilde{x}\in [x^{*},\widehat{x}]\) such that \(m^{\prime }(x)=0\), and thus \(m(x)=m^{*}\), if \(x\in [x^{*},\widetilde{x}]\) and \(m^{\prime }(x)>0\) if \(x\in [\widetilde{x},\widehat{x}]\). When \(m^{\prime }(x)>0\), we have \(\eta (x)=\mu _{2}(x)=0\), and thus \(\mu _{2}^{\prime }(x)\equiv 0\) if \(x\in [\widetilde{x},\widehat{x}]\), which gives \(u^{\prime }(w-P-m(x)+I^{*})=\gamma xv^{\prime }(m(x))\), and thus \(m^{\prime }(x)\equiv -\gamma v^{\prime }(m(x))/[\gamma xv^{\prime \prime }(m(x))+u^{\prime \prime }(w-P-m(x)+I^{*})]\). When \(m^{\prime }(x)=0\), we have \(\Phi ^{\prime }(x)<0\) if \([x^{*},\widetilde{x})\) and \(\Phi ^{\prime }(\widetilde{x})=0\), and thus \(\widetilde{x}\) is given by \(u^{\prime }(w-P-m^{*}+I^{*})=\gamma \widetilde{x}v^{\prime }(m^{*})\) if \(u^{\prime }(w-P-m^{*}+I^{*})>\gamma x^{*}v^{\prime }(m^{*})\), and \(\widetilde{x}=x^{*}\) if \(u^{\prime }(w-P-m^{*}+I^{*})=\gamma x^{*}v^{\prime }(m^{*})\).

If \(x^{*}<\widehat{x}\), then replacing \(m^{*}\) by \(\widehat{m}\equiv m(\widehat{x})>m^{*}\) implements the same allocation with lower audit costs. Indeed, m(x) is an optimal choice of type x individuals if \(x>\widehat{x}\), because such individuals would prefer choosing \(\widehat{m}\) rather than any \(m\in [0,\widehat{m}),\) and furthermore, for such individuals, there is full coverage at the margin in \((\widehat{m},m(x)]\) and they cannot choose expenses larger than m(x). In addition, the expected audit cost decreases from \(c[1-F(x^{*})]\) to \(c[1-F(\widehat{x})]\) when \(\widehat{m}\) is substituted for \(m^{*}\). Thus, an optimal allocation is necessarily such that \(x^{*}=\widehat{x}.\)

Proof of Proposition 4

Let \(\mu _{1}(x)\) and \(\mu _{2}(x)\) be costate variables, respectively, for \(\widehat{I}(x)\) and m(x) and let \(\delta (x)\) and \(\lambda\) be Lagrange multipliers, respectively, for (9) and (20). The Hamiltonian is written as in the proof of Proposition 1, and the optimality conditions (25), (26), and (27) still hold. We also have \(\delta (x)\ge 0\) and \(\delta (x)=0\) if \(\widehat{I} (x)>0\), and \(\mu _{1}(x^{*})+\mu _{2}(x^{*})=0\) from the characterization of the optimal continuation allocation. The optimality conditions on \(m^{*},I^{*},x^{*},P,\) and A are written as

$$V_{1}^{\prime }-\mu _{2}(x^{*})=0,$$
(39)
$$V_{2}^{\prime }-\mu _{1}(x^{*})=0,$$
(40)
$$\begin{aligned}&\left. V_{3}^{\prime }+\{u(R^{*})+h_{0}-\gamma x^{*}[1-v(m^{*})]\}f(x^{*})\right. \nonumber \\&\,\quad \left. -\mu _{1}(x^{*})\frac{\gamma x^{*}v^{\prime }(m^{*})}{u^{\prime }(R^{*})}-[\lambda -\delta (x^{*})]I^{*}\le 0,=0\text { if}x^{*}>0,\right. \end{aligned}$$
(41)
$$V_{4}^{\prime }-{\displaystyle \int _{0}^{x^{*}}}\left[ u^{\prime }(R(x))f(x)+\mu _{1}(x)h(x)\gamma x\frac{v^{\prime }(m(x))u^{\prime \prime }(R(x))}{u^{\prime }(R(x))^{2}}\right] {\rm{d}}x=0,$$
(42)
$$V_{5}^{\prime }+\lambda =0,$$
(43)

respectively, where \(V_{1}^{\prime },V_{2}^{\prime },...\) denote the partial derivatives of \(V(m^{*},I^{*},x^{*},P,A)\) and \(R^{*}\equiv R(x^{*})=w-P-m^{*}+I^{*}\). Define \(\varphi (x)\) for all \(x\in [0,x^{*}]\) by (25) as in the proof of Proposition 1.

Step 1:

\(m(x)>0\) for all \(x>0.\)

Identical to Step 1 in the proof of Proposition 1.

Step 2:

\(\mu _{1}(x)\) is continuous in \([0,x^{*}]\) with \(\mu _{1}(x)=0\) for all \(x\in [0,x^{*}]\) such that \(\widehat{I}(x)=0.\)

Identical to Step 2 in the proof of Proposition 1.

Step 3:

\(\mu _{1}(x)\ge 0\) for all \(x\in [0,x^{*}]\) with \(\mu _{1}(x^{*})>0.\)

We know from Lemma 4 that \(R(x)=w-P-m^{*}+I^{*}\) and

$$m(x)=m^{*}+ {\displaystyle \int _{x^{*}}^{x}} \frac{v^{\prime }(m(t))}{tv^{\prime \prime }(m(t))}{\rm{d}}t,$$

for all \(x\in [x^{*},a]\). Thus,

$$V_{2}^{\prime }=u^{\prime }(w-P-m^{*}+I^{*})[1-F(x^{*})],$$

and (40) gives \(\mu _{1}(x^{*})>0\). The remaining part of Step 3 is the same as in the proof of Proposition 1.

Step 4:

\(\widehat{I}(x)>0\) for all \(x\in (0,x^{*}].\)

Identical to Steps 4 and 5 in the proof of Proposition 1.

Step 5:

\(x^{*}>0.\)

We have

$$V_{3}^{\prime }=-\{u(R^{*})+h_{0}-\gamma x^{*}[1-v(m^{*} )]+\lambda (I^{*}+c)\}f(x^{*}),$$

from the definition of V(.). Thus (41) and \(\delta (x^{*})=0\) give

$$\lambda cf(x^{*})-\mu _{1}(x^{*})\frac{\gamma x^{*}v^{\prime }(m^{*})}{u^{\prime }(R^{*})}\le 0,=0\text { if }x^{*}>0,\text { }$$

which implies \(x^{*}>0\).

Step 6:

There is \(\overline{x}\in (0,x^{*}]\) such that

$$\begin{aligned} \widehat{I}^{\prime }(x)&>0,h(x)=m^{\prime }(x)>\,0\,{\text{if}}\,0<x<\overline{x},\\ \widehat{I}(x)&=\widehat{I}(\overline{x}),m(x)=m(\overline{x} ),h(x)=\,0\,{\text{if}}\,\overline{x}<x\le x^{*},\\ \widehat{I}^{\prime }(0)&=\,0,\widehat{I}^{\prime }(\overline{x} )=\,0\,{\text{if}}\,\overline{x}=a\,{\text{and}} \widehat{I}^{\prime }(\overline{x})>\,0\,{\text {if}}\,\overline{x}<x^{*}. \end{aligned}$$

Identical to the proof of Proposition 2.

Finally, \(\mu _{1}(x^{*})>0\) shows that there is an upward discontinuity in m(x) and \(\widehat{I}(x)\) at \(x=x^{*}\). \(\square\)

Proof of Proposition 5

Using \(x^{*}>0\) and \(m^{\prime }(x)>0\) if \(x\in (0,\overline{x})\) gives \(m^{*}>0\). The remaining part of the Proposition is a straightforward adaptation of Proposition 3. \(\square\)

Proof of Lemma 3

Similar to Lemma 1, with straightforward adaptation. \(\square\)

Proof of Lemma 4

We now have

$$V(x,\widetilde{x})=U\left( w-P+\widehat{I}(\widetilde{x})-m(\widetilde{x} )),h_{0}-\gamma x(1-v(m(\widetilde{x}))\right) .$$

A straightforward adaptation of the proof of Lemma 1 shows that (17) is a necessary condition for incentive compatibility. (17) gives

$$\frac{\partial V(x,\widetilde{x})}{\partial \widetilde{x}}=\gamma v^{\prime }(m(\widetilde{x}))m^{\prime }(\widetilde{x})U_{H}^{\prime }(R(\widetilde{x} ),H(x,\widetilde{x}))\left[ x-\widetilde{x}A(x,\widetilde{x})\right],$$

where

$$\begin{aligned} H(x,\widetilde{x})&\equiv h_{0}-\gamma x(1-v(m(\widetilde{x} )),H(\widetilde{x},\widetilde{x})\equiv H(\widetilde{x}),\\ A(x,\widetilde{x})&\equiv \frac{U_{R}^{\prime }(R(\widetilde{x} ),H(x,\widetilde{x}))U_{H}^{\prime }(R(\widetilde{x}),H(\widetilde{x}))}{U_{R}^{\prime }(R(\widetilde{x}),H(\widetilde{x}))U_{H}^{\prime } (R(\widetilde{x}),H(x,\widetilde{x}))}. \end{aligned}$$

Using \(U_{H^{2}}^{\prime \prime }<0\) and \(U_{RH}^{\prime \prime }>0\) gives \(A(x,\widetilde{x})>1\) if \(\widetilde{x}>x\) and \(A(x,\widetilde{x})<1\) if \(\widetilde{x}<x\), with \(A_{\widetilde{x}}^{\prime }(x,\widetilde{x} )_{\left| \widetilde{x}=x\right. }>0\), and thusFootnote 50

$$\frac{\partial ^{2}V(x,\widetilde{x})}{\partial \widetilde{x}^{2}}\left| _{\widetilde{x}=x}\right. =-\gamma v^{\prime }(m(x))m^{\prime }(x)U_{H} ^{\prime }(R(x),H(x))[1+A_{\widetilde{x}}^{\prime }(x,\widetilde{x})_{\left| \widetilde{x}=x\right. }].$$

Thus incentive compatibility gives (18). Conversely, assume that (17) and (18) hold. We have

$$\begin{aligned} \frac{\partial V(x,\widetilde{x})}{\partial \widetilde{x}}&\le \gamma v^{\prime }(m(\widetilde{x}))m^{\prime }(\widetilde{x})U_{H}^{\prime }(R(\widetilde{x}),H(x,\widetilde{x}))(x-\widetilde{x})<0\text { if }\widetilde{x}>x,\\ \frac{\partial V(x,\widetilde{x})}{\partial \widetilde{x}}&\ge \gamma v^{\prime }(m(\widetilde{x}))m^{\prime }(\widetilde{x})U_{H}^{\prime }(R(\widetilde{x}),H(x,\widetilde{x}))(x-\widetilde{x})>0\text { if }\widetilde{x}<x, \end{aligned}$$

which implies incentive compatibility. \(\square\)

Proof of Proposition 6

The notations of costate variables and Lagrange multipliers are the same as in the proof of Proposition 1. Observe first that Steps 1–4 of this proof remain valid, with an unchanged definition of \(\varphi (x)\), just replacing (30) by

$$\varphi ^{\prime }(x)=\left[ \lambda (1+\sigma )f(x)-\delta (x)\right] \left[ 1-\frac{\gamma xv^{\prime }(m(x))}{u^{\prime }(R(x))}\right] -\gamma \mu _{1}(x)\frac{v^{\prime }(m(x))}{u^{\prime }(R(x))},$$
(44)

and \(\lambda\) by \(\lambda (1+\sigma )\) in (26).

Suppose that \(\widehat{I}^{\prime }(x)>0\) if \(x<\varepsilon\), with \(\varepsilon >0\). Hence \(\widehat{I}(x)>0\) (and thus \(\delta (x)=0\)) for all \(x>0\). Using (6) gives

$$h(x) >0,$$
(45)
$$1-\frac{\gamma xv^{\prime }(m(x))}{u^{\prime }(R(x))}>0,$$
(46)

if \(x<\varepsilon\). (45) implies \(\varphi (x)=\varphi ^{\prime }(x)=0\) if \(x<\varepsilon\). Furthermore, using (26) (in which \(\lambda\) is replaced by \(\lambda (1+\sigma )\)), (29), and \(\mu _{1}(a)=0\) yields

$$\mu _{1}(0)=-\int _{\underline{x}}^{a}\mu _{1}^{\prime }(x){\rm{d}}x=\int _{0}^{a} \delta (x){\rm{d}}x-\lambda \sigma =-\lambda \sigma <0,$$

and thus \(\mu _{1}(x)<0\) for x small enough. Equations (44) and (46) then yield \(\varphi ^{\prime }(x)>0\), hence a contradiction. Since we know from Step 4 that \(\widehat{I}(x)\) is non-decreasing, we deduce that there exists \(d>0\) such that \(\widehat{I}(x)=0\) if \(x\le d\) and \(\widehat{I}(x)>0\) if \(x>d\).

The simulated trajectories of \(\mu _{1}(x)\) and \(\mu _{2}(x)\) are illustrated in Fig. 15 in the case of an exponential distribution function, with \(\sigma =0.1\) and with the same calibration as in Sect 3.3. We have \(\mu _{1}(x)=\mu _{2}(x)=0\) when \(x\le d\) and \(\mu _{1}(x)>0,\mu _{2}(x)<0\) when \(x>d\), with \(d\simeq 0.41\).

The characterization of the indemnity schedule I(m) is derived in the same way as in Proposition 3, with \(D=m(d)\).Footnote 51 \(\square\)

Fig. 15
figure 15

Costate variables under loading

Proof of Corollary 3

Similar to Corollary 1. \(\square\)

Proof of Corollary 4

Similar to Corollary 2. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martinon, P., Picard, P. & Raj, A. On the design of optimal health insurance contracts under ex post moral hazard. Geneva Risk Insur Rev 43, 137–185 (2018). https://doi.org/10.1057/s10713-018-0034-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/s10713-018-0034-y

Keywords

JEL Classification

Navigation