1 Introduction

To find a plausible exogenous variation which is generated by a quasi-experimental situation has become an essential element of recent econometric policy evaluation, particularly when it is not possible to randomly assign the treatment of interest.

Regression kink (RK) designs have recently been added to the methodologies available for the implementation of this trend. The basic idea of an RK design is similar to a regression discontinuity (RD) design, which is now one of the most popular approaches in applied microeconometrics. Whereas an RD design utilizes a “discontinuity” or “jump” in treatment status at a threshold of an assignment variable, an RK design exploits a “kink” in treatment status at a threshold. The applicability of RK designs may be potentially broader than that of RD designs in policy evaluation because various public policies avoid abrupt discontinuity at a threshold in the treatment and outcome status of relevant parties and result in a continuous but often kinked transition in status following introduction of the treatment in question.

Two such examples are progressive taxation and welfare transfer programs. Because marginal income tax rates and social welfare benefits are often determined by the income levels of taxpayers and welfare recipients, the budget constraints of individuals are often kinked at eligibility thresholds. The existence of such kinks in budget constraints is well known and is taken into account in empirical studies of labor supply and other relevant topics.Footnote 1

Nonetheless, kinks generated by institutional or policy settings have not been recognized as explicit sources of identification. One notable recent exception is studies of bunching such as Saez (2010), which exploits the bunching behavior of taxpayers at a kink point to estimate labor-supply elasticity. While the study of bunching utilizes endogenous sorting at a kink for identification, RK designs exploit the seemingly exogenous variation that is generated by a kink in a manner similar to the way RD designs use a discontinuity as a source of exogenous variation.

The empirical applications of RK designs are still limited but increasing. Before the seminal work of Nielsen et al. (2010) introduced the term “regression kink design,” a few papers had already exploited a kink as a means of identification (Guryan 2001; Dahlberg et al. 2008). After Nielsen et al. (2010) (working paper 2008), Card et al. (2015) [and its previous versions Card et al. (2009, (2012)] formally discussed nonparametric identification and estimation using an RK design. Other recent applications include Bravo (2011), Ek (2013), Turner (2014), Lundqvist et al. (2014), Simonsen et al. (2016) Landais (2015), and Engström et al. (2015).Footnote 2

Although an RK design seems to be a fruitful identification strategy which can be applied to various institutional settings, there is some concern about the applicability of RK designs to real-world finite samples. That is, although an RK design tries to capture a discontinuous change in slope at a kink point, it may be incapable of providing accurate estimation with a finite sample if a kinked distribution in an outcome variable against an assignment variable is significantly confounded by noise around the cutoff point. While a similar problem arises in RD designs, RK designs may have a more severe problem due to the intrinsic subtleness of the estimation of a “kink” when compared with a “jump.”

In particular, a confounding smooth nonlinear relationship, e.g., a quadratic or more complicated nonlinearity, between an assignment variable and an outcome variable around a cutoff point could be problematic in a real-world sample, because this smooth nonlinearity could be spuriously captured as a kink. This problem is pointed out and considered by Landais (2015), but has not been directly investigated in the RK literature.Footnote 3

Another recent important contribution to this topic is the study of Ganong and Jäger (2015), which proposes a permutation test for RK estimation, assuming that the assignment of a kink is random around a threshold. The authors’ method directly addresses the issue of confounding nonlinearity around the threshold, but mainly focuses on credible statistical tests for RK designs, not the issue of the degree of bias and imprecision caused by confounding nonlinearity.

Calonico et al. (2014b) (hereafter CCT) also examine a closely related issue. The authors propose robust confidence intervals for RD/RK estimation based on a bias-corrected RD/RK estimator and a corresponding standard error estimator. Their bias-correction technique, which primarily follows the standard approach in the nonparametric literature such as Fan and Gijbels (1996), can potentially reduce bias from confounding nonlinearity around the cutoff in RK designs. The performance of CCT’s procedures in RK designs, however, is still largely unknown because CCT focus on RD designs in their simulation study.Footnote 4

In this paper, therefore, I study how well an RK design can eliminate confounding smooth nonlinearity around the threshold and capture only the “kink” that is generated by a treatment variable. I first examine this problem with MC simulations and then apply an RK design to a real-world situation where I estimate the effect of a fiscal equalization grant on local expenditure with panel data from Japanese municipalities. As I discuss below, the MC simulations and the real-world application are strong complements, yielding useful and practical implications.

In my MC simulations, I use a data-generating process (DGP) that generates both an endogeneity problem and confounding smooth nonlinear relation between an assignment variable and an outcome variable. The DGP can also be interpreted as a stylized version of a Japanese fiscal equalization scheme, which I subsequently investigate. For RK estimation in each simulation, I adopt both a conventional estimator and CCT’s bias-corrected estimator.

My finding in the MC simulations is that a conventional linear RK estimate can be biased when there is a confounding nonlinearity around the kink point. This bias could be mitigated or eliminated by using a quadratic polynomial model and/or CCT’s bias-corrected estimator, but either procedure comes at a cost of reduced precision and power, and this cost is often prohibitive. The simulation results also show that the performance of RK estimation is significantly improved by controlling for observed covariates.

I then examine the plausibility of an RK design using Japanese local public finance panel data. Specifically, I investigate the effect of Japanese fiscal equalization grants on local expenditure by exploiting the kinked formula of the fiscal equalization grant.

My results of the real-world application show that their implications are more or less consistent with those of the MC simulations in at least two respects. First, in both the MC simulations and the empirical application, RK estimation without covariates can be easily biased, and this problem can be mitigated by adding observed covariates to the regressors. Second, in both cases, a smaller bandwidth, a higher-order polynomial, even a quadratic polynomial, and/or CCT’s bias-correction procedures tend to result in highly imprecise estimates.

Overall, my studies with MC simulations and the Japanese municipal data provide robust evidence that RK designs should be used with great caution. The MC simulations, which are based on somewhat arbitrary but known data-generating processes, indicate that RK estimation provides highly misleading results when confounding nonlinearity exists around the threshold. The empirical application, which in turn based on actual data but with an unknown data-generating process, does not provide as conclusive implications as the MC simulations. It does, however, strongly suggest that the problems of bias and imprecision in RK estimation, which are found in the MC simulations, actually exist in real-world settings.

Finally, this paper also considers an informal way to check whether an RK estimation suffers from serious bias or imprecision. The idea is akin to a permutation test proposed by Ganong and Jäger (2015), but I use placebo RK estimates with varying placebo cutoffs not for a statistical test but as a graphical examination, as is done in several RD and RK studies.Footnote 5 Although this placebo RD or RK estimation is not new, I discuss how this procedure may be particularly useful in detecting confounding nonlinearity and examining the validity of an RK design.

The rest of the paper is organized as follows: In Sect. 2, I briefly explain an RK design and discuss potential problems in this approach. Section 3 presents MC simulations with an DGP that can also be interpreted as a stylized fiscal equalization scheme. Section 4 provides an empirical application of an RK design to a real-world situation by studying the effects of fiscal equalization grants on local expenditures in Japanese municipalities. In Sect. 5, I examine placebo RK estimation as a tool to detect bias in RK designs. Section 6 concludes.

2 An identification problem in RK designs

2.1 Estimation in RK designs

Consider the following constant-effect and additive model that is presented by Nielsen et al. (2010):Footnote 6

$$\begin{aligned} Y = \tau B + g(V) + \varepsilon , \end{aligned}$$
(1)

where \(B=b(V)\) is a deterministic and continuous function of V with a kink at \(v=0,g(V)\) is an unrestricted function and \(\varepsilon \) is an error term. They show that if \(g(\cdot )\) and \(E(\varepsilon |V = v)\) have derivatives that are continuous in v at \(v=0\), then the RK estimand \(\tau \) can be expressed as follows:

$$\begin{aligned} \tau = \frac{\left. \displaystyle \lim _{v_0 \rightarrow \ 0^+}\frac{\mathrm{{d}}E(Y|V=v)}{\mathrm{{d}}v}\right| _{v=v_0} - \left. \displaystyle \lim _{v_0 \rightarrow \ 0^-}\frac{\mathrm{{d}}E(Y|V=v)}{\mathrm{{d}}v}\right| _{v=v_0}}{\left. \displaystyle \lim _{v_0 \rightarrow \ 0^+}\frac{\mathrm{{d}}b(v)}{\mathrm{{d}}v}\right| _{v=v_0} - {\left. \displaystyle \lim _{v_0 \rightarrow \ 0^-}\frac{\mathrm{{d}}b(v)}{\mathrm{{d}}v}\right| _{v=v_0}}}. \end{aligned}$$
(2)

Intuitively speaking, the numerator of the RK estimand is the change in the slope of the conditional expectation function \(E(Y|V=v)\) at the kink point \((v=0)\) and the denominator is the change in the slope of the deterministic assignment function b(V) at the kink. As Card et al. (2015) discuss, one important feature of an RK design is that it allows for other less extreme forms of endogeneity if the density of the assignment variable is smooth and rules out deterministic sorting at the kink point.

For clarification, suppose a linearly incremental treatment variable \(B=b(V)=\kappa V\) where \(\kappa >0\) if \(v>0\) and otherwise \(B=b(V)=0\) in equation . Given that \(g(\cdot )\) is a smooth function that is differentiable in V at \(v=0\), the slope of YdY / dV, changes discontinuously at \(v=0\) from \(g'(0)\) to \(\tau \kappa +g'(0)\). Because the change in the slope of \(B=b(V\)) at \(v=0\) is \(\kappa \), the treatment effect \(\tau \) can be recovered as

$$\begin{aligned} \frac{g'(0)+\tau \kappa -g'(0)}{\kappa -0}=\frac{\tau \kappa }{\kappa }=\tau , \end{aligned}$$

using the RK estimand in (2)

The stylized features of an RK design in this setting can also be graphically described as in Fig. 1. Here the effect of B on Y at \(v=0\) is depicted as the ratio of the change in tangent from the line CD (at \(v\rightarrow 0^-\)) to the line C’D’ (at \(v\rightarrow 0^+\)) to the change in the slope of the treatment variable at \(v=0\).

Fig. 1
figure 1

Stylized features of the regression kink design

For estimation with an RK design, previous studies often use a local polynomial model, which is analogous to a local polynomial model in an RD design (Lee and Lemieux 2010):

$$\begin{aligned} Y = \alpha _0 + \beta _0\cdot D + \sum _{p=1}^{\bar{p}}[\alpha _p v^p + \beta _p v^p\cdot D]+\varepsilon \quad \mathrm{{where}}\ |v|\le h, \end{aligned}$$
(3)

where \(\varepsilon \) is a usual random error term and D is a dummy variable which takes one when the assignment variable V exceeds the threshold \(v=0\) and otherwise takes zero. \(\bar{p}\) is the degree of a polynomial and h is the bandwidth that determines the window \([-h;+h]\) within which the sample is selected. Note that Eq. (3) has the term D, implying that this model allows discontinuity at the cutoff, but this does not affect asymptotic bias and variance of the RK estimator.Footnote 7

In this local polynomial regression, the numerator of Eq. (2) is estimated by the OLS estimator of \(\beta _1\). Thus, the RK estimator of \(\tau \) can be obtained by dividing the OLS estimator of \(\beta _1\) by the slope change of \(B=b(V)\) at the cutoff.

2.2 A potential problem in an RK design

Equation (2) holds true only at the cutoff point \(v=0\). In a real-world sample, however, we often need to include observations that are not very close to the threshold, and this inclusion of less relevant observations might lead to a biased estimate. For example, suppose \(g(\cdot )\) can be expressed as a smooth nonlinear (e.g., quadratic) function as depicted in Fig. 1. In this case, it could be difficult to separate out this confounding nonlinearity from a kink generated by the treatment with a finite sample. The resulting estimate of \(\beta _1\), hereafter \(\widehat{\beta }_1\), can thus be biased and systematically different from \(\tau \kappa \).

This is of particular concern if the first-order polynomial \(\bar{p}=1\) is used in Eq. (3) for RK estimation. To illustrate this problem, Fig. 2 replicates Fig. 1 with an arbitrary bandwidth \([-h;+h]\). The kinked line of the treatment variable is dropped for simplicity. In this graph, a kinked line EFG expresses fitted values with the piecewise linear model \(Y=\alpha _0+\alpha _1 V+\beta _1 V\cdot D+\varepsilon \) with bandwidth \([-h;+h]\), which is equivalent to Eq. (3) with \(\bar{p}=1\) and \(\beta _0=0\). In other words, the line EF is a linear fit for observations with \(v\le 0\) and the line FG is a linear fit for observations with \(v>0\), where the continuity of the two lines (kink) at \(v=0\) is imposed. Here, the estimated coefficient \(\widehat{\beta }_1\) is the difference between the slopes of the lines FG and EF, which is clearly different from \(\tau \kappa \).

Fig. 2
figure 2

Bias in an RK estimate with a linear polynomial

The reason for the difference between \(\widehat{\beta }_1\) and \(\tau \kappa \) is intuitively quite straightforward. \(\widehat{\beta }_1\) is different from \(\tau \kappa \) because the linear fits EF and FG are not identical with the tangent lines CD and C’D’ when \(g(\cdot )\) is a nonlinear function around \(v=0\). This gap between \(\beta _1\) and \(\tau \kappa \) can be reduced by making bandwidth \([-h;+h]\) smaller and goes to zero as \(h\rightarrow 0\). However, with a finite sample, the bandwidth may not be sufficiently narrowed to perfectly remove this discrepancy.

In this case, an RK design with a local linear regression results in a biased estimator of \(\tau \). In addition, even if a smaller bandwidth might reduce or even eliminate this bias, it comes at the cost of less precision due to a smaller sample size and less data variation around the cutoff point. An alternative solution is to use a global or local polynomial regression. For example, in Fig. 2, quadratic fits in the both sides of the threshold seem to recover the slope change \(\tau \kappa \) at \(v=0\). However, this procedure may not always work well since RK estimation with a higher-order polynomial incurs the substantial cost of larger variance in the estimator.Footnote 8

In sum, although RK designs are as explicit and straightforward as RD designs, there is some concern about their applicability to real-world finite samples. In the following sections, I discuss the potential defects of an RK design with a finite sample using MC simulations and real-world empirical data.

3 Monte Carlo simulations

3.1 Stylized fiscal equalization scheme

In this section, I implement Monte Carlo (MC) simulations in order to examine the performance of RK estimators with a finite sample and in the presence of a confounding smooth nonlinear relation around the cutoff point. Because the empirical application I examine in the next section is the effects of Japanese fiscal equalization grants on local expenditure, I also set up a DGP based on a stylized fiscal equalization scheme in this section.

Many countries have fiscal equalization schemes that are meant to equalize or alleviate fiscal disparities among local governments through intergovernmental transfers. Although fiscal equalization schemes differ considerably across countries, it can be argued that there are in general two important components in fiscal equalization transfers from an upper-level government to a lower-level government. The first is the equalization of fiscal revenue capacity (revenue equalization), and the second is the equalization of expenditure needs (needs equalization).Footnote 9 According to Dafflon (2007), the fiscal equalization approach that takes into account both revenue equalization and needs equalization is referred to as “need-capacity gap” equalization. Because Japan has a unified fiscal equalization scheme which takes into account both revenue and needs equalization, the Japanese fiscal equalization scheme can be categorized as a need-capacity gap equalization scheme.

Suppose a stylized scheme of “need-capacity gap” equalization where fiscal equalization general (unconditional) grants are distributed to individual local bodies (hereafter municipalities) based on the following kinked assignment rule:

$$\begin{aligned} \mathrm{GRANT}_i= \left\{ \begin{array}{l} 0 \ \ \ \ \ \ \ \text {if}\ \ V_i\le 0\\ V_i \ \ \ \ \ \text {if}\ \ V_i>0, \\ \end{array} \right. \end{aligned}$$

where \(\mathrm{GRANT}_i\) is the amount of the fiscal equalization grant for municipality i at t. \(V_i\) is the “need-capacity gap” of the municipality and defined as follows:

$$\begin{aligned} V_i = \mathrm{NEED}_i - \mathrm{CAP}_i, \end{aligned}$$

where \(\mathrm{NEED}_i\) is the expenditure need, which indicates the total cost of the standard levels of local public services for municipality i. \(\mathrm{CAP}_i\) is the revenue capacity of municipality i which represents the amount of its own tax revenues that municipality i can collect under the standard local tax system.

In short, under this fiscal equalization scheme, grants ensure that all municipalities can provide a standard level of local public services by filling the need-capacity gap in cases where \(\mathrm{NEED}_i\) outweighs \(\mathrm{CAP}_i\). On the other hand, if the need-capacity gap is negative, a municipality is considered wealthy enough to cover their expenditure needs on their own and no grant is provided. I incorporate this stylized fiscal equalization scheme with a kinked grant assignment rule into my MC simulations.

3.2 Data-generating process

One primary reason an RK design is required in the analysis of grant effects on local expenditure is that endogeneity results in a biased estimate if expenditure, hereafter EXP, is regressed on GRANT with a simple OLS.Footnote 10 In this subsection, I construct a DGP which has the following three properties in order to investigate the performance of an RK design: (I) an endogeneity problem when simply regressing EXP on GRANT, (II) a deterministic assignment rule \(\mathrm{GRANT}=b(V)\) with a kink at \(v=0\), and (III) confounding nonlinearity between V and EXP through observed and unobserved covariates.

It is not difficult to include this setting in a DGP for MC simulations. First, \(\mathrm{NEED}_i\) is defined as follows:

$$\begin{aligned} \mathrm{NEED}_i = \mathrm{NEED}^0+ X_i + U_i + \psi _i, \end{aligned}$$

where \(\mathrm{NEED}^0\) indicates a constant basic expenditure need, \(X_i\) is an observed covariate, \(U_i\) is an unobserved covariate, and \(\psi _i\) is a random component. The coefficients of \(X_i\) and \(U_i\) are set as one for simplicity. Second, \(\mathrm{CAP}_i\) is determined as:

$$\begin{aligned} \mathrm{CAP}_i = \mathrm{CAP}^0 + \zeta _i, \end{aligned}$$

where \(\mathrm{CAP}^0\) is a constant basic revenue capacity and \(\zeta _i\) is a random term. For simplicity, I assume that \(\mathrm{CAP}_i\) is not affected by \(X_i\) and \(U_i\).

Finally, the outcome variable \(\mathrm{EXP}_i\) is set to be affected by \(\mathrm{CAP}_i,X_i\) and \(U_i\). One straightforward interpretation of the effect of \(\mathrm{CAP}_i\) on \(\mathrm{EXP}_i\) is that expenditure should be primarily determined by tax revenue capacity. Thus, I set the overall DGP (named as DGP 1) as follows:

$$\begin{aligned} X_i&= x_i \end{aligned}$$
(4)
$$\begin{aligned} U_i&= u_i \end{aligned}$$
(5)
$$\begin{aligned} \mathrm{NEED}_i&= \mathrm{NEED}^0+ X_i + U_i + \psi _i \end{aligned}$$
(6)
$$\begin{aligned} \mathrm{CAP}_i&= \mathrm{CAP}^0 + \zeta _i \end{aligned}$$
(7)
$$\begin{aligned} V_i&= \mathrm{NEED}_i - \mathrm{CAP}_i \end{aligned}$$
(8)
$$\begin{aligned} \mathrm{GRANT}_i&= \left\{ \begin{array}{l} 0 \ \ \ \ \ \ \text {if}\ \ V_i\le 0 \\ V_i \ \ \ \ \ \text {if}\ \ V_i>0 \\ \end{array} \right. \end{aligned}$$
(9)
$$\begin{aligned} \mathrm{EXP}_i&= \tau \mathrm{GRANT}_i + \mathrm{CAP}_i + \mathrm{CAP}^2_i + X_i + X^2_i \nonumber \\&\quad + U_i + U^2_i + \omega _i, \end{aligned}$$
(10)

where \(x_i,u_i,\psi _i,\zeta _i\) and \(\omega _i\) are all \(\mathrm{NID}(0,1)\). In (10) I include the terms \(X^2\) and \(U^2\) so that X and U and have linear effects on V through NEED, but nonlinear effects on EXP. In addition, I also add the term \(\mathrm{CAP}^2\) to this equation, which implies that an increase in CAP nonlinearly boosts EXP. As a result, there is a nonlinear smooth correlation between V and EXP through CAP, X, and U. The coefficients of \(\mathrm{CAP},\mathrm{CAP}^2,X,X^2,U\) and \(U^2\) are set as one.Footnote 11 Finally, \(\tau \) is the homogeneous treatment effect of GRANT on EXP.

In this DGP, the three requirements listed above are fulfilled because the effect of GRANT on EXP cannot be estimated with a simple regression due to an omitted variable bias from unobserved U (I), GRANT is nonstochastically determined by V with a kink at \(v=0\) (II), and there is a confounding nonlinear relation between V and Y through observed covariates CAP and X and an unobserved covariate U (III).

Figure 3 shows a causal diagram for (4)–(10). In this DGP, simple regression with observed covariates cannot properly identify \(\tau \), because a confounding effect from U cannot be controlled for. The treatment variable GRANT, however, deterministically depends on V with the kinked formula (9), and I can thus exploit an RK design by using V as an assignment variable and \(v=0\) as a threshold.

Fig. 3
figure 3

Causal diagram of the stylized fiscal equalization scheme. Note A solid arrow shows causality and an outlined arrow represents a deterministic relation

I also conduct another MC simulation with a different DGP. In this second DGP (DGP 2), I use the following equations instead of (6) and (10), while the other parts of the DGP are the same as DGP 1:

$$\begin{aligned} \mathrm{NEED}_i = \mathrm{NEED}^0+ X_i + X_i^2 + U_i + U_i^2 + \psi _i, \end{aligned}$$
(6')

In these models, X and U have nonlinear quadratic effects on V, but linear one-to-one effects on EXP. In this case, the confounding nonlinear relation between V and EXP is different from a straightforward quadratic U-shape relation as in DGP 1. That is, in DGP 2, a unit increase in X or U leads to a direct linear increase in EXP and quadratic increase in V, not vice versa. Therefore, the resulting distribution of g(V) in Eq. (1) can be more complicated in DGP 2 than in DGP 1, which has a simple U-shape confounding nonlinearity. One important thing to note, however, is that there is still no kinked relation between V and EXP other than through GRANT, implying that the condition that \(g(\cdot )\) has a derivative that is continuous at the cutoff is valid.

3.3 Simulation settings

When it comes to other MC simulation settings, I first impose a one-to-one constant treatment effect by setting \(\tau \) as one in (10) and (10’). Second, \(\mathrm{NEED}^0\) and \(\mathrm{CAP}^0\) are set at 5. Third, the sample size is set to 10, 000, which is close to empirical application I discuss later. Finally, the number of simulations conducted is 1,000.

In simulation analysis based on the above DGPs, \(\tau \) in (10) or (10’) is estimated with an RK design using Eq. (3). Because Eq. (9) shows that the slope of \(\mathrm{GRANT}=b(V)\) changes from zero to one at the threshold, the denominator of the RK estimand (2) is one. The point estimate of \(\beta _1\) with the RK regression model (3) can therefore be interpreted as the causal parameter of interest. Equation (10) or (10’) is estimated using a linear or quadratic polynomial regression and with CCT’s procedures.

In CCT’s procedures, I use both regularized and nonregularized bandwidth selectors, following Card et al. (2015). Although Imbens and Kalyanaraman (2012), who propose a seminal optimal bandwidth selector, and CCT argue that “regularization” is useful to avoid very large bandwidths, Card et al. (2015) point out that bandwidths based on these regularized selectors tend to be too small in their empirical analysis with an RK design. I therefore present RK estimation results with and without the regularization term in CCT’s bandwidth selector. I also provide estimation results with CCT’s optimal bandwidth but without CCT’s bias-correction procedure.

When it comes to the dependent variable, I use not only total expenditure itself but the residuals obtained by regressing total expenditure on observed covariates \(\mathrm{CAP},\mathrm{CAP}^2,X\), and \(X^2\) using the entire sample. The latter estimation enables me to consider how the performance of RK estimation changes if I directly control for observable confounding nonlinearity. The unobserved covariate U, which generates both endogeneity and confounding nonlinearity, is not controlled for in any of the cases.

I present the rejection rates of the null hypotheses of \(\beta _1=1\) (size of test) and \(\beta _1=0\) (power of test), as well as the means and standard deviations of \(\widehat{\beta _1}\). Test levels are set at a conventional 5 percent, and I use both conventional and CCT’s bias-corrected estimators and counterpart standard errors.

Before presenting the results of the MC simulations, Fig. 4 presents an example of the plots and the linear fits of EXP and GRANT against V with simulated data based on DGP 1 and 2. The figure shows that in both DGP 1 and 2 the slope changes of linear fits are greater than the slope changes of GRANT at the kink point, implying biases in an RK estimate with a simple linear polynomial regression.

Fig. 4
figure 4

Plots and linear fits with the stylized fiscal equalization. Note Optimal data-driven plots are generated with the default setting of rdplot command using Stata 13, which is developed in Calonico et al. (2015)

Table 1 Monte Carlo simulations with DGP 1

3.4 Results

The results of MC simulations for DGP 1 are presented in Table 1. First, row 1 shows that RK estimates using a linear polynomial model are severely biased (i.e., far from one) when the entire sample is used for estimation, but RK estimates using a quadratic polynomial model with the same sample exhibit no bias. Rows 2 and 4 suggest that the estimation bias in a linear polynomial regression remains if CCT’s optimal bandwidth is adopted but bias-correction procedure is not incorporated. On the other hand, the means of estimates are close to one and test sizes are around 5 percent when the bias-corrected estimator is used (rows 3 and 5, column I). An important problem, however, is that test power becomes very low if CCT’s optimal bandwidth is used. This tendency is stronger when a quadratic polynomial is used for estimation, the regularization term is introduced in CCT’s procedures, and/or bias-correction and robust estimation are incorporated (rows 2–5). Increased imprecision in RK estimation by these measures is theoretically expected, and the MC results suggest that this problem can indeed be very serious in practice.

Second, rows 6–10 present the same simulation results but with a different outcome variable, that is, the residuals obtained by regressing expenditure on the observed covariates \(X,\mathrm{CAP}\) and their quadratic terms. Row 6 shows that RK estimation with a linear polynomial and the whole sample still generates biased estimates, but a quadratic polynomial regression with the sample works very well as before. Rows 7 and 9 suggest that conventional (not bias-corrected) estimates with CCT’s optimal bandwidth but without regularization are still modestly biased when a linear polynomial is used, but counterpart quadratic polynomial regressions perform relatively well with much higher power than the “no covariates” case. CCT’s bias-corrected estimator and robust inference in rows 8 and 10 also work fine: The test size is close to 5 percent and the test power is significantly improved compared with the “no covariates” case.

Next, the results of MC simulations for DGP 2 are provided in Table 2. Overall implications from these results are similar to those based on DGP 1. First, RK estimates with the entire sample are severely biased in the “no covariates” case (row 1). This bias is mitigated or even eliminated when CCT’s optimal bandwidth and bias-correction are adopted, but at the cost of much lower test power (rows 2–5). This trade-off between bias and variance, however, almost disappears when RK estimation is implemented after conditioning on covariates (rows 6–10).

When it comes to differences between RK estimates from DGP 1 and DGP 2, quadratic RK estimation without covariates seems to work more poorly in DGP 2 with both global and CCT’s bandwidths (rows 1–5, column II). This may imply that the ability of the quadratic polynomial to control for confounding nonlinearity depends on how the nonlinearity is generated. In the specific examples above, simple U-shape confounding nonlinearity in DGP 1 seems to be more easily controlled for by a quadratic polynomial.

Table 2 Monte Carlo simulations with DGP 2

Table 5 in Appendix “MC Simulation results for OLS estimation” provides additional simulation results with which I show OLS estimates are “precisely biased.” The online appendices of Ando (2013), the working paper version of this paper, also show that MC simulations with higher-order (third- and fourth-order) polynomials result in seriously imprecise estimates, and a larger sample leads to more precision and larger power. Ando (2013) also presents the results of MC simulations with a simpler DGP where the fiscal equalization scheme is not incorporated into the models.

Overall, it is hard to fully demonstrate the validity of RK estimation at least in these specific DGPs when we do not include any relevant covariates in the regressors. The simulation results suggest that RK estimation without covariates may generate biased estimates, particularly when a linear polynomial regression is used, because its functional form does not take into account a confounding nonlinear relation between an assignment variable and an outcome variable. This bias could be mitigated or even removed by using CCT’s optimal bandwidth and bias-correction, but the resulting highly imprecise and often statistically insignificant estimates make it difficult to obtain a robust conclusion about the causal effect of interest. The inclusion of covariates may significantly improve RK estimations.

Despite these problems that may undermine the internal validity of estimation as well as inherently limited external validity as a local “treatment on the treated” parameter, it can be argued that RK designs still have some advantages over standard OLS regression as a method of causal inference. First, RK designs can potentially avoid the endogeneity bias that stems from many unobserved or unrecognized correlations by exploiting an explicit source of identification and somewhat testable identifying assumptions. Second, the availability of robustness checks using different model specifications and bandwidth choices is also an attractive feature. In the following section, I examine the usefulness of RK designs by using a sample of Japanese municipality panel data and exploiting a kinked formula in the Japanese fiscal equalization scheme.

4 An empirical application

4.1 Institutional setting

In this section, I apply an RK design to a real-world situation. I exploit the kinked formula of Japanese fiscal equalization grants to identify the effect of these grants on local expenditure.

As mentioned in the last section, the Japanese fiscal equalization scheme allocates general grants to local governments (prefectures and municipalities) in order to compensate for the “need-capacity gap” of each local government and ensure a certain standard of local services for all citizens. This fiscal equalization grant is called a Local Allocation Tax (LAT) grant.Footnote 12 The detailed allocation mechanism of LAT grants is fairly complicated, but the basic framework can be explained as follows.

First, the national-level total amount of LAT grants is determined based on the amount of central tax revenues and political and bureaucratic processes in the central government.Footnote 13 Second, the LAT grant is distributed to individual local bodies based on the same formulas (8) and (9) in Sect. 3.1. In the Japanese fiscal equalization scheme, NEED is officially referred to as “Standard Fiscal Need” and is calculated annually by the central government. CAP is officially referred to as Standard Fiscal Revenue and is also calculated annually by the central government.Footnote 14

The assumed DGP under this scheme is similar to the DGPs in the MC simulations. That is, I expect a smooth nonlinear relationship between the need-capacity gap (V) and local expenditure (EXP) because various covariates should affect both need-capacity gap and local expenditure in various, often nonlinear, ways (see also footnote 11). One difference from the DGPs in the MC simulations is that observed and unobserved variables (X and U in Fig. 3) should affect both NEED and CAP, whereas the DGPs in the MC simulations are constructed such that these variables affect only NEED. This difference, however, does not affect my empirical strategy of employing an RK design that exploits the deterministic kink in the assignment variable V.

Because the objective of this paper is to investigate the performance of an RK design with a confounding nonlinearity, it is preferable to use a clean dataset that does not contain unnecessary noise and to establish some expected bounds of the treatment effect. From this standpoint, there are at least two advantages in studying the kinked assignment of Japanese fiscal equalization grants. First, the problem of potential endogenous sorting can be almost completely ignored because my assignment variable is an indicator, which is calculated by the central government and mostly unmanipulable by local governments. Second, because it is a well-known fact that Japanese municipalities have relatively homogeneous revenue systems regardless of the relative size of their respective LAT grants, it is expected that grant effects on total expenditure will not be much different from one.Footnote 15

4.2 Data and preliminary investigation

In estimations, I use the panel data for cities (shi) covering fiscal years 1980–1999.Footnote 16 I exclude from the sample the cities which experienced amalgamation during the sample period because merged municipalities follow a special fiscal equalization scheme, but a large part of the cities remain in the sample. All of the fiscal data are from Reports on the Municipal Public Finance (Shichoson-betsu Kessan Jokyo Shirabe), which are published annually by the Ministry of Internal Affairs and Communications (MIC). When it comes to observed predetermined covariates, I use revenue capacityFootnote 17, population, population density, population ratios of the elderly cohort and the young cohort, and the sectoral ratios of employment. All the covariates, except for revenue capacity, are from Census data. Because Census data are only available for every fifth year, I impute annual data by linear interpolation. For additional details concerning data construction, see Appendix “Description of data arrangement.”

Table 3 shows the summary statistics of the variables that I use for the empirical application in this section. All the fiscal variables are expressed as per capita values and deflated by Consumer Price Index (CPI: the reference year is 2005) published by MIC. On average, the size of the LAT grant is about 16 % of total expenditure. The sum of revenue capacity and the LAT grant is smaller than total expenditure because there are other important fiscal revenues, including conditional grants from the central and prefectural governments and prefectures and debt financing.

Table 3 Summary statistics
Fig. 5
figure 5

Total expenditure and LAT grant against need-capacity gap. Notes An optimal data-driven plot is generated by the default setting of rdplot command with Stata 13, which is developed in Calonico et al. (2015). Linear fits of expenditure per capita are obtained by RK estimation with a linear polynomial model that imposes continuity at the cutoff. Sources: Reports on the Municipal Public Finance, Census, and CPI

Fig. 6
figure 6

Total expenditure against need-capacity gap (\(bandwidth\ |V|<50\)). Note Optimal data-driven plots are generated by the default setting of rdplot command with Stata 13, which is developed in Calonico et al. (2015). Quadratic fits are based on the quadratic RK model that imposes continuity at the cutoff. The residual variable in the right graph is obtained by regressing total expenditure on all covariates listed in Table 3, their quadratic terms, and year dummies using the whole sample

Before proceeding to econometric analysis, I conduct several preliminary analyses in order to examine the validity of my identification strategy. First, Fig. 5 plots the LAT grant and total expenditure against the need-capacity gap for municipalities (only cities). This graph indicates that the LAT grant has a clear deterministic kink at the threshold and the size of the grant is not negligible for many LAT-receiving municipalities. The linear fits of total expenditure based on RK estimation with a first-order polynomial show that the size of an estimated kink decreases when the bandwidth is shifted from \(|V_{i}|<50\) to \(|V_{i}|<20\), implying the existence of some confounding nonlinear relation between the assignment variable and expenditure per capita.

Second, Fig. 6 shows the plots and quadratic fits of total expenditure (my outcome variable) and its residual against the assignment variable with the bandwidth \(|V_i|<50\). The residual variable in the graph on the right is obtained by regressing total expenditure on all covariates, their quadratic terms, and year dummies using the whole sample. This graph shows that the observed covariates effectively eliminate some nonlinearity within the bandwidth. Although a quadratic fit appears to control for this nonlinearity in the graph on the left, these two graphs also suggest that RK estimation without covariates may suffer from bias due to confounding nonlinearity, particularly when a linear RK model is used for estimation.

Third, a key identifying assumption for a valid RK design is that the density of the assignment variable is continuously differentiable at the threshold. Since the LAT grant is calculated by centrally determined uniform formulas, there is little possibility that municipalities or the central government can precisely manipulate the need-capacity gap around the threshold. It may be suspected, however, that some institutional settings or unknown factors systematically affect the determination of whether or not a given municipality near the threshold becomes an LAT-grant receiver. I therefore conduct a density test analogous to that proposed by McCrary (2008) for the RD design. Both estimation results and graphical analysis indicate that the density of the need-capacity gap is smooth at the threshold. These results are given in Appendix “Smooth density of the assignment variable.”

Finally, according to Card et al. (2015), an important implication under the required conditions for a valid RK design is that any predetermined covariate should have a conditional distribution which evolves smoothly around the threshold. In other words, there should be no kink at the threshold for any predetermined covariate against the assignment variable.

However, the argument presented in Sect. 2 implies that a smooth nonlinear relation between a covariate and an assignment variable around the kink point could be mistakenly captured as a kink using RK estimation. It may thus be hard to assert that there are no kinks whatsoever at the threshold for any covariate. The plots and quadratic fits of covariates against the assignment variable within a certain bandwidth, however, at least indicate that no such kinks are visually apparent in the graphical representation of the data except for the covariate of population. Graphs are provided in Fig. 11 in Appendix “Distributions of covariates against V.”

4.3 Results

Table 4 presents the results of RK estimates generated by applying linear and quadratic polynomials to the empirical model (3). I provide RK estimates with varying bandwidths and CCT’s procedures. As MC simulations, I provide estimation results with and without regularization and bias correction when I adopt CCT’s procedures.

Table 4 RK estimates for total expenditure

First, I examine RK estimation without covariates (rows 1–8). When a linear polynomial is used (column I), estimates are around two when relatively narrow bandwidths (\(|V|=20, 30, \text {or}\ 40\)) are used and then become small and statistically insignificant when bandwidth is \(|V|=10\), probably because of imprecision. When it comes to RK estimation with CCT’s procedures, estimates are significantly different from zero when the regularization term is not introduced (rows 5, 6), and the size of the estimate is 1.37 if the bias-correction procedure is applied (row 6). On the other hand, CCT’s optimal bandwidth with the regularization term makes RK estimates not statistically significant, possibly due to a very narrow bandwidth of 10.42 (rows 7–8). Finally, when I adopt a quadratic polynomial (rows 1–8 in column II), RK estimates without covariates are never significantly different from zero, possibly due to imprecise quadratic RK estimation.

Second, in RK estimation with covariates (rows 9–16), linear RK estimates (column I) are around 0.6 to 1.1 and statistically significant in rows 9–11 and 13–14. These estimates are considerably smaller than their counterpart estimates in the “no covariates” case. This is suggestive evidence that RK estimates without conditioning on covariates suffer from biases from confounding nonlinearity around the kink point, and the introduction of covariates in RK estimation alleviates this problem. Linear RK estimates with narrower bandwidths (\(|V|< 10\) in row 12 and \(|V|<11.86\) in rows 15–16) are again not significantly different from zero, probably because of imprecision. Finally, quadratic RK estimates (rows 9–16 in column II) are mostly not significantly different from zero as in the “no covariates” case, but rows 9, 10, and 13 show that estimates with larger bandwidths (\(|V|>30, 40, \text {and}\ 30.43\), respectively) are around 1.0–1.1 and their standard errors are relatively small. These results imply that the precision of quadratic RK estimation increases by conditioning on covariates. In other words, low precision may be a major reason why quadratic RK estimation without covariates or smaller bandwidths yields statistically insignificant results.

These estimation results are more or less compatible with the implications of the MC simulations. First of all, the introduction of covariates can reduce the biases in RK estimates, particularly estimates with a linear polynomial. Second, RK estimates with a quadratic polynomial fluctuate a lot and are statistically insignificant when no covariates are introduced and CCT’s procedures are used. This is compatible with the MC results, which indicated that the standard deviation of RK estimates is often too high and power too low if a quadratic polynomial is used with CCT’s bias-correction procedure.

In Table 7 in Appendix “OLS results from the empirical application,” I also present the results of OLS estimates. OLS estimates, which are obtained by simply regressing the total expenditure on the grant, change considerably if I introduce covariates or fixed effects, implying severe endogeneity bias. It is difficult to draw useful conclusions about the magnitude of the treatment effect on the basis of these OLS estimates alone. Ando (2013) provides RK estimates with higher-order polynomials. The RK estimates with a third- and fourth-order polynomial are very unstable and mostly statistically insignificant. This could be because RK estimates with higher-order polynomials tend to be very imprecise. Ando (2013) also presents RK estimates for total revenue minus the LAT grant and shows that no effect is observed once covariates are controlled for.

Overall, relatively reliable RK estimation using a lower-order polynomial model with covariates suggests that the effect of the grant on total expenditure is roughly around one. As I have already discussed, this result appears reasonable on the basis of our institutional knowledge of Japanese local public finance. When it comes to the validity of the various RK estimations examined, the empirical results and the MC simulations suggest that an RK design employing a local linear or quadratic regression with a modestly small bandwidth and additional covariates provides arguably the most reliable estimate. CCT’s procedures of bias correction and robust inference should be used with caution because it often produces imprecise estimation, particularly when regularization is incorporated.

5 Detection of confounding nonlinearity

My MC simulations and real-world application show that RK estimation can suffer from a bias caused by confounding nonlinearity around the cutoff and some procedures for this such as a smaller bandwidth, a quadratic polynomial, and CCT’s procedure instead result in highly imprecise and often impractical estimation.

In this section, I provide one complementary method to detect such bias-inducing confounding nonlinearity in RK designs. The idea is quite similar to the permutation inference with placebo RK estimates proposed by Ganong and Jäger (2015). As a complementary analysis of a formal statistical test with permutation, however, I recommend simply showing the distribution of placebo RK estimates along placebo cutoff values in order to detect bias caused by confounding nonlinearity. This method has already been applied in some studies such as Engström et al. (2015) and Ando (2015).

Referring to Ando (2013), Card et al. (2015) also conduct Ganong and Jäger (2015)’s permutation test after controlling for confounding nonlinearity with observed covariates in Appendix “Revenue capacity (modified for a predetermined covariate)” of the working paper version. This procedure is required because the original permutation test may not be informative when the curvature of the conditional expectation function \(E[Y|V=v]\) changes. As a corresponding analysis, I also present the distribution of placebo RK estimates conditional on observed covariates. This analysis enables me to check how controlling for observed covariates can mitigate bias caused by confounding nonlinearity.

I implement placebo RK estimation with a varying placebo cutoff \(c_{plcb}\) using the following model:

$$\begin{aligned} Y_i= & {} \alpha _0 + \beta _0\cdot D_i + \sum _{p=1}^{\bar{p}}[\alpha _p (v_i - c_{plcb})^p \nonumber \\&+ \beta _p (v_i - c_{plcb})^p\cdot D_i]+\varepsilon \ \ \ \text {where}\ |v_i - c_{plcb}|\le h, \end{aligned}$$
(11)

where \(D_i = 1\) if \((v_i - c_{plcb}) > 0\) and otherwise 0. I apply this placebo test to a simulated set of data with DGP 2 in Sect. 3 and the dataset used in Sect. 4. Because the motivation of this placebo test is to detect confounding nonlinearity, I do not use CCT’s bias-correction procedure for RK estimation.

Results for simulated data with DGP 2 are shown in Fig. 7. Here I fix the bandwidth h at 2 and change the placebo cutoff \(c_{plcb}\) from −3, −2.8\( \ldots \) to 3. For estimation, I use two outcome variables, that is, the original outcome EPX and residuals after regressing EXP on observed covariates CAP, CAP\(^2,X\) and \(X^2\).

Fig. 7
figure 7

Placebo RK estimates for simulated data. Notes Data for analysis are generated based on DGP 2 in Sect. 3. All placebo estimates and their confidence intervals are obtained by Eq. (11) with \(p=1\ \text {or}\ 2\). The bandwidth h is fixed at 2. CCT’s procedure for bias correction and a robust confidence interval is not applied to these placebo estimates. A dashed line in each graph indicates a value of one

The two upper graphs in Fig. 7 show placebo RK estimates for EXP with local linear and quadratic regression models. Most placebo estimates with a linear RK model are significantly different from zero (or one, a true “effect” in this DGP) regardless of the values of the placebo cutoff, indicating that there exists severe confounding nonlinearity and that a true RK estimate with \(c=0\) suffers from bias because of it. On the other hand, placebo estimates with a quadratic RK model are almost always statistically insignificant, and their confidence intervals are too large, suggesting highly imprecise estimation.

The two lower graphs, in turn, present placebo estimates when the dependent variable is the residual. Both graphs show that placebo estimates are closer to one, a true ”effect” of GRANT, when placebo cutoffs \(c_{plcb}\) are closer to the true cutoff of zero. On the other hand, placebo estimates are around zero and statistically not different from zero when placebo cutoffs are further away from the true cutoff. This strongly implies that the true RK estimate is not severely biased, although the possibility of confounding nonlinearity exactly at the true cutoff cannot be ruled out by this placebo test.Footnote 18

Figure 8 present placebo RK estimates for total expenditure using the Japanese municipality data. I fix the bandwidth h at 20 and change the placebo cutoff \(c_{plcb}\) from −50, −48 \(\ldots \) to 50. For dependent variables, I use the total expenditure and its residual after regressing it on observed covariates listed in Table 3, their quadratic terms, and year dummies.

Fig. 8
figure 8

Placebo RK estimates for total expenditure. Notes The dataset used in Sect. 4 is utilized for analysis. All placebo estimates and their confidence intervals are obtained with Eq. (11) with \(p=1\ \text {or}\ 2\). The bandwidth h is fixed at 20. CCT’s procedure for bias correction and a robust confidence interval is not applied to these placebo estimates. A dashed line in each graph indicates a value of one

The implications of the graphs are similar to those of Fig. 7. The upper-left graph shows that placebo estimates for total expenditure with a linear RK model seem to be confounded by some nonlinearity because placebo estimates are often significantly different from zero when the placebo cutoffs are between \(-30\) and 30. Placebo estimates for total expenditure with a quadratic RK model presented in the upper-right graph suggest that quadratic RK estimation yields overly wide confidence intervals due to high imprecision.

In contrast, placebo estimates for residuals with a linear RK model presented in the lower-left graph are significantly different from zero only when placebo cutoffs are close to the true cutoff of zero. In particular, placebo estimates are stably around zero when the placebo cutoffs are larger than ten, where the number of observations is relatively large. These results suggest that confounding nonlinearity around the true kink is effectively mitigated by covariates. Finally, placebo estimates with a quadratic RK model presented in the lower-right graph are not as precise as the counterpart graph in Fig. 7, possibly due to higher imprecision in quadratic RK estimation with real-world data.

In sum, the placebo test presented here provides an effective, though supplementary, graphical tool to detect the confounding nonlinearity that can cause bias in RK estimation. For RK estimation to be plausible, placebo RK estimates should be reasonably close to zero at the points where true local RK estimation with a certain bandwidth may capture confounding nonlinearity if it exists.

6 Conclusion

Regression kink (RK) designs, which have several attractive features similar to those of regression discontinuity (RD) designs, are potentially quite useful, but the weakness of this approach has been largely ignored in the emerging applied studies. In order to investigate the validity of RK designs, I first examined the finite sample properties of RK estimation in the presence of a confounding smooth nonlinearity around the cutoff, using Monte Carlo (MC) simulations. Then, I applied an RK design to the study of the causal effects of fiscal equalization grants on the spending of local governments.

The results of the MC simulations suggested that RK estimation often resulted in biased estimates when there was a confounding nonlinear relation between an assignment variable and an outcome variable. Introduction of a higher-order polynomial, a smaller bandwidth, or Calonico et al. (2014b)’s bias-correction procedure could mitigate this bias, but they often resulted in an imprecise estimate and low power that might significantly undermine the results obtained. The simulations also provided evidence that the introduction of observed covariates can improve RK estimation, although the improvement may be insufficient if confounding nonlinearity is generated by unobserved covariates.

When it comes to the real-world application, I first found seemingly very large biases in the estimated effect of the equalization grant on local expenditure when I did not include any additional covariate in my RK estimation. After conditioning on observed covariates, however, relatively robust positive estimates close to one were obtained when a linear polynomial was used for RK estimation. RK estimation with a quadratic polynomial was less robust, but they also imply that the effect of the grant on local expenditure is not far from one.

In conclusion, my MC simulations and real-world application provide mixed answers concerning the usefulness of an RK design with a finite sample. On the one hand, it can be argued that RK analysis is more reliable than a simple pooled or fixed-effect OLS regression when it comes to causal interpretation because an RK design has an explicit identification strategy and transparent tools for validity and robustness checks. On the other hand, if unobserved covariates generate confounding nonlinearity around the cutoff point and the sample size around this point is relatively small, a biased or imprecise estimate may be obtained. In empirical studies employing an RK design, this possible weakness should be explicitly addressed by careful robustness checks and placebo tests. The construction of formal criteria for a sufficiently credible RK design is an important area for further study.