Abstract
We study the undeclared work patterns of Hungarian employees in relatively stable jobs, using a panel dataset that matches individual-level self-reported Labour Force Survey data with administrative records of the Pension Directorate for 2001–2006. We estimate the determinants of undeclared work using Heckman-type random-effects panel probit models, and develop a two-regime model to separate permanent and transitory undeclared work, where the latter follows a Markov chain. We find that about 6–7% of workers went permanently unreported for six consecutive years, and a further 4% were transitorily unreported in any given year. The models show lower reporting rates—especially in the permanent segment—among males, high-school graduates, those in agriculture and transport, small firms and various forms of atypical employment. Transitory non-reporting may be partly explained by administrative records missing for technical reasons. The results suggest that (1) the “aggregate labour input method” widely used in Europe can indeed be a simple yet reliable tool to estimate the size of informal employment, although it slightly overestimates the true magnitude of black work and (2) the long-term pension consequences of undeclared work may be substantial because of the high share of permanent non-reporting.
Similar content being viewed by others
Notes
For instance, 11.5% of employment in full-time equivalent units for Italy in 2004 (Baldassarini 2007), 17–21% of official GDP for Slovenia between 1995 and 2004 (Nastav and Bojnec 2007), 8% of total employment for Spain in 2002 and 20–30% of GDP for Romania between 1996 and 2002 (for these and other EU country results, see GHK/FGB 2009, pp. 49 and 77).
Hungarians typically have no precise information on their accumulated accrual points and expected pensions before they actually retire. The computer records (since 1988) and printed material that records registered workdays and contribution payments are available to individuals before their retirement only after a lengthy administrative procedure.
Similar small differences occur if the data are reweighted according to the LFS weights that are supplied by the CSO to correct for non-response in the LFS. In the following we will use the unweighted data unless stated otherwise. For further details, see Bálint et al. (2010).
For the study of such events, we selected employed workers from the 2001–2006 waves of the LFS, who had entered their jobs more than 12 months before the LFS interview and counted those who said that 12 months before the interview they had not been working.
Homogeneity of the three distributions is not rejected by a Chi squared test (p value is 0.22).
The likelihood function in the RE probit model is given as an integral with respect to the unobserved heterogeneity \( c_{i}^{P} \). In the MSL procedure this is approximated with adaptive Gauss–Hermite quadrature and then maximized. We use 12 integration points, but the results are not sensitive to this choice. Technically, the estimates are obtained with the xtprobit command of the Stata software (version 12).
The average marginal effects are calculated using the gllapred command of the GLLAMM package of Stata, after bootstrapping the estimated model 1000 times.
Confidentiality is a major concern of the CSO. We do not know any case of misusing individual statistical data since 1946 (when census data were used to identify ethnic Germans).
In addition, Table 7 of “Appendix 1” displays the composition of workers split according to whether they required NPID data at all. Most high-risk groups (such as people with secondary school attainment, farmers, service and construction workers, those working in atypical forms of employment, casual workers, the self-employed and those in micro-firms) are roughly equally represented in the two groups of LFS respondents. The proportion of men and residents of Budapest are under-represented among those requested NPID data, which we attribute to the low response probability of these groups in population surveys in general. Young people are obviously much less interested in their pension prospects than their older counterparts. However, workers in the first year of their tenure were less likely to make it to the merged sample.
We do not use the linear specification [Eq. (1)] in models with endogenous selection, because the estimation of such models depends crucially on the distribution of the error term (e.g. normality). \( R_{it} \) is close to binary, and hence a binary model on \( Q_{it} \) is more appropriate. For the same reason we do not use a tobit-type model for \( R_{it} \), which is bounded on the unit interval, or a double hurdle model for the joint analysis of sample membership and reporting rates.
Technically, the system is estimated using the gllamm package of the Stata software (Rabe-Hesketh et al. 2004).
To keep the model structure relatively simple, we do not incorporate endogenous selection into the two-regime model. According to Sect. 4.2, the model with endogenous selection yields qualitatively similar results to the baseline ones.
The notation here differs slightly from the previous section because \( \varvec{X}_{i} \) now does not contain the year 2008 values of the time-varying variables (work experience and tenure).
This partly reflects the higher than average ratio of undeclared work among self-employed. 90% of those in one-member firms reported themselves as self-employed, and, on the other hand, more than 60% of self-employed people reported firm size of one. Hence, although the marginal effect of self-employment is negligible in our models, Table 3 shows that self-employed people are by 20% points more likely to be unreported than employees.
The parameters of settlement type dummies (county capitals, other urban areas and villages) become economically and statistically insignificant once the micro-regional unemployment rates and residence in Budapest are controlled for, therefore these dummies are not included in the equations.
According to the LFS, the median worker spends ten consecutive years on average in the same job, provided she has spent at least 2 years there.
These workers make up less than one tenth of the employment stock at any time according to LFS.
References
Abowd JM, Stinson MH (2013) Estimating measurement error in annual job earnings: a comparison of survey and administrative data. Rev Econ Stat 95(5):1451–1467. https://doi.org/10.1162/REST_a_00352
Baldassarini A (2007) The Italian approach of measuring undeclared work: description and main strengths. Seminar on measurement of undeclared work, organized by DG EMPL. Presentation, Brussels, December. http://ec.europa.eu/social/BlobServlet?docId=2770&langId=en. Accessed 15 Jan 2018
Baldini M, Bosi P, Lalla M (2009) Tax evasion and misreporting in income tax returns and household income surveys. Pol Econ 25(3):333–348. https://doi.org/10.1429/30792
Bálint M, Köllő J, Molnár G (2010) Accrual years and the life cycle (in Hungarian). Stat Szle 88(6):623–647
Benedek D, Elek P, Köllő J (2013) Tax avoidance, tax evasion, black and grey employment. In: Fazekas K, Benczúr P, Telegdy Á (eds) The Hungarian labour market, review and analysis 2013. Hungarian Academy of Sciences Institute of Economics, Budapest, pp 161–187. http://econ.core.hu/file/download/HLM2013/TheHungarianLabourMarket_2013_InFocusI.pdf. Accessed 15 Jan 2018
Buehn A, Schneider F (2012) Shadow economies around the world: novel insights, accepted knowledge, and new estimates. Int Tax Public Finance 19(1):139–171. https://doi.org/10.1007/s10797-011-9187-7
Cichocki S, Tyrowicz J (2010) Shadow employment in post-transition: is informal employment a matter of choice or no choice in Poland? J Soc Econ 39(4):527–535. https://doi.org/10.1016/j.socec.2010.03.003
Eckel CC, Grossman PJ (2008) Men, women and risk aversion: experimental evidence. In: Plott CR, Smith VL (eds) Handbook of experimental economics results, vol 1, Ch. 113. Elsevier, Amsterdam, pp 1061–1073. https://doi.org/10.1016/s1574-0722(07)00113-8
Elek P, Scharle Á, Szabó B, Szabó PA (2009) Measuring undeclared employment in Hungary (in Hungarian, with abstract in English). In: Semjén A, Tóth IJ (eds) Rejtett gazdaság. Be nem jelentett foglalkoztatás és jövedelemeltitkolás—kormányzati lépések és a gazdasági szereplők válaszai, KTI Könyvek, vol 11. Hungarian Academy of Sciences Institute of Economics, Budapest, pp 84–102. http://econ.core.hu/file/download/ktik11/ktik11_08_feketefoglalkoztatas.pdf. Abstract in English: http://econ.core.hu/file/download/ktik11/ktik11_15_abstracts.pdf. Accessed 15 Jan 2018
Elek P, Köllő J, Reizer B, Szabó PA (2012) Detecting wage underreporting using a double-hurdle model. In: Lehmann H, Tatsiramos K (eds) Research in labor economics, vol 34 (Informal Employment in Emerging and Transition Economies), Ch. 4. Emerald Group Publishing Limited, Bingley, pp 135–166. https://doi.org/10.1108/s0147-9121(2012)0000034007
European Commission (2014) Undeclared work in the European Union. Special Eurobarometer Report No. 402/wave EB79.2. http://ec.europa.eu/public_opinion/archives/ebs/ebs_402_en.pdf. Accessed 15 Jan 2018
Feld LP, Larsen C (2012) Undeclared work, deterrence and social norms: the case of Germany. Springer, Berlin
GHK/FGB (2009) Study on indirect measurement methods for undeclared work in the EU (VC/2008/0305). European Commission, Directorate-General Employment, Social Affairs and Equal Opportunities. Final Report submitted by GHK and Fondazione G. Brodolini. December. http://ec.europa.eu/social/BlobServlet?docId=4546&langId=en. Accessed 15 Jan 2018
Meriküll J, Staehr K (2010) Unreported employment and envelope wages in mid-transition: comparing developments and causes in the Baltic countries. Comp Econ Stud 52(4):637–670. https://doi.org/10.1057/ces.2010.17
Nastav B, Bojnec S (2007) Shadow economy in Slovenia: the labour approach. Manag Glob Transit 5: 193–208. http://www.fm-kp.si/zalozba/ISSN/1581-6311/5_193-208.pdf. Accessed 15 Jan 2018
Paulus A (2015) Tax evasion and measurement error: an econometric analysis of survey data linked with tax records. Institute for social and economic research working papers 2015/10
Pénzügyi Tudakozó (2016) Pension calculation: how are accrual years calculated? (in Hungarian). http://penzugyi-tudakozo.hu/nyugdij-szamitas-hogyan-szamoljak-a-szolgalati-idot/. Accessed 15 Jan 2018
Pickhardt M, Prinz A (2014) Behavioral dynamics of tax evasion: a survey. J Econ Psychol 40:1–19. https://doi.org/10.1016/j.joep.2013.08.006
Pischke J-S (1995) Measurement error and earnings dynamics: some estimates from the PSID validations study. J Bus Econ Stat 13(3):305–314. https://doi.org/10.1080/07350015.1995.10524604
Rabe-Hesketh S, Skrondal A, Pickles A (2004) Generalized multilevel structural equation modelling. Psychometrika 69(2):167–190. https://doi.org/10.1007/BF02295939
Renooy P, Ivarsson S, van der Wusten-Gritsai O, Meijer R (2004) Undeclared work in an enlarged Union. Final report, European Commission Directorate-General for Employment and Social Affairs, May. http://www.social-law.net/IMG/pdf/undecl_work_final_en.pdf. Accessed 15 Jan 2018
Schneider F (2012) The shadow economy and work in the shadow: what do we (not) know? IZA Discussion Paper, No. 6423
Semjén A, Tóth IJ, Medgyesi M, Czibik Á (2009) Tax evasion and corruption: population involvement and acceptance. In: Semjén A, Tóth IJ (eds) Rejtett gazdaság. Be nem jelentett foglalkoztatás és jövedelemeltitkolás—kormányzati lépések és a gazdasági szereplők válaszai, KTI Könyvek, vol 11. Hungarian Academy of Sciences Institute of Economics, Budapest, pp 228–258. http://econ.core.hu/file/download/ktik11/ktik11_14_lakossagi.pdf. Abstract in English: http://econ.core.hu/file/download/ktik11/ktik11_15_abstracts.pdf. Accessed 15 Jan 2018
Slemrod J, Yitzhaki S (2002) Tax avoidance, evasion, and administration. In: Auerbach AJ, Feldstein M (eds) Handbook of public economics 1st edn, vol 3, no 3, Ch. 22. Elsevier, Amsterdam, pp 1423–1470
Williams CC (2007) Tackling undeclared work in Europe: lessons from a study of Ukraine. Eur J Ind Relat 13(2):219–236. https://doi.org/10.1177/0959680107078254
Wooldridge JM (2010) Econometric analysis of cross section and panel data. MIT Press, Cambridge
Acknowledgements
The authors would like to thank Anikó Bíró, Márton Csillag and Gábor Kézdi for useful comments on an earlier version of the paper. Péter Elek was supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences and later by the ÚNKP-17-4 New National Excellence Program of the Ministry of Human Capacities in Hungary.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
See Fig. 2 and Tables 6, 7 and 8.
Appendix 2: Maximum likelihood estimation of the two-regime model
Let \( {\text{t}}_{\text{i}} \) denote worker \( {\text{i}} \)’s first year in the sample (which can take values between 2001 and 2006) and let us use the notations \( p_{it}^{\left( k \right)} = \Pr \left( {Q_{it} = k | J_{i} = 0, \varvec{ X}_{it} } \right) \) for the conditional probability of transitory non-reporting (k = 1) and reporting (k = 0), respectively, at time t.
If \( \prod\nolimits_{{t = t_{i} }}^{2006} {Q_{it} } = 0 \) (i.e. if the person is reported at least once), then for kt ∊ {0, 1} \( ( {t = t_{i} , \ldots , 2006} ) \):
because in this case, by definition, the worker belongs to the transitory regime and then the joint probability of her undeclared pattern can be written as a product of the starting probability \( p_{{i,t_{i} }}^{{( {k_{{t_{i} }} } )}} \) and the transition probabilities \( p_{it}^{{\left( {k_{t - 1} ,k_{t} } \right)}} \).
If \( \prod\nolimits_{{t = t_{i} }}^{2006} {Q_{it} } = 1 \) (i.e. if the person is never reported), then
where the first term shows the contribution of the permanent regime and the second term gives the probability that a person of the transitory regime is undeclared for the whole period.
For workers who entered the sample at the start of their current job (i.e. for whom \( C_{{i,t_{i} }} = 1 \)), Eq. (7) implies that \( p_{{i,t_{i} }}^{\left( k \right)} = p_{{i,t_{i} }}^{{\left( {0,k} \right)}} \) and hence the above likelihood calculation is complete for them. However, the majority of our observations are left-censored because Ci,2001 > 1 for most workers who entered our sample in 2001. The missing observations from their work history could be tackled, for instance, using the expectation–maximization (EM) algorithm, or we can follow a computationally less intensive but equally satisfactory approach. Indeed, we first note that p (1) it can be calculated recursively by considering the probability of being undeclared in the previous year (p (1) i, t−1 ) and the transition probabilities (p (11) it and p (01) it ):
Hence, to obtain p (1) i,2001 and p (0) i,2001 —only they are needed in Eqs. (9)–(10) for the likelihood—we can go back to (for example) year 1997, approximate p (1) i,1997 as the stationary distribution of the two-state Markov chain whose transition probabilities are fixed at their 1997 levels:
and then calculate recursively the starting values p (1) i,2001 and p (0) i,2001 . This final step is only an approximation, because (1) the transition probabilities are slightly time-varying and (2) the Markov chain may not have reached the stationary distribution for some workers by 1997. Nevertheless, since the probability of return to the reported state, \( {\text{p}}_{\text{it}}^{{\left( {10} \right)}} \), turns out to be relatively large in our case (see Sect. 4.1), the corresponding Markov chain has good mixing properties, and it seems to be enough to go back 4 years in time to get a satisfactory approximation for \( {\text{p}}_{{{\text{i}},2001}}^{\left( 1 \right)} \) in Eqs. (9)–(10). We note that the maximum likelihood estimates turn out to be almost identical, irrespective of whether 1996, 1997 or 1998 is used as the start year in the approximation.
Rights and permissions
About this article
Cite this article
Elek, P., Köllő, J. Eliciting permanent and transitory undeclared work from matched administrative and survey data. Empirica 46, 547–576 (2019). https://doi.org/10.1007/s10663-018-9403-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10663-018-9403-0
Keywords
- Undeclared work
- Labour input method
- Matched administrative-survey data
- Random-effects panel probit with endogenous selection
- Markov chain