Eliciting permanent and transitory undeclared work from matched administrative and survey data

Elek, Péter; Köllő, János

doi:10.1007/s10663-018-9403-0

Eliciting permanent and transitory undeclared work from matched administrative and survey data

Original Paper
Published: 09 April 2018

Volume 46, pages 547–576, (2019)
Cite this article

Empirica Aims and scope Submit manuscript

344 Accesses
7 Citations
Explore all metrics

Abstract

We study the undeclared work patterns of Hungarian employees in relatively stable jobs, using a panel dataset that matches individual-level self-reported Labour Force Survey data with administrative records of the Pension Directorate for 2001–2006. We estimate the determinants of undeclared work using Heckman-type random-effects panel probit models, and develop a two-regime model to separate permanent and transitory undeclared work, where the latter follows a Markov chain. We find that about 6–7% of workers went permanently unreported for six consecutive years, and a further 4% were transitorily unreported in any given year. The models show lower reporting rates—especially in the permanent segment—among males, high-school graduates, those in agriculture and transport, small firms and various forms of atypical employment. Transitory non-reporting may be partly explained by administrative records missing for technical reasons. The results suggest that (1) the “aggregate labour input method” widely used in Europe can indeed be a simple yet reliable tool to estimate the size of informal employment, although it slightly overestimates the true magnitude of black work and (2) the long-term pension consequences of undeclared work may be substantial because of the high share of permanent non-reporting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The Role of Mental Health on Workplace Productivity: A Critical Review of the Literature

Article 15 November 2022

Claire de Oliveira, Makeila Saka, … Rowena Jacobs

The Impact of Technological Progress on the Future of Work: Insights from a Survey on Alternative Employment Contracts in OECD Countries

Article Open access 25 January 2024

Thibaud Deruelle, Andrey Ugarte Montero & Joël Wagner

Women’s and men’s work, housework and childcare, before and during COVID-19

Article 06 September 2020

Daniela Del Boca, Noemi Oggero, … Mariacristina Rossi

Notes

Examples are the consumption-income discrepancy method, the monetary method, the electricity consumption method and econometric methods such as MIMIC (Renooy et al. 2004; Schneider 2012).
For instance, 11.5% of employment in full-time equivalent units for Italy in 2004 (Baldassarini 2007), 17–21% of official GDP for Slovenia between 1995 and 2004 (Nastav and Bojnec 2007), 8% of total employment for Spain in 2002 and 20–30% of GDP for Romania between 1996 and 2002 (for these and other EU country results, see GHK/FGB 2009, pp. 49 and 77).
Hungarians typically have no precise information on their accumulated accrual points and expected pensions before they actually retire. The computer records (since 1988) and printed material that records registered workdays and contribution payments are available to individuals before their retirement only after a lengthy administrative procedure.
Similar small differences occur if the data are reweighted according to the LFS weights that are supplied by the CSO to correct for non-response in the LFS. In the following we will use the unweighted data unless stated otherwise. For further details, see Bálint et al. (2010).
For the study of such events, we selected employed workers from the 2001–2006 waves of the LFS, who had entered their jobs more than 12 months before the LFS interview and counted those who said that 12 months before the interview they had not been working.
Homogeneity of the three distributions is not rejected by a Chi squared test (p value is 0.22).
The likelihood function in the RE probit model is given as an integral with respect to the unobserved heterogeneity $ c_{i}^{P} $. In the MSL procedure this is approximated with adaptive Gauss–Hermite quadrature and then maximized. We use 12 integration points, but the results are not sensitive to this choice. Technically, the estimates are obtained with the xtprobit command of the Stata software (version 12).
The average marginal effects are calculated using the gllapred command of the GLLAMM package of Stata, after bootstrapping the estimated model 1000 times.
The raw data in Table 6 of “Appendix 1” show that 29% responded positively in the first wave and 25% in subsequent waves.
Confidentiality is a major concern of the CSO. We do not know any case of misusing individual statistical data since 1946 (when census data were used to identify ethnic Germans).
In addition, Table 7 of “Appendix 1” displays the composition of workers split according to whether they required NPID data at all. Most high-risk groups (such as people with secondary school attainment, farmers, service and construction workers, those working in atypical forms of employment, casual workers, the self-employed and those in micro-firms) are roughly equally represented in the two groups of LFS respondents. The proportion of men and residents of Budapest are under-represented among those requested NPID data, which we attribute to the low response probability of these groups in population surveys in general. Young people are obviously much less interested in their pension prospects than their older counterparts. However, workers in the first year of their tenure were less likely to make it to the merged sample.
We do not use the linear specification [Eq. (1)] in models with endogenous selection, because the estimation of such models depends crucially on the distribution of the error term (e.g. normality). $ R_{it} $ is close to binary, and hence a binary model on $ Q_{it} $ is more appropriate. For the same reason we do not use a tobit-type model for $ R_{it} $, which is bounded on the unit interval, or a double hurdle model for the joint analysis of sample membership and reporting rates.
Technically, the system is estimated using the gllamm package of the Stata software (Rabe-Hesketh et al. 2004).
To keep the model structure relatively simple, we do not incorporate endogenous selection into the two-regime model. According to Sect. 4.2, the model with endogenous selection yields qualitatively similar results to the baseline ones.
The notation here differs slightly from the previous section because $ \varvec{X}_{i} $ now does not contain the year 2008 values of the time-varying variables (work experience and tenure).
This partly reflects the higher than average ratio of undeclared work among self-employed. 90% of those in one-member firms reported themselves as self-employed, and, on the other hand, more than 60% of self-employed people reported firm size of one. Hence, although the marginal effect of self-employment is negligible in our models, Table 3 shows that self-employed people are by 20% points more likely to be unreported than employees.
The parameters of settlement type dummies (county capitals, other urban areas and villages) become economically and statistically insignificant once the micro-regional unemployment rates and residence in Budapest are controlled for, therefore these dummies are not included in the equations.
According to the LFS, the median worker spends ten consecutive years on average in the same job, provided she has spent at least 2 years there.
These workers make up less than one tenth of the employment stock at any time according to LFS.

References

Abowd JM, Stinson MH (2013) Estimating measurement error in annual job earnings: a comparison of survey and administrative data. Rev Econ Stat 95(5):1451–1467. https://doi.org/10.1162/REST_a_00352
Article Google Scholar
Baldassarini A (2007) The Italian approach of measuring undeclared work: description and main strengths. Seminar on measurement of undeclared work, organized by DG EMPL. Presentation, Brussels, December. http://ec.europa.eu/social/BlobServlet?docId=2770&langId=en. Accessed 15 Jan 2018
Baldini M, Bosi P, Lalla M (2009) Tax evasion and misreporting in income tax returns and household income surveys. Pol Econ 25(3):333–348. https://doi.org/10.1429/30792
Google Scholar
Bálint M, Köllő J, Molnár G (2010) Accrual years and the life cycle (in Hungarian). Stat Szle 88(6):623–647
Google Scholar
Benedek D, Elek P, Köllő J (2013) Tax avoidance, tax evasion, black and grey employment. In: Fazekas K, Benczúr P, Telegdy Á (eds) The Hungarian labour market, review and analysis 2013. Hungarian Academy of Sciences Institute of Economics, Budapest, pp 161–187. http://econ.core.hu/file/download/HLM2013/TheHungarianLabourMarket_2013_InFocusI.pdf. Accessed 15 Jan 2018
Buehn A, Schneider F (2012) Shadow economies around the world: novel insights, accepted knowledge, and new estimates. Int Tax Public Finance 19(1):139–171. https://doi.org/10.1007/s10797-011-9187-7
Article Google Scholar
Cichocki S, Tyrowicz J (2010) Shadow employment in post-transition: is informal employment a matter of choice or no choice in Poland? J Soc Econ 39(4):527–535. https://doi.org/10.1016/j.socec.2010.03.003
Article Google Scholar
Eckel CC, Grossman PJ (2008) Men, women and risk aversion: experimental evidence. In: Plott CR, Smith VL (eds) Handbook of experimental economics results, vol 1, Ch. 113. Elsevier, Amsterdam, pp 1061–1073. https://doi.org/10.1016/s1574-0722(07)00113-8
Elek P, Scharle Á, Szabó B, Szabó PA (2009) Measuring undeclared employment in Hungary (in Hungarian, with abstract in English). In: Semjén A, Tóth IJ (eds) Rejtett gazdaság. Be nem jelentett foglalkoztatás és jövedelemeltitkolás—kormányzati lépések és a gazdasági szereplők válaszai, KTI Könyvek, vol 11. Hungarian Academy of Sciences Institute of Economics, Budapest, pp 84–102. http://econ.core.hu/file/download/ktik11/ktik11_08_feketefoglalkoztatas.pdf. Abstract in English: http://econ.core.hu/file/download/ktik11/ktik11_15_abstracts.pdf. Accessed 15 Jan 2018
Elek P, Köllő J, Reizer B, Szabó PA (2012) Detecting wage underreporting using a double-hurdle model. In: Lehmann H, Tatsiramos K (eds) Research in labor economics, vol 34 (Informal Employment in Emerging and Transition Economies), Ch. 4. Emerald Group Publishing Limited, Bingley, pp 135–166. https://doi.org/10.1108/s0147-9121(2012)0000034007
European Commission (2014) Undeclared work in the European Union. Special Eurobarometer Report No. 402/wave EB79.2. http://ec.europa.eu/public_opinion/archives/ebs/ebs_402_en.pdf. Accessed 15 Jan 2018
Feld LP, Larsen C (2012) Undeclared work, deterrence and social norms: the case of Germany. Springer, Berlin
Book Google Scholar
GHK/FGB (2009) Study on indirect measurement methods for undeclared work in the EU (VC/2008/0305). European Commission, Directorate-General Employment, Social Affairs and Equal Opportunities. Final Report submitted by GHK and Fondazione G. Brodolini. December. http://ec.europa.eu/social/BlobServlet?docId=4546&langId=en. Accessed 15 Jan 2018
Meriküll J, Staehr K (2010) Unreported employment and envelope wages in mid-transition: comparing developments and causes in the Baltic countries. Comp Econ Stud 52(4):637–670. https://doi.org/10.1057/ces.2010.17
Article Google Scholar
Nastav B, Bojnec S (2007) Shadow economy in Slovenia: the labour approach. Manag Glob Transit 5: 193–208. http://www.fm-kp.si/zalozba/ISSN/1581-6311/5_193-208.pdf. Accessed 15 Jan 2018
Paulus A (2015) Tax evasion and measurement error: an econometric analysis of survey data linked with tax records. Institute for social and economic research working papers 2015/10
Pénzügyi Tudakozó (2016) Pension calculation: how are accrual years calculated? (in Hungarian). http://penzugyi-tudakozo.hu/nyugdij-szamitas-hogyan-szamoljak-a-szolgalati-idot/. Accessed 15 Jan 2018
Pickhardt M, Prinz A (2014) Behavioral dynamics of tax evasion: a survey. J Econ Psychol 40:1–19. https://doi.org/10.1016/j.joep.2013.08.006
Article Google Scholar
Pischke J-S (1995) Measurement error and earnings dynamics: some estimates from the PSID validations study. J Bus Econ Stat 13(3):305–314. https://doi.org/10.1080/07350015.1995.10524604
Google Scholar
Rabe-Hesketh S, Skrondal A, Pickles A (2004) Generalized multilevel structural equation modelling. Psychometrika 69(2):167–190. https://doi.org/10.1007/BF02295939
Article Google Scholar
Renooy P, Ivarsson S, van der Wusten-Gritsai O, Meijer R (2004) Undeclared work in an enlarged Union. Final report, European Commission Directorate-General for Employment and Social Affairs, May. http://www.social-law.net/IMG/pdf/undecl_work_final_en.pdf. Accessed 15 Jan 2018
Schneider F (2012) The shadow economy and work in the shadow: what do we (not) know? IZA Discussion Paper, No. 6423
Semjén A, Tóth IJ, Medgyesi M, Czibik Á (2009) Tax evasion and corruption: population involvement and acceptance. In: Semjén A, Tóth IJ (eds) Rejtett gazdaság. Be nem jelentett foglalkoztatás és jövedelemeltitkolás—kormányzati lépések és a gazdasági szereplők válaszai, KTI Könyvek, vol 11. Hungarian Academy of Sciences Institute of Economics, Budapest, pp 228–258. http://econ.core.hu/file/download/ktik11/ktik11_14_lakossagi.pdf. Abstract in English: http://econ.core.hu/file/download/ktik11/ktik11_15_abstracts.pdf. Accessed 15 Jan 2018
Slemrod J, Yitzhaki S (2002) Tax avoidance, evasion, and administration. In: Auerbach AJ, Feldstein M (eds) Handbook of public economics 1st edn, vol 3, no 3, Ch. 22. Elsevier, Amsterdam, pp 1423–1470
Williams CC (2007) Tackling undeclared work in Europe: lessons from a study of Ukraine. Eur J Ind Relat 13(2):219–236. https://doi.org/10.1177/0959680107078254
Article Google Scholar
Wooldridge JM (2010) Econometric analysis of cross section and panel data. MIT Press, Cambridge
Google Scholar

Download references

Acknowledgements

The authors would like to thank Anikó Bíró, Márton Csillag and Gábor Kézdi for useful comments on an earlier version of the paper. Péter Elek was supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences and later by the ÚNKP-17-4 New National Excellence Program of the Ministry of Human Capacities in Hungary.

Author information

Authors and Affiliations

Department of Economics, Eötvös Loránd University (ELTE), Pázmány Péter sétány 1/A, Budapest, 1117, Hungary
Péter Elek
Institute of Economics, Hungarian Academy of Sciences (MTA KRTK), Tóth Kálmán u. 4., Budapest, 1097, Hungary
János Köllő
IZA, Bonn, Germany
János Köllő

Authors

Péter Elek
View author publications
You can also search for this author in PubMed Google Scholar
János Köllő
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Péter Elek.

Appendices

Appendix 1

See Fig. 2 and Tables 6, 7 and 8.

Table 6 The percentage share of selected groups among respondents requesting NPID data after the first LFS visit versus later visits

Full size table

Table 7 The composition of employed LFS respondents inside and outside the merged sample (per cent)

Full size table

Table 8 Descriptive statistics of the simulated ratios of workers undeclared for exactly k years $ \left( {k = 1, 2, \ldots , 6} \right) $ in the 6-year period and the observed distribution in the sample (in per cent)

Full size table

Appendix 2: Maximum likelihood estimation of the two-regime model

Let $ {\text{t}}_{\text{i}} $ denote worker $ {\text{i}} $’s first year in the sample (which can take values between 2001 and 2006) and let us use the notations $ p_{it}^{\left( k \right)} = \Pr \left( {Q_{it} = k | J_{i} = 0, \varvec{ X}_{it} } \right) $ for the conditional probability of transitory non-reporting (k = 1) and reporting (k = 0), respectively, at time t.

If $ \prod\nolimits_{{t = t_{i} }}^{2006} {Q_{it} } = 0 $ (i.e. if the person is reported at least once), then for k_t ∊ {0, 1} $ ( {t = t_{i} , \ldots , 2006} ) $:

$$ \Pr \left( {\mathop {\bigcap }\limits_{{t = t_{i} }}^{2006} \left\{ {Q_{it} = k_{t} } \right\} | \left\{ {\varvec{X}_{it} } \right\}} \right) = \left( {1 - \varPhi \left( {\varvec{X}_{i}\varvec{\beta}_{Z} } \right)} \right)*p_{{i,t_{i} }}^{{\left( {k_{{t_{i} }} } \right)}} *\mathop \prod \limits_{{t = t_{i} + 1}}^{2006} p_{it}^{{\left( {k_{t - 1} ,k_{t} } \right)}} $$

(9)

because in this case, by definition, the worker belongs to the transitory regime and then the joint probability of her undeclared pattern can be written as a product of the starting probability $ p_{{i,t_{i} }}^{{( {k_{{t_{i} }} } )}} $ and the transition probabilities $ p_{it}^{{\left( {k_{t - 1} ,k_{t} } \right)}} $.

If $ \prod\nolimits_{{t = t_{i} }}^{2006} {Q_{it} } = 1 $ (i.e. if the person is never reported), then

$$ \Pr \left( {\mathop {\bigcap }\limits_{{t = t_{i} }}^{2006} \left\{ {Q_{it} = 1} \right\} | \left\{ {\varvec{X}_{it} } \right\}} \right) = \varPhi \left( {\varvec{X}_{i}\varvec{\beta}_{Z} } \right) + \left( {1 - \varPhi \left( {\varvec{X}_{i}\varvec{\beta}_{Z} } \right)} \right)*p_{{i,t_{i} }}^{\left( 1 \right)} *\mathop \prod \limits_{{t = t_{i} + 1}}^{2006} p_{it}^{{\left( {11} \right)}} , $$

(10)

where the first term shows the contribution of the permanent regime and the second term gives the probability that a person of the transitory regime is undeclared for the whole period.

For workers who entered the sample at the start of their current job (i.e. for whom $ C_{{i,t_{i} }} = 1 $), Eq. (7) implies that $ p_{{i,t_{i} }}^{\left( k \right)} = p_{{i,t_{i} }}^{{\left( {0,k} \right)}} $ and hence the above likelihood calculation is complete for them. However, the majority of our observations are left-censored because C_i,2001 > 1 for most workers who entered our sample in 2001. The missing observations from their work history could be tackled, for instance, using the expectation–maximization (EM) algorithm, or we can follow a computationally less intensive but equally satisfactory approach. Indeed, we first note that p ⁽¹⁾_it can be calculated recursively by considering the probability of being undeclared in the previous year (p ⁽¹⁾_{i,
t−1} ) and the transition probabilities (p ⁽¹¹⁾_it and p ⁽⁰¹⁾_it ):

$$ p_{it}^{\left( 1 \right)} = p_{it}^{{\left( {11} \right)}} *p_{i,t - 1}^{\left( 1 \right)} + p_{it}^{{\left( {01} \right)}} *\left( {1 - p_{i,t - 1}^{\left( 1 \right)} } \right) . $$

(11)

Hence, to obtain p ⁽¹⁾_i,2001 and p ⁽⁰⁾_i,2001 —only they are needed in Eqs. (9)–(10) for the likelihood—we can go back to (for example) year 1997, approximate p ⁽¹⁾_i,1997 as the stationary distribution of the two-state Markov chain whose transition probabilities are fixed at their 1997 levels:

$$ {\text{p}}_{{{\text{i}},1997}}^{\left( 1 \right)} \approx {\text{p}}_{{{\text{i}},1997}}^{{\left( {01} \right)}} /\left( {{\text{p}}_{{{\text{i}},1997}}^{{\left( {10} \right)}} + {\text{p}}_{{{\text{i}},1997}}^{{\left( {01} \right)}} } \right) , $$

(12)

and then calculate recursively the starting values p ⁽¹⁾_i,2001 and p ⁽⁰⁾_i,2001 . This final step is only an approximation, because (1) the transition probabilities are slightly time-varying and (2) the Markov chain may not have reached the stationary distribution for some workers by 1997. Nevertheless, since the probability of return to the reported state, $ {\text{p}}_{\text{it}}^{{\left( {10} \right)}} $, turns out to be relatively large in our case (see Sect. 4.1), the corresponding Markov chain has good mixing properties, and it seems to be enough to go back 4 years in time to get a satisfactory approximation for $ {\text{p}}_{{{\text{i}},2001}}^{\left( 1 \right)} $ in Eqs. (9)–(10). We note that the maximum likelihood estimates turn out to be almost identical, irrespective of whether 1996, 1997 or 1998 is used as the start year in the approximation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elek, P., Köllő, J. Eliciting permanent and transitory undeclared work from matched administrative and survey data. Empirica 46, 547–576 (2019). https://doi.org/10.1007/s10663-018-9403-0

Download citation

Published: 09 April 2018
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s10663-018-9403-0

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Eliciting permanent and transitory undeclared work from matched administrative and survey data

Abstract

Access this article

Similar content being viewed by others

The Role of Mental Health on Workplace Productivity: A Critical Review of the Literature

The Impact of Technological Progress on the Future of Work: Insights from a Survey on Alternative Employment Contracts in OECD Countries

Women’s and men’s work, housework and childcare, before and during COVID-19

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2: Maximum likelihood estimation of the two-regime model

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Eliciting permanent and transitory undeclared work from matched administrative and survey data

Abstract

Access this article

Similar content being viewed by others

The Role of Mental Health on Workplace Productivity: A Critical Review of the Literature

The Impact of Technological Progress on the Future of Work: Insights from a Survey on Alternative Employment Contracts in OECD Countries

Women’s and men’s work, housework and childcare, before and during COVID-19

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2: Maximum likelihood estimation of the two-regime model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation