Introduction

Balanced development is essential to sustainable growth in an economy. In the case of China, the imbalanced development in both the eastern and the western regions, as well as the northern and the southern regions are frequently discussed. Rapid economic development promotes the process of urbanization and created regional imbalances in China. To address this imbalance, the Chinese government has implemented a series of strategies, such as opening up the eastern coastal cities, accelerating economic development in the central and western regions, and revitalizing the old industrial bases in the northeast. These strategies and urbanization development drove the creation of an economy based on urban agglomeration, and they have attained overwhelming importance.

Regional differences in economic development clearly and significantly affect the real estate market in China. In 2013, differentiation in the real estate market in China began to appear on a large scale. Housing prices in first-tier citiesFootnote 1 briefly fell after a series of regulations were put in place, such as restrictions on purchases and loans. Then, in 2014 and 2015, housing prices continued to grow in a few first- and second-tier cities, but either fell or remained flat in most third- and fourth-tier cities. In 2016, housing prices increased substantially in 61 of the 70 large and midsize cities.Footnote 2 Most of these 70 cities are either first- and second-tier cities, as well as some third-tier cities. The data show that five of the top six cities in terms of housing price growth are in the Yangtze River Delta urban agglomeration. In 2017, growth in housing prices approached zero in the first- and second-tier cities whereas the housing market continued to be hot in the third- and fourth-tier cities. In 2018, the housing prices increased rapidly overall in many small and midsize cities but grew very little in big cities. However, this rapid growth was not seen in all the second-, third-, and fourth-tier cities, a signal of further differentiation in the real estate market. Hence, such characteristics of the growth of housing prices have attracted our attention and motivated us to analyze the fluctuation of housing prices from a new perspective.

Our two figures show the trends in the growth rates of housing prices in recent years. As shown in Fig. 1, although similar trends are seen in the growth rates of housing prices in most years, they diverge to a very large extent in some years, which is challenging for us to analyze. In Fig. 2, the growth rates of housing prices are illustrated to show the fluctuation in housing prices across all 70 large and midsize cities. Figure 1 is important to show a general description of the trend of housing prices’ growth rate from the overall perspective. So we label a “Single graph” in its title. It follows “Multiple graphs” which show very clearly for each of the cities. Therefore, if we see both Figs. 1 and 2 together, the picture of the story in this study is more clear.

Fig.1
figure 1

Source: The National Statistical Bureau, the Urban Statistical Yearbook, and the Economic Statistics Database of China Economic Information Network from 2005 to 2016

The Growth Rates of Housing Price (Single graph).

Fig.2
figure 2

Source: The National Statistical Bureau, the Urban Statistical Yearbook, and the Economic Statistics Database of China Economic Information Network from 2005 to 2016

The Growth Rates of Housing Price (Multiple graphs).

Based on the two figures above, a glance at the real estate development facts in these years reveals the following characteristics. First, the highest increases in housing prices are in mega-core cities, such as Shanghai, Beijing, Guangzhou, and Shenzhen. Because of the lag in housing price transmission, increases spread to the surrounding small and midsize cities in turn, from east to west, which lifted prices in the real estate market. Second, in the core urban agglomeration, the increase in housing prices is lower in mega-core cities than in those neighboring cities. For instance, in 2016, in Hefei, Nanjing, Wuxi, and Hangzhou, housing prices rose by 35%, more than that in Shanghai by 28%. Third, housing prices rose much more in small and midsize cities in core urban agglomerations than in non-core urban agglomerations.

The changes in urban housing prices are no longer characterized by city levels but gradually characterized by urban agglomerations. For example, the housing prices in many small and midsize cities increased rapidly, but this growth was not seen in all the second-, third-, and fourth-tier cities. The differences in housing prices between cities in different urban agglomerations are observed. Based on the facts and the continuous differentiation in the real estate market, it is difficult to analyze to reflect the real variation based on the traditional division into first-, second-, third-, and fourth-tier cities. Therefore, using a regional division based on urban agglomeration would be better for analyzing fluctuations in housing prices in China. Thus, this study analyzes the regional heterogeneity in a housing price increase in China based on urban agglomerations. It focuses on housing prices in China's large urban areas, referred to as “agglomerations” which is distinct from the label of "tier 1" versus "tier 2" (and three and four) that is used to categorize cities in China which have been the focus of previous research. This is the innovation of this study and the marginal contribution to existing literature. Specifically, based on the particularity of the development of China's real estate market, the Yangtze River Delta, Pearl River Delta, and Beijing-Tianjin-Hebei in China are the three largest urban agglomerations (Zhou 1998; Lu et al. 2020) and essentially the same as "megalopolises" described by Gottman (1957). Therefore, this study defines these three urban agglomerations as core urban agglomerations for research.

The Yangtze River Delta urban agglomeration (i.e., CHANGSANJIAO, CSJ) has the most developed economy, the highest urbanization level, and the most regional competitiveness in China, having created about 20% of China's GDP. Shanghai is its core city, and Hangzhou and Nanjing are the secondary core cities. The Pearl River Delta urban agglomeration (i.e., ZHUSANJIAO, ZSJ), adjacent to Hong Kong and Macao, is the gateway to southern China. The Chinese central government and the Hong Kong Special Administrative Region government plan to build a "Guangdong-Hong-Kong-Macao Greater Bay Area" (including Guangzhou, Shenzhen, and Hong Kong which are the core cities in ZSJ). Under the background of deepening the integration of the bay area urban economic belt, ZSJ will become a world-class super urban agglomeration with "multi-center and multi-network coverage". ZSJ is a base for technological R&D and technological innovation in China and has the highest development urban agglomerations among the three core urban agglomerations. The Beijing-Tianjin-Hebei urban agglomeration (i.e., JINGJINJI, JJJ) is the political, cultural, and technological innovation center of China, with Beijing and Tianjin as two core cities. Unlike ZSJ, JJJ has not formed a unified development network, so its internal differences are visible.

To address the imbalanced development in China, this study focuses on housing prices in China's large urban areas, referred to as “agglomerations” which is distinct from the label of “tier 1” versus “tier 2” (and three and four) that is used to categorize cities in China which have been the focus of previous research. It mainly uses a dummy variable approach to analyze whether the housing prices in China's core urban agglomeration increase faster than those in a non-core urban agglomeration, revealing the sharpened trend of imbalanced development of the Chinese cities. In the empirical analysis, this study estimates the differences in housing price increases across the agglomerations via the use of dummy variables in both pooled and panel regression models. It does so thoroughly by comparing different versions of the dummy variable-focused models, and also incorporates instrumental variables as part of that exercise. Overall, this study provides an interesting and detailed look at housing prices in China.

Firstly, the core urban agglomerations have a significant impact on increases in housing prices. Among them, the Pearl River Delta urban agglomeration has a significant impact on housing prices, while the Beijing-Tianjin-Hebei Urban agglomeration does not, which shows that the economic development of the Pearl River Delta urban agglomeration is greater, and the core cities in the urban agglomeration have a positive spillover effect on surrounding small and midsize cities. Secondly, the development of the secondary industry has a significant negative impact on the growth of housing prices. Thirdly, the growth rate of housing prices in the central region is faster than that in the eastern region, which may be due to the high base of housing prices in eastern China, so that its growth rate becomes lower, and there is no significant difference between housing prices in the western region and those in the central region. Finally, the difference in housing prices between small and midsize cities has intensified.

Hence, this study may contain exciting work that will fill the knowledge gap in the area of real estate marks in China. The innovation of this study lies in the method of dividing cities, which is different from the previous method of dividing cities into primary and secondary levels. Instead, it matches the urban agglomeration of the Yangtze River Delta, the Pearl River Delta, and Beijing-Tianjin-Hebei with the concept of “megalopolised” proposed by Gottman in 1957 (Gottman 1957), which is called “agglomerations”. In addition to its empirical findings, this study also makes technical contributions in using altitude and other appropriate instrumental variables to address endogeneity issues, which is useful in studying Chinese cities. In the empirical design, the model is progressive layer by layer. Based on the panel data of 70 large and midsize cities in China from 2005 to 2016, the author selects the level of economic development, industrial structure, demographic factors, urban area, and public service level as the control variables for empirical analysis. In the process of empirical analysis, this study not only considers the impact of regional heterogeneity on the model results but also uses instrumental variables such as altitude to control the endogenous problem to a certain extent, which proves the above conclusions. The results and the views are clear.

The structure of the study is as follows. After this introduction, in Literature review we review the related literature. Then in Models, we show the basic models and data. In Data, we use several econometric models such as pool and panel estimation with and without instrumental variables, and discuss the empirical results in Empirical results and analysis. In Discussion, we offer our conclusions based on these findings.

Literature review

Balanced development including the housing market at the macro economy level has been well studied (see Liu et al. 2019, among others). However, the balanced development of the housing market at the regional level is not enough examined.

In modern economics, Gottmann (1957) was the first to research urban agglomerations and proposed the concept of a "megalopolis." He indicates that a megalopolis is not simply a metropolitan area but an urbanized region with a wide range, high population density, and several metropolitan areas clustered with closely linked populations and economies. Perrous (1955) presents the theory of a growth pole, in which the core city is the growth pole of a regional economy and has polarization and diffusion effects on the surrounding areas. At the early stage of growth pole development, the core city attracts high-quality elements from the surrounding cities. Then the growth pole drives economic development in the surrounding areas to be the sub-growth center. Later, Friedmann's (1966) "core–edge" theoretical model makes similar statements and shows that it leads to unbalanced development. Literature indicates that the flow of factors, such as knowledge, technology, and skills would strengthen the economic links between developed and undeveloped regions (Saxenian 1994; Hirschman 1988). Further, matching firms and workers in urban agglomerations would improve productivity and bring about unbalanced urban development (Christopher 2001; Keuschnigg et al. 2019). The continuous accumulation of capital and labor promotes the development of urban agglomeration so urban agglomeration has been an important part of the economy.

From an empirical perspective, Kanbur and Zhang (1999) measure the contributions of urban–rural and inland-coastal unbalanced development to the overall regional imbalance in China, based on data for the 1980s and 1990s. Muellbauer and Murphy (1997) find that changes in the financial system, especially financial liberalization, are the main factors affecting housing prices, depending on research on housing prices in Britain from 1957 to 1994. Based on the housing price data for different regions in Britain, McDonald and Taylor (1993) indicate that a cointegration relationship existed between regions, which confirms the spillover effect of housing prices. Partridge et al. (2009) analyze the agglomeration spillovers of core cities and show that urban hierarchy influences housing prices. Examining three major cities in the United States, Gupta and Miller (2012) claim that conductivity is a characteristic of housing prices between cities. Regarding the regional heterogeneity of housing price fluctuation, Negro and Otrok (2007) analyze the characteristics of housing price fluctuation in 48 states by constructing a VAR model. Their empirical results show that, based on historical data, regional factors greatly influence housing prices. Furthermore, based on housing price data in eight Australian state capitals from 1989 to 2005, Luo et al. (2007) explore the correlation in housing prices between different cities using a cointegration test and an error correction model. Hence, Brady (2011) builds a spatial autocorrelation dynamic panel data model and conducts both ordinary least squares and instrumental variable estimations, which confirm the spatial diffusion effect and the time lag effect in California housing prices. The literature above indicates that the development of urban agglomerations will strengthen the links between cities in the agglomeration, which may drive the development of undeveloped cities and may also aggravate the imbalance between cities. This may be one of the reasons for the differentiation of the real estate market.

The real estate market plays an important role in China's economy, so scholars do a lot of research on the real estate market and the housing price fluctuation. Based on monthly data on housing prices in 70 large and midsize cities in China from 2015 to 2018, Tan (2018) reveals that when housing prices initially rise, first-tier cities have a greater increase, longer duration, and earlier beginning than second-tier cities, and when they fall, the opposite occurs. Liang and Gao (2007) use an error correction model to analyze differences in regional fluctuations in housing prices. Their results show that the scale of credit has a greater impact on housing prices in eastern and western China than in central China. Furthermore, Wei and Wang (2010), Fan and Guo (2014), and Zhao (2017), among others, study regional differences in housing price fluctuations and real estate differentiation in China. In addition, Xiao (2016), Chen and Wang (2018), and Zhou et al. (2018a, b) analyze the reasons for differentiation in housing prices among cities of different sizes in China. Most of these researches analyzing urban housing prices used the method of dividing cities into primary and secondary levels, but this method cannot explain well the current differentiation of the real estate market in China. Therefore, this study focus on the core urban agglomerations.

Due to the regional imbalance in economic development in China, research on urban agglomeration is becoming more and more important, especially in the three core urban agglomerations: Yangtze River Delta, Pearl River Delta, and Beijing-Tianjin-Hebei (Zhang et al. 2019; Li and Wang 2020). Xue et al. (2000) and Li (2007) analyze the role and ability of the surroundings of core cities to affect the economic development of an urban agglomeration. The internal development of each urban agglomeration is unbalanced (Qi 2015). Specifically, Huang and Zhou (2008) conclude that Shanghai, as a core city, has the strongest capacity for influence in China. Cai and Man (2016) arrive at similar conclusions about Shanghai, but find that Beijing has no driving effect on surrounding cities. Tan (2014) uses the PMG estimation method to analyze the factors affecting the equilibrium price of real estate in the Pearl River Delta urban agglomeration and shows that Shenzhen is the source of fluctuation in housing prices in the Pearl River Delta urban agglomeration. Moreover, some scholars, such as Zhang and Lin (2015) and Zhou et al. (2018a, b), find a short-term diffusion effect of housing prices between core cities and surrounding areas in China. Contradictorily, Zhang and Liu (2017) find those core cities have no obvious driving effect on the surrounding areas. The difference in research results further motivates our research.

The factors of housing price fluctuation in China are presented be significant regional heterogeneity in the literature. In a summary, the migration, income level, expected future rent, and urban land use constraints together make the housing price gap among metropolitan areas (Potepan 2010). With the development of urbanization, the urban network brings a cross-city spillover effect on housing prices (Gong et al. 2020). However, the inter-city spillover effects make regional housing prices converge (Chow et al. 2016). Comparing the different levels of cities in China, the spillover effects of the first-tier cities are the strongest, which would reduce their pressure of exuberance (Tsai and Chiang 2019). The land is a significant factor in accelerating urbanization in the Yangtze River Delta (Zhang et al. 2019). However, for adjacent cities with separate urban systems (i.e. Beijing, Tianjin, and Hebei), it is difficult to formulate a superior, more competitive, and more advanced urban economic unit (Li and Wang 2020).

At last, regional differences have an important impact on housing prices. In the long run, the serious differentiation between economic conditions and development levels will be reflected in the fluctuation of housing prices (Alexander and Barrow 1994; Holly et al. 2010; Miles 2013). In recent years, transportation development was represented by the construction of the high-speed railway in China. On the one hand, transportation development has strengthened links between regional economic activities and affected the regional spatial structure (Yin et al. 2015; Wang and Ni 2016). On the other hand, it has directly or indirectly affected regional employment, wages, and economic growth (Dong and Zhu 2016). Therefore, the regional differences in China are significant and cause regional differences in housing prices (Zheng and Kahn 2013; Gong et al. 2016).

The existing literature is meaningful, especially concerning China’s real estate market. However, this market has developed rapidly, and the literature has not kept pace. No comparative study has been done on the spillover effect of housing prices in urban agglomerations in China, and little research has been conducted on intercity differentiation in its real estate market. Therefore, we investigate the difference in housing prices between core and non-core urban agglomerations. Based on our empirical research, we explore the factors that affect the increase in housing prices in urban agglomerations and deepen our theoretical understanding of the interaction between core cities and the cities surrounding them. The results would be useful for designing more precise and effective government policies to regulate and manage the real estate market, which indicates that cross-regional coordinated regulations and control mechanisms over housing prices are very important.

Models

At present, most scholars study regional housing prices with cointegration tests, Granger causality tests, and spatial econometric models. Some scholars use social network analysis to explore the characteristics of fluctuation in intercity housing prices. To focus on the most essential problem, this study conducts a dummy variable regression analysis to construct panel models with appropriate settings, and the primary explanatory variables consider whether a city is located within one of the three core urban agglomerations. The regression results directly reflect the characteristics of differentiation in the real estate market between core and non-core urban agglomerations.

The existing econometric models of housing prices in China constructed by some scholars are meaningful and important, such as Li (2014), Liu and Zhang (2018), Chen (2018), and Lan and Wu (2018). They state that housing prices are related to the gross domestic product (GDP), the industrial structure, urban population, urban area, and the level of urban public services (i.e., education, health care, and transportation). They also offer a method that we can use to study regional heterogeneity in housing price fluctuation: adding dummy variables to the model. Most existing models construct dummy variables for cities in eastern, central, and western China or first-, second-, and third-tier cities. In this study, we also use dummy variables for cities in eastern, central, and western China but add variables for urban agglomerations. Doing so not only improves the accuracy of the model but also has great practical significance because a model with a division into first-, second-, and third-tier cities does not reflect the current market well. However, our model is better at reflecting differences in housing price fluctuations because we examine the most essential question: is the city located within a core urban agglomeration?

Therefore, this study constructs the following empirical model:

$${HP}_{it}=F({CSQ}_{it},{GDP}_{it},{SI}_{it}, { LA}_{it},{BD}_{it},{NR}_{it}$$
(1)
$${HP}_{it}=\frac{({hp}_{it}-{hp}_{i\left(t-1\right)})}{{hp}_{i\left(t-1\right)}}$$
(2)

where hp is the absolute housing price, HP is the housing price index, which essentially is the growth rate in the absolute housing price, i is the city, t is the year, and CSQ is a dummy variable indicating whether a city is located within one of the three core urban agglomerations, GDP is the gross product of an urban area, SI is industrial structure, LA is the urban land area, BD measures the level of urban public services, and NR is the size of the urban population. (Table 1).

Table 1 Descriptive statistics of variables of housing price increase in three core urban agglomeration

Models without regional distinctions

In this section, we present four empirical models, which do not include the regional divisions into eastern, central, and western China.

Model 1 is a pool estimation model with the primary explanatory variable CSQi:

$$ \ln HP_{it} = \beta_{0} + \beta_{1} CSQ_{i} + \beta_{2} \ln GDP_{it} + \beta_{3} \ln SI_{it} + \beta_{4} \ln LA_{it} + \beta_{5} \ln BD_{it} + \beta_{6} \ln NR_{it} + \varepsilon_{it} $$
(3)

where:

$$ CSQ_{i} = \left\{ {\begin{array}{*{20}c} {1, {\text{if it is located in one of the three core urban agglomerations,}}} \\ {i = 1,2,3, \ldots , 70} \\ {0, {\text{ otherwise}}} \\ \end{array} } \right. $$

Model 2 includes both individual and time random effects with the primary explanatory variable CSQi because the Hausman test results suggest the use of random effects (RE), rather than fixed effects (FE). The Hausman test results are strong indicators for model selection between fixed or random effect models in panel regression. Although the fixed effect model may be more commonly seen, we have to follow the result of the Hausman test. Although in some articles fixed effect model is directly used without the Hausman test, the results of the FE model are always consistent regardless of whether all explanatory variables are related to individual effects. However, the random effect model is more effective if all explanatory variables are not related to individual effects. Therefore, when the Hausman test result supports RE model selection, we select the RE model in this study.

In contrast, Model 3 is a pooled estimation model with the primary explanatory variables CSJi, ZSJi, and JJJi, for the three core urban agglomerations, which are the Yangtze River Delta urban agglomeration (i.e., CHANGSANJIAO, CSJ), with nine cities; the Pearl River Delta urban agglomeration (i.e., ZHUSANJIAO, ZSJ), with four cities; and the Beijing-Tianjin and Hebei urban agglomeration (i.e., JINGJINJI, JJJ), with five cities.

$$ \ln HP_{it} = \beta_{0} + \beta_{1} CSJ_{i} + \beta_{2} ZSJ_{i} + \beta_{3} JJJ_{i} + \beta_{4} \ln GDP_{it} + \beta_{5} \ln SI_{it} + \beta_{6} \ln LA_{it} + \beta_{7} \ln BD_{it} + \beta_{8} \ln NR_{it} + \varepsilon_{it} $$
(4)

where:

$$ CSJ_{i} = \left\{ {\begin{array}{*{20}c} {1, {\text{if it is located in the Yangtze River Delta urban agglomeration}}} \\ {i = 1,2,3, \ldots ,70} \\ {0, {\text{otherwise}}} \\ \end{array} } \right. $$
$$ ZSJ_{i} = \left\{ {\begin{array}{*{20}c} {1, {\text{if it is located in the Pearl River Delta urban agglomeration}}} \\ {i = 1,2,3, \ldots ,70} \\ {0, {\text{otherwise}}} \\ \end{array} } \right. $$
$$ JJJ_{i} = \left\{ {\begin{array}{*{20}c} {1, {\text{if it is located in the Beijing Tianjin Hebei urban agglomeration}}} \\ {i = 1,2,3, \ldots ,70} \\ {0, {\text{otherwise}}} \\ \end{array} } \right. $$

Here, t is the year (t = 2005, 2006, 2007, …, 2016), i is the city (i = 70), SI is the share of industry, LA is the urban land area, BD is the level of urban public services, and NR is the size of the urban population. β0 is a constant, β12,…, β8 are coefficients, δt is the time effect, and εit is a random error term.

In addition, Model 4 is an individual and time random effect model with the primary explanatory variables CSJi, ZSJi, and JJJi.

Models with regional distinctions

The four models in this section distinguish between eastern, central, and western China as follows:

As in the previous models, Model 5 is a pool estimation model with the primary explanatory variable CSQi.

$$ \ln HP_{it} = \beta_{0} + \beta_{1} CSQ_{i} + \beta_{2} EAST_{i} + \beta_{3} CENTRAL_{i} + \beta_{4} \ln GDP_{it} + \beta_{5} \ln SI_{it} + \beta_{6} \ln LA_{it} + \beta_{7} \ln BD_{it} + \beta_{8} \ln NR_{it} + \varepsilon_{it} $$
(5)

Other settings are the same as the models in Models without regional distinctions, but with the addition of regional distinctions, setting "whether it is located in eastern China and the central China" as dummy variables, while western China is a control variable. Thus, the setting is listed as follows:

$$ EAST_{i} = \left\{ {\begin{array}{*{20}c} {1, {\text{if it is located in eastern China}}, i = 1,2,3, \ldots ,70} \\ {0, {\text{otherwise}}} \\ \end{array} } \right. $$
$$ CENTRAL_{i} = \left\{ {\begin{array}{*{20}c} {1, {\text{if it is located in central China}}, i = 1,2,3, \ldots ,70} \\ {0, {\text{otherwise}}} \\ \end{array} } \right. $$

Please note that “West” is missing for the reason of omitted dummy variable.

Model 6 is an individual and time random effects model with the primary explanatory variable CSQi.

Model 7 is a pool estimation model with the primary explanatory variables CSJi, ZSJi, and JJJi.

$$ \ln HP_{it} = \beta_{0} + \beta_{1} CSJ_{i} + \beta_{2} ZSJ_{i} + \beta_{3} JJJ_{i} + \beta_{4} EAST_{i} + \beta_{5} CENTRAL_{i} + \beta_{6} \ln GDP_{it} + \beta_{7} \ln SI_{it} + \beta_{8} \ln LA_{it} + \beta_{9} \ln BD_{it} + \beta_{10} \ln NR_{it} + \varepsilon_{it} $$
(6)

Finally, Model 8 is an individual and time random effects model with the primary explanatory variables CSJi, ZSJi, and JJJi.

Data

Using panel data on 70 large and midsize cities in China from 2005 to 2016, in this study, we analyze the differences in price increases between core and non-core urban agglomerations. These 70 large and midsize cities are the most important and representative of economic and urbanization construction in China. The National Bureau of statistics will regularly publish all kinds of relevant economic data from these 70 cities. Therefore, it is trustworthy to study China's economic problems based on the data of these 70 cities. Moreover, the development of commercial housing in China formally began in 2003. Because of the lack of data for 2003, 2004, 2017, and 2018, we use 2005–2016 as the sample period. This 12-year period covers the main stages in the development of commercial housing in China, which illustrates the trends in housing prices across the urban agglomerations. During our sample selection period, macro factors such as interest rates in China are relatively stable.Footnote 3 In addition, we use instrumental variables (IVs) to conduct empirical analysis, which can help to solve the possible endogenous problems caused by missing variables.

As mentioned earlier, the three core urban agglomerations are the Yangtze River Delta urban agglomeration (CSJ), the Pearl River Delta urban agglomeration (ZSJ), and the Beijing-Tianjin and Hebei urban agglomeration (JJJ). In this study, the dependent variable HP is expressed by the sales price index of newly built commercial housing. The housing price index comprehensively reflects the general trend in variations in commercial housing prices and the scale of the changes. The primary explanatory variables are the dummy variable (CSQ) for ‘whether they are located in one of the three major urban agglomerations and the three dummy variables (CSJ, ZSJ, and JJJ) for which urban agglomeration they are in. The control variables in this study are selected as follows. The first one is the economic development level, which is positively related to housing prices. In this study, we measure the level of economic development by GDP in each area and industrial structure (SI) proxied by the proportion of secondary industry in GDP. The second one is the population factor, which is the most basic and important factor affecting demand for housing in the long run. To do so, we use the non-rural population (NR) at the end of the year, and the relationship between the non-rural population and urban housing prices is expected to be positive. Then, we include the urban land area, measured by the log of the built-up area, to reflect the potential supply of housing, which further affects housing prices. Based on Wang (2011), we add a fourth control variable, the level of public service. In addition, we include the number of buses for every 10,000 people as a proxy for the level of public transport, which is the most important factor reflecting its current level.

Regional differences also have an impact on housing prices. Thus, we use the dummy variables EAST (located in eastern China) and CENTRAL (located in central China), with cities in western China as the control group.

The control variables in this study use annual data, but the sales price index of newly built commercial housing compiled by the National Statistical Bureau represents monthly data. To ensure consistency across the data, we calculate the annual sales price index of newly built commercial housing through monthly growth based on monthly data. Simultaneously, to eliminate or at least reduce the heteroskedasticity of data, except for the dummy variables, the control, and explanatory variables are logarithmically processed in this study.

The descriptive statistics of the data are in Table 1.

Empirical results and analysis

Total sample regression results of pooled estimation

Table 2 shows that in the pooled estimation, the coefficient of the provincial capital (i.e., SHENGHUI) is insignificant in the OLS model, but it is almost significant when we introduce IVs to address endogeneity problems. The coefficient of the merged urban agglomerations (i.e., DUSHIQUAN, DSQ) is significant in the OLS model, but it is almost insignificant after adding IVs to the pooled estimation.

Table 2 Empirical estimation results of Pool estimation corresponding to Provincial capital (SHENGHUI) and Merging Urban Agglomerations (DUSHIQUAN) with Dependent Variable NEWHOUING_LN

Then we subdivide DSQ into three specific core urban agglomerations: i.e., the Yangtze River Delta urban agglomeration (CSJ), the Pearl River Delta urban agglomeration (ZSJ), and the Beijing-Tianjin-Hebei urban agglomeration (JJJ). In the pooled estimation, the coefficient of CSJ is significant in the OLS model, but JJJ is not significant. After introducing IVs to address endogeneity, JJJ is still not significant, but ZSJ is significant. Moreover, the results of the endogeneity test indicate that all coefficients of the IV groups that include ALTITUDE_LN are significant. Thus, ALTITUDE_LN is a good IV to use in this study.

Potential endogeneity is present in this study for many reasons. As is commonly seen in urban studies, housing prices are usually correlated with the scale of the economy, i.e., GDP. However, based on omitted variables and other problems, GDP easily suffers from endogeneity. Hence, in this study, the variable GDP_LN is considered an endogenous variable, confirmed by the difference in the J-statistics test, shown in Tables 2 and 3.

Table 3 Empirical estimation results of Pool estimation corresponding to Provincial capital (SHENGHUI) and the Three Urban Agglomerations with Dependent Variable NEWHOUING_LN

Although it is easy to understand this theoretically, finding a good IV for GDP is difficult because almost every economic activity is correlated with GDP. However, we identified a promising candidate. The basic topography of mainland China shows low altitude in the east and high altitude in the west, and the economy of the low-altitude area is more developed than the high-altitude area. Therefore, we use altitude as the IV, as it is correlated with GDP but uncorrelated with the error term because it occurs naturally. In fact, using ALTITUDE_LN as the IV performs well. As shown in Tables 2 and 3, it not only passes the endogeneity test and the corresponding weak instrument diagnostics but also yields better regression results.

Regression and analysis of panel data with random effects

After supplementing housing price index data and considering random effects, we conduct the regression again using panel data in Tables 4 and 5.

Table 4 Empirical estimation results corresponding to Provincial capital (SHENGHUI) and Merging Urban Agglomerations (DUSHIQUAN) with Dependent Variable NEWHOUING_LN
Table 5 Empirical estimation results corresponding to Provincial capital (SHENGHUI) and the Three Urban Agglomerations with Dependent Variable NEWHOUING_LN

The regression results of the panel data with IVs are similar to those found earlier. After introducing IVs to address endogeneity, the coefficient of DSQ is almost insignificant, and among the three core urban agglomerations, only the coefficient of ZSJ is significant.

Robustness test

Test 1: Regional differences

Considering the substantial regional differences in economic development in China, we introduce dummy variables for eastern, central, and western China to check the robustness of the panel data.

In Tables 6 and 7, after this dummy variable is added, the coefficient of DSQ is insignificant.

Table 6 Empirical estimation results corresponding to Provincial capital (SHENGHUI), Merging Urban Agglomerations (DUSHIQUAN), and East & Middle with Dependent Variable NEWHOUING_LN
Table 7 Empirical estimation results corresponding to Provincial capital (SHENGHUI), Merging Urban Agglomerations (DUSHIQUAN), and Middle & West with Dependent Variable NEWHOUING_LN

Then we divide DSQ into the three core urban agglomerations in Tables 8 and 9, where the coefficient of ZSJ is significant, and the coefficient of the CENTRAL is nearly significant. In the panel data regression, the coefficient of SECOND_INDUS_LN (i.e., the logarithm of the second industry level to show the industrial structure) is always significant and negative, which means that housing prices in cities focused on manufacturing rise slowly. In contrast, in cities with more developed tertiary industries, housing prices tend to increase more rapidly.Footnote 4 The coefficient of LAND_AREA_LN (i.e., the logarithm of the urban land area) is negative and nearly significant. Thus, the land area is a limited condition for real estate. When a city has less land area, housing prices face greater pressure, due to supply–demand principles. The coefficient of GDP_LN is positive and significant, a result that is consistent with most studies, which shows that the economic scale is relevant to housing price increases. In addition, the coefficient of NON_RURAL_LN (i.e., the logarithm of the size of the urban population) is always insignificant.

Table 8 Empirical estimation results corresponding to Provincial capital (SHENGHUI), the Three Urban Agglomerations, and East & Middle with Dependent Variable NEWHOUING_ LN
Table 9 Empirical estimation results corresponding to Provincial capital (SHENGHUI), the Three Urban Agglomerations, and Middle & West with Dependent Variable NEWHOUING_LN

Test 2: Time difference (2015–2016)

Because over the period 2015–2016 housing prices in many Chinese cities increased rapidly, we are particularly interested in seeing whether our estimation results remain valid for this period.Footnote 5 So, we conduct a robustness test using the panel data from 2015 to 2016 by rerunning the models.

As shown in Tables 10 and 11, from 2015 to 2016, the coefficients of DSQ are almost insignificant. Then we conduct the regression after dividing DSQ into the three core urban agglomerations.

Table 10 Empirical estimation results corresponding to Provincial capital (SHENGHUI), Merging Urban Agglomerations (DUSHIQUAN), and East & Middle with Dependent Variable NEWHOUING_LN (2015–2016)
Table 11 Empirical estimation results corresponding to Provincial capital (SHENGHUI), Merging Urban Agglomerations (DUSHIQUAN), and Middle & West with Dependent Variable NEWHOUING_LN (2015–2016)

As seen in Tables 12 and 13, from 2015 to 2016, the coefficient of ZSJ is almost significant and the coefficient of JJJ is still insignificant. The coefficient of LAND_AREA_LN is still nearly significant. However, GDP is less significant than before, and secondary industry is insignificant. This indicates that the boom in the real estate market from 2015 to 2016 is driven mostly by funds, which came from investment and also speculation. In 2015, the real estate market surged in Shenzhen, as well as in some “hot” coastal cities. In 2016, housing prices in the big cities in coastal urban agglomeration increased rapidly, a trend that spread from east to west. Therefore, from 2015 to 2016, GDP and the proportion of secondary industry do not truly reflect their impact on housing prices. This result reveals that in many cities in China, the housing market deviates from the fundamentals of an urban economy to some extent.

Table 12 Empirical estimation results corresponding to Provincial capital (SHENGHUI), the Three Urban Agglomerations, and East & Middle with Dependent Variable NEWHOUING_LN (2015–2016)
Table 13 Empirical estimation results corresponding to Provincial capital (SHENGHUI), the Three Urban Agglomerations, and Middle & West with Dependent Variable NEWHOUING_LN (2015–2016)

Indeed, as an emerging market with the fast development, the Chinese housing market is subject to a lot of factors, one of which is speculation. One of the contributors to the deviation of housing prices from fundamental is speculation. Some researchers have discussed the speculation in the Chinese housing market, including Lai et al. (2020), Chen and Wang (2020), and Chen and Wang (2022), just to cite a few. Of course, a single model can't control everything.

Discussion

As shown earlier, using the pooled estimation, with IVs to address endogeneity, DSQ is almost insignificant. When we divide DSQ into the three core urban agglomerations, only the coefficient of ZSJ is significant. The regression results of the panel data confirm these conclusions. When we consider regional differences, the coefficient of DSQ is insignificant. However, ZSJ is still significant, and JJJ is always insignificant. This is consistent with Cai and Man (2016), which indicates that Beijing had no significant driving effect on housing prices in the surrounding cities. The coefficient of CENTRAL is almost significant, except from 2015 to 2016.

In all panel data models, the coefficient of SECOND_INDUS_LN is always significant and negative. This result confirms that housing prices increase more quickly in cities with a higher ratio of tertiary industry. The coefficient of LAND_AREA_LN is negative and nearly significant, which mirrors the general belief that land resources are limited. The results for GDP_LN are consistent with the view in most studies that the economic scale has a significant impact on housing prices. Finally, NON_RURAL_LN is always insignificant, which varies from our expectations.

The empirical results indicate that housing prices in the three core urban agglomerations increase more rapidly, especially in ZSJ, due in part to plans for building the Guangdong Hong Kong Macao Bay Area. Housing prices increase faster in core cities than in non-core cities. In addition, housing prices have a higher growth rate in small and midsize cities in core urban agglomerations than in non-core urban agglomerations. Therefore, serious internal differentiation exists among cities in China, especially small and midsize cities.

The reasons for this internal differentiation are as follows. First, in this period the large core cities in core urban agglomerations suffered from strict regulation of the housing market, which spills over to purchasing power. At the same time, small and midsize cities around the core cities benefited from the stimulation of the policy of reducing real estate to inventory and the regulations to loosen the real estate market. On the one hand, in 2016 because housing prices sharply increased in the core cities, people with rigid demand cannot afford it, so they move to the surrounding cities. On the other hand, local fixed demand in these surrounding cities also rose, which drove the development of the local housing market. In addition, the real estate markets in these small and midsize cities attracted much investment, because of factors such as relatively low housing prices, high growth in housing prices, the positive spillover effect of the core cities, and the relaxed purchasing policy (i.e., no household registration restriction and social insurance restriction). Thus, housing prices in these small and midsize cities were driven higher.

Second, the developing process of urbanization makes accelerated housing renovation for rundown areas, and the amount of monetary compensation for housing renovation in many small and midsize cities has become another dominant factor recently. Thus, the demand for housing by people who received this compensation increased rapidly in many third- and fourth-tier cities. According to the data, from 2015 to 2017, a total of 18 million units have been renovated for three years across the country. In 2018, 60% of the household the housing renovation in rundown urban areas choose monetary compensation. This compensation enables some people with fixed demand for housing to purchase new housing units, while others invested in real estate in the small and midsize cities around the core cities. Of course, this artificial demand is temporary and unsustainable. After this demand is satisfied, excess housing in these cities will remain, which is another problem.

What is the most important factor affecting the development of small and midsize cities? Maybe a better location? Based on our empirical analysis, this study confirms that small and midsize cities in core urban agglomerations have greater development potential than non-core urban agglomerations. Geographic location is not the only important factor. In the long run, the three important factors affecting the development of the urban real estate market in the city are industry, transportation, and population, which are known as the “fundamentals.”

The regional heterogeneity of housing price fluctuation in China is due to the difference in population agglomeration caused by unbalanced regional economic development. This makes a huge difference in the mismatches between housing supply and demand, which is ultimately reflected in regional differentiation in the real estate market. The imbalance in regional development is reflected in an unbalanced allocation of resources. Population mobility and migration depend on the agglomeration of resources, such as employment opportunities, health care, education and culture, and public services. The higher the degree of urban resource agglomeration, the more it can attract an agglomeration of population and wealth. Thus, it increases economic vitality in a city and drives development in the real estate market.

The results of the differentiation in small and midsize cities lead us to divide them into three types depending on the real estate development level. The first type presents the real estate market, which is developing well and consists of the small and midsize cities surrounding the core cities in core urban agglomerations. These cities have the advantage of a unique geographic location and complete basic services. At the same time, these cities benefit from various preferential investment policies and industry support policies enabling industrial transfer from the core cities and attracting great population inflows. The second type presents the development potential of the real estate market and consists of small and midsize cities with a better economic foundation. Because they have a sound industrial system and developed transportation network, these cities can continuously attract an inflow of skilled workers, funds, and resources, which lead to innovative achievements. The third type means the real estate development is not optimistic and comprises small and midsize cities with excess housing stock, which describes a high proportion of cities in China. The real estate market in these cities benefited from current increases in housing prices, but in the long run, these cities will feel pressure from this excessive inventory. These cities have fewer employment opportunities than other cities because of their lagged industrial development and inconvenient transportation. Thus, these cities have difficulty attracting people and even experience an outflow of local skilled workers, who depart for better opportunities.

Our summary of the implications and insight here is twofold. First, in the long run, the serious differentiation in the real estate market across cities will gradually affect households' investment expectations and consumption habits and even expand the wage gap among cities in different urban agglomerations, which will seriously distort the rational allocation of social resources (i.e., technology, financial resource, labor resource, etc.) and hence aggravate the gap between the rich and the poor. Second, the significant difference in housing prices makes many people regard commercial housing as an investment in the big core cities of the primary urban agglomerations—for example, Beijing, Shanghai, Guangzhou, and Shenzhen. Over time, whether in the core cities and surrounding small and midsize cities in core urban agglomerations or small and midsize cities in non-core urban agglomerations may have huge differences in the housing price. Thus, the social total factor productivity (i.e., TFP) may be reduced in those cities or provinces in core urban agglomerations, which is harmful to normal growth in the regional economy.

Conclusions

Closing remark

This study investigates urban housing prices in China. Instead of using the traditional way of tier-1, 2, 3, and 4 categorizations, this study shed some light on the topic from the perspective of urban agglomerations. It is found that housing prices grow more rapidly in cities in the Pearl River Delta urban agglomeration. Overall, this study may be interesting and innovative to some extent.

Based on the sales price index for new commercial housing in 70 large and midsize cities in China from 2005 to 2016 and data on variables for urban characteristics, this study uses panel data models with appropriate settings to explore variation in housing price increases between core urban agglomerations and non-core urban agglomerations in China. IVs were used to address the endogeneity problem. Our results lead to some interesting conclusions.

First, the core urban agglomerations have a significant impact on increases in housing prices. Housing prices grow more quickly in small and midsize cities in core urban agglomerations,Footnote 6 under positive influence from the core cities. The Pearl River Delta urban agglomeration has a significant impact on housing prices, while the Beijing-Tianjin-Hebei urban agglomeration does not. This indicates greater economic development in the Pearl River Delta urban agglomeration, and the mega core cities (including Hong Kong) in urban agglomerations have a positive spillover impact on surrounding small and midsize cities. However, the Beijing-Tianjin-Hebei urban agglomeration is still in the process of being developed. According to the differences in the development stages of urban agglomerations, policymakers can make policies on coordinating economic growth and stabilizing housing prices.

Second, the coefficient of the proportion of secondary industry in GDP suggests that it has a significant negative impact on growth in housing prices. To a certain extent, greater urban industrialization hinders the development of a local real estate market. In addition, CENTRAL has a significant impact on housing prices, which means that housing prices grew more quickly in central China than in eastern China. The housing price base in eastern China is so high that its growth rate becomes lower. In addition, housing prices in western China are not found to have a statistically significant difference from those in central China because they grow rapidly as well. Due to the differences between the East, the Central, and the West, different industrial structure adjustment policies can be formulated. For example, properly adjusting the proportion of the secondary industry in central China can greatly promote the development of the real estate market there.

Third, differentiations between small and midsize cities have been exacerbated. The mega-core cities play a leading role in core urban agglomerations, so the skilled workers, funds, resources, and innovative achievements of mega-core cities rapidly spread to the surrounding small and midsize cities. Thus, in these surrounding cities, housing prices have a higher upward trend, and the real estate market develops faster. However, in non-core urban agglomerations, small and midsize cities do not benefit from the positive spillover effect of mega-core cities. At the same time, these cities have a weak economic foundation, undeveloped industry, and net population outflow, which slows the development of the real estate market. Therefore, differentiation in the real estate market across different cities is exacerbated. On the one hand, for those surrounding cities affected by the positive spillover effect of core urban agglomeration, purchase and loan restriction policies should be adopted to stabilize housing prices and prevent the “bubble” caused by the rapid rise of housing prices. On the other hand, for the small and midsize cities in the non-core urban agglomeration, relative positive policies should be implemented to promote the development of the real estate market.

Policy recommendations

The development stages of urban agglomerations are different, thus the real estate development level in different urban agglomerations are represented significant differences. According to these differences, policymakers can make different policies on coordinating economic growth and stabilizing housing prices in different urban agglomerations.

The result indicates that greater urban industrialization hinders the development of a local real estate market. Due to the regional differences between the East, the Central, and the West in China, different industrial structure adjustment policies can be formulated. If policymakers properly adjust the proportion of the secondary industry in Central, the development of the real estate market there will be promoted.

For those surrounding cities affected by the positive spillover effect of core urban agglomeration, purchase and loan restriction policies should be adopted to stabilize housing prices and prevent the “bubble” caused by the rapid rise of housing prices.

For the small and midsize cities in the non-core urban agglomeration, relatively positive and relaxing policies should be implemented to promote the development of the real estate market.

Future research directions

An urban agglomeration is used as a division of cities to study the housing price variation, which is the innovation of this study. Thus, dummy variables are selected for empirical analysis, but we know that it may be better to use continuous variables to analyze the impact effect, which is a deficiency of this study. Further, we can conduct more in-depth research and analysis with spatial econometric models. Now facing the shocks of COVID-19 (Wang and Liu 2022; Liu et al. 2022), the development of the housing market in China become more diverse. Therefore, the method and basic results shown in this study are still useful in future studies.