Introduction

Initial public offerings (IPOs) play an important role in financial markets because they open new investment opportunities, redistribute funds’ allocations and attract new investors to the market. An IPO is usually a long-awaited event in the life of a privately held company, both for the current stockholders and the public exchange investors, giving the owners the opportunity to cash in and giving the investors a chance to gain from potential underpricing and future returns. Here, numerous financial studies have addressed various behavioural biases in relation to IPOs: Ljungqvist and Wilhelm Jr (2005) analysed the satisfaction with an IPO underwriter’s performance, Ljungqvist and Wilhelm Jr (2003) indicated a unique pricing behaviour around the dot-com bubble, while Kaustia and Knüpfer (2008) found that investors’ personal experiences and previous IPO returns have a significant impact on future IPO subscriptions. Other studies have analysed IPO investments (Karhunen and Keloharju, 2001), IPO earnings (Spohr, 2004) and IPO underpricing (Keloharju, 1993) in financial markets on an aggregated level.

Financial markets, in turn, are complex systems comprised of financial decisions, information flows and direct and indirect investor interactions. A typical aspect of a financial market is multidimensionality and agent heterogeneity (Lakonishok and Maberly, 1990; Musciotto et al., 2016). Making an investment decision is a complex procedure because it is layered with different choices that are influenced by various market factors, investors’ experiences, wealth and investors’ stage of life. It is crucial to understand the characteristics of the underlying investor behaviour patterns because these, when combined with their behaviours, shape the dynamics of the whole market and thus are important factors in explaining the booms and bubbles in the financial markets (Ranganathan et al., 2018). Because investors seek higher returns, one possibility is to use social networks and other private information channels to follow other investors’ strategies and to exploit privately channelled information in stock markets. Recently, Baltakys et al. (2018a) provided evidence of the negative relationship between distance and trade timing similarity for household investors, indicating that face-to-face communication is still important in financial decision making. According to Ozsoylev et al. (2013), information links can be identified from realised trades because investors who are directly linked in the information network tend to time their transactions similarly. We follow this idea and use observations on investor-level transactions from shareholder registration data to identify the links between investors, here with a special focus on identifying investor clusters. Prior studies have investigated the structures of investor networks in different contexts (Ozsoylev et al., 2013; Tumminello et al., 2012; Gualdi et al., 2016; Musciotto et al., 2018; Ranganathan et al., 2018; Baltakys et al., 2018b), but investor clusters around IPOs have barely been explored.

We address this research gap by performing a broad multistock exploratory analysis of investor clusters over 69 stocks in the first two years of their IPO. In particular, we seek to establish whether the identified investor clusters are persistent over the first two years of the IPOs and appear across multiple IPO securities, as well as with existing, mature stocks in the market. Our analysis unveils statistically robust investor clusters that form simultaneously in various securities, and that persist over time.

Most of the earlier papers perform analyses on an aggregated category level (Karhunen and Keloharju, 2001; Grinblatt and Keloharju, 2001; Lillo et al., 2015; Siikanen et al., 2018) or concentrate on a single highly liquid stock (Tumminello et al., 2012; Musciotto et al., 2018). Even though earlier studies might have included nearly all market participants (Tumminello et al., 2011a; Musciotto et al., 2018), due to the focus on a single most liquid security, the results were limited and insufficient to conclude what strategies investors employ when trading over multiple securities. In contrast to previous research in the IPO literature, the current study is the first one on early-stage trading behaviour patterns on an individual investor account level. On the other hand, in opposition to the existing research on investor networks, in the current paper, instead of focusing of heavily capitalised stocks we analyse collective investor trading strategies that emerge after IPOs in the Helsinki Stock Exchange (HSE).

With the growing amounts of data and the availability of new datasets, the network theory has become a popular approach in analysing financial complex systems (e.g., Emmert-Streib et al., 2018). Notwithstanding the high interest in the market structure, investor networks and the complexity of investor behavioural interrelationships remain weakly explored. Indeed, high precision financial investor-level datasets covering years of historical data and containing information about the social links are very rare and expensive because of their sensitive nature. Moreover, transactional data often have no explicit or implicit links between investors. As a consequence, the network inference methodologies have gained much interest in recent research (Ozsoylev et al., 2013; Gualdi et al., 2016). Similar to Musciotto et al. (2018), we use the statistical validation method proposed by Tumminello et al. (2011a), which best suits our objectives and the available dataset.

In the current paper, we infer investor networks based on the investors’ trading co-occurrences for 69 securities that had their IPOs between the years 1995 and 2007, and we obtain multilink networks covering two years after their IPOs. Further, by applying the Infomap algorithm (Rosvall and Bergstrom, 2008) on the investor networks, we obtain clusters of investors that share high trade-timing synchronisation. With the obtained network partitioned into clusters, we detect statistically robust clusters that persist in the networks between the first and the second years after the IPO. We also find clusters that form and re-occur over multiple securities. Finally, by cross-validating investor clusters on IPO securities with the investor clusters of more mature stocks, we conclude that the phenomenon of persistent clusters observed in earlier studies (see e.g. Musciotto et al., 2018) is not limited to mature companies but is also observable in young securities during the first years after their IPO.

Dataset and methodology

Dataset

In this paper, we use a unique database provided by Euroclear Finland. The dataset contains all transactions executed in the HSE by Finnish stocks shareholders between 1995 and 2009 on a daily basis. The data records represent the official certificates of ownership and include all the transactions executed in the HSE that change an ownership of assets. Each transaction in the dataset has a rich set of attributes—such as investor sector code, investor birth year, gender and postal code—that we make use of in our analysis to identify and characterise the investor groups. The dataset classifies investors into six main categories: households; nonfinancial corporations; financial and insurance corporations; government; nonprofit institutions; and the rest of the world. Finnish domestic investors correspond to a separate account ID, while foreign investors can choose the nominee registration for the trades. However, the analysis cannot be conducted for nominee-registered transactions because individual nominee investors cannot be uniquely identified. Rather, the nominee investors are pooled together under the custodian’s nominee trading account. Therefore, a single nominee-registered investor’s account holdings may correspond to a large aggregated ownership of several foreign investors. So to avoid inconsistencies in the results, we eliminated nominee transactions from our analysis. This dataset has been also analysed and described in previous research (e.g., Ilmanen and Keloharju, 1999; Baltakys et al., 2018a, 2018b; Ranganathan et al., 2018; Siikanen et al., 2018).

The analysed data are restricted to marketplace transactions for securities that had their IPO listing in the HSE between 1995 and 2009. The official listing dates were provided by NASDAQ OMX Nordic explicitly for the current research. We analyse 69Footnote 1,Footnote 2 stocks in total that were listed in Finland on the Main Exchange or First North in the given time period (Table 1). Some companies (e.g. Oriola) have two share classes with different voting rights. Class A shares give the owner more voting rights than Class B and hence potentially falls under a separate group of investors. Therefore, the comparison or a direct substitution of shares with one another seems improper, and we consider the securities with different voting classes as separate stocks.

Table 1 Summary of IPO stocks

Table 2 gives the number of investors, the number of transactions and the traded volume for the entire set of 69 IPO stocks. The total number of investors who traded an IPO security is 570,039, and the total number of transactions is 76,505,089. The table also shows the number of nominee and non-nominee-registered investors. As shown, a few nominee accounts perform roughly twice as many trades as the non-nominee accounts.

Table 2 Summary of the number of investors, absolute exchanged shares volume and the number of transactions

Methodology

The given dataset is composed of transaction data where investors’ social links are not explicitly given, nor can they be directly obtained from other sources because of data anonymisation. However, given that investors must individually react and adapt to a quickly changing environment, they should identify and follow the best trading strategies. To detect investors with similar trading strategies or, more precisely, trade timing similarity, we take a look at the pairwise investors’ trading co-occurrences. In the current paper, we use a statistically validated network (SVN) method first introduced by Tumminello et al. (2011a). This method, briefly presented below, has been demonstrated to be effective in investigating financial, biological and social systems (Tumminello et al., 2011a, 2012).

To compare the trading position taken by an investor on a given day, irrespective of the absolute volume traded, a categorical variable is introduced that describes the investor’s trading activity. For each investor i and each trading day t having the volume sold of a security Vs(i, t) and the volume bought of a security Vb(i, t), we calculate the scaled net volume ratio as follows:

$$r(i,t) = \frac{{V_{\mathrm {b}}(i,t) - V_{\mathrm {s}}(i,t)}}{{V_{\mathrm {b}}(i,t) + V_{\mathrm {s}}(i,t)}}$$
(1)

Then, a daily trading state can be assigned for an investor after having selected a threshold θ, as follows:

$$\left\{ {\begin{array}{*{20}{l}} {{\mathit {b}} - {\mathrm{primarily}}\,{\mathrm{buying}}\,{\mathrm{state,when}}\,r(i,t)\; > \;\theta } \hfill \\ {{\mathit {s}} - {\mathrm{primarily}}\,{\mathrm{selling}}\,{\mathrm{state,when}}\,r(i,t)\; < \;- \theta } \hfill \\ {{\mathit {bs}} - {\mathrm{buying}}\,{\mathrm{and}}\,{\mathrm{selling}}\,{\mathrm{state,when}} - \theta \le r(i,t) \le \theta } \hfill \end{array}} \right.$$

Note that r(i, t) is not defined for day t that had no trading activity, and therefore, no trading state is assigned. In our analysis, much like in Musciotto et al. (2016), we set θ = 0.25. We have verified that the calculations are not sensitive to θ selection: the results do not vary significantly for the θ threshold ranging from 0.01 to 0.25. With this categorisation, the system can be mapped into a bipartite network. We will take one set of nodes composed of investors and the other set composed of the trading days.

The states b, s and bs of investor i are indicated as ib, is and ibs, respectively. There are nine possible combinations of the three trading states between investors i and j: (ib, jb), (ib, js), (ib, jbs), (is, jb), (is, js), (is, jbs), (ibs, jb), (ibs, js) and (ibs, jbs). Because we are focusing on the positive relationship between investors’ trading strategies, we further analyse only the situations where both investors have been in a buy state (ib, jb), both investors have been in the sell state (is, js), and both investors have been day traders (ibs, jbs), thus excluding the other six trading state co-occurrences.

Statistically validated networks

With the categorical variables on the trading states, the co-occurrence of the trading states of investors i and j can be identified and statistically validated. First, for each investor, her or his activity period is identified. Second, for an investor pair, the length of a joint trading period is determined, T, which is equal to the number of trading days in an annual data sample for a given security (≈250). Then, in the intersection periods of a trader’s activity, \(N_i^ {P}\) (\(N_j^ {P}\)) denotes the number of days when investor i (j) is in a given state {b, s, bs}. Moreover, \(N_{i,j}^ {P}\) denotes the number of days when we observe the co-occurrence of the given states for investors i and j. Under the null hypothesis of the random co-occurrences of a state for investors i and j, the probability of observing X co-occurrences of the investigated states for two investors in T observations can be expressed by the hypergeometric distribution H(X|T, \(N_i^P\), \(N_j^P\)) (Tumminello et al., 2011a). For each trading state P = {b, s, bs}, a p-value can be associated as follows:

$$p\left( {N_{i,j}^P} \right) = 1 - \mathop {\sum}\limits_{X = 0}^{N_{i,j}^P - 1} H (X|T,N_i^P,N_j^P)$$
(2)

Using the SVN method, for each security we construct two subsequent year networks. The analysis for each security spans from the initial listing day up to the second year after the IPO. We assign the categorical variables that define the investor’s daily trading state, and we select only domestic Finnish investors who have traded an IPO stock at least five days during the first or second year. For each analysed security, we take two consecutive one-year periods of categorised trading states for investors. Taking the projection of the investor set in a year, we obtain an annual monopartite investor network, and two investor networks for consecutive years are obtained for each security.

We adjust the p-thresholds using a false discovery rate (FDR) correction (Benjamini and Hochberg, 1995) by taking the sorted p-values p1 < p2 < … < \({p_{{n_{\text{tests}}}}}\) in an increasing order and retain those that satisfy pi < αi/ntests, i = 1, …, ntests. Here, we apply α = 0.05, and ntests equals the total number of observed relationships in a year. All networks are essentially multilink networks, where each link describes the type of trading co-occurrence between an investor pair. This adjustment is needed because there are multiple links and thus multiple tests with a given network. The link between investors i and j is considered to be statistically significant and thus existing if the corresponding p-value, \(( {N_{i,j}^P})\), is below the FDR-adjusted p-threshold. In this way, we obtain validated networks for the first and second years. As an example, Fig. C.1 in Appendix C shows the first year sorted p-values and the FDR thresholds for Kemira GrowHow links.

Statistically validated clusters: persistence in time

We are interested in the investors’ cluster evolution over time. In other words, we want to verify whether investors systematically synchronise their trading strategies with other investors and if such behaviour can be detected in the subsequent year networks. With the community partition for each network, we identify persistent clusters (i.e., clusters that share the same statistically significant component of investors in both the first and the second years after the IPO). Further, we briefly present the method from Marotta et al. (2015).

We are interested in identifying statistically similar clusters that emerged in both years (i.e., clusters with the overexpression of the same investor composition in both clusters, which share nonrandom elements). The probability that X elements in the cluster C1 of the first year network composed of \(N_{C_1}\) elements also appear in the cluster C2 of the second year composed of \(N_{C_2}\) elements under the null hypothesis that the elements in each cluster are randomly selected is given by the hypergeomteric distribution \(H(X|N,N_{C_1},N_{C_2})\), where N is the total number of unique elements over 2 years. By using this distribution, a p-value can be associated with the observed number \(N_{C_1C_2}\) of elements of the cluster C1 reoccurring in C2 according to the following equation:

$$p(N_{C_1C_2}) = 1 - \mathop {\sum}\limits_{X = 0}^{N_{C_1C_2} - 1} H (X|N,N_{C_1},N_{C_2})$$
(3)

We reject the null hypothesis if p(\(N_{C_1C_2}\)) is smaller than a given adjusted threshold, in which case we say that the cluster C1 is statistically similar with the cluster C2. We adjust the statistical threshold using the FDR correction with α = 0.05 and the number of tests being equal to the total number of cluster pairs over 2 years that shared at least one common element.

Statistically validated clusters: similarity across securities

Additionally, to check if the same cluster exists over multiple securities, we expand the analysis and further look for statistically significant overlapping clusters across all investigated securities. Because the IPO event is the alignment point in our analysis, we look for the overlapping clusters in the set of first-year networks and the set of second-year networks separately. We again use the method (Eq. (3)) for the cluster overlaps to detect clusters with nonrandomly overlapping elements (investors). To calculate the p-values, we take N equal to the total number of unique investors across all investigated securities in the same year, where \(N_{C_1}\) is the number of investors in the cluster C1, \(N_{C_2}\) is the number of investors in the cluster C2, and \(N_{C_1C_2}\) is the number of common investors in both C1 and C2. Again, we adjust the statistical threshold using the FDR correction, where α = 0.05 and the number of tests is equal to the total number of cluster pairs within the same year that shared at least one common element.

Overexpression and underexpression of the characterising investor attributes

To describe the investor clusters from the perspective of the attributes, such as postal code, age, gender or the type of organisation, we again use the hypergeometric test for identifying nonrandom overlap (Tumminello et al., 2011b). Once we obtain a system of N elements partitioned into clusters (communities), we want to characterise each cluster C of NC elements. Each element of the system has a certain number of attributes from a specific class. Here, we want to see if the number of elements in the cluster with a specific attribute value is significantly larger than randomly selecting the elements from the total system elements. For each attribute Q of the system, we test if Q is over-expressed in the cluster C. The probability that X elements in cluster C have the attribute Q under the null hypothesis that the elements in the cluster are randomly selected is given by the hypergeomteric distribution H(X|N, NC, NQ), where NQ is the total number of elements in the system with attribute Q. By using this distribution, a p-value can be associated with the observed number NC,Q of elements in cluster C that have the attribute Q analogously with Eq. (3). We reject the null hypothesis if the p-value is smaller than a given FRD-adjusted p-threshold, and we then say that the attribute Q is overexpressed in cluster C. In the FDR-adjustment, the number of tests is equal to the total number of unique attribute values over all attribute classes and all clusters in a network.

Alternatively, the attribute’s Q underexpression can also be tested. Here, we want to see if the number of elements in the cluster with a specific attribute value is significantly lower than randomly selecting the elements from the total system elements. The probability under the null hypothesis that the value of an attribute Q in a cluster C is smaller than the observed value in the system can be obtained from the left tail of the hypergeometric distribution, as follows:

$$p_u(N_{C,Q}) = \mathop {\sum}\limits_{X = 0}^{N_{C,Q}} H (X|N,N_C,N_Q)$$
(4)

Again, if pu(NC,Q) is smaller than a given FDR-adjusted p-threshold, we say that the attribute Q is underexpressed in cluster C. We used the same setting for the FDR correction.

Results

Using the SVN methodology, for each of the 69 securities we infer b, s and bs trading state networks for the first and the second year after their IPO dates. In order to identify investor clusters we start by aggregating the networks for all three possible joint-trading states into one weighted network. Each link in the network is given the weight w {1, 2, 3} depending on how many validated trading states have been observed for a given investor pairFootnote 3. Finally, for each weighted network we identify clusters using Infomap community detection algorithmFootnote 4 (Rosvall and Bergstrom, 2008). Identified communities are locally dense connected subgraphs in a network that play an important role in understanding a system’s topology. In the current paper, communities represent investor clusters that are timing their trades synchronously throughout the year. Table 3 summarises the number of observed clusters during the first and the second year. For example, during the first year, 54 investor clusters were identified in the security’s Kemira GrowHow (FI0009012843) networks, while during the second year 64 clusters were formed. Figure 1a, b visualise the later Infomap clusters for the first-year and second-year networks.

Table 3 Investor network clusters’ statistics
Fig. 1
figure 1

Infomap clusters and their evolution for Kemira GrowHow (FI0009012843). Community detection is used with weighted links based on the total number of buy state, sell state, and day trade link types between two investors. a FDR: 54 clusters, first year after IPO, b FDR: 64 clusters, second year after IPO, c, d show five statistically significant overlapping clusters in both years. Node position is fixed. The colours of reoccurring clusters in all graphs coincide. In a, b, each cluster has a unique colour, with the exception of those with fewer than four elements, which are coloured in grey

Next, for each security, we detect clusters with a statistically significant investor overlap between the first and second year. The summary of statistically validated cluster time persistence for all 69 securities is presented in the fourth column of Table 3. For example, in the Kemira GrowHow networks, only 5 of the 54, i.e. 9% of clusters identified in the first year were observed in the second year. Figure 1c, d display those five clusters that persisted over the first two years after the IPO. The observation in the example that only a small number of clusters persist into the second year is consistent for the majority of the analysed IPO securities. However, there are several securities for which more than a half of the first year clusters persist into the following year. A sample of time persistent clusters and their composition in terms of investor attributes are visualised in the Appendix Figs. A.1 and A.2.

By calculating the fraction of clusters that do not persist into the second year, we observe that over all 69 securities on average 88% of the first-year clusters are not observed in the following year, while the same number falls to 78% for mature company networks inferred during the same periods (more details about the comparison to mature companies are provided in the following section). This observation can suggest the existence of IPO trading strategy-related clusters that form exclusively during the first year after the IPO date and break up in the following year.

Additionally, we analyse cluster overlap across multiple securities, separately for the first-year and second-year networks. The second and third columns in Table 3 show the number of asset-specific clusters over the total number of communities in the first and second year. Here, by asset-specific clusters, we refer to the clusters that are not observable within investor networks of the same year for other IPO securities in our investigated 69 security universe. The number of observed asset-specific clusters is rather small and is around 15% (9%) during the first (second) year averaged over all 69 securities. This means that the majority of investor clusters are found to be present in multiple securities, i.e. they execute synchronised trading strategies over multiple IPOs. Note that this cluster synchronisation is observed even though the network inference periods are not aligned in time. The observed decrease in the overall percentage of asset-specific clusters hints that during the second year after IPO more clusters use non-IPO related trading strategies. This is later supported by the mature security analysis (see the next section and Tables 4 and 5). Figure A.3 in Appendix A shows a sample of clusters with statistically significant investor overlap across multiple securities.

Combining the previous results together, we observe persistent clusters that emerge in investor networks over multiple securities. Figure 2 explains the visualisation of a cluster in this study and Fig. 3 shows a sample of clusters that both, overlap over time and over multiple securities. In the figure, the top (bottom) row of the group refers to the first- (second-) year clusters. Moreover, the downward arrows associate statistically similar clusters in the first-year and second-year networks. The arrows between the clusters in the same year after IPO are omitted for the simplification of the visualisation. Notably, even if some of the clusters are not persistent over time, quite often they appear over different securities.

Fig. 2
figure 2

Graphical representation of the clusters. A single cluster is visualised as a rectangle block, where a row represents one investor with four attributes: sector code, location, gender and birth year decade. Sector code: —Households, —Non-financial, —Financial-insurance, —General-government, —Non-profit, —Rest-world. Geographic location: —Helsinki, —South-West, —Western-Tavastia, —Central-Finland, —Northern-Finland, —Ostrobothnia, —Rest-Uusimaa, —Eastern-Tavastia, —Eastern-Finland, —South-East, —Northern-Savonia. Gender: —Male, —Female, —No-Gender. Decade: —No-age, —1910, —1920, —1930, —1940, —1950, —1960, —1970, —1980, —1990, —2000

Fig. 3
figure 3

Statistically significant cluster overlaps across multiple securities and over time. The figure contains many subfigures separated by borders. Each subfigure presents a cluster of investors that spans over multiple securities and persists in time. The row alignment shows statistically similar clusters in the same year: the top row is the first after the IPO, and the bottom row is the second year after the IPO. The downward arrows show the cluster timewise evolution from the first to the second year for the same security. A cluster is represented by the rectangle. Each cluster is composed of investors with four attributes: sector code, geographic location, gender and decade. See the attribute colour mapping in Fig. 2

Next, we analyse the overexpression and underexpression of the investor attributes in the identified investor clusters. We say that a cluster is overexpressing (underexpressing) an attribute if the number of investors in the cluster with that particular attribute is significantly higher (lower) than could be expected under the null model defined in the “Dataset and Methodology” section. We are primarily interested in the sector code attribute analysis, where investors can be assigned households, nonfinancial corporations, financial and insurance corporations, government, nonprofit institutions, and the rest of the world attribute. Additionally, we test whether or not attributes related to gender, age or geographical location are over expressed or underexpressedFootnote 5.

Over all 69 securities, we identify 115 (28) investor clusters with 182 (40) overexpressed (underexpressed) attributes during the first year after the IPO, and 130 (44) investor clusters with 236 (70) overexpressed (underexpressed) attributes during the second year. The number of overexpressed (underexpressed) attributes is larger than the number of investor clusters, because each cluster can overexpress (underexpress) more than one attribute. The overexpressed clusters are observed over 28 different securities during the first year after IPO and for 27 different securities during the second year after IPO. As for the underexpressed clusters, they are observed over 16 securities during the first year and 20 securities during the second year after IPO.

In order to present the attribute analysis in a concise way, we use the fact that the same clusters appear over multiple securities and assign overexpressed (underexpressed) investor clusters into groups if they are statistically similar. Figure 4 presents the resulting sector code attribute overexpressing investor cluster networks for the first and second years after respective IPOs. In the figure, nodes on the left (right) hand side of the vertical dashed line represent investor clusters observed in the first (second) year after IPO. Statistically similar cluster nodes are connected with links and dotted lines circle network components. Each connected component in the network relates to a group of clusters with a statistically similar investor composition. The dashed lines crossing from the left to the right-hand-side indicate that there is a statistical similarity for some of the clusters in the components between the first and the second year.

Fig. 4
figure 4

Network of investor clusters with overexpressed attributes. On the left-hand-side are the clusters observed in the first year after respective IPOs and right-hand-side, in the second year. Investor cluster nodes are connected with continuous links if they share statistically significant number of individual investors. Dashed links represent statistical similarity between some of the connected cluster components in the first and the second year after the IPOs. Node colours identify overexpressed sector codes within clusters. For overexpressed geographical location see Appendix Fig. B.1, for underexpressed attributes see Fig. 5 and for all overexpressed and underexpressed attributes see Appendix Tables B.1 and B.2. Sector code: - Households, —Non-financial, —Financial-insurance, —General-government, —Non-profit, —Rest-world

Tables B.1 and B.2 in the Appendix summarise the overexpressed and underexpressed cluster attributes for each investor cluster component in Figs. 4 and 5. The largest first and second year components in Fig. 4 are over-represented by finance-insurance and general government institutions, as well as nonprofit organisations. Moreover, the same components underexpress Household sector (see Fig. 5), further supporting their institutional profile. In addition, the same components overexpress location attributes, in particular Helsinki and South-West regions (see Fig. B.1 in the Appendix). Investor clusters with an overexpression of a geographical attribute could be observed because of some locally present investment strategy, for example an investor club, or some other means of local information transfer. Overall, the results show that the largest cluster components mainly contain institutions that are timing their trades similarly in a year. Compared with household investors, institutional traders form robust clusters, that execute similar trade-timing strategies over multiple IPOs, both during the first and the second year after the IPO date. Our findings thus support the studies that provide evidence of institutional herding (Nofsinger and Sias, 1999; Sias, 2004). Some of the financial institutions, such as pension insurance companies, are driven by the same legislation and portfolio restrictions, which can lead to the same trading strategies. Alternatively, traders working for financial institutions have mutual and/or joint private information channels, leading to similar trade timing. The third explanation is that they react to public news in similar ways.

Fig. 5
figure 5

Network of investor clusters with underexpressed attributes. On the left-hand-side are the clusters observed in the first year after respective IPOs and right-hand-side, in the second year. Investor cluster nodes are connected with continuous links if they share statistically significant number of individual investors. Node colours identify underexpressed sector code and geographical location attributes within clusters. Sector code: —Households, —Financial-insurance. Geographic location: —Helsinki, —South-West

Do clusters of IPO investors exist with mature companies?

To verify if our identified clusters are just IPO-related or if they exist with mature companiesFootnote 6 as well, we compare the clusters of the new-to-the-market stocks with five mature companies (see Table 4). For each mature security, just like previously for IPOs, we construct SVNs and identify investor clusters with Infomap algorithm. When constructing the first-year and second-year networks, the periods are aligned with respective IPO dates. This way we construct 345 (69 × 5) networks for each year. Next, we analysed the overlaps between mature security investor clusters and the investor clusters inferred with the data from IPOs, to answer the question if the investor clusters identified with IPO securities exist with a mature company. When statistically validating overlaps between mature and IPO security investor network clusters, we use the total number of cluster pairs with at least one investor in common between an IPO and all five mature securities as the number of tests for the FDR correction. Table 5 shows the number of statistically similar clusters between the IPO and mature securities, as well as the total number of clusters observed in the IPO and the mature security during the exactly same period. Here we observe that on average over all investigated IPO securities only 16% of IPO clusters are not observed in one of the five investigated mature securities during the first year after IPO, and 13% during the second year. By looking at the same table, we can see that only a fraction of total clusters observed in mature securities are also observed in IPO security networks. It can be because not all investors who trade mature securities trade recently issued securities, and if they do, not all of them might apply the same trading strategies and, therefore, not form similar synchronised clusters as in mature securities.

Table 4 Five mature companies with the highest number of transactions in HSE
Table 5 IPO and mature companies investor clusters overlap

Conclusions

In the current paper, we analysed investor interactions and behaviours using a unique dataset of all Finnish investors’ transactions in the HSE. Our selected set of 69 securities is aligned to an IPO event, which occurs when a company first starts publicly trading its securities. We performed an analysis for multiple securities on an individual investor account level by constructing the networks from the statistically validated trading co-occurrences. Our main focus was on the newly emerging market networks and their common and persistent market-driven structures with the other mature and new stocks.

Applying a community detection algorithm, we found statistically similar investor clusters with synchronised trading strategies that were forming repeatedly over several years and for multiple securities. We detected statistically robust clusters between the first and second year after an IPO. We also found clusters that could be found within other securities. By investigating cluster attribute overexpression and underexpression, we find a highly persistent institutional investor cluster. This finding provides further evidence about institutional herding. Comparing the findings with the clusters on mature securities, we observe that the majority of clusters can also be observed with a mature security.

Our results show that some synchronised trading strategies in financial markets span across multiple stocks, are persistent over time and occur with both newly issued and mature stocks. However, this analysis applies to the HSE only and does not generalise to all markets. Further research should check if this phenomenon also exists in other stock exchanges with a larger amount of IPOs; however, to the best of our knowledge, these investor-level data are not available, for example, from the U.S. markets.

Traditional financial research assumes that investors are rational and hold optimal portfolios. However, actual investors have information, intellectual and computational limitations, and they satisficeFootnote 7 when making decisions. The systematic reoccurrence of the clusters gives a notion of possible stronger information connections that the investors share. For example, they may be consistently following the same public information sources or have mutual private information channels. However, with the current research, we do not try to explain the direction or the publicity of the information transfer. On the other hand, according to Ozsoylev et al. (2013), investor networks can be considered proxies of information networks if they are fairly stable over time. In light of this argument, the persistent and security-wide investor clusters can represent the mutual information channels that exist for both new IPO securities and mature stocks (e.g., Nokia).