An abundance of research has established that phonological awareness skills are important prerequisites for early reading acquisition (e.g., Bradley & Bryant, 1983; Kjeldsen et al., 2019; for reviews see Melby-Lervåg et al., 2012). Early development of phonological awareness implies that a child moves from implicit to explicit control of the sound structure of language, and this explicit control is critical when a child learns to understand and handle the alphabetic principle (e.g., Caravolas et al., 2013; Lundberg et al., 2010; Papadopoulos et al., 2012). However, languages using the alphabetic principle vary in consistency. In a transparent language, the relationship between phonemes and graphemes is more predictable than in opaque orthographies. English is the most opaque alphabetic language, and English-speaking children typically learn to read at a slower pace than children learning to read in more transparent languages (Frith et al., 1998; Seymour et al., 2003). Yet, it appears as the same mechanisms underpin the reading development in languages of different orthographic depths, and that phonological awareness is a valid predictor for acquiring early reading skills in both transparent and opaque languages (Caravolas et al., 2013). It should be noted, however, that some researchers argue that phonological awareness is of less importance in transparent languages (de Jong & van der Leij, 1999; Wimmer, 1993), whereas others suggest that phonological awareness training offers faster results in transparent than opaque orthographies (Kjeldsen et al., 2019). However, regardless of transparency, most phonological interventions have been carried out in combination with, or just before, formal reading instruction starts, and studies have typically investigated development over short periods (Kjeldsen et al., 2019). In the present randomized experiment, phonological awareness training was carried out in Swedish preschools for six weeks when children were four and five years old, starting three years before the start of formal reading instruction and with follow-up during five years. When children were six years old, all children, including children in the control condition, received phonological training offered by their schools in grade 0, which form of schooling is intended to function as a bridge between the informal learning that occurs at preschool and the more formal learning in school. Grade 0 is non-obligatory, but 95% of Swedish six-year-old children participate (National Agency of Education, 2014).

The overall aim of the present study was to investigate the effects of this early phonological awareness training on reading and spelling in grades 2 and 3. To the best of our knowledge, this is the first study to examine phonological awareness training three years before formal reading instruction starts. Another novel contribution is the fact that all children, regardless of group assignment, received phonological training in grade 0 prior to effects being examined. Thus, the potential effects obtained from training in the current study, will be caused by two short periods of phonological training at ages 4 and 5.

Previous research on phonological training

As early as 1983, Bradley and Bryant conducted a longitudinal training study among English speaking children aged four to five, where they demonstrated that there is a causal link between phonological awareness and reading. However, the children received phonological training while learning to read. To avoid a possible confounding effect between reading and phonological awareness training, Lundberg et al. (1988) conducted a phonological awareness training study in Denmark one year before formal reading instruction started. The results from the Bradley and Bryant study (1983) were replicated, but now in a more transparent orthography: the trained children outperformed the control children on phonological awareness, reading and spelling. The results also showed (Lundberg, 1994) that the majority of children in the experimental group who were in a high-risk zone for later reading problems reached a normal level of reading in Grade 3, which was not the case for the high-risk children in the control group.

The Lundberg et al. study (1988) has been replicated by Schneider et al. (1997) in Germany, and by Kjeldsen, Niemi and Olofsson (2003) in Finland with Swedish speaking children. Both studies emphasized the importance of structure in the phonological program. Kjeldsen et al. (2003) showed that training must be strictly systematic to be effective. Thus, the order of components of phonological awareness has to be carefully planned, beginning from easy listening tasks moving to the most advanced levels of phoneme identification and manipulation. High-quality interventions also need to be easy-to-implement, and to be carried out by trained professionals, as, for example, less well-planned lessons could result in teachers giving simpler tasks to children with the purpose to keep all children engaged (Kjeldsen et al., 2019). Further, Kjeldsen et al. (2003) also demonstrated that quality is more important than quantity. Substantially shorter programs than in the Lundberg et al. study were equally efficient, whereas a less structured program was not. However, a large field study was carried out in Sweden by Lundberg et al. (2010), including two cohorts of six-year-old children in Grade 0. They examined the effect of frequency of training, and found, as expected, that phonological awareness is possible to improve, and that it is strongly related to reading. However, in contrast to Kjeldsen et al. (2003), the authors found the quantity of training to be important, as there was an interaction effect between dosage and phonological awareness skills, such that children with low initial phonological ability benefitted from more practice, whereas for the higher performing children the frequency of training seemed less important (Lundberg et al., 2010). These conflicting findings may imply that comparing group means is not sufficient, as suggested by Byrne et al. (2000). They found that the actual level of phonological skills was less important than the rate of acquisition of these skills. Children who are slow in acquiring the principles of phoneme-grapheme correspondence will be slower to acquire related skills too, and will need extra training. Thus, comparing overall group means may conceal differential results for subgroups of children. In a similar vein, Kjeldsen et al. (2019) found only small group mean differences among students in grades 6 and 9, who previously participated in a phonological training study in Kindergarten. However, only 24% of at risk readers in the intervention group, compared to 60% of at risk readers in the control group, still belonged to the lowest quartile in word reading and reading comprehension.

As outlined by Kjeldsen et al. (2019), it is crucial to examine if training in phonological skills will result in enhanced reading comprehension and fluency throughout the school career. Yet, only a few studies investigating phonological training have a follow-up of five years or longer. These studies have been carried out before reading instruction (Byrne et al., 2000; Elbro & Petersen, 2004; Kjeldsen et al., 2014, 2019; Partanen & Siegel, 2014), at the beginning of reading instruction (Snowling & Hulme, 2011), or when reading difficulties already are established (Blachman et al., 2014; Wolff, 2016). Kjeldsen et al. (2019) observed that long-term transfer effects to word decoding and reading comprehension were found of phonological training in kindergarten with Danish and Swedish speaking children (Elbro & Petersen, 2004; Kjeldsen et al., 2014). In contrast, studies including phonological training with older English and Swedish speaking students who had already acquired reading difficulties (Blachman et al., 2014; Wolff, 2016) were successful to improve reading comprehension, fluency and word decoding at the immediate post-test and up to one year later, but they failed to show any effects on reading comprehension in the long-term follow-ups. Kjeldsen et al. (2019, p. 377) suggest that momentum was lost in those studies “…because the training was burdened by the need to unlearn unsuccessful strategies brought by about an initial reading failure.” This hypothesis is in line with much previous research, such as the National Reading Panel (Ehri et al., 2001) and others (e.g., Al Otaiba et al., 2009; Bus & van IJzendoorn, 1999; Tunmer, 2008).

Dimensions of phonological awareness

In an earlier study, involving the same sample as in the present study before the intervention, we have proposed a model of phonological awareness with two dimensions at age 4 (Wolff & Gustafsson, 2015): a linguistic complexity dimension and a processing complexity dimension. The linguistic complexity dimension spans morphemes, syllables and phonemes, going from less to more complex linguistic units, and the processing complexity dimension spans identification, blending/segmentation and manipulation, going from less to more complex processing. To assess these skills the two dimensions are crossed. Thus, the tasks assessing morphemes, syllables and phonemes, also, one at a time in a systematic order, require the processing skills of identification, blending/segmentation and manipulation.

General fluid intelligence (Gf), in many ways represents the activities required in the phonological processing activity, as the capacity to identify patterns, understand relationships and solve novel problems (Gustafsson & Wolff, 2015). Both reading ability and later academic success are predicted by Gf, but according to Bowey (2005), Gf best predicts early reading acquisition. In line with this, de Jong and van der Leij (1999) found that the direct relation between Gf and reading diminishes over time. One hypothesis that has obtained support (Wolff & Gustafsson, 2015) is that Gf works as a trigger for the development of phonological awareness skills, and that the impact of Gf on reading skills is mediated through phonological awareness. Wolff & Gustafsson (2015) demonstrated highly significant relations between Gf and the processing complexity factors, identification being the highest while blending/segmentation and manipulation were lower. However, manipulation would be expected to have the highest relation with Gf, given that Gf represents the ability to manipulate complex relations. One explanation why this was not found may be that the ability to manipulate sounds is generally not developed at age 4, whereas there are substantial individual differences in the ability to identify sounds (Wolff & Gustafsson, 2015). These developmental differences among phonological tasks, the strong relation between Gf and reading acquisition, and the hypothesis that Gf works as a trigger for the development of phonological awareness underscore the usefulness of including Gf in the present study as an early predictor of reading development following the phonological training.

The present study

The present study aims at extending the rich knowledge of the effects of preventive phonological interventions preceding reading instruction. These studies show positive effects on development of phonological awareness, but the problem with “treatment resisters” has been discussed. Torgesen (2000) analyzed five preventive interventions, and found that around two to six percent of the children could be defined as treatment resisters, who seem to be present even when the best practice instruction is applied. Thus, some children do not grasp the idea of phonemes as discrete entities, and they do not seem to enhance their phonological skills to an acceptable level by the training. To meet this problem, the phonological awareness training in the present study was carried out three years before formal reading instruction started when children were four years old, and continued with a second wave of training at the age of five. The intention was to begin the study at this early stage when children’s explicit awareness of the structure of speech starts to emerge (Wolff & Gustafsson, 2015; Dodd & Gillon, 2001), and to implement training on less complex phonological structures in order to facilitate the more advanced phonological training in grade 0.

The most novel contribution from the present study is that in contrast to previous studies, both the control and experimental group received phonological training in grade 0, but only the experimental group received the early phonological training at ages 4 and 5. Thus, potential effects of the intervention derive from two waves of six weeks of training in preschool. The design also allows for studying long-term effects of this early training, and only a handful of previous studies have follow ups of five years or more according to Kjeldsen et al. (2019). Furthermore, as to our knowledge there has not been any evaluation of phonological awareness training which begins three years before formal reading instruction starts. Even in countries where school starts at an early age, very few interventions have been conducted with children as young as around four years old.

Research questions

The more specific research questions are:

1) Does structured phonological awareness training starting at the age of 4 affect later development of phonological awareness?

2) Does structured phonological awareness training starting at the age of 4 affect reading related skills four and five years later in grade 2 and 3?

3) Are there differential effects of phonological awareness training as a function of children’s cognitive abilities?

Method

Design and participants

The participants (N = 364) were recruited from 58 preschools in 8 municipalities. The participating preschools were situated in rural as well as urban regions, approximately representative of the Swedish population. Also, non-native Swedish speaking children (n = 38) were included. The preschools were to have at least three children who could form a group, and who were between 3 and 10 months and 4 years 4 months old. The preschool groups were randomly assigned to an intervention group (n = 138) or to a control group (n = 226). The groups comprised three to six children. In case there were two groups at the same preschool, both groups were assigned to the same condition. The Intervention group received phonological awareness training for six weeks at the age of 4, and for six weeks at the age of 5. Before the intervention at age 4, a pre-test was given (t1) assessing Gf and phonological awareness; in grade 0, age 6, (t2) the phonological awareness tests were re-administered; and four and five years later in grade 2 (t3) and grade 3 (t4), reading related skills were assessed. Informed consent was obtained from all parents before t1.

Attrition

From t1 when children were four years old (n = 364) to t2 when children were six years old seven children dropped out of the study (n = 357), and there was further attrition of eight children to t3 in grade 2 (n=349). Two more children from the control group dropped out before t4 in grade 3 (n = 347). The attrition of 4.7% was mostly due to children moving, and a few children were excluded because of hearing problems or serious concentration problems. There were no differences in the outcome measures at age 4 between children who later dropped out and children who did not.

Instruments

The tests at age 4 were individually administered, whereas all tests in grades 2 and 3 were group administered, except for reading speed in grade 2. At age 4 the phonological awareness tests were administered before the Gf test battery. The Gf tests were given in the order they appear in the text below. In grade 3, the tests were given in the following order: spelling, silent word reading, phonological choice, reading speed, reading comprehension and word reading list.

Phonology

The instrument measuring phonological skills consists of 18 tasks, each including nine items (Wolff, 2013). It is designed for children between four and six years old, and it reflects two dimensions of phonological task complexity: linguistic complexity and processing complexity. The test is divided into three blocks according to the processing complexity level: Identification, Blending/Segmentation and Manipulation. At each of the processing complexity levels, the items reflect three linguistic complexity levels: the morpheme level, the syllable level and the phoneme level. Swedish is a morpho-phonological language. Compound words and other long words are frequently used. There are many consonant clusters with a maximum of five consonants in one cluster (Wolff, 2009). The use of compound words is reflected in the test battery in that the morphological tasks concern compound words; also long words are used at the various linguistic levels. However, complex consonant clusters are avoided in the test battery for these young children.

In all three blocks, the child receives two test items and corrective feedback before each task of nine items. Corrective feedback is not provided on test items. When a child makes three subsequent errors the testing is interrupted and moves on to the next block of tasks. Below the tasks in the phonological test are described.

Identification. The first task is on the morpheme level, and the child is asked to point at the two pictures among three presented ones, which represent compound words that begin with the same morpheme, in this case a complete word. An example in English would be pictures representing the words snowball and snowman with the distractor picture representing the word football. At the syllable level, the child is asked to judge whether two words rhyme or not. At the phoneme level, one item is to point at the picture among three presented ones, which represents a word that begins with the same sound as sun (i.e., /s/). There is one task on the morpheme level, one task on the syllable level and three tasks on the phoneme level. Cronbach’s alpha reliability is 0.93.

Blending/segmentation. Example items at the morpheme level are to tell which word you get if you have the word butter and you add the word fly, or vice versa, i.e., what you get if you have the word butterfly and separate it into two words. At the syllable level, one task is to listen to syllables presented in isolation, e.g., win-ter, and blend them together to a word, and another task is to indicate syllables in a word by using markers, e.g., summer. Similarly, at the phoneme level the task is to blend and segment phonemes. One example is to listen to d-o-g, and blend the phonemes together, another task is to segment the word boy into phonemes and indicate them by using markers. The first task in Blending/segmentation is on the morpheme level, the next two are on the syllable level and the last three are on the phoneme level. Cronbach’s alpha reliability is 0.95.

Manipulation. At the morpheme level, one example of an item is to delete the first part (morpheme) of the compound word doorstep, i.e., door, and indicate what word this makes by pointing at one of three pictures. The next morpheme task is similar, but a vocal response is required. At the syllable level, a word is presented orally, e.g., crocodile, and the task is to say which word is left if you take the syllable cro away. At the phoneme level an example of an item is to say the word mat without the sound /m/. The syllable and phoneme tasks increase in difficulty in the way that the correct responses are not real words. The two first tasks are on the morpheme level, the next two are on the syllable level and the last two are on the phoneme level. Cronbach’s alpha reliability is 0.94.

Cognitive tasks

The test-battery of cognitive tasks included the following measures: Sentence Memory, WPPSI-R (Wechsler, 1991); Coloured Progressive Matrices (Raven, Raven, & Court, 2000); Corsi block tapping modeled after Farrell Pagulayan, Busch, Medina, Bartok, and Krikorian, 2007); WNV Recognition (Wechsler & Naglieri, 2006); Wechsler preschool and primary scale of intelligence III, Block Design (Wechsler, 1991); Verbal Working Memory (Wolff, 2013); Word Span Backward and Word Span Forward (Thorell and Wåhlstedt, 2006); NEPSY Comprehension of Instructions (Korkman et al., 1998); Wechsler nonverbal scale of ability; and WNV Matrices (Wechsler & Naglieri, 2006).The tests are not presented in detail here, as their measurement properties were investigated in another study (see Gustafsson & Wolff, 2015 for detailed descriptions of the tests). In summary, the test battery included three visuospatial problem solving tests hypothesized to measure Gf (Coloured Progressive Matrices, Block Design and WNV Matrices), two visual STM tests (Corsi and WNV Recognition), two verbal STM tests (Word Span Forward and Sentence Memory) and three verbal WM tests (Verbal Working Memory, Comprehension of Instructions and Word Span Backward). A sequence of six alternative CFA models of increasing complexity was investigated (see Authors, 2015b). An oblique model with four factors fitted the data well, as did a bi-factor model with a general factor, along with verbal and visual modality factors. The bi-factor model was the preferred solution, and it was concluded that the general factor in this model represents Gf. For the purposes of the present study individual factors scores were computed for Gf using the regression method implemented in Mplus 7 (Muthén & Muthén, 19982017). The two modality factors were not included due to poor reliability.

Reading related tasks in grade 2

Swedish is a fairly transparent language. On a continuum where English is considered a typically deep, or opaque, orthography, and Finnish, Italian and Hungarian are considered shallow, or transparent, orthographies, Swedish is placed somewhere in the middle. It is more opaque than German and less opaque than French. Other important features (Schmalz et al., 2015) are that polysyllabic words and consonant clusters are common. Digraphs and trigraphs are used, and context-sensitive spellings are sometimes unpredictable. The stressed syllables in Swedish include only one long sound, either a long vowel sound or a long consonant sound. The long vowel sound is never marked, whereas the long consonant sound (with some exceptions and depending on the context) is marked by the gemination of the consonants. Typically, the most pronounced obstacle for impaired spellers is the gemination of consonants following a short vowel in a stressed syllable (Wolff, 2009). In such case, a spelling error is more likely than a reading error (Lundberg, 1985). Further, there is usually only one or two possible pronunciations of a grapheme, whereas there are several possible spellings of a phoneme (Lundberg, 1985). Thus, as in many alphabetic scripts, reading is less complex than spelling (Landerl, 2001). Latency is also a more critical issue than accuracy in Swedish, which also seems to be the case in French (Sprenger-Charolles et al., 2000), German (Frith et al., 1998), Italian (Paulesu et al., 2001) and in many other languages that are more orthographically shallow than English (Grigorenko, 2001). Apparently, there is a risk of ceiling effects in word reading accuracy in transparent orthographies (Wimmer, 1993). Therefore, spelling and speeded word decoding scores are better measures of reading related ability as compared to reading accuracy (Wimmer, 1993; Wolff, 2009), and hence, these reading related measures are used as outcome variables besides reading comprehension.

Word decoding

Silent word reading. The task comes from the Wordchains test (Jacobson, 2001) and involves separation of triplets of words written without inter-word spaces. The participant has to correctly mark the inter-word spaces in as many triplets as possible within three minutes. High performance on this task requires fully automatized word identification. The validity of this task as an indicator of word reading skill has been demonstrated in several studies (see e.g., Samuelsson et al., 2003; Skolverket, 2001). Test-retest reliability is 0.89 (Jacobson, 2001).

Word reading list. The task was to read as many printed real words as possible within 60 s. Words were presented in vertical lists, and were not graded by difficulty. The test was originally developed for a previously reported study (Wolff, 2011; 2016).

Phonological choice. Triplets of non-words were printed in columns in a booklet. Each non-word was pronounceable but only one corresponded to a real word when read aloud. The task was to mark in each triplet the alternative, which was a pseudo-homophone with a pronunciation equivalent to a real word. The only way to arrive at a correct decision in this task is to silently pronounce the words and find out which one matches an internal phonological representation that is the sound of a real word. A large number of triplets were presented, and the task was to quickly mark as many correct alternatives as possible within three minutes. A similar task has been used by Olson et al. (1994) and it has proven to yield a valid and reliable indication of phonological ability. Test-retest reliability is 0.84 (Author, 2010).

Spelling

The test leader dictated twenty single words with varying complexity, concerning for example clusters and phoneme/grapheme correspondence, which the students were required to spell. The instruction was to write as much as possible of the word, even if the student was insecure about the spelling. Accuracy was recorded with scores from 1 to 3. A correctly spelled word yielded 3 p, correctly spelled but with a mirrored letter or another similar mistake yielded 2 p, and a partly correct spelling yielded 1 p. The test was developed for this study. Cronbach’s alpha reliability is 0.90.

Reading speed

Pupils read out one text aloud. Rate was measured, and recorded as words/minute. The internal consistency reliability (alpha) for reading speed was not possible to calculate as the test was speeded.

Reading comprehension in grade 2 and 3

Short statements were presented with four alternative pictures each (Lundberg, 2001). The task was to choose among the pictures and indicate which one corresponded to the statement. The distractor alternatives could, for example, illustrate a boy who goes skating when the statement was “The boy goes skiing”. Working time was ten minutes and the total score was the number correct in this time. Test-retest reliability is 0.94.

Intervention group training

The phonological training program was carried out in two waves; one wave of six weeks of training when the children were four years old and a second wave of six weeks when they were five years old. It was carried out in small groups of between three to six children, and lasted for 25 min each school day. The program was designed for the present study, and the preschool teachers were given training during three days by the project personnel on how to carry out the intervention. Fidelity assessments were done by visits from the research team to each preschool. To assess fidelity, the preschool teachers also wrote a short report each day about how the training progressed.

The training program was systematic, and the preschool teacher followed a detailed plan each day. The children were encouraged to disregard the semantic side of the language, and attend to the sound structure. The very first activities were general listening tasks, and the program then continued with activities with words, morphemes and syllables, and towards the end of the first six weeks, they also attended to phonemes. The training was carried out in a playful way, with games and animal toys to assist the teacher. For example, one of these animal toys was always hungry, but preferred different things to eat on different days. One day he wanted compound words only, and the next day he wanted to eat word pairs that rhymed. There were also toys like Sally Seal, who only liked things starting with /s/ and Molly Mouse, who liked things starting with /m/. They introduced “their own” letter by displaying one wooden letter and one printed letter on a sheet of paper. No further activity linked to reading and writing was included in the phonological training program when the children were four and five years old. It thus included phonological activities only.

Control group training

The training in the treated control group was carried out under comparable circumstances as the phonological awareness training, but effort was made to avoid the critical phonological component in the training. The preschool teachers, who were responsible for the control groups were also given special instruction during three days by the project personnel to prepare the training. However, they did not receive a detailed program. Rather, the preschool teachers chose an area of their own interest. They designed a program with support and consent from the research team and in collaboration with other preschool teachers in the control group. Thus, the programs did not include any phonological awareness activities, but typically focused on subjects as geometry, animals or seasons in the nature.

Test procedures

The children were tested individually at their pre-schools, whereas the tests in Grade 2 and 3 were both individually and group administered. All children were given the tests in the same order. The test leaders were special educational needs teachers at the child’s preschool or school, or a special needs education teacher who was affiliated to the student health team in the school district. The test leaders received training before each test point.

Analytic procedures

The method applied was Structural Equation Modeling (SEM), and the models were estimated with the Mplus 7.4 program (Muthén & Muthén, 19982012). Three separate analyses were carried out investigating direct and indirect effects of early phonological training. There are some obvious advantages of using SEM in the present study. It allows for estimation of relations between multiple dependent variables, and for reciprocal and indirect effects. SEM also allows for the use of manifest and latent variables in the same model. Whereas the effects and interaction effects of phonological awareness and training were investigated in one model, the effects and interaction effects of Gf and training were investigated in another model.

The models were estimated with the Robust Maximum Likelihood (MLR) estimator in Mplus 7. In order to take the cluster-sampling design of the study into account, the so-called ‘complex option’ in Mplus was used to obtain cluster-robust estimates of standard errors. Chi-square, Root Mean Square of Approximation (RMSEA) with confidence intervals, Comparative Fit Index (CFI), and Standardized Root Mean Square Residual (SRMR) were used to evaluate model fit. According to generally accepted rules of thumb, the RMSEA estimate should be lower than 0.07–0.08 (Brown & Cudeck, 1993) and CFI should be close to or higher than 0.95 (Hu & Bentler, 1999).

Results

Three SEM models were estimated in order to test the three research questions concerning the effects of the training at ages 4 and 5 on the development of phonological awareness (RQ 1); on reading related skills four and five years later in grades 2 and 3 (RQ 2); and if there were any differential effects of phonological awareness training as a function of children’s cognitive abilities (RQ 3). This section starts with descriptive data of Gf at age 4, phonological awareness at ages 4 and 6, and reading related tasks in grades 2 and 3. Then two SEM models are presented where effects and interaction effects of phonological training and Gf on phonological awareness at age 6 were examined, along with reading speed, word reading and spelling in grade 2, and reading comprehension in grades 2 and 3. A third SEM model follows, where the direct and interaction effects of initial phonological awareness ability and training were investigated.

Table 1 Means and standard deviations of phonological awareness, Gf and reading related skills

Table 1 shows all manifest variables at pre-test at age 4, follow-up of phonological awareness ability at age 6, and the reading related abilities in grades 2 and 3. There were no significant differences between the control and intervention groups at the pre-test, whereas there were significant differences between the groups on phonological awareness at age 6 and all the reading related measures in grades 2 and 3.

Fig. 1
figure 1

Model 1 shows direct and indirect relations from Gf, training and their product term on phonological awareness at age 6 and word reading, spelling and reading speed in grade 2, and reading comprehension in grades 2 and 3

Table 2 Direct and total effects of Gf and treatment on reading related skills in Model 1

Two SEM models, Model 1 and 2, were estimated to examine the possible effects of the training, and to examine if there were differential effects of the training as a function of cognitive abilities. Table 2 shows the results of Model 1. In order to capture the treatment effects, a dummy-variable representing the intervention was included in the model, along with Gf and their product term. Phonological awareness at age 6 was a dependent variable, which in turn was related to word decoding, spelling, reading speed in grade 2 and reading comprehension in grades 2 and 3. There were also direct relations from Gf, the intervention dummy and the product term to reading and spelling outcomes in grades 2 and 3. Figure 1 depicts Model 1 schematically. The model fit statistics were acceptable: Chi-square = 38.38, df = 15; RMSEA = 0.065, CI90 [0.040, 0.091]; CFI = 0.991; SRMR = 0.019. Table 2 shows that Gf at age 4 explained around 50% of the variance in phonological awareness two years later. There was also a medium effect of the training (d = 0.48), and an additional interaction effect of Gf at age 4 and group assignment (r = − 0.16). The interaction effect was in favour of children low on Gf in the experimental group, as illustrated in Fig. 2a. In Fig. 2b, a corresponding interaction effect is illustrated, depicting the interaction between phonological awareness at age 4 and group assignment. Initial phonological awareness skills explained around 37% of the variance in phonological ability at age 6. There was a direct effect of training (d = 0.72, p < 0.001), and an interaction effect in favour of children low on phonological ability in the experimental group (r = − 0.17, p < 0.05).

Fig. 2
figure 2

Interaction effects between, Gf (1a), and phonological awareness (1b) at age 4, and treatment on phonological awareness at age 6

Model 1 further shows a direct effect of Gf on word reading in grade 2 (r = 0.19, p < 0.01), which, however, was much weaker than the total effect (r = 0.55; p < 0.001), obtained by adding the indirect effect via phonological awareness at age 5 to the direct effect. In Model 2 (not reported in Table 2) only the direct relations to reading and spelling in grades 2 and 3 were estimated (Chi-square = 33.16, df = 13; RMSEA = 0.065, CI90 [0.038, 0.093]; CFI = 0.991; SRMR = 0.018). The direct effects in Model 2 essentially equal the total effects in Model 1. This pattern was the same for all the estimated effects. The results in Model 2 show that Gf significantly predicted all the reading measures and that there also were significant interaction effects in favour of the low performing children: word reading r = 0.56, int. = − 0.15; text reading speed = 0.54, int. = − 0.14; reading comprehension r = 0.61, int. = − 0.20; and spelling r = 0.47, int. = − 0.20 in grade 2; and reading comprehension r = 0.61, int. = − 0.18 in grade 3. The phonological training significantly affected the reading measures with effect sizes ranging between d = 0.22 − 0.32.

The effects of initial phonological skills, phonological training and the interaction between the two variables were also investigated, using SEM. Manifest variables were used, except for word reading in grade 2, where three manifest word reading variables indicated the latent variable. The CFA was just-identified, thus with a trivial perfect fit. Figure 2 illustrates the hypothesis that phonological awareness and phonological training predict word reading in grade 2, which underpins the development of text reading speed, spelling and reading comprehension in grade 2, and these skills in turn predict reading comprehension in grade 3. The model had a good fit to the data (Chi-square = 33.96, df = 25, p < 0.109; RMSEA = 0.031, CI90 [0.000 − 0.056]; CFI = 0.996; SRMR = 0.018). Phonological training (d = 0.54, p < 0.001) and phonological awareness (r = 0.41, p < 0.001) at age 4 predicted word reading in grade 2, whereas there was no interaction effect. Word reading in grade 2, in turn, predicted text reading speed (r = 0.93, p < 0.001), spelling (r = 0.69, p < 0.001), and reading comprehension (r = 0.78, p < 0.001) in grade 2. There was also an additional direct effect of phonological awareness at age 4 (r = 0.10, p < 0.001) and spelling (r = 0.13, p < 0.001) on reading comprehension in grade 2. Reading speed (r = 0.28, p < 0.001), spelling (r = 0.13, p < 0.001) and reading comprehension (r = 0.56, p < 0.001) in grade 2 predicted reading comprehension in grade 3.

Fig. 3
figure 3

Structural equation model. Treat = group assignment; Phon, 4 yrs = phonological awareness ability at age 4; Interaction = product term of Treat and Phon, 4 yrs; Word2 = word decoding in grade 2, Spell2 = spelling in grade 2; Speed2 = text reading speed in grade 2; Rcom2 = reading comprehension in grade 2; Rcom3 = reading comprehension in grade 3

*p < 0.05; **p < 0.01; ***p < 0.001

However, not visible in the figure, there also were indirect effects. The early phonological training affected text reading speed (d = 0.50, p < 0.001), reading comprehension (d = 0.46, p < 0.001) and spelling (d = 0.37, p < 0.001) in grade 2, and reading comprehension in grade 3 (d = 0.45, p < 0.001). Phonological awareness at age 4 predicted indirectly text reading speed (r = 0.38, p < 0.001), reading comprehension (r = .44, p < 0.001) and spelling (r = .28, p < 0.001) in grade 2, and reading comprehension in grade 3 (r = 0.40, p < 0.001).

Discussion

The main aim of the present study was to examine if early phonological awareness training preceding the ordinary kindergarten training improves children’s further development of phonological skills, and if that implies better performance in reading and writing in early grades. The overall results show that it is indeed possible to enhance both phonological and reading skills by short periods of training before kindergarten.

The intervention comprised of two waves of phonological training; six weeks at age 4 and six weeks at age 5. The training did not comprise any instruction of letters or reading, but it only addressed different aspects of phonological awareness, gradually moving from games and exercises with morphemes and syllables to phonemes. Hulme et al. (2005) argue that it is almost impossible to train phonological awareness when children are so young that they do not know any letters. However, in this study it was possible as the formal reading instruction starts late in Sweden, and as initially the phonological tasks were quite easy. The reason not to include letters is not to deny the finding that phonological awareness training is most effective when combined with explicit training of phoneme-grapheme mapping (National Reading Panel, 2000). That kind of training was later introduced for all children in grade 0 when they were six years old, one year before formal reading instruction started. For children as young as four years old, the purpose was to examine if early phonological awareness training provides preparation so that all children may benefit from the more advanced training in grade 0. The training proved indeed to be successful.

The effect of the training on phonological awareness one year after the second wave was highly significant with an effect size of d = 47, when taking initial skills of Gf into account. According to Cohen’s rules of thumb (Cohen, 1992), this would be a medium strong effect size. However, as Cohen himself pointed out, these rules of thumb should be cautiously interpreted. In educational settings, an effect size of this magnitude could often be judged as high. In this case, for example, the effect size equals just below 1.5 years of increase of phonological awareness in the control group. Thus, the effect of 0.47 could be regarded as substantial. Further, there was a smaller, yet significant, interaction effect in favour of children low on Gf in the experimental group. In spite of the training moderating the effect, Gf at age 4 explained 48% of the variance in phonological awareness at age 6. A separate regression analysis revealed that phonological awareness at age 4 explained 37% of the variance in phonological awareness. When taking the initial phonological skills into account, the training had an effect size of d = 0.72, and the results showed that children low on phonological awareness benefitted the most from the intervention. Thus, the results clearly demonstrate that it is an advantage for children to receive phonological training as compared to not receive training. This is, however, what many previous studies have shown. The crucial question here is if early phonological training adds to the results after language training in grade 0 for all children regardless of group assignment. Also, is there a transfer effect of the phonological training to reading related skills?

Gf at age 4 was highly predictive of word decoding, text reading speed and spelling in grade 2, and reading comprehension in grades 2 and 3. This strong influence of early Gf on reading was, however, moderated by the phonological training. The at risk children, performing low on Gf, benefitted the most from the training. This was most pronounced in reading comprehension in grades 2 and 3, and spelling in grade 2. Obviously, to support the children at risk to be able to read and comprehend a text, as well as to be able to write is the ultimate goal for phonological training. Several studies have exhibited positive results on spelling (Lundberg et al., 1988; Poskiparta et al., 1999). In Swedish, spelling is less transparent than decoding (Lundberg, 1985; Seymoure et al., 2003; Wolff, 2009). However, as compared to English, spelling and word decoding both correspond well to phoneme-grapheme mapping, and phonological training can be expected to give immediate effects not only on word decoding, but also on spelling (Kjeldsen et al., 2019). According to Byrne et al. (2010), spelling is more sensitive to classroom effects than reading. This is in line with the finding of strong effects of teacher competence in a previous intervention study for children with poor phonological and decoding skills in grade 3 (Author, 2011). In this study special needs educational teachers carried out one-to-one tutoring, and the teachers’ performance on a test of the structure of written language predicted the increase of the students’ spelling abilities during the intervention (Wolff, 2017). Likewise, Kjeldsen et al. (2003) found effects on spelling among at risk children only. The special role of spelling may be because spelling requires very explicit knowledge and recognition of phonemes and graphemes in order to map them successfully. This is of course an issue for students with phonological deficits, and well-structured competent teaching may be more important for them than for other students. Further, spelling related significantly to reading comprehension in grade 2. Theoretically, spelling predicts reading comprehension in this model, but the fact that the measures are simultaneous makes the interpretation less certain. Nevertheless, there seems to be a close relationship between spelling and reading comprehension. Previous research has shown that reading comprehension in grade 3 predicts spelling in grade 4 among poor readers (Wolff, 2011). Thus, it is likely that there is a reciprocal relationship.

The introduction of structured phonological awareness training at a young age aimed to facilitate the more complex phonological training in grade 0, resulting in better phonological skills before formal reading instruction starts in grade 1. Our hypothesis is that phonological awareness underpins word decoding, which in turn underpins text reading speed, spelling and reading comprehension in grade 2, and that the reading related skills in grade 2 can predict reading comprehension in grade 3. The SEM analysis (Fig. 3) statistically supported this view. Phonological awareness at age 4 predicted word reading in grade 2. It also had a small but highly significant direct relation to reading comprehension in grade 2, and there was an additional indirect effect via word reading. As illustrated in Table 2, the total effect equals the sum of the indirect and the direct effects. Some researchers argue that phonological awareness is not important after school entrance. However, in the present study, phonological awareness at age 4 significantly predicted, apart from word reading and reading comprehension in grade 2, also text reading speed and spelling in grade 2 and reading comprehension in grade 3. The early phonological training affected all the reading related measures in grades 2 and 3, with medium strong effect sizes running from d = 0.37 − 0.54. Bearing in mind the phonological training for all children in grade 0, these effects four and five years after training are impressive. However, there were no significant interaction effects between training and initial phonological skills. One reason may be that there were floor effects on phonological awareness at age 4, especially on tasks involving phonemes. Another reason may be that phonological awareness is most important in the lower end of the distribution, whereas high scores are beneficial, however, not in the way that a child with extremely high scores necessarily scores higher on reading than a child with just sufficiently high phonological awareness skills. These reasons in combination may cause the interaction effects to not reach significance. A support for this hypothesis is that a regression analysis involving the 50th percentile of the sample, with spelling as a dependent variable, showed significant interaction effects of training and initial phonological awareness skills.

Limitations and further research

The present study takes advantage of a relatively large sample and it is based on a randomized longitudinal design. The interventions were, furthermore, carried out in typical day-care environments. As such, the study offers good protection against threats to both internal and external validity. However, the study also suffers from some limitations that should be attended to in further research. It would have been advantageous if the outcome measures were linked across the different waves of measurement, so that growth curve modelling techniques could have been used to analyze individual differences in development of skills over the three first grades of schooling. In the absence of linked scales, the outcome measures had to be analyzed on a piecemeal basis that did not allow description of development over longer periods of time. The fact that the present study demonstrated the interventions to be particularly beneficial for the development of at-risk children suggests that the further research should over-sample children from at-risk categories to allow acceptable statistical power even in studies with a limited number of participants.

Concluding remarks

The phonological training was efficient for improving the further development of phonological awareness. Children who are not at risk for developing reading and writing difficulties appear to develop awareness of morphemes and syllables spontaneously, while this does not happen for children at-risk. Therefore, it is important that the development of less complex processing tasks such as identification, and linguistic tasks such as syllables and morphemes are trained early. They later form the basis for enabling training of the more complex phonemic awareness tasks. Our results further indicate that to achieve phonemic awareness, explicit training is required.

The training was especially helpful for children low on Gf and phonological awareness at age 4, i.e., children at-risk for reading difficulties, which may be the children sometimes referred to as “treatment resisters” (Torgesen, 2000). However, it is important not to interpret the results in such a way that it is the children with the most difficulties that should be selected to participate in the training. This study conducted training in unselected groups, that is, everyone at the preschool in the prescribed age span participated. The training is designed to encourage children to discuss and inspire each other, and as noticed training in small groups has been shown to be more efficient than one-to-one training (Bus & van IJzendoorn, 1999; Ehri et al., 2001). Someone’s question or comment can make the penny drop for someone else. Furthermore, all the children may benefit from the training, just to varying degrees.

In line with our hypothesis about the relation between Gf and early reading related skills, Gf at age 4 could clearly predict children’s reading related skills in grades 2 and 3. Furthermore, the phonological training in the experimental group moderated the relation between Gf and the reading related tasks. These results support the hypothesis that phonological awareness mediates the impact of Gf on early reading skills. They also support the hypothesis that children low on Gf are more in need of structured explicit training of phonological awareness than other children.