Numerous studies have shown that phonotactic probability, the likelihood of occurrence of a sound sequence in a language, and neighborhood density, the number of words that are phonologically similar to a given sound sequence, influence spoken-language recognition, production, and acquisition of both real words and nonwords across the lifespan (e.g., Munson, 2001; Munson, Swenson, & Manthei, 2005; Newman & German, 2005; Storkel, Armbruster, & Hogan, 2006; Storkel & Lee, 2011; Vitevitch & Luce, 1999). Given the clear influences of phonotactic probability and neighborhood density across multiple tasks, age groups, and types of stimuli (i.e., real words vs. nonwords), it is crucial to control or manipulate these variables in psycholinguistic research during either stimulus selection or data analysis. To support this contention, in this study a corpus of consonant–vowel–consonant sequences (CVCs) was created (provided as supplemental materials); phonotactic probability and neighborhood density were measured, based on both child and adult corpora; and potential relationships among CVCs were investigated, in order to better inform stimulus selection. The specific issues addressed were whether (1) computations based on a child corpus would differ from those based on an adult corpus; (2) the phonotactic probability and/or neighborhood density of real words would differ from those of nonwords; and (3) phonotactic probability and/or neighborhood density would differ across CVCs varying in consonant age of acquisition.

Comparison of child and adult values

In terms of the comparability of computations based on child versus adult corpora, a prior study by Storkel and Hoover (2010) addressed this issue for a set of 380 early-acquired nouns that varied in word length and sound structure. Their results showed that child values were significantly correlated with adult values. However, the raw values did differ significantly, with child phonotactic probability being higher than adult phonotactic probability, and child neighborhood density being lower than adult neighborhood density. The transformation of these values into z scores based on the means and standard deviations of the child or the adult corpus reduced the difference between the child and adult values. This finding indicates that the significant differences in the raw values were likely related to differences in the size and composition of the child versus the adult corpus, differences that were minimized by transformation of the values in a manner that was sensitive to the individual characteristics of the corpus. Similar findings were obtained for a nonrandom sample of 310 primarily CVC nonwords. The present report was designed to extend the issue of the comparability of child and adult probability and density values to a large set of CVCs that would include both real words and nonwords. It was expected that the results of the prior study would be replicated, indicating the need to consider differences in the corpora used to compute phonotactic probability and neighborhood density.

Lexicality and consonant age of acquisition

Although Storkel and Hoover (2010) analyzed child and adult values for real words and nonwords, the two types of stimuli were never compared to one another. Thus, it is unclear whether the phonotactic probability or neighborhood density of real words would differ from that of nonwords. Prior research has suggested that the effects of phonotactic probability and neighborhood density may differ for real words versus nonwords (e.g., Munson et al., 2005; Vitevitch, 2003; Vitevitch & Luce, 1998, 1999). In addition, phonotactic probability and neighborhood density are correlated with wordlikeness judgments (Bailey & Hahn, 2001; Frisch, Large, & Pisoni, 2000); that is, nonwords that are higher in probability or density tend to be judged as sounding more like real words than do nonwords that are lower in probability or density. It is possible that this finding could be further extended to show that real words are higher than nonwords in probability and/or density. An understanding of how phonotactic probability and neighborhood density vary by lexicality may inform stimulus selection for future research.

In a similar vein, past research has indicated that phonotactic probability and neighborhood density can influence the accuracy of sound production, with production generally being more accurate for high-probability and/or high-density sound sequences (e.g., Edwards, Beckman, & Munson, 2004; Gierut & Storkel, 2002; Vitevitch, 1997; Zamuner, Gerken, & Hammond, 2004). Moreover, it has been argued that phonological acquisition in children is tightly coupled with acquisition and knowledge of words (Edwards, Munson, & Beckman, 2011; Stoel-Gammon, 2011; Velleman & Vihman, 2002). One question that arises is whether CVCs composed of earlier-acquired sounds might have higher phonotactic probability and/or neighborhood density than do CVCs composed of later-acquired sounds, a finding that would be informative for designing developmental studies of phonotactic probability or neighborhood density.

Purpose

The purpose of the present report is to provide a comprehensive corpus of legal CVCs in American English (see the supplemental materials) that can be used in psycholinguistic research. To that end, phonotactic probability and neighborhood density were computed on the basis of child and adult corpora, and CVCs were coded as real words or nonwords and by consonant age of acquisition. Three questions were addressed: (1) Do phonotactic probability and/or neighborhood density values differ depending on the corpus (i.e., child vs. adult) used for the computations? (2) Are real-word CVCs higher in phonotactic probability and/or neighborhood density than are nonword CVCs? (3) Are CVCs composed of earlier-acquired sounds higher in phonotactic probability and/or neighborhood density than are CVCs composed of later-acquired sounds?

Method

Child and adult corpora

The variables of interest were determined using an online calculator available at www.bncdnet.ku.edu/cml/info_ccc.vi. The child corpus for this calculator is described more fully in Storkel and Hoover (2010). In short, this corpus consists of 4,832 different words spoken by American kindergarten or first-grade children (Kolson, 1960; Moe, Hopkins, & Rush, 1982). The adult corpus is described more fully in Nusbaum, Pisoni, and Davis (1984). Briefly, this corpus consists of 19,290 words taken from a dictionary of American English (Merriam-Webster, 1964). For each word, both corpora contain a phonetic transcription of the target pronunciation in American English in a computer-readable format, an orthographic spelling of the word, and the log frequency of the word, based on a sample of approximately 1 million words. Both corpora generally consist of uninflected root words (e.g., “run” rather than “running”) because this is the typical format for dictionaries, and the child corpus was created to match this format (see Storkel & Hoover, 2010, for details).

Lexicality

CVCs were generated by pairing all possible combinations of initial consonants, vowels, and final consonants. These CVCs were then submitted to the online calculator (i.e., www.bncdnet.ku.edu/cml/info_ccc.vi) to compute the phonotactic probability and neighborhood density based on the child (Storkel & Hoover, 2010) or the adult (Nusbaum et al., 1984) corpus. Importantly, the calculator also identifies whether the input item occurs in either corpus. In this way, real-word CVCs were differentiated from nonword CVCs. A real word was defined as any CVC that occurred in the child corpus only (n = 84), the adult corpus only (n = 592), or both corpora (n = 720). Thus, 1,396 CVCs were identified as real words. For the remaining CVCs, which form the pool of potential nonwords, those with a phonotactic probability and neighborhood density equal to zero were removed from consideration because these were considered to be unattested sequences in this sample of American English. This yielded 4,369 CVCs identified as probable nonwords meeting the characteristics of American English. The Excel file provided in the supplemental materials contains three worksheets, showing (1) real-word CVCs, (2) nonword CVCs, and (3) real- and nonword CVCs combined. The data in the last worksheet (i.e., real- and nonword CVCs combined) were analyzed for this report.

Phonotactic probability

Two raw measures of phonotactic probability were computed on the basis of each corpus (child vs. adult), using the online calculator: positional segment sum and biphone sum. Positional segment sum is computed by first calculating the positional segment frequency for each sound in the CVC, and then adding those individual frequencies together. The positional segment frequency is computed by summing the log frequencies of all of the words in a corpus that contain the given sound in the given word position and then dividing this by the sum of the log frequencies of all the words in the corpus that contain any sound in the same word position. Biphone sum is computed in a similar manner, but the unit of calculation is the pair of adjacent sounds (i.e., CV or VC), rather than a single sound. Thus, the biphone frequency for a given pair of sounds is the sum of the log frequencies of all of the words in the corpus that contain the given sound pair in the given word position, divided by the sum of the log frequencies of all of the words in the corpus that contain any sound in the given word position. Storkel (2004b) has provided a detailed example of these calculations.

In addition to these raw values, transformed values were computed. The transformations were computed for real words alone, nonwords alone, and real words and nonwords combined. For each of these three sets, the mean and standard deviation for each measure of phonotactic probability (i.e., positional segment and biphone sums) was computed for each corpus (i.e., child and adult) and then used to compute a z score and a percentile for each CVC. The formula for the z score is (obtained value – mean)/(standard deviation). Percentiles were computed using an SPSS function (i.e., cdfnormal) that computes the percentile on the basis of a normal curve with the given mean and standard deviation. The z scores for the real words and nonwords combined are the data that were analyzed for this report, so that real words could be compared with nonwords. Table 1 provides the means and standard deviations used to create these z scores.

Table 1 Means and standard deviations used for the z-score transformations

Note that the raw value for a given CVC is the same across all worksheets in the supplemental Excel file, but the transformed value changes across worksheets because the mean and standard deviation used for the transformation is specific to a given worksheet (i.e., set of CVCs: real words only, nonwords only, or both). For future studies in which the supplemental materials are used to select stimuli, a particular worksheet should be chosen on the basis of correspondence with the type of stimuli needed for the study. For example, if only real-word CVCs are being used in the study, the real-word worksheet should be used for stimulus selection, whereas if both real-word and nonword CVCs are being used, the all-CVC worksheet should be used for stimulus selection. The transformed values indicate how extreme a particular CVC is relative to the other CVCs in the same set/worksheet. That is, for the first case, of real words only, a z score of +1.0 for positional segment sum would indicate that the selected CVC has a positional segment sum that is 1.0 standard deviation above the mean positional segment sum of the real-word CVCs, whereas in the second case, of real words and nonwords, a z score for positional segment sum of +1.0 would indicate that the selected CVC’s positional segment sum is 1.0 standard deviation above the mean positional segment sum for all CVCs. Finally, the z scores place the positional segment sum and biphone sum on the same scale, making it appropriate to average the two z scores to create one measure of phonotactic probability when a single measure is desirable.

Neighborhood density

Neighborhood density was computed for each corpus (child or adult) by counting the number of words appearing in the corpus that differed from the given CVC by a substitution of one sound, a deletion, or an addition in any word position. To illustrate, the neighbors of the real-word CVC “rat” include “bat” (initial sound substitution), “rot” (middle sound substitution), “rag” (final sound substitution), “at” (initial sound deletion), “brat” (initial sound addition), and “raft” (penultimate sound addition). Note that the determination of neighbors is based on sounds, rather than spelling. As with phonotactic probability, transformed values—specifically, z scores and percentiles—were computed for neighborhood density following the methods already described. See Table 1 for the means and standard deviations used for these transformations.

Consonant age of acquisition

Categories of consonant age of acquisition were taken from Shriberg (1993), who divided the 24 American English consonants into three groups of eight consonants based on accuracy by a group of children with speech sound disorders. The groupings identified by Shriberg were consistent with data from larger, cross-sectional studies of typically developing children (e.g., Smit, Hand, Freilinger, Bernthal, & Bird, 1990). The three groupings are (1) the early-8, consisting of the sounds m, n, w, y, h, p, b, and d; (2) the middle-8, consisting of ng (e.g., king), t, k, g, f, v, ch, and j; (3) and the late-8, consisting of voiceless th (e.g., thanks), voiced th (e.g., that), s, z, sh, zh (e.g., azure), l, and r. Each consonant (initial and final) in a CVC was coded as early-, middle-, or late-8, and then each CVC was given a whole-CVC code based on the coding of the two consonants. Five whole-CVC codes were used. Specifically, Code 1 (early/early) was assigned to CVCs in which both consonants were early-8 sounds (n = 567). Code 2 (early/mid) was assigned to CVCs with one early-8 and one middle-8 sound (n = 1,387). Code 3 (mid) was assigned to CVCs with one early-8 and one late-8 sound, and also to CVCs with two middle-8 sounds (n = 1,938). Code 4 (mid/late) was assigned to CVCs with one middle-8 and one late-8 sound (n = 1,314). Finally, Code 5 (late/late) was assigned to CVCs in which both consonants were late-8 sounds (n = 559).

Results

All analyses were performed on the combined real-word and nonword set of CVCs (i.e., on the data from the all-CVCs worksheet in the supplemental materials).

Comparison of child and adult values

A comparison of the child and adult values mirrored the findings of the prior study examining a different set of real words and nonwords (Storkel & Hoover, 2010). Figure 1 shows the child and adult raw values for positional segment sum (top panel), biphone sum (middle panel), and neighborhood density (bottom panel). As is shown in this figure, the raw positional segment sum, biphone sum, and neighborhood density from the child corpus were significantly correlated with the raw values from the adult corpus, r(5765) = .95, p < .001, R 2 = .91, for positional segment sum; r(5765) = .88, p < .001, R 2 = .78, for biphone sum; and r(5765) = .89, p < .001, R 2 = .79, for neighborhood density. However, a t-test analysis showed that the positional segment sum based on the child corpus was significantly higher than that based on the adult corpus, t(5764) = −61.81, p < .001. Likewise, the biphone sum based on the child corpus was significantly higher than that based on the adult corpus, t(5764) = −23.93, p < .001. In contrast, the neighborhood density based on the child corpus was significantly lower than that based on the adult corpus, t(5764) = 104.77, p < .001. Because of these significant differences, z scores were used in the subsequent analyses to rescale the measures on a common metric and minimize the differences across corpora (Storkel & Hoover, 2010).

Fig. 1
figure 1

Scatterplots of child versus adult positional segment sum (top), biphone sum (middle), and neighborhood density (bottom). Solid lines indicate the linear regression fit lines. The dashed lines are reference lines indicating perfect correlation

Lexicality and consonant age of acquisition

Three separate multivariate analyses of variance (MANOVAs) were performed: one for each dependent variable (i.e., the z scores for positional segment sum, biphone sum, and neighborhood density). In each MANOVA, lexicality (real word vs. nonword) and whole-CVC consonant age of acquisition (1 = early/early, 2 = early/mid, 3 = mid, 4 = mid/late, 5 = late/late) were the independent variables, and z scores based on the child and adult corpus were the dependent variables. MANOVA was used because the dependent variables based on the child and adult corpora were correlated, making a univariate approach inappropriate, due to inflation of Type I and II error rates (see, e.g., Haase & Ellis, 1987). However, it is important to note that power in MANOVA is still affected by the correlation between dependent variables, such that more highly correlated dependent variables, as in the present report, will tend to reduce power (Cole, Maxwell, Arvey, & Salas, 1994). Thus, the analyses reported here represent a potentially conservative analysis approach, although the relatively large sample size (n = 5,765) offsets this possible limitation.

For positional segment sum, the effect of lexicality was significant, F(2, 5754) = 281.91, p < .001, Wilks’s λ = .91, η p 2 = .10. As is shown in the top panel of Fig. 2, real words had higher positional segment sums than did nonwords, and this was true for the child, F(1, 5755) = 563.00, p < .001, η p 2 = .09, and the adult, F(1, 5755) = 489.33, p < .001, η p 2 = .08, corpora. Likewise, the effect of consonant age of acquisition was significant, F(8, 11508) = 11.85, p < .001, Wilks’s λ = .98, η p 2 = .01. The effect was significant for both the child, F(4, 5755) = 18.04, p < .001, η p 2 = .01, and the adult, F(4, 5755) = 11.98, p < .001, η p 2 = .01, corpora. This significant effect was further examined via Tukey HSD. As is shown in the top panel of Fig. 2, CVCs with two early consonants (i.e., Code 1 early) had significantly higher positional segment sums than did all other combinations of consonant age of acquisition, all ps < .001, for both the child and adult corpora. In contrast, CVCs with one middle and one late consonant (i.e., 4 mid/late) had significantly lower positional segment sums than all other combinations of consonant age of acquisition, all ps < .01 for the child and adult corpora. As can be seen in the top panel of Fig. 2, lexicality did not significantly interact with consonant age of acquisition, F(8, 11508) = 1.13, p = .34, Wilks’s λ = .998, η p 2 = .001.

Fig. 2
figure 2

Normalized (i.e., z-score) positional segment sum (top), biphone sum (middle), and neighborhood density (bottom) by consonant acquisition class for real words (RW) based on the adult (open bars) or the child (vertical-line bars) corpus, and for nonwords (NW) based on the adult (filled bars) or the child (dotted bars) corpus. Error bars indicate standard errors

For biphone sum, the effect of lexicality was significant, F(2, 5754) = 411.09, p < .001, Wilks’s λ = .88, η p 2 = .13. As is shown in the middle panel of Fig. 2, real words had higher biphone sums than did nonwords, and this was true for the child, F(1, 5755) = 808.80, p < .001, η p 2 = .12, and the adult, F(1, 5755) = 517.99, p < .001, η p 2 = .08, corpora. Likewise, the effect of consonant age of acquisition was significant, F(8, 11508) = 9.96, p < .001, Wilks’s λ = .99, η p 2 = .01. The effect was significant for both the child, F(4, 5755) = 13.40, p < .001, η p 2 = .01, and the adult, F(4, 5755) = 6.23, p < .001, η p 2 = .004, corpora. As can be seen in Fig. 2, CVCs with two early-acquired consonants (i.e., 1 early/early) had significantly higher biphone sums than did all other combinations of consonant age of acquisition, all ps < .01, for the child and adult corpora. In contrast, CVCs with one middle and one late-acquired consonant (i.e., 4 mid/late) had lower biphone sums than did most other combinations of consonant age of acquisition, all ps < .05, for the child (except for 5 late/late) and the adult (except for 2 early/mid) corpora. Likewise, lexicality did not interact significantly with consonant age of acquisition, F(8, 11508) = 0.29, p = .97, Wilks’s λ = 1.00, η p 2 < .001.

For neighborhood density, the effect of lexicality was significant, F(2, 5754) = 728.26, p < .001, Wilks’s λ = .80, η p 2 = .20. As is shown in the bottom panel of Fig. 2, real words had higher densities than did nonwords, and this was true for the child, F(1, 5755) = 1,182.77, p < .001, η p 2 = .17, and the adult, F(1, 5755) = 1,446.61, p < .001, η p 2 = .20, corpora. Likewise, the effect of consonant age of acquisition was significant, F(8, 11508) = 38.89, p < .001, Wilks’s λ = .95, η p 2 = .03. This effect was significant for both the child, F(4, 5755) = 42.38, p < .001, η p 2 = .03, and the adult, F(4, 5755) = 10.72, p < .001, η p 2 = .01, corpora. As can be seen in Fig. 2, density tended to decrease as consonant age of acquisition increased. All pairwise comparisons were significant for the child corpus, all ps < .05, and most pairwise comparisons were significant for the adult corpus, all ps < .01, except for 2 early/mid versus 3 mid, and 4 mid/late versus 5 late/late. These significant main effects were qualified by a significant interaction between lexicality and consonant age of acquisition, F(8, 11508) = 7.91, p < .001, Wilks’s λ = .99, η p 2 = .01. Note that the interaction was significant only for the adult corpus, F(4, 5755) = 4.69, p = .001, η p 2 = .003, and not for the child corpus, F(4, 5755) = 0.32, p = .87, η p 2 < .001. As is shown in the bottom panel of Fig. 2, this interaction appeared to be attributable to a stronger effect of consonant age of acquisition on density for nonwords rather than real words, especially for density based on the adult corpus. That is, the effect of consonant age of acquisition on density was significant for nonwords, F(4, 4364) = 50.21, p < .001, η p 2 = .04, and for real words, F(4, 1391) = 9.50, p < .001, η p 2 = .03, with the child corpus, but the effect was significant only for nonwords, F(4, 4364) = 26.00, p < .001, η p 2 = .02, and not for real words, F(4, 1391) = 1.35, p = .25, η p 2 < .01, with the adult corpus.

Discussion

To summarize, the three main findings are that (1) phonotactic probability based on a child corpus was higher than that based on an adult corpus, whereas neighborhood density based on the child corpus was lower than that based on the adult corpus; (2) real-word CVCs were higher in probability and in density than nonword CVCs; and (3) CVCs composed of earlier-acquired sounds were higher in both probability and density than CVCs composed of later-acquired sounds. These three findings have both theoretical and methodological implications.

Comparison of child and adult values

The first finding replicates a prior study using the same child and adult corpora and calculator (Storkel & Hoover, 2010), but with a larger and more varied set of CVCs. However, the prior explanation of these significant differences across corpora is likely relevant to the present findings. Specifically, the prior analysis of the two corpora showed that the words in the child corpus were higher in frequency than the words in the adult corpus (Storkel & Hoover, 2010), possibly as a by-product of frequency effects on learning. That is, a child’s lexicon is likely to consist predominately of high-frequency words, which are easier to learn (e.g., Storkel, 2004a). As the lexicon grows, low-frequency words are added, such that the adult lexicon consists of a mix of low- and high-frequency words. Because word frequency is used in calculating phonotactic probability, this change in the frequencies of the words in the lexicon could account for the observed lowering of phonotactic probability from the child to the adult corpus. Likewise, as words are added to the lexicon, the overall size of the lexicon changes, including the size of individual neighborhoods (e.g., Charles-Luce & Luce, 1990, 1995). The observed increase in neighborhood density from the child to the adult corpus is consistent with these prior results.

In terms of methodological implications, it is important to note that child phonotactic probability and neighborhood density were highly correlated with adult phonotactic probability and neighborhood density. Thus, when only broad distinctions (e.g., low vs. high probability or density) are being studied across ages, it will likely be possible to identify stimuli that are low or high for both child and adult measures of probability or density. However, studying finer distinctions in phonotactic probability or neighborhood density may be more challenging, because of differences across the corpora that likely also reflect differences across ages (e.g., changes in the size of the corpus likely mirror changes in the size of the lexicon). To illustrate, a density of five neighbors may not have the same “meaning” across the child and the adult lexicons. Specifically, five neighbors is relatively close to the mean density for children (i.e., z score = −0.62), but relatively farther from the mean density for adults (i.e., z score = −1.11). Moreover, it is unclear whether the raw density or the relative density is what critically influences language processing. That is, does the presence of five neighbors have the same effect on language processing, regardless of where this falls in the density distribution (i.e., raw values matter), or does a word’s relative position within the system (i.e., the degree of sparseness) influence processing (i.e., relative measures matter)? Note that a similar scenario could be constructed for phonotactic probability. In selecting stimuli for developmental studies investigating finer distinctions of phonotactic probability and neighborhood density, the theoretical framework would need to be considered to determine whether raw or relative values would be predicted to influence language processing. If strong predictions were not possible, then both types of values might need to be investigated to determine which aspect of probability or density influences language processing.

Lexicality

The finding that real words are higher in both probability and density than nonwords is consistent with prior studies of wordlikeness ratings, in which higher-probability and higher-density nonwords have been judged to be more wordlike than lower-probability or lower-density nonwords (Bailey & Hahn, 2001; Frisch et al., 2000). That is, there appears to be a relationship between lexicality or potential lexicality (i.e., wordlikeness) and phonotactic probability and neighborhood density, such that higher-probability and higher-density CVCs are preferred. Thus, across the distribution of legal CVCs, the most probable and dense CVCs tend to be actual words in the language, whereas the least probable and dense CVCs tend to be excluded from the language. This fits well with the results of other studies, suggesting that language growth (i.e., adding new words to a language) is governed by preferential attachment (e.g., Perc, 2012), a process by which new items that are highly similar to existing items are more likely to be added to a system than are new items that are less similar.

Turning to methodological considerations, it is particularly notable that the differences between nonword and real-word CVCs were quite strong, with relatively large effect sizes (i.e., η p 2 range of .10–.20). Moreover, the z-score differences were approximately 1.00 for most comparisons (see Fig. 2), meaning that values for real words and nonwords differed by approximately one standard deviation (see Fig. 2). The practical implication of this is that matching real words and nonwords on phonotactic probability and/or neighborhood density requires careful attention during stimulus selection and may not be possible, depending on other study-specific criteria. If matching is not possible, then interpreting the effects of lexicality, phonotactic probability, and neighborhood density may be challenging. For example, if lexicality is manipulated without specific attention to phonotactic probability and neighborhood density, it is likely that the items will differ in probability and density, which could serve to amplify or dampen the effect of lexicality. Thus, any results could not be solely attributable to lexicality. In this scenario, more complex statistical analyses (e.g., crossed-random-effects multilevel modeling) might be a useful post-hoc solution, providing a means to account for differences in phonotactic probability and/or neighborhood density within the statistical analysis. Whether this is a viable solution would depend on precisely how large the probability and/or density difference is between the real words and nonwords. In addition, when comparing the effects of phonotactic probability and neighborhood density across real words and nonwords, there could be two potential interpretations. The first would be that phonotactic probability and/or neighborhood density affect the processing of real words differently than the processing of nonwords. The second would be that phonotactic probability and/or neighborhood density affect processing differently at different points in the probability and/or density distribution (cf. Storkel, Bontempo, Aschenbrenner, Maekawa, & Lee, 2012, for a study showing differences in word learning across the full distribution of probabilities and densities). Again, statistical analyses may be helpful in ruling out or supporting one of these alternatives over the other.

Consonant age of acquisition

Turning to consonant age of acquisition, the finding that CVCs composed of earlier-acquired sounds have higher probability and higher density than do CVCs composed of later-acquired sounds is consistent with claims that phonological and lexical development are tightly coupled (Edwards et al., 2011; Stoel-Gammon, 2011; Velleman & Vihman, 2002). In this case, both sound and word characteristics converge on favorable characteristics that should facilitate correct production.

In terms of methodological implications, the phonotactic probability and neighborhood density differences between CVCs composed of earlier-acquired and later-acquired sounds were somewhat weak, with relatively small effect sizes (i.e., η p 2 range of .01–.03) and z-score differences of approximately 0.50 or less (see Fig. 2). Unlike the lexicality effect on phonotactic probability and neighborhood density, the consonant age-of-acquisition effect should be relatively easier to contend with when designing a study, because there is a fair degree of overlap in the probability and density distributions for each consonant age-of-acquisition category (see Fig. 2). Thus, it is likely that CVCs composed of earlier-acquired sounds could be closely matched in phonotactic probability and/or neighborhood density to CVCs composed of later-acquired sounds, to isolate consonant age-of-acquisition effects in an empirical study. Likewise, it should be possible to define ranges for low versus high phonotactic probability and/or neighborhood density that are the same for CVCs composed of earlier-acquired versus later-acquired sounds, making it possible to cleanly cross phonotactic probability and/or neighborhood density with consonant age of acquisition.

Conclusions

Child and adult corpora yield differing values for phonotactic probability and neighborhood density, but the implication of this for future research will depend on the level of precision needed in manipulating probability or density. In contrast, large differences in phonotactic probability and neighborhood density exist between real words and nonwords, which may present methodological challenges in designing studies that manipulate lexicality and phonotactic probability or neighborhood density. Although CVCs composed of earlier-acquired sounds did differ in probability and density from those composed of later-acquired sounds, this effect was relatively small and less likely to present significant methodological challenges to studies that manipulate consonant age of acquisition and phonotactic probability or neighborhood density.