Abstract
Purpose
To provide guidance regarding the desirable size of pre-tests of psychometric questionnaires, when the purpose of the pre-test is to detect misunderstandings, ambiguities, or other difficulties participants may encounter with instrument items (called «problems»).
Methods
We computed (a) the power to detect a problem for various levels of prevalence and various sample sizes, (b) the required sample size to detect problems for various levels of prevalence, and (c) upper confidence limits for problem prevalence in situations where no problems were detected.
Results
As expected, power increased with problem prevalence and with sample size. If problem prevalence was 0.05, a sample of 10 participants had only a power of 40 % to detect the problem, and a sample of 20 achieved a power of 64 %. To achieve a power of 80 %, 32 participants were necessary if the prevalence of the problem was 0.05, 16 participants if prevalence was 0.10, and 8 if prevalence was 0.20. If no problems were observed in a given sample, the upper limit of a two-sided 90 % confidence interval reached 0.26 for a sample size of 10, 0.14 for a sample size of 20, and 0.10 for a sample of 30 participants.
Conclusions
Small samples (5–15 participants) that are common in pre-tests of questionaires may fail to uncover even common problems. A default sample size of 30 participants is recommended.


References
Presser, S., Couper, M. P., Lessler, J. T., Martin, E., Martin, J., Rothgeb, J. M., et al. (2004). Methods for testing and evaluating survey questions. Public Opinion Quarterly, 68, 109–130.
Converse, J. M., & Presser, S. (1986). Survey questions. Handcrafting the standardized questionnaire. Newbury Park: Sage Publications Inc.
Backstrom, C. H., & Hursch-César, G. (1981). Survey Research (2nd ed.). New York: Macmillan Publishing Company.
Streiner, D. L., & Norman, G. R. (2003). Health Measurement Scales. A practical guide to their development and use (3rd ed.). Oxford: Oxford University Press.
Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey methodology. Hoboken NJ: Wiley.
Beatty, P. C., & Willis, G. B. (2007). Research synthesis: The practice of cognitive interviewing. Public Opinion Quarterly, 71, 287–311.
DeVellis, R. F. (2012). Scale development. Theory and applications (3rd ed.). Los Angeles, Newbury Park: Sage Publications Inc.
Wild, D., Grove, A., Martin, M., Eremenco, S., McElroy, S., Verjee-Lorenz, A., et al. (2005). Principles of good pratice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: Report of the ISPOR task force for translation and cultural adaptation. Value Health, 8, 94–104.
Patrick, D. L., Burke, L. B., Gwaltney, C. J., Kline Leidy, N., Martin, L., Molsen, E., et al. (2011). Content validity—Establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 2—Assessing respondent understanding. Value Health, 14, 978–988.
Sprangers, M. A., Cull, A., Groenvold, M., Bjordal, K., Blazeby, J., & Aaronson, N. K. (1998). The European Organization for Research and Treatment of Cancer approach to developing questionnaire modules: An update and overview. Quality of Life Research, 7, 291–300.
Fayers, P. M., & Machin, D. (2000). Quality of life. Assessment, analysis and interpretation. New York: Wiley.
Beaton, D. E., Bombardier, C., Guillemin, F., & Ferrsaz, M. C. (2000). Guidelines for the process of cross-cultural adaptation of self-report measures. Spine, 25, 3186–3191.
Blair, J., & Conrad, F. G. (2011). Sample size for cognitive interview pretesting. Public Opinion Quarterly, 75, 636–658.
PASW statistics, version 18, Chicago, IL.
Clopper, C., & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26, 404–413.
StatXact version 4. Cytel software, Cambridge, MA.
Perneger, T. V., Kossovsky, M. P., Cathieni, F., di Florio, V., & Burnand, B. (2003). A randomized trial of four patient satisfaction questionnaires. Medical Care, 41, 1343–1352.
Cleopas, A., Kolly, V., & Perneger, T. V. (2006). Longer response scales improved the acceptability and performance of the Nottingham Health Profile. Journal of Clinical Epidemiology, 59, 1183–1190.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Perneger, T.V., Courvoisier, D.S., Hudelson, P.M. et al. Sample size for pre-tests of questionnaires. Qual Life Res 24, 147–151 (2015). https://doi.org/10.1007/s11136-014-0752-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-014-0752-2