Improving Data Analysis in Political Science

Edward R. Tufte

doi:10.2307/2009670

Improving Data Analysis in Political Science

Published online by Cambridge University Press: 18 July 2011

Edward R. Tufte

Article contents

Extract
References

Get access

Extract

Students of politics use statistical and quantitative techniques to: summarize a large body of numbers into a small collection of typical values;

confirm (and perhaps sanctify) the results of the analysis by using tests of statistical significance that help protect against sampling and measurement error;

discover what's going on in their data and expose some new relationships; and

inform their audience what's going on in the data.

Type: Research Notes
Information: World Politics , Volume 21 , Issue 4 , July 1969 , pp. 641 - 654

DOI: https://doi.org/10.2307/2009670 [Opens in a new window]
Copyright: Copyright © Trustees of Princeton University 1969

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

¹ For similar categories, see Tukey, John W. and Wilk, M. B., “Data Analysis: Techniques and Approaches,” Proceedings of the Symposium on Information Processing in Sight Sensory Systems (Pasadena, California Institute of Technology, November 1965Google Scholar). This paper is also reprinted in Edward Tufte, R., ed., The Quantitative Analysis of Social Problems (Reading, Mass. 1969Google Scholar).

² An important exception is Wallis, W. Allen and Roberts, Harry V., Statistics: A New Approach (Glencoe, 111. 1956Google Scholar).

³ J. David Singer, ed. (New York 1968).

⁴ See Kaplan, Abraham, The Conduct of Inquiry (San Francisco 1964Google Scholar), chap. 1.

⁵ Kish, Leslie, “Some Statistical Problems in Research Design,” American Sociological Review, 24 (June 1959CrossRef Google Scholar), 336. Another good discussion of significance tests is Kruskal, William H., “Tests of Significance,” International Encyclopedia of the Social Sciences (New York 1968), vol. 14, 238Google Scholar–50.

⁶ Edwards, Ward, Lindman, Harold, and Savage, Leonard J., “Bayesian Statistical Inference for Psychological Research,” Psychological Review, 70 (May 1963), 217CrossRef Google Scholar.

⁷ Kish, 336.

⁸ Mosteller, Frederick and Hammel, E. A., book review, Journal of the American Statistical Association, 58 (September 1963Google Scholar), 836.

⁹ For a recent statement of Stevens, S. S., see his “Measurement, Statistics, and the Schemapiric View,” Science, 161 (August 30, 1968), 849Google Scholar–56.

¹⁰ For a discussion of the problem of “being arbitrary,” see Nunnally, J. C., Psychometric Theory (New York 1967Google Scholar), chap. 1.

¹¹ One common practice is to convert the numerical values of variables into ordered ranks before computing measures of association. Such a transformation, presumably made because it somehow seems statistically more conservative (it is not), may throw away useful information in the data and also sometimes discourage efforts at multi-variate analysis. One other alternative is to employ some of the nonmetric multivariate methods.

¹² The discussion here is necessarily rather brief. For more information, see Abelson, Robert P. and Tukey, John W., “Efficient Conversion of Non-Metric Information into Metric Information,” Proceedings of the Social Statistics Section of the American Statistical Association (Washington 1959), 226Google Scholar–30 (also in Tufte); , Abelson and , Tukey, “Efficient Utilization of Nonnumerical Information in Quantitative Analysis: General Theory and the Case of Simple Order,” Annals of Mathematical Statistics, 34 (December 1963), 1347CrossRef Google Scholar–69; and Shepard, Roger N., “Metric Structures in Ordinal Data,” Journal of Mathematical Psychology, 3 (1966), 287–315CrossRef Google Scholar.

¹³ Tukey, John, “Causation, Regression, and Path Analysis,” in Kempthorne, Oscar and others, eds., Statistics and Mathematics in Biology (Ames, Iowa 1954), 38Google Scholar.

¹⁴ See Forbes, Hugh Donald and Tufte, Edward R., “A Note of Caution in Causal Modelling,” American Political Science Review, LXII (December 1968), 1258Google Scholar–64, and further discussion at 1269–71.

¹⁵ See Wallis and Roberts, 546–56, on the hazards of ratios. The problem was discussed by Pearson, Karl, “Mathematical Contributions to the Theory of Evolution—On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs,” Proceedings of the Royal Society of London, LX (1897), 489Google Scholar–98. See also Kuh, Edwin and Meyer, John R., “Correlation and Regression Estimates When the Data are Ratios,” Econometrica, 23 (October 1955), 400CrossRef Google Scholar–16; and Briggs, F.E.A., “The Influence of Errors on the Correlation of Ratios,” Econometrica, 30 (January 1962), 162CrossRef Google Scholar–77.

¹⁶ Tukey, 35–66. Blalock, Hubert M. Jr., makes a similar argument in his “Causal Inferences, Closed Populations, and Measures of Association,” American Political Science Review, LXI (March 1967), 130CrossRef Google Scholar–36. Blalock's application of the argument to the Miller-Stokes data, however, is a most inappropriate example. For some useful applications and contrasts between standardized and unstandardized regression coefficients, see Alker, Hayward R. Jr. and Russett, Bruce, “Multifactor Explanations of Social Change,” in Russett and others, World Handbook and Political and Social Indicators (New Haven 1964), 311Google Scholar–21.

¹⁷ Alker's, Hayward“The Long Road to International Relations Theory: Problems of Statistical Nonadditivity,” World Politics, xviii (July 1966Google Scholar), at 646–47, has a useful discussion of this point.

¹⁸ There are a number of other areas in which current packaged programs are deficient for the needs of social scientists. Tw o examples here serve to show that we must be careful even though the result came out of the computer. Longley, in a test of commonly used regression programs, found many inaccuracies in the output—including even the wrong sign attached to some coefficients! In this analysis of difficult but real test data (with highly collinear variables), several well-known programs proved accurate to only one or two digits in their estimates of regression coefficients. See Longley, James W., “An Appraisal of Least Squares Programs from the Point of View of the User,” Journal of the American Statistical Association, 62 (September 1962), 819Google Scholar–41. Second, many cross-tabulation programs have contributed to the frequent misuse of the chi-square test in the analysis of contingency tables. The test is not appropriate for ordered metrics. Of course, it is not entirely the fault of programs when their users dutifully report whatever the printout says.

¹⁹ Tukey and Wilk, 12.

²⁰ For converting nonlinear models into linear fit problems, see the useful book by Draper, N. R. and Smith, H., Applied Regression Analysis (New York 1966Google Scholar), chap. 5. The best place to learn about transformations is in the informative and straightforward essay by Kruskal, Joseph B., “Transformations of Data,” International Encyclopedia of the Social Sciences (New York 1968), vol. 16, 182Google Scholar–53.

²¹ See , Alker and , Russett, 311–13; also J. Johnston, Econometric Methods (New York 1963). 44–52Google Scholar.

²² See J. B. Kruskal and the references cited there. Another useful discussion is Car. Hovland, I., Lumsdaine, Arthur A., and Sheffield, Fred D., “A Baseline for Measurement of Percentage Change,” in Lazarsfeld, Paul and Rosenberg, Morris, eds., The Language of Social Research (Glencoe, 111. 1955), 77–82Google Scholar.

²³ Johnston, J., 201–07; Blalock, Hubert M. Jr., “Correlated Independent Variables: The Problem of Multicollinearity,” Social Forces, 62 (December 1963), 233Google Scholar–37; and Farrar, Donald E. and Glauber, Robert R., “Multicollinearity in Regression Analysis: The Problem Revisited,” Review of Economics and Statistics, 49 (February 1967), 92–107CrossRef Google Scholar.

²⁴ See Putnam, Robert D., “Toward Explaining Military Intervention in Latin American Politics,” World Politics, xx (October 1967), 94–95Google Scholar. The finding that economic development is positively correlated with military intervention after the effect of social mobilization is removed is unfortunately not testable because of the high instability of the partial correlation due to multicollinearity. Another example of the problem is discussed in Forbes and Tufte, 1262–64.

²⁵ Johnston, 207. See Farrar and Glauber for discussion of some modest palliatives.

Article contents

Improving Data Analysis in Political Science

Extract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests