Analysis of complex survey data using SAS

doi:10.1016/S0169-2607(00)00088-2

Computer Methods and Programs in Biomedicine

Volume 64, Issue 1, January 2001, Pages 65-69

https://doi.org/10.1016/S0169-2607(00)00088-2 Get rights and content

Abstract

Commonly used statistical methods and software packages typically assume that observations are independent and identically distributed and fail to account for complex sampling designs when present. I suggest an approach to analyzing complex survey data in SAS, using weighted generalized estimating equations. Limited Monte Carlo simulations support the method. An example demonstrates application of the method and compares results to those from software commonly used in the analysis of complex survey data.

Introduction

Data analyses that ignore elements of a complex sampling design, such as clustering and unequal sample weights, may lead to dubious point estimates and measures of dispersion [1], [2]. Such misleading results are a result of the observations in complex survey data being dependent and unequally weighted. Commonly used statistical programs, such as SAS, assume that observations are independent and identically distributed (iid). Using a set of limited and extreme Monte Carlo simulations, I demonstrate that analysis using weighted generalized estimating equations (GEE), [3] available in SAS Version 6.12 or later, provides correct point estimates and variances. In addition, I demonstrate the method by example using the Duke Established Populations for Epidemiologic Study of the Elderly (EPESE) cohort.

Section snippets

Model

Consider the standard logistic regression model, logit Pr[Y=1]=β₀+β₁x, where the logit of the response variable, log[Y/(1−Y)], is linearly associated with the function, β₀+β₁x, of the intercept and exposure, respectively. This model is fit using the SAS code: $PROC GENMOD; MODEL Y=X/D=B;$ This invokes the generalized linear model procedure, models Y as the outcome, X as the exposure, and specifies a binary distribution. The default link is the canonical, or logit for a binary distribution, hence a

Monte Carlo simulation

I generated 1000 samples of 500 clustered pairs for each of eight scenarios (true odds ratio 1 or 2, sample weight effect present or absent, intra-cluster correlation 0 or 1), providing a limited and extreme set of hypothetical simulations. The odds ratio was set by randomly generating a binary exposure with probability of 0.5, and then randomly generating the binary outcome, conditional on the exposure when the odds ratio was 2; throughout the probability of outcome was 25%. The sample weight

Example

The Duke (N=4162) EPESE study was one of four NIA-funded studies that randomly sampled community-dwelling men and women 65 years of age or older in 1982 to identify predictors of mortality, hospitalization, and placement in long-term care facilities [4]. The sampling universe was all noninstitutionalized persons 65 years of age and older (i.e. elders) living in five counties of the north-central Piedmont area of North Carolina. The first stage of the four-stage sampling design was area sampling

Comment

In summary, I demonstrated the use of SAS for analysis of complex survey data. There are three important advantages to using SAS for analysis of complex surveys. First, there is no need to purchase and learn new software. such a SUDAAN or WesVar. Second, the procedure is relatively easy to implement. Third, in addition to accurate estimation of the odds ratio from a logistic regression model as demonstrated, Genmod fits many variations of generalized linear models. Admittedly, there are

References (5)

W.G. Cochran
Sampling Techniques
(1977)
P.S. Levy et al.
Sampling of Populations: Methods and Applications
(1991)

There are more references available in the full text version of this article.

Cited by (35)

Health Care Costs for Adults With Congenital Heart Disease in the United States 2002 to 2012
2016, American Journal of Cardiology
More adults than children with congenital heart disease (CHD) are alive today. Few studies have evaluated adult congenital heart disease (ACHD) health care utilization in the United States. Data from the National Inpatient Sample from 2002 to 2012, using International Classification of Diseases, Ninth Revision, codes for moderate and complex CHD were analyzed. Hospital discharges, total billed and reimbursed amounts, length of stay, and gender/age disparities were evaluated. There was an increase in CHD discharges (moderate CHD: 4,742 vs 6,545; severe CHD: 807 vs 1,115) and total billed and reimbursed dollar amounts across all CHD (billed: $2.7 vs $7.0 billion, 155% increase; reimbursed: $1.3 vs $2.3 billion, 99% increase) and in the ACHD subgroup (billed: $543 million vs $1.5 billion, 178% increase; reimbursed: $221 vs $433 million, 95% increase). Women comprised more discharges in 2002 but not in 2012 (men:women, 2002: 6,503 vs 7,805; 2012: 7,715 vs 7,200, p = 0.39). Gender-based billed amounts followed similar trends (2002: $263 vs $280 million; 2012: $845 vs $662 million, p = 0.006) as did reimbursements (2002: $108 vs $114 million; 2012: $243 vs $190 million, p = 0.008). All age subgroups demonstrated increased health care expenditures, including the >44 versus 18- to 44-year-old age subgroup (billed: $618 vs $347 million, p <0.001; reimbursed: $136 vs $75 million, p <0.001). Our results reveal increased ACHD billed and reimbursed amounts and hospital discharges with a shift in gender-based ACHD hospitalizations: men now account for more hospitalizations in the United States. In conclusion, increased health care expenditure in older patients with ACHD is likely to increase further as health care system use and costs continue to grow.
Effects of astragalus membranaceus root processed to different particle sizes on growth performance, antioxidant status, and serum metabolites of broiler chickens1
2013, Poultry Science
Citation Excerpt :
These blood metabolites assays were analyzed by an automatic biochemical analyzer (Hitachi 7600–020, Beijing, China). Data were statistically analyzed using 1-way ANOVA using the GLM procedure of SAS version 9.0 (Cole, 2001; SAS Institute Inc., 2002). The effect of AMP supplementation was determined by the contrast option of the GLM procedure.
The objectives of this study were to assess the effects of supplementation of Astragalus membranaceus root powder (AMP) and AMP processed to different particle sizes on growth performance, antioxidant status, and serum metabolites of broiler chickens. The experiment was conducted with one hundred twenty 1-d-old Arbor Acres broilers in 5 groups of 4 cages and for both starter (0 to 21 d) and grower (22 to 42 d) phases. The treatments were basal diet only (control) and basal diet supplemented with 5 g/kg of diet of AMP processed to particle sizes of 300, 149, 75, or 37 µm. Average daily gain, ADFI, and feed conversion rate (FCR) were determined weekly, and carcass yield, serum antioxidant enzyme activity, and metabolites were determined at 21 and 42 d of the experiment. Supplementation of AMP increased (P < 0.01) activities of total superoxide dismutase (TSOD) and glutathione peroxidase (GSHPx), but reduced (P < 0.01) concentrations of malondialdehyde (MDA) and cholesterol in the serum of chickens at 21 and 42 d. Reducing AMP particle sizes from 300 to 37 µm linearly increased (P < 0.01) TSOD and GSHPx activities at 21 and 42 d, but linearly decreased (P < 0.01) MDA at 42 d. Concentrations of total protein, albumin, and globulin in the serum were also increased (P < 0.05) or tended to be increased (P = 0.05 to 0.10) by AMP and linearly increased (P < 0.01) as the AMP particle sizes decreased. However, both treatments had no effect on ADG, ADFI, or FCR throughout the entire experiment period, although carcass yield increased (P < 0.05) at 42 d. Dietary supplementation of AMP at the concentration of 5 g/kg of diet enhanced serum antioxidant status and its efficacy linearly increased as the AMP particle size decreased from 300 to 37 µm, but had no effect on growth performance of broilers.
Fetal sex pairing and adverse perinatal outcomes in twin gestations
2013, Annals of Epidemiology
Citation Excerpt :
Infants from a female–female pairing served as the referent group. To account for the fact that infants were matched on mother/pregnancy, log-linear regression models based on the method of generalized estimating equations (GEE), with an exchangeable correlation structure, were used [12,13]. Ignoring the cluster dependency found in twin gestations may lead to erroneous inferences [14].
To assess the association between fetal sex pairing in twin pregnancies and adverse perinatal and infant outcomes.
A retrospective cohort study of 9770 infants from 4885 twin pregnancies delivered in 2007 was conducted with a statewide hospital discharge database for Texas. Log-binomial regression models based on generalized estimating equations were used to calculate relative risks (RR) and 95% confidence intervals (95% CI) for the following dichotomous outcomes: breech presentation, hospital mortality, intrauterine growth restriction (IUGR), low birth weight, prolonged length of stay (>4 days), receipt of mechanical ventilation, and respiratory distress syndrome (RDS).
The sample was composed of 4918 females and 4852 males. An approximately equal number of infants were from a female–female pregnancy (n = 3270), mixed-sex pregnancy (n = 3296), and a male–male pregnancy (n = 3204). Twins of either sex from mixed-sex pairs were 45% less likely to die in the hospital compared with females from a female–female pregnancy (RR, 0.55, 95% CI, 0.31–0.98). Males from a male–male pair were 33% less likely than females from female–female pairs to experience IUGR (RR, 0.67; 95% CI, 0.53–0.83). The incidence of RDS was significantly increased in males from male–male twin pairs versus females from female–female pairs (RR, 1.21; 95% CI, 1.05–1.41).
Male infants from male–male twin pairs were more likely to develop RDS and be placed on a ventilator but less likely to experience IUGR than female infants from female–female pairs.
A multienzyme preparation enhances the utilization of nutrients and energy from pure corn and wheat diets in broilers
2012, Journal of Applied Poultry Research
Citation Excerpt :
In equations [3] and [4], E4 is the total amount (kcal) of energy in the diet that was force fed to each bird; E5 is the total amount (kcal) of energy in excreta collected during the 48-h period after force feeding; E6 is the total amount (kcal) of energy in excreta collected during the 48-h period without being fed; and E7 is the feed intake of the diet that was force fed to each bird (DM basis) [28]. Data from all broiler assays were analyzed statistically by one-way ANOVA for a completely randomized design by using SAS software [29, 30]. The nature of the response to increasing levels of supplemental enzyme was determined by polynomial contrasts including linear models.
A metabolic experiment was conducted to investigate the effects of 2 different types of exogenous multienzyme preparations, one used to supplement a pure corn-based broiler diet (ME-C; mainly consisting of mannase, xylanase, cellulase, pectase, prolease, and glucoamylase) and the other added to a pure wheat-based broiler diet (ME-W; mainly containing xylanase, cellulase, glucanase, pectase, amylase, and glucoamylase), on the energy efficiency and nutrient utilization of broilers. Ninety 36-d-old Arbor Acres commercial male broilers were randomly allocated to 90 cages, and 72 of them were assigned to 8 treatment groups with 3 replicates of 3 broilers each. Four treatment groups were force-fed corn with different levels of ME-C (0, 50, 100, and 150 mg/kg), and the other 4 treatment groups were force-fed wheat with different levels of ME-W (0, 50, 100, and 150 mg/kg). The remaining 18 birds were taken as 1 group, with 3 replicates of 6 birds for each, and used for endogenous excreta collection. Digestibility values of DM, NDF, starch, gross energy, and ME were measured by “true” methods. The ME-C and ME-W added to pure corn- and wheat-based diets, respectively, enhanced (P < 0.05) the apparent and true digestibility values of NDF, starch, DM, and gross energy and the AME and TME values of broilers. The nutrient digestibility and energy efficiency of corn responded quadratically (P < 0.05) to increasing amounts of ME-C. However, the digestibility of nutrients in corn did not change significantly at 100 and 150 mg/kg of ME-C, indicating a saturation effect. Thus, supplementation of 100 mg/kg of ME-C is sufficient to maximize the nutrient utilization and energy conversion of corn. However, linear (P < 0.001) and quadratic responses (P < 0.01) were observed for the nutrient digestibility and ME values in wheat-based diets with increasing amounts of ME-W added.
Interspecific competitiveness affects the total biomass yield in an alfalfa and corn intercropping system
2011, Field Crops Research
Little is known about the intercropping of perennial legumes with annual cereals although intercropping system is widely applied and studied. The main objective of this study was to determine the aboveground biomass yield and interspecific competitiveness in an intercropping system of alfalfa (Medicago sativa L.) with corn (Zea mays L.). A 3-year (2007–2009) field experiment was conducted, including four intercropping patterns by alternating alfalfa and corn row ratios of 2:2, 3:2, 4:2 and 5:2. Mono-cultured corn and alfalfa were used as the control. The biomass yield of alfalfa was measured at early blooming, whereas that of corn was at physiological maturity. Competitiveness indices examined were land equivalent ratio (LER), aggressivity (A), relative crowding coefficient (K values) and competitive ratio (CR). The biomass yields of mono-cultured alfalfa and all intercropping patterns increased each year. In all years, the 5:2 intercropping pattern always displayed a biomass yield advantage based on greater LER values. Alfalfa had higher relative crowding coefficients (K values), CR, and A values than corn. The intercropping of alfalfa with corn had yield advantages compared to alfalfa or corn monoculture. The intercropping pattern of 5:2 (alfalfa:corn row ratio) was an optimal pattern in our study. Alfalfa was the superior competitor when grown with corn, and its productivity dominated the total biomass yields. Thus, intercropping of alfalfa with corn has the potential to improve performance with high land-use efficiency.
State but not District Nutrition Policies Are Associated with Less Junk Food in Vending Machines and School Stores in US Public Schools
2010, Journal of the American Dietetic Association
Policy that targets the school food environment has been advanced as one way to increase the availability of healthy food at schools and healthy food choice by students. Although both state- and district-level policy initiatives have focused on school nutrition standards, it remains to be seen whether these policies translate into healthy food practices at the school level, where student behavior will be impacted.
To examine whether state- and district-level nutrition policies addressing junk food in school vending machines and school stores were associated with less junk food in school vending machines and school stores. Junk food was defined as foods and beverages with low nutrient density that provide calories primarily through fats and added sugars.
A cross-sectional study design was used to assess self-report data collected by computer-assisted telephone interviews or self-administered mail questionnaires from state-, district-, and school-level respondents participating in the School Health Policies and Programs Study 2006. The School Health Policies and Programs Study, administered every 6 years since 1994 by the Centers for Disease Control and Prevention, is considered the largest, most comprehensive assessment of school health policies and programs in the United States.
A nationally representative sample (n=563) of public elementary, middle, and high schools was studied.
Logistic regression adjusted for school characteristics, sampling weights, and clustering was used to analyze data. Policies were assessed for strength (required, recommended, neither required nor recommended prohibiting junk food) and whether strength was similar for school vending machines and school stores.
School vending machines and school stores were more prevalent in high schools (93%) than middle (84%) and elementary (30%) schools. For state policies, elementary schools that required prohibiting junk food in school vending machines and school stores offered less junk food than elementary schools that neither required nor recommended prohibiting junk food (13% vs 37%; P=0.006). Middle schools that required prohibiting junk food in vending machines and school stores offered less junk food than middle schools that recommended prohibiting junk food (71% vs 87%; P=0.07). Similar associations were not evident for district-level polices or high schools.
Policy may be an effective tool to decrease junk food in schools, particularly in elementary and middle schools.

View all citing articles on Scopus

View full text