Sample Code to Pool Multiple Cycles of CHIS Data

print share

Sample Code to Pool Multiple Cycles

The following SAS/SUDAAN and Stata codes show how to combine CHIS data files and to create weights accounting for the multi-year files.

SAS/SUDAAN:

In addition to the sample code below, we also provide a SAS macro for users interested in analyzing CHIS data in SAS. This SAS macro will do all the work of weight adjustments automatically, and also generate other necessary information needed in later analysis. You can download this macro here: CHIS Pooling Macro


SAS:
/*THIS STEP CALCULATES ESTIMATES FOR EACH YEAR SEPARATELY*/
ODS OUTPUT ONEWAY = EST_2019;
PROC SURVEYFREQ DATA = CHIS2019;
	WEIGHT RAKEDW0;
	REPWEIGHT RAKEDW1-RAKEDW80/JKCOEFS=1;a
	TABLE AH1;
RUN;

ODS OUTPUT ONEWAY = EST_2018;
PROC SURVEYFREQ DATA = CHIS2018;
	WEIGHT RAKEDW0;
	REPWEIGHT RAKEDW1-RAKEDW80/JKCOEFS=1;a
	TABLE AH1/CL;
RUN;

/*ADD A POSTFIX INDICATING THE YEAR OF EACH ESTIMATES*/
DATA EST_2019;
	SET EST_2019;
	RENAME Frequency=Frequency_2019 WgtFreq=WgtFreq_2019 Percent=Percent_2019 
StdErr=StdErr_2019;
RUN;

DATA EST_2018;
	SET EST_2018;
	RENAME Frequency=Frequency_2018 WgtFreq=WgtFreq_2018 Percent=Percent_2018
StdErr=StdErr_2018;
RUN;

/*MERGE TWO DATASETS AND CALCULATE THE FINAL POOLED ESTIMATES*/
PROC SORT DATA = EST_2019;
	BY AH1;
RUN;

PROC SORT DATA = EST_2018;
	BY AH1;
RUN;

DATA EST_COMBINED;
	MERGE EST_2018 EST_2019;
	BY AH1;
	/*RAW FREQUENCY*/
	Frequency_Pooled = SUM(of Frequency_:)/2;
	/*WEIGHTED FREQUENCY*/
	WgtFreq_Pooled = SUM(of WgtFreq_:)/2;
	/*PERCENT*/
	Percent_Pooled = SUM(of Percent:)/2;
	/*STANDARD ERROR*/
	StdErr_Pooled = SQRT(SUM(StdErr_2019**2 + StdErr_2018**2))/2;
	/*CV*/
	CV = StdErr_Pooled/Percent_Pooled;
	/*UPPER 95% CONFIDENCE INTERVAL*/
	UpperCL_Pooled = MIN(1, Percent_Pooled + 1.96*StdErr_Pooled);
	/*LOWER 95% CONFIDENCE INTERVAL*/
	LowerCL_Pooled = MAX(0, Percent_Pooled - 1.96*StdErr_Pooled);
RUN;

a Jackknife coefficients are necessary for accurate variance calculaDons, and jackknife coefficients of 1 in SAS will produce equal variance calculaDons as those produced in SUDAAN. However, for SAS V.9.2(TS1M0) and earlier, a value of 1 will not be accepted; as a subsDtute, 0.9999 can be entered. Without this specificaDon, the default value of the jackknife coefficients will be [(# replicate weights - 1)/# replicate weights]; for CHIS, this would be [(80 - 1)/80] = 0.9875.

Stata:


/*OUTPUT ESTIMATES FROM EACH YEAR SEPARATELY*/
use "chis_2019.dta", replace
svyset [pw=RAKEDW0], jkrw(RAKEDW1-RAKEDW80, multiplier(1) reset) vce(jack ) mse
svy: tab INS, percent se ci
/*Weighted Percentage*/
scalar pct1_2019 = e(b)[1,1]
scalar pct2_2019 = e(b)[1,2]

/*Standard Error*/
scalar se1_2019 = sqrt(e(V)[1,1]) * 100
scalar se2_2019 = sqrt(e(V)[2,2]) * 100

/*Weighted Frequency*/
scalar wgtfreq1_2019 = pct1_2019 * e(N_pop)
scalar wgtfreq2_2019 = pct2_2019 * e(N_pop)

/*Raw Frequency*/
scalar freq1_2019 = e(Obs)[1,1]
scalar freq2_2019 = e(Obs)[2,1]
use "chis_2018.dta", replace
svyset [pw=RAKEDW0], jkrw(RAKEDW1-RAKEDW80, multiplier(1)) vce(jack) mse
svy: tab INS, percent se ci
scalar pct1_2018 = e(b)[1,1]
scalar pct2_2018 = e(b)[1,2]
scalar se1_2018 = sqrt(e(V)[1,1]) * 100
scalar se2_2018 = sqrt(e(V)[2,2]) * 100
scalar wgtfreq1_2018 = pct1_2018 * e(N_pop)
scalar wgtfreq2_2018 = pct2_2018 * e(N_pop)
scalar freq1_2018 = e(Obs)[1,1]
scalar freq2_2018 = e(Obs)[2,1]

/*Pooled Percentage*/
scalar pct1_pool = (pct1_2019 + pct1_2018)/2
scalar pct2_pool = (pct2_2019 + pct2_2018)/2
/*Pooled Standard Error*/
scalar se1_pool = sqrt(se1_2019^2 + se2_2018^2)/2
scalar se2_pool = sqrt(se2_2019^2 + se2_2018^2)/2

/*Pooled Weighted Frequency*/
scalar wgtfreq1 = (wgtfreq1_2019 + wgtfreq1_2018)/2
scalar wgtfreq2 = (wgtfreq2_2019 + wgtfreq2_2018)/2

/*Pooled Raw Frequency*/
scalar freq1_pool = freq1_2019 + freq1_2018
scalar freq2_pool = freq2_2019 + freq2_2018

/*Pooled CV*/
scalar cv1_pool = se1_pool/pct1_pool
scalar cv2_pool = se2_pool/pct2_pool

/*Pooled 95% Upper and Lower Percentage*/
scalar ul1_pool = min(1,pct1_pool + 1.96 * se1_pool)
scalar ul2_pool = min(1,pct2_pool + 1.96 * se2_pool)
scalar ll1_pool = max(pct1_pool - 1.96 * se1_pool, 0)
scalar ll2_pool = max(pct2_pool - 1.96 * se2_pool, 0)

The above SAS and Stata code illustrate how to extract stored esDmates from SAS procedures and Stata commands and perform data operaDons based on the saved data.

For SAS, ODS OUTPUT statement saves the outputs from SAS procedures to a working dataset. ODS OUTPUT ONEWAY is used for one-way table, and ODS OUTPUT CROSSTABS is used for two-way crosstabs.

In this example, results are saved in the EST_COMBINED data, users may choose to use PROC PRINT to display the results in ODS window, or use PROC EXPORT to output them to an Excel file.

For Stata, returned Stata results are saved to Stata scalars and can be accessed later. Aeer generaDng results from both years, users may choose to either use DISPLAY command to outputs these scalars in results window or use PUTEXCEL to save the results to an Excel file.

General Formula for Pooling

The above code example shows how to pool esDmates from 2 CHIS year.The general pooling should follow formula: