|
|
||||||||
RESEARCH REPORT |
1 Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 307 East 63rd Street, 3rd floor, New York, NY 10021;
2 Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th Street, New York, NY 10032; and
3 School of Dental and Oral Surgery, Columbia University, 630 West 168th Street, New York, NY 10032;
*corresponding author, panageak{at}mskcc.org
| ABSTRACT |
|---|
|
|
|---|
KEY WORDS: intra-cluster correlation Mantel-Haenszel methods site-specific periodontal data
| INTRODUCTION |
|---|
|
|
|---|
In this paper, we focus on analyzing data that can be summarized in a series of 2x2 tables. In such an analysis, we are concerned with detecting an association between a binary response measurement and a binary exposure, stratified across levels of a third, categorical covariate. For correlated data (such as site-specific periodontal data), modified statistical methods have been proposed by several authors. Donald and Donner (1987), Donner and Banting (1988), Rao and Scott (1992), Zhang and Boos (1997), and Liang (1985) have developed simple corrections to standard Mantel-Haenszel methods for multiple 2x2 tables that account for within-subject correlation. These approaches are limited, however, because they assume that the exposure and covariate information collected does not vary from site to site. It is not uncommon in periodontal studies for covariate information to change from site to site (e.g., measurements such as bleeding on probing, attachment loss, and probing depth). In contrast, regression-type approaches such as the generalized estimating equation (GEE) method (Liang and Zeger, 1986) or random-effects approaches (Breslow and Clayton, 1993; Wolfinger and OConnell, 1993; Lee and Nelder, 1996) can handle site-specific covariates. While more flexible, these complex methods are harder to explain to a general audience, and may be difficult to apply in practice due to computational problems.
The purpose of this paper is to review two recently published methods that correct the well-known Mantel-Haenzsel statistic and the variance of the Mantel-Haenszel common odds ratio for the effect of correlation within the patient. Begg (1999) and Begg and Panageas (1999) proposed closed-form adjustments to these familiar statistics that correct for intra-subject correlation, whether the covariates are subject-specific or site-specific. In other words, these methods can be applied when dependencies exist across tables as well as within tables. In the following sections, we introduce a motivating example, briefly review the standard Mantel-Haenszel methods of analysis for uncorrelated data, and describe the corrected techniques for site-specific data. Throughout, we illustrate the application of these methods using data from a clinical trial of scaling and root planing on immunologic response in patients with periodontal disease.
Motivating Example
A clinical trial was conducted to study the effects of scaling and root planing (SRP) on probing depth and the levels of interleukin-1ß (IL-1ß) in gingival crevicular fluid (GCF) (Engebretson et al., 2002). GCF samples and clinical measurements were recorded on up to 8 sites per patient in each of 29 patients at baseline (pre-treatment) and 24 weeks later (post-treatment).
To demonstrate the statistical methods described in this paper, we used these data to study a possible association between IL-1ß and probing depth (PD), both measured at 24 weeks. Although PD and level of IL-1ß were originally recorded as continuous measures, both are commonly dichotomized for easier evaluation. Periodontal pockets were dichotomized as shallow (PD < 5 mm) vs. deep (PD > = 5 mm); and IL-1ß levels were dichotomized as low (< 56 pg/µL) vs. high (> = 56 pg/µL). Table 1
is a 2x2 table cross-classifying these 24-week variables.
|
Mantel-Haenszel Methods for Analyzing Multiple 2x2 Tables
To describe the strength of the association between probing depth and IL-1ß level, it is common to report the odds ratio (OR). The OR is defined as the ratio of the odds of disease among exposed relative to the odds of disease among unexposed, and can be estimated by ad/bc. An odds ratio of 1 indicates equal odds of disease regardless of exposure status (or no association between exposure and disease). Odds ratios greater than 1 indicate higher odds (risk) of disease among exposed sites vs. unexposed sites, while odds ratios less than 1 indicate decreased risk. For example, the estimated odds ratio for Table 1
is 1.80; this means that a site that is deep at week 24 is about 1.8 times more likely than a shallow site to have high IL-1ß at week 24.
However, this comparison might be distorted by baseline probing depth. In epidemiologic terms, we say that baseline probing depth may be a confounding variable, since it is related to the 24-week levels of both probing depth and IL-1ß. This means that baseline probing depth may conceal or exaggerate the true association between probing depth and IL-1ß at 24 weeks. Confounding is often a problem in observational (non-randomized) studies. One strategy for addressing the problem of confounding is to re-analyze the data, stratifying by levels of the suspected confounder. In the example above, we might stratify the exposure/response information by levels of the third factor (baseline probing depth), to reduce potential bias due to confounding. This results in a series of 2x2 contingency tables, referred to as strata.
Applying this strategy to our example, we obtain two contingency tables (Table 2
). The cell frequencies, row and column totals, and table total are now subscripted with the numbers 1 and 2 to refer to the first stratum (for which all pockets were shallow at baseline) and the second stratum (for which all pockets were deep at baseline), respectively. Note that a patient may contribute site-specific information to more than one of the tables.
|
Typically, we do not wish to report separate analyses by stratum. Rather, we want to combine the information across strata while still adjusting for confounding effects of the third covariate, to report a concise summary of our findings on the relationship between exposure and response. Ordinarily, a standard analysis would include the methods of Mantel and Haenszel (1959):
Formulas for all three procedures are given below.
The Mantel-Haenszel test statistic is defined as:
![]() |
where s = 1, 2, ... k denotes the stratum (as defined by level of the third variable). For the above illustration, there are two levels of the stratifying covariate (shallow and deep baseline probing depth, PD0); thus, k = 2; s = 1 refers to shallow PD0, and s = 2 refers to deep PD0. We can apply this formula to compute the Mantel-Haenszel chi-squared statistic, and obtain a p-value by comparing the value of this statistic with a chi-squared distribution with 1 degree of freedom. For the data from Tables 2A and 2B![]()
, we calculate a Mantel-Haenszel chi-squared value of 4.30 and a p-value of 0.038. Thus, based on this analysis, we conclude that there is a statistically significant association between IL-1ß and probing depth after treatment, controlling for the baseline probing depth. However, a required assumption for this statistic is that each site comes from a different subject; hence, it is not valid for the data analyzed here.
The Mantel-Haenszel common odds ratio estimator is:
![]() |
This estimator can be viewed as a weighted average of the stratum-specific odds ratios. Recall that the stratum-specific odds ratios for our example are 2.06 for shallow sites at baseline and 2.06 for deep sites. The Mantel-Haenszel common odds ratio also equals 2.06, indicating a two-fold increase in the odds of high IL-1ß, given deep pockets at 24 weeks, controlling for baseline probing depth.
To generate a 95% confidence interval for the common OR, we rely on the assumption of normality. Because the natural logarithm of the OR estimator follows a normal distribution more closely than does ORMH itself, we first compute a 95% confidence interval for the log ORMH:
![]() |
A confidence interval for the OR itself then follows by taking the exponent of the endpoints of the interval above.
To apply this formula, we can use the estimated variance given by Hauck (1979):
![]() |
where ws = bscs/ts and vs = as-1 + bs-1 +ds-1. For our data, the ORMH equals 2.06, with 95% confidence interval extending from 1.04 to 4.08.
Corrected Mantel-Haenszel Methods for Site-specific Data
The standard Mantel-Haenszel methods are appropriate when each observation comes from a different subject. With site-specific data, however, we have multiple observations per subject which tend to be correlated, so that conclusions and inferences from the standard analysis may be biased. This bias tends to worsen as either the level of correlation between sites increases or the number of sites per patient increases. Note that the Mantel-Haenszel common OR estimator remains valid regardless of cluster correlation. The test statistic and variance for the confidence interval, however, may be distorted. In practical terms, this means that the type I error rate (i.e., the rate at which we reject the null hypothesis when it is true) can be higher or lower than expected. Likewise, the coverage probability of the 95% confidence interval (i.e., the rate at which the confidence interval covers the "true" OR) may be higher or lower than 95%.
To address this problem, Begg (1999) and Begg and Panageas (1999) derived correction factors, f1 and f2, that can be applied to the Mantel-Haenszel test statistic and Haucks variance estimate, respectively. (Please refer to the Appendix for details and formulas [www.dentalresearch.org].) These correction factors are derived from the GEE technique (Liang and Zeger, 1986). The corrected Mantel-Haenszel statistic (denoted by an additional subscript C) is defined as:
![]() |
using correction factor f1 as defined in Begg (1999). The distribution of this statistic is approximately chi-squared on 1 degree of freedom (Rotnitzky and Jewell, 1990). The corrected version of Haucks variance estimate (also denoted by subscript C) is obtained by applying a similar correction term, f2 (defined in Begg and Panageas, 1999):
![]() |
A valid 95% confidence interval for the true common odds ratio can then obtained by computing:
![]() |
and then taking the exponent of the endpoints.
While both correction factors have closed form, computing f1 and f2 by hand is not practical, due to the complexity of the formulas involved. This calculation is best done by computer. Software for computing f1 (published by Begg and Paykin, 2001) can be obtained by clicking on the appropriate link on the following Web site: http://www.columbia.edu/~mdb3/.
Corrected Methods Applied to the Example
We re-analyzed the data from our example using the corrected Mantel-Haenszel methods. Recall that the uncorrected Mantel-Haenszel test statistic equaled 4.30, with a p-value of 0.038. The correction factor, f1, is calculated to be 2.04. This means that the corrected Mantel-Haenszel test statistic is equal to 2.11, with a p-value of 0.144. Once we appropriately correct for the correlation within subject, our results change dramatically from statistically significant to non-significant, as indicated by the change in p-value from 0.038 to 0.144.
Our previous analysis showed that the ORMH is equal to 2.06, with uncorrected 95% confidence interval (1.04, 4.08). Correction factor f2 is approximately equal to 2.03, leading to a corrected confidence interval of (0.78, 5.47). Based on the corrected analysis, we see that the standard techniques would lead us to overstated significance levels, inaccurate confidence intervals, and incorrect conclusions regarding the true relationship between IL-1ß and PD.
| DISCUSSION |
|---|
|
|
|---|
In a 1985 paper, Laster showed that, in general, correction for correlation when analyzing a subject-specific variable causes p-values to become larger (i.e., less significant), while correction when analyzing a site-specific variable causes p-values to become smaller (i.e., more significant). Correspondingly, the adjustment factors presented in this paper, f1 and f2, can be greater or less than one.
The Mantel-Haenszel methods, as originally proposed, are appropriate for the analysis of multiple 2x2 tables when responses are independent. Corrected methods must be used for analysis of 2x2 tables under cluster sampling. As stated earlier, several authors have proposed corrected Mantel-Haenszel statistics for clustered data. Most of these methods are appropriate only when the primary and secondary covariates are subject-specific. The methods reviewed in this paper have the advantage of being applicable when the covariates are either subject-specific or site-specific. In practice, this implies that our methods are valid whether the data from a single patient appear in one table or across many tables.
It is worth noting that, in our analysis, we used Haucks formula for the variance of the Mantel-Haenszel odds ratio. This approach assumes a large number of observations per table. An alternative formula, proposed by Robins et al.(1986), does not require this assumption but is more complicated in form. Our correction factor, f2, may be applied to either variance formula.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Received October 28, 2002; Last revision April 1, 2003; Accepted April 4, 2003
| REFERENCES |
|---|
|
|
|---|
Begg MD, Panageas KS (1999). Interval estimation of the common odds ratio from k(2x2) tables under cluster sampling. Stat Med 18:10871100.[ISI][Medline]
Begg MD, Paykin AB (2001). Performance of and software for a modified Mantel-Haenszel statistic for correlated data. J Statist Comput Simul 70:175195.
Breslow ND, Clayton DG (1993). Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:925.[ISI]
DeRouen TA, Mancl L, Hujoel P (1991). Measurements of associations in periodontal diseases using statistical methods for dependent data. J Periodontal Res 26:218229.[ISI][Medline]
Donald A, Donner A (1987). Adjustments to the Mantel-Haenszel chi-square statistic and odds ratio variance estimator when the data are clustered. Stat Med 6:491499.[ISI][Medline]
Donner A, Banting D (1988). Analysis of site-specific data in dental studies. J Dent Res 67:13921395.
Engebretson SP, Grbic JT, Singer R, Lamster IB (2002). GCF IL-1ß profiles in periodontal disease. J Clin Periodontol 29:4853.[ISI][Medline]
Fleiss JL, Park MA, Chilton NW (1987). Within-mouth correlations and reliabilities for probing depth and attachment level. J Periodontol 58:460463.[ISI][Medline]
Hauck WW (1979). The large-sample variance of the Mantel-Haenszel estimator of a common odds ratio. Biometrics 41:5568.
Imrey PB (1986). Considerations in the statistical analyses of clinical trials in periodontitis. J Clin Periodontol 13:517532.[ISI][Medline]
Laster LL (1985). The effect of subsampling sites within patients. J Periodontal Res 20:9196.[ISI][Medline]
Lee Y, Nelder JA (1996). Hierarchical generalized linear models (with discussion). J R Stat Soc B 58:619678.
Liang KY (1985). Odds ratio inference with dependent data. Biometrika 72:678682.
Liang KY, Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73:1322.
Mantel N, Haenszel W (1959). Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719748.
Rao JN, Scott AJ (1992). A simple method for the analysis of clustered binary data. Biometrics 48:577585.[ISI][Medline]
Robins J, Breslow N, Greenland S (1986). Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. Biometrics 42:311323.[ISI][Medline]
Rotnitzky A, Jewell NP (1990). Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data. Biometrika 77:485497.
Wolfinger R, OConnell M (1993). Generalized linear mixed models: a pseudolikelihood approach. J Statist Comput Simul 48:233243.
Zhang J, Boos DD (1997). Mantel-Haenszel test statistics for correlated binary data. Biometrics 53:11851198.[ISI][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| IADR Journals | Advances in Dental Research ® |
| Journal of Dental Research ® | Critical Reviews (1990-2004) |