|
|
||||||||
PROCEEDINGS |
1 Department of Biostatistics & Epidemiology, Cleveland Clinic Foundation/Wb4, 9500 Euclid Avenue, Cleveland, OH 44195, USA; and
2 Biostatistics Core, Division of Population and Health Promotion Sciences, National Institute of Dental and Craniofacial Research, National Institutes of Health, 45 Center Drive, Room 4As-25U, Bethesda, MD 20892-6401;
* corresponding author, pimrey{at}bio.ri.ccf.org
| ABSTRACT |
|---|
|
|
|---|
KEY WORDS: dental caries clinical trials data analysis ordinal categorical data diagnostic modalities
| INTRODUCTION |
|---|
|
|
|---|
Indeed, a consensus is emerging that the future of dental practice in developed countries lies in the early recognition of incipient caries and in therapeutic intervention to prevent initial cavitation. A recent US NIH Consensus Statement (Bowersox, 2001) noted that "Digitally acquired and postprocessed images have great potential in the detection of noncavitated caries and in the diagnosis of secondary caries. Promising new diagnostic techniques are emerging, including fiber-optic transillumination and light and laser fluorescence.... At this time the panel senses a paradigm shift in the management of dental caries toward improved diagnosis of early noncavitated lesions and treatment for prevention and arrest of such lesions." (For relevant reviews, see, e.g., Featherstone and Fried, 2001; Stookey and Gonzalez-Cabezas, 2001.) However, no index based entirely on binary classifications of surfaces as non-cavitated or frankly cavitated can capture the intermediate information on progression and regression of lesions that is becoming available from new diagnostic technologies and that should offer, at least in principle, the most sensitive indication of therapeutic benefit from agents meant to halt and/or reverse early lesions. For this purpose, we require measurement scales that distinguish among different levels of progression toward cavitation or remineralization of an individual surface, and analytic techniques that can efficiently use the information that is accumulated when such ordinal, interval, or ratio-scaled data are observed longitudinally on multiple surfaces.
Techniques for analyses of continuous (interval or ratio scale) data dominate a century of statistical literature and most statistics texts. There is also a very large body of literature on methods for ordinal data, but the core of this literature is as yet somewhat less well-known and perhaps more subject to misconception. This paper will briefly consider several topics in the analysis of correlated non-binary observations. We will emphasize but not entirely confine ourselves to consideration of ordinal categorizations, such as may arise from a combination of visual and tactile observations or from grouped continuous observations. For simplicity, it will be assumed that a single primary outcome variable has been specified initially, that the central focus is on progression/regression of a gradual process rather than avoidance of a "concluding" event such as cavitation, and that interest is in a single terminal measurement of the primary outcome, or in change between baseline and terminal measurements. Generalizations to repeated follow-up observations are not difficult. We discuss some issues relevant to analyses of caries data in such situations, emphasizing "semiparametric" methods that are relatively light on statistical assumptions, and illustrating using data also examined by Katz and Huntington (2004).
| SCORES AND MODELS FOR ORDINAL CATEGORIZATIONS |
|---|
|
|
|---|
But the grounds for concern are easily misconstrued, and the criticism is often inappropriate. Choice of scores affects the relative sensitivity (statistical power) of resulting tests of treatment differences in different situations, but does not affect the fundamental validity of these tests. Scores should be selected in advance of the data, to be consistent with the clinical severity of the lesions they describe or the progress of the underlying pathologic process. This ensures that disparities in aggregated scores will reasonably represent disparities in disease burden, and hence be readily interpreted and accepted by the scientific and clinical communities. Within the realm of clinically meaningful possibilities, scores should also be selected to yield high power against the types of treatment differences suggested by the modes and speeds of action of the test agents. Since power is typically insensitive to moderate variations in the spacing of scores, precise spacing is rarely an issue (Wainer, 1976). Where the criteria of biological relevance and statistical power are irreconcilable, even after possible restriction of the subject population, then the grounds for interest in the new treatment are suspect. (Note that in clinical trials of efficacy and superiority, economic and scientific interests coincide such that all parties share an interest in using scores of high power. In equivalence trials, where this is not necessarily the case, controversy may be avoided by importing outcomes and associated scores from efficacy and superiority studies.) Finally, as shall be seen below, analyses of ordinal variables that avoid use of pre-specified scores often incorporate implicit forms of scoring that may also be controversial.
Once scores have been selected, several analytic choices remain, including:
Although technical issues of implementation differ, these are essentially the same choices to make in analysis of continuous outcome or other variables, such as subject level averages of discrete surface or tooth scores, for which the use of continuous approximating distributions is appropriate.
Distributional Shift Models
When explicit category scores are unavailable, models for category probabilities that incorporate ordinal information may be used instead. However, it should be acknowledged that these models incorporate ordinality by constraining relationships among probabilities in ways that may, in the substantive context, be no less arbitrary than explicit category scoring. We assume now, for the sake of simple illustration, a parallel two-arm clinical trial with a J-category ordinal subject-level outcome, and let
ij be the probability that a random subject who receives treatment i ends therapy in outcome category j, for i = 1,2 and j = 1,...,J. We consider two commonly used ordinal models, each of which is readily extended to dealing with multiple sites within a subject in the presence of covariates.
Equal adjacent-odds ratios
Define the odds of being in category j vs. category j' for the ith treatment arm as the ratio oi;j,j' =
ij/
ij', for i = 1,2 and j,j' = 1,...,J; the "adjacent-category odds" as the oi;j,j1 for i = 1,2 and j = 2,...,J; and the "adjacent-category odds ratios" as the respective quotients of these odds in the second and first treatment arms, ORj,j1 = o2;j,j1/o1;j,j1. The "equal adjacent-odds ratio" model, also known as the "uniform" ordinal association model (Agresti, 1996), stipulates equality of the ORj,j1(=
) for all j = 2,...,Jin other words, that the treatment 2 odds between adjacent categories are all the same multiple of the odds between these categories under treatment 1. A consequence is that the odds of being in category j relative to category 1 under treatment 2 are an increasing power
j1 of the same odds for treatment 1. On the log scale, we have log ORj,j' = log o2;j,j' - log o1;j,j' = (j j') log
. More generally, the log odds ratios between pairs of categories, which jointly constitute one way of summarizing the relationship of outcome to treatment, are just multiples of the numbers of categories separating the pair in the ordering.
The equal adjacent-odds ratio model may be regarded as a simplification of a model known alternatively as the multicategory, polychotomous, or generalized logit model (Agresti, 1996; Stokes et al., 2000). Under this model, the log ORj,j1 = log
j may differ for each adjacent category pair. Mathematically, only the restriction that all
j =
differentiates the equal adjacent-odds model from the generalized logit model. However, the difference is more profound than this suggests, because the generalized logit model without this restriction entirely neglects any information in the ordinality of the response categories. The restriction itself may be thought of as having two components. The first, an ordinality requirement that all log
j have equal sign (so that either
j < 1 for every j or
j > 1 for every j), ensures that the treatment effect either consistently increases (
j > 1) or consistently decreases (
j < 1) ratios of all odds, with numerator following denominator in the category ordering, shifting the entire distribution toward one end of the scale or the other. The second, and more restrictive, component is the additional scaling requirement that this distributional shift be uniform in magnitude, in the sense that all
j be equal.
When the equal adjacent-odds ratio model adequately fits the data, the parameter
represents, in a single number, the manner in which the distribution of ordinal outcomes shifts either upward or downward for treatment 2 in comparison with the distribution in treatment 1. Values of
close to 1 represent little or no shift, while
> 1 indicates a shift toward higher outcomes with treatment 2 than with treatment 1 and, conversely, for
< 1. Even if this model is only approximately true, a test of H0 :
= 1 or, equivalently, of H0 : log
= 0, may be a powerful test of treatment differences, because an estimate
of
may concentrate most of the treatments effect in a single parameter.
In this connection, it is relevant to note that the
is mathematically determined by the observed mean of equally spaced scores for the ordinal categories, and that alternative models of similar mathematical form are defined by any set of category scores, no matter what their spacing. The scores in these models impose a relative spacing on the odds between different categories rather than reflecting, as in "Differences in Mean Scores" above, a presumably inherent spacing between the clinical severity of the categories. It is not clear that one type of assumption is more or less arbitrary than the other.
Proportional odds
The "proportional odds" model (Agresti, 1996; Stokes et al., 2000) is an alternative approach to representing distributional shift with a single parameter. Assuming that outcome categories are numbered in order of increasing severity, define the "cumulative odds" of being no higher than category j under treatment i as
![]() |
for j = 1,...,J 1. Then the proportional odds model requires that the ratios of cumulative odds for treatment 2 relative to treatment 1,
, be identical for each of j = 1,...,J 1. Remarks similar to those above regarding
apply to the common value
of these J 1 cumulative odds ratios. Under the proportional odds model,
is an effective one-parameter representation of a distributional shift, having the same role as
under the equal adjacent-odds ratio model.
There is, of course, no prima facie biological reason why either the equal adjacent-odds ratio or the proportional odds model should ever be a true representation of nature. As the statistician George Box famously said, "All models are wrong; some models are useful." However, if the ordered categories are presumed to reflect intervals of values of a continuous latent random variable, the two models have similar mathematical rationales that may in some circumstances be attractive. Specifically, the proportional odds model applies exactly if the outcome category for an individual observation is determined by the location, with respect to a sequence of cut-points, of a linear function of covariate values and a random observation from a logistic distribution (a symmetric continuous probability law similar to the standard normal distribution, but with lower center and heavier tails; Agresti, 2002). The cut-points, as well as the resulting probabilities and cumulative odds, may be estimated from data. Substitution of a normal distribution for the logistic in this setting implies that the uniform association model will fit approximately, provided the number of cut-points is not too large (Goodman, 1981). Since the normal and logistic distributions are themselves similar, for many datasets both models will fit adequately and give similar results with respect to estimation of a treatment effect. This is not guaranteed, though. When neither model fits, particularly when the data do not conform to a general upward or downward shift across the range of the response variable distribution, then the models may differ considerably in both numerical results and substantive implications.
The assumptions of each of these models and of other such alternatives may be evaluated by examining the odds ratios or cumulative odds ratios observed in a given dataset. Thus, the choice of how to represent ordinality in analysis becomes arguable on empirical statistical grounds, based on properties of the data that are quite separable from and neutral regarding the existence or absence of a treatment effect, in the spirit of Berry (1987) and other data analysts. However, the incorporation of ordinality in this fashion discards a clear linkage of spacing to underlying biology. Thus, objectivity and statistical simplicity of interpretation may be gained at the possible expense of clinical interpretability.
The models above are simple representatives meant to illustrate a substantial class of ordinal data models. [See Becker (1998) and Agresti (1984) for more thorough surveys.]
| BASELINE ORDINAL COVARIATE DATA |
|---|
|
|
|---|
For the purposes of establishing a treatment effect while minimizing potentially controversial assumptions, a general method of non-parametric covariance analysis for randomized studies may be useful. The approach assumes that the basic model under which treatment effect is to be estimated, exclusive of covariate adjustment, can be implemented by applying generalized least-squares (GLS) to a vector of functions of the observations, and that consistent estimates of the covariances between these functions and the within-treatment means or category proportions of covariates are available. The adjustment is then accomplished by appending to the GLS-based model for treatment effect a model that equates the expectations of the covariate means or proportions in the two treatment arms, and estimating the treatment parameter(s) under this joint model. The expected covariate means or proportions indeed must satisfy this model: They are equal by virtue of the randomization process through which subjects are allocated to treatments.
Consider, for example, a two-group trial with a univariate response yij by the jth subject in the ith treatment arm (i = 1,2;j = 1,...,ni), where a difference
2 -
1 between the mean responses is of primary interest, and where a k-vector of covariates x'ij = (xij1,...,xijk) is observed corresponding to yij. Then the model may be written as


with the
, i = 1,2 being estimated by the conventional empirical covariance matrices

based on sums of cross-products across subjects. Partitioning VF = V1 + V2 =

the covariate-adjusted estimate of treatment effect is then obtained as
= (
2 -
1) - V'yxV1xx(
2 -
1) with asymptotic estimated variance v
= V'yy - V'yxV1xxVyx. The ratio of the squared covariate-adjusted effect to this variance,
2/v
, then gives a Wald chi-square statistic with df = 1 that may be used to test statistical significance of the treatment difference after adjustment for the covariates. [See Koch et al.(1998) for more detailed discussion, in a broader context, of this adjustment technique.]
| EXAMPLE |
|---|
|
|
|---|
For consistency between analyses, we analyzed only those tooth surfaces for which full data were available at both examinations.
Table 1
gives percentages of sites in each category, by treatment. The distributions differ little, with the exception of a 1.3% excess of surfaces with signs of caries under treatment B. Eighty-four percent (1.1 of 1.3%) of this excess consists of white spots or vague FOTI shadows (category 2). Thus, these data do not suggest a treatment effect beyond a very early stage of demineralization. This observation is also reflected in the observed adjacent category and cumulative odds ratios, shown in Table 2
. In each case, the odds ratio formed using the boundary between first and second categories substantially exceeds all other odds ratios of the same type, which fluctuate around 1, the adjacent-odds ratios spanning a much wider range than the ratios of cumulative odds. Hence, neither distributional shift model describes these data well. Note also that in situations where treatment impact, as here, is confined to a limited range of the baseline data distribution, analyses distinguishing between ordinal categories outside that range may contaminate the restricted treatment effect with noise from data in regions where the treatment is ineffective.
|
|
Table 3
displays the results of simple analyses based upon these scores, either at the one-year examination unadjusted for baseline, or adjusted for baseline status by subtraction. Mean scores were obtained for each subject at both baseline and one-year examinations, and these means were analyzed at the subject level, so that the estimated treatment effects represent differences between the across-subject means, of the within-subject averages across sites, for Treatment B vs. Treatment A. For instance, the unadjusted results for s1 are compatible with a one-category increase for roughly one in 71 tooth surfaces among patients under Treatment B, or a one-category-increased change from baseline for one in roughly 83 tooth surfaces in such patients, beyond increases seen under Treatment A. Note that approximately two-fifths of the disadvantage of Treatment B seen in Table 1
was present at baseline. Baseline-adjusted differences between treatments are small, ranging from 0.004 to 0.046 on a 0 to 6 scale, depending upon scoring scheme. Similar results could have been obtained by analysis at the surface level with appropriate adjustment for within-subject correlationfor instance, using the svyreg command of the StataTM statistical analysis package.
|
The improvement in precision obtained by adjustment for baseline is apparent in comparison of the two rows of the table, but substantively the results of the unadjusted and adjusted analyses are similar. Subject-level non-parametric covariance analysis with equally spaced scores yields an estimated effect of 0.13 with p = 0.16. Analyses at the site-specific level, adjusting for clustering using elementary sample survey-based methods (Binder, 1983), should yield results very similar to those above.
In contrast, we now consider purely ordinal models in which no explicit category scores are used, but in which simplifying constraints are placed on probabilities through assumptions about ratios of adjacent generalized or cumulative odds. Fitting the equal adjacent-odds ratio model to the status of individual surfaces at the one-year examination, with sample survey-based adjustment for correlation among sites within the mouth, estimates the odds of a surface falling in the next highest category for subjects under Treatment B as 1.005 times the corresponding odds for surfaces in subjects under Treatment A. This effect does not attain statistical significance (Table 4
). Incorporation of strata into the model, and adjustment for baseline surface status either by additional stratification or by assuming equal spacing among baseline categories, does not substantially change this result. The analogous proportional odds models, however, do find a statistically significant treatment difference after adjustment for strata, with or without adjustment for surface status at baseline, as does the generalized logit model after adjustment for both baseline risk stratum and baseline surface score (Table 4
). The equal adjacent-odds ratio and generalized logit models were fit, after reformulation as loglinear models, using the StataTM svypois command with the desmat prefix; the proportional hazards models were fit using StataTM svyologit.
|
The wide spread of adjacent category odds ratios noted earlier in Table 2
is virtually replicated by the fitted odds from the generalized logit model after adjustment for both strata and baseline status, confirming the poor fit of the equal adjacent-odds ratio model (data not shown). The adjusted cumulative odds remain less variable. Even when distributional shift does occur, one would anticipate that adjacent category odds models would be more vulnerable than proportional odds models to instability due to data sparseness such as found here in categories 46, because each adjacent category odds omits much of the data. Hence, such models may be less useful than alternatives when there is great imbalance in the distribution of observations across categories.
| HEAVY-TAILED CONTINUOUS DATA |
|---|
|
|
|---|
It is commonly assumed that continuous measurements are more informative than classifications, and that continuous data are used most efficiently when their actual values are incorporated into descriptive summaries and hypothesis tests. This assumption is usually but not always correct. One exception is when continuous measurements are made with a great deal of error but related classifications can be made far more reliably. In that case, the apparent precision of continuous data is spurious, and the extra variability introduced by measurement error may be more damaging than would reduction to a cruder but more reliable categorical representation. A second exception is when data arise from a distribution with very heavy tails relative to the Gaussian, such as the double exponential or Cauchy distributions. We generally think of heavy-tailed distributions as caused by outliers, and the Cauchy distributionwhich has tails so heavy that the mean and variance of the distribution do not exist, and most conventional properties of sample statistics do not holdis often thought of as a theoretical oddity. But the Cauchy distribution is the probability law that governs the ratio of two standard Gaussian random variables, and thus its occurrence in practice is hardly inconceivable. For that matter, measurements from a technology might take this or a similar form, unbeknownst to the consumer, if a random Gaussian denominator were used in a normalization process.
As an example of somewhat heavy-tailed data in practice, the Fig.
displays a quantile-quantile (QQ) plot of one-year ECM measurements from tooth 47 of the Lithuanian children. These ECM data are quite heavy-tailed, and not because of outliers. Normally distributed data would be expected to more or less track the solid diagonal line, while Cauchy data would be even more vertical at both upper and lower ends. The reason this matters is that parametric analyses are not robust to extreme heaviness in the tails, and conventional statistical intuition may fail dramatically in such circumstances, with major reductions in statistical efficiency relative to non-parametric methods. The most remarkable example is that sample means from the Cauchy distribution provide no more information about the center of the distribution than the first observation, no matter how large a sample is drawn!
|
There is no guarantee that continuous measurements using new diagnostic methodologies will closely follow a Gaussian or any other particularly convenient statistical model. Indeed, biological outliers may signal rapid demineralization, and substantial measurement errors are more common with new than with established diagnostic systems. Consequently, careful attention should be paid to distributional shapes, and non-parametric approaches or other robust approaches to analysis may be of particular importance in the interpretation of clinical trial outcomes with the use of new diagnostic tools.
| SUMMARY COMMENTS |
|---|
|
|
|---|
Caution is necessary in the design and analysis of studies of any treatment-outcome combination for which the measurable effect of the treatment during the observational time frame may vary substantially depending upon the baseline disease measurement, as apparently in the example described above. Such variation of the treatment effect with initial disease status renders results of statistical models based on general distributional shifts less relevant to the biological situation and more difficult to interpret.
Continuous measurements need not be normally distributed, and standard parametric analyses of data with very heavy tails can miss effects that are readily found with more robust approaches. Consequently, there is no guarantee that use of a site-level ordinal or continuous demineralization-based outcome measure will produce a more efficient clinical trial than would classic dichotomization by cavitation.
However, caveats notwithstanding, close attention to (i) measurement reliability, (ii) the nature of treatment differences that may reasonably be anticipated based on underlying caries biology and mechanisms of action, and (iii) the anticipated statistical properties of selected ordinal or continuous outcome measures should produce clinical trials that can be shorter and smaller than previously, because they use new diagnostic modalities to increase the harvest of information about treatment activity from each site and subject studied.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
Agresti A (1996). An introduction to categorical data analysis. New York: Wiley.
Agresti A (2002). Categorical data analysis. 2nd ed. New York: Wiley.
Becker M (1998). Ordered categorical data. In: Encyclopedia of biostatistics. Vol. 4: Med-Pre. Armitage P, Colton T, editors. New York: Wiley, pp. 31863195.
Berry DA (1987). Logarithmic transformations in ANOVA. Biometrics 43:439456.[ISI][Medline]
Binder DA (1983). On the variance of asymptotically normal estimators from complex surverys. Internat Statist Rev 51:279292.
Bowersox J (2001). National Institutes of Health consensus development conference statement: diagnosis and management of dental caries throughout life, March 2628, 2001. J Am Dent Assoc 132:11531161.
Cohen ME (2001). Analysis of ordinal dental data: evaluation of conflicting recommendations. J Dent Res 80:309313.
Featherstone JDB, Fried D (2001). Fundamental interactions of lasers with dental hard tissues. Med Laser Appl 16:181194.
Goodman LA (1981). Association models and the bivariate normal for contingency tables with ordered categories. Biometrika 68:347355.
Heeren T, DAgostino R (1987). Robustness of the two independent samples t-test when applied to ordinal scaled data. Stat Med 6:7990.[ISI][Medline]
Katz BP, Huntington E (2004). Statistical issues for combining multiple caries diagnostics for demonstrating caries efficacy. J Dent Res 83(Spec Iss C):C109C113.
Koch GG, Tangen CM, Jung JW, Amara IA (1998). Issues for covariance analysis of dichotomous and ordered categorical data from randomized clinical trials and nonparametric strategies for addressing them. Stat Med 17:18631892.[ISI][Medline]
Stokes ME, Davis CS, Koch GG. (2000). Categorical data analysis using the SAS® system. 2nd ed. Cary, NC: SAS Institute, Inc.
Stookey GK, Gonzalez-Cabezas C (2001). Emerging methods of caries diagnosis. J Dent Educ 65:10011006.[Abstract]
Wainer H (1976). Estimating coefficients in linear models: it dont make no nevermind. Psycholog Bull 83:213217.
This article has been cited by other articles:
![]() |
N.B. Pitts and J.W. Stamm Preface J. Dent. Res., July 1, 2004; 83(suppl_1): C4 - C5. [Full Text] [PDF] |
||||
![]() |
J.W. Stamm The Classic Caries Clinical Trial: Constraints and Opportunities J. Dent. Res., July 1, 2004; 83(suppl_1): C6 - C14. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| IADR Journals | Advances in Dental Research ® |
| Journal of Dental Research ® | Critical Reviews (1990-2004) |