JDR Woodhead Publishing
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Katz, B.P.
Right arrow Articles by Huntington, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Katz, B.P.
Right arrow Articles by Huntington, E.
J Dent Res 83(Spec Iss C):C109-C112, 2004
© 2004 International and American Associations for Dental Research


PROCEEDINGS
Clinical

Statistical Issues for Combining Multiple Caries Diagnostics for Demonstrating Caries Efficacy

B.P. Katz1,*, and E. Huntington2

1 Division of Biostatistics, Indiana University School of Medicine, 1050 Wishard Blvd, RG 4101, Indianapolis, IN 46260, USA; and
2 Unilever Research, Port Sunlight, Wirral, UK;

* corresponding author, bkatz{at}iupui.edu


   ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS & METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Caries efficacy in clinical trials has been based primarily on visual examinations supplemented by Fiber Optic Transillumination (FOTI) and radiography, with the assessments combined at the surface level to classify each surface as to its caries status. Newer caries diagnostics techniques measure the caries process in a quantitative manner and so thus yield continuous rather than ordinal results. The objective of this study was to examine various methods for the analysis of multiple outcomes in clinical trials and to compare their usefulness for the analysis of caries trials. Four global tests (rank sum, ordinary least squares, general least squares, and generalized estimating equations) and two caries indices (based on average and maximum values of the methods) were evaluated with the use of one-year follow-up data from 1063 children in a recent caries trial. A new hybrid method was also developed and evaluated. All of the methods performed well when the diagnostic measures showed product differences in caries in the same direction. Ease of use, interpretability, and distributional assumptions must be considered before a consensus method for analysis of multiple diagnostic measures in caries trials can be determined.

KEY WORDS: statistical analysis • caries diagnostics • clinical trials


   INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS & METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Caries efficacy in clinical trials has been based primarily on visual examinations. The most common outcome measure for caries trials is the number of surfaces that are sound or unerupted at baseline but exhibit caries at follow-up. This caries increment has traditionally been analyzed by analysis of covariance (ANCOVA) (Grainger et al., 1984). Covariates commonly included in the analysis are age, gender, and caries history. These variables are sometimes used as stratification factors in the design of the trial (Kingman, 1984). Alternative statistical analysis methods for caries trials have also been proposed. These include efforts to include the ordinal nature of the clinical exam (Fleiss, 1984) and the analysis of the caries increment as a count. The latter analyses use modeling procedures with a Poisson error structure (Hujoel et al., 1994; Bohning et al., 1999). Most recently, Caplan et al.(1999) proposed the incidence density method, an epidemiological approach which is based on the number of surfaces at risk over time.

Some caries trials have incorporated radiographic assessment in addition to visual exam into the calculation of the caries increment. Decisions are made at the surface level, where a surface is considered carious if either measure shows evidence of a lesion. Although not generally thought of as a score based on multiple outcomes, this combined increment is a caries index that uses the maximum value of the available methods at each surface. The additional information provided by x-rays or other imaging methods presented few statistical challenges. One of the few formal analyses of multiple outcomes in caries trials was the use of multivariate analysis of covariance to analyze data from four caries trials (Geary et al., 1992). However, measures of dental health other than caries—including plaque, gingivitis, and calculus—were of interest in this analysis.

The caries process is not a step function where surfaces or teeth transition instantly from sound to cavitation. Nor can visual exams or radiographic methods with several categories instead of two adequately describe the process. Caries is a more gradual disease process, with demineralization and remineralization occurring over time. Caries lesions occur when demineralization is the dominant process. As a better reflection of the biology, the new caries diagnostic tools attempt to measure the caries process on a continuum in a quantitative manner. Thus, they yield continuous rather than ordinal or dichotomous results for each surface. In addition, these methods lead to unbalanced data, since they are usually available for only a subset of the surfaces of interest. One possibility for incorporating these data into the traditional analysis of covariance method for caries trials would be to dichotomize the continuous measures and then classify the surface as caries if any of the measures (including visual exam) was above the threshold. However, establishing useful cut-off values for these new measures is a difficult process and results in a large amount of lost information. The use of the continuous data from these new methods holds the most promise for increasing the efficiency of future caries trials.

There has been a variety of proposed methods for analyzing clinical trials with multiple outcomes, although not specifically for caries trials. Pocock (1997) looked at many of the proposed methods and discussed some of the associated practical and statistical issues. Although there are several possible classifications, we have chosen to characterize the methods broadly into four main themes: (1) Define each of the diagnostic measures as a primary or secondary outcome; (2) perform tests for each measure but formally control the type one error rate; (3) combine the data from the multiple outcomes into a single global test; and (4) construct a combined endpoint or index based on all of the methods.

The first approach depends on a pre-specified decision rule to determine the "success" of the trial. If there is only a single primary outcome, then the success of the trial depends solely on that measure. Other pre-specified decision rules based on combinations of primary and secondary outcomes are also possible (Chi, 1998). The second method is to adjust the p-values for the individual tests (e.g., Bonferroni) or to control the type I error rate in some other way (Cook and Farewell, 1996; Zhang et al., 1997). We will not consider either of these approaches further, since they can easily yield different conclusions for each outcome, leading to uncertain interpretation of the results and/or conflict between the sponsor and the regulatory agencies.

O’Brien (1984) presented three global tests for multiple outcomes and showed that, for situations where group differences would be expected to be in the same direction for all measures, they had superior power to the traditional multivariate analyses. A brief description of the rank-sum, ordinary least-squares (OLS), and general least-squares (GLS) procedures follows.

For the rank-sum test, the value for each subject is ranked for each outcome, and then the ranks are summed across outcomes within subject. For large sample sizes, this sum can then be compared among groups by standard parametric analyses (e.g., ANCOVA). The OLS and GLS procedures require that the outcomes be transformed to a common scale. For continuous measures, this is usually accomplished by converting each value to a Z-score. The OLS method is equivalent to a repeated-measures analysis of variance, where the outcomes are the repeated factor. This yields equal weights for all of the outcomes. The GLS procedure is similar, except that the outcomes do not receive equal weights. The weights are computed from the inverse of the sample covariance matrix. The OLS and GLS tests are equivalent when the outcomes are equally correlated.

More recently, several other global tests have been proposed. One such method proposes treating the outcome measures as correlated data and then using generalized estimating equations (GEE) to perform a global analysis (Liang and Zeger, 1986; Lefkopoulou and Ryan, 1993). In addition, an approximate likelihood ratio test (Tang et al., 1989) and a multivariate linear mixed-model approach (Sammel et al., 1999) have also been developed. A simple method suggested by Wittes is to use the maximum value of the multiple outcome measures as the value for each subject and then perform a standard analysis (Follman, 1995). One thing that all of these methods have in common, with the notable exception of the rank-sum test, is that they require that the outcomes have the same scale. Since this is rarely the case, each outcome is transformed to a common scale. The mix of ordinal and continuous data in caries trials makes this an important issue.

The construction of a caries index could possibly be accomplished with the use of multivariate statistical methods such as factor analysis. However, we have chosen to develop candidate indices by extending current practice and combining data from different methods at the surface level. In the remainder of this paper, we will illustrate several of the global tests and construct some candidate indices using data from a recently completed caries trial.


   MATERIALS & METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS & METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Trial Design
Investigators assigned 2141 schoolchildren aged 13, living in Lithuania, to treatment group A or B, using a stratified randomization procedure. The 12 strata were defined by gender, Decayed, Missing & Filled Surfaces (DMFS) status (3 categories), and caries risk on 2nd molars (2 categories). The trial was planned for a two-year follow-up period, but only 12-month data on half of the subjects (N = 1063) were made available for this paper and to other contributors to the ICW CCT workshop. The purpose of the trial was to assess the sensitivity of various diagnostic techniques and combinations of techniques. Some of these techniques are still experimental prototypes, and those data are proprietary, as is the full database. The purpose of the trial was not to compare the two treatments, since, in previous randomized clinical trials, Product A has already been shown to be more effective than product B in preventing caries. At baseline and 12 mos, each subject was assessed with a complete visual clinical exam (Chesters et al., 2002), Fiber Optic Transillumination (FOTI), radiography (x-ray), and DIAGNOdent (DD) for posterior approximal surfaces and occlusal surfaces, and an Electrical Caries Monitor (ECM) for posterior occlusal surfaces. For the purposes of further analysis, the visual exam, x-ray, and FOTI were combined into a single outcome, as might be done in a traditional caries trial analysis. The combination (CFX) resulted in a seven-category ordinal scale (Table 1Go).


View this table:
[in this window]
[in a new window]
 
Table 1. Surface Scoring System Based on Visual Exam, FOTI, and X-ray
 
Global Tests
Four global tests were implemented for the data from this trial—rank sum, OLS, GLS, and GEE—with an unstructured working correlation matrix. These were chosen primarily for their ease of implementation. As noted above, the OLS, GLS, and GEE require that the outcomes be transformed to the same scale. For DD and ECM, this was done by transforming the observed value at each surface into a z-score (observed value minus baseline mean, all divided by the baseline standard deviation). For the CFX, observations in each of the 7 categories were converted to a standard normal scale by transforming the median percentile within each category according to the normal distribution (i.e., the 50 percentile transforms to 0, and the 97.5 percentile converts to 1.96). For all three measures, the data were then averaged over surfaces to yield the value for each individual.

Caries Indices
Surface data for each measure were transformed as described above. Surface scores were then calculated by two methods: the average (AVE) and the maximum (MAX) of the available data. In each case, the index for each subject was the mean of all of the surface scores.

Hybrid Method
We also developed analyses that created 4 partial mouth indices per person and then analyzed these with global tests. Two groupings of teeth and surfaces were used. The first grouping was based on caries susceptibility and used the following groups of teeth: (1) anterior teeth, (2) premolars, (3) first molars, and (4) second molars. The second grouping was based on the availability of the diagnostic methods and used the following groups: (1) anterior teeth (visual exam only), (2) posterior smooth surfaces (visual exam only), (3) posterior occlusal surfaces (all methods), and (4) posterior approximal surfaces (all methods except ECM). When more than one diagnostic method was available within a grouping, an index was created based on the maximum or the average of the transformed data. Thus, 16 hybrid analyses were performed (2 groupings, 2 index calculation methods, and 4 global tests).


   RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS & METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Univariate Analyses
Prior to the implementation of any of the proposed methods, the data available for CFX, ECM, and DD were analyzed separately. For CFX, 2 cut points were used to define caries and calculate the increment. These were: (1) any evidence of at least a non-cavitated (CFX1) lesion; or (2) evidence of a cavitated lesion (CFX3). ANCOVA models were then fit, including the design strata and age. For ECM and DD at 12 mos, the baseline value was also included as a covariate. Only CFX1 was statistically significant (Table 2Go). Neither ECM nor DD showed any meaningful difference between the products, and ECM actually had slightly better mean values for the less-effective treatment. Further examination of these data showed that both methods had problems with reproducibility and also had different mean values for different teeth locations with sound surfaces. Evaluation of the usefulness of these diagnostic methods is beyond the scope of this paper, so for illustration of the statistical methods, a second dataset was created that altered the 12-month data for product B so that ECM and DD showed trends in the correct direction but did not reach statistical significance. Specifically, 12-month Product B ECM values at each surface were increased by 5%. Similarly, DD values were increased by 1%. The analysis of these augmented data values is presented on the right side of Table 2Go.


View this table:
[in this window]
[in a new window]
 
Table 2. Analysis Results Using Global Tests and Caries Indices
 
Global Tests
The transformations for the three outcomes to a standard normal scale were reasonably successful. For example, the means and standard deviations of the average scores across surfaces for the baseline exam are as follows: CFX 0.078 (0.16), DD 0.003 (0.38), and ECM 0.003 (0.45). All of the distributions were generally bell-shaped but slightly skewed, with CFX showing the greatest skewness. It should be noted that the subject values are the means across surfaces, so that the standard deviation among individuals would be expected to be less than 1.0, and CFX would have the smallest standard deviation, since it is based on more surfaces. The analyses were implemented on the differences between baseline and follow-up for each transformed measure, and age and randomization strata were included as covariates. Difference scores were chosen to adjust each technique directly for its baseline value, rather than including all three baseline values as covariates in the analysis. When the original data were used (Table 2Go, left side), only the GLS showed a significant difference between the products. It is not surprising that the difference seen with CFX was diluted by the other diagnostic methods. For GLS, due to its lower variance and the correlation structure, CFX had a much higher weight (over 20 times more), and so the resulting test was essentially based on one measure. For the augmented data, each of the four methods showed a statistically significant difference and yielded lower p-values and greater effect sizes than the individual measures.

Caries Indices
The two caries indices, MAX and AVE, were both approximately bell-shaped and slightly skewed. Each was analyzed by ANCOVA with baseline value, age, and design strata as covariates. For the original data, MAX showed significant differences between the products, but AVE did not. This is essentially because MAX used the CFX value when caries was detected and did not average them with the diagnostics that did not show a difference. For the augmented data, both indices showed a statistically significant difference and yielded smaller p-values and greater effect sizes than the individual diagnostic methods.

Hybrid Methods
For the original data, the MAX method of combining the measures was nearly significant for both groupings and for all the global tests (Table 3Go). For the augmented data, a significant product difference was achieved in all 16 combinations. The two methods for grouping surfaces showed similar results. However, from a statistical standpoint, the method based on the availability of the diagnostic methods deals with the unbalanced data issues in a more logical manner.


View this table:
[in this window]
[in a new window]
 
Table 3. Results with the Use of Hybrid Methods
 

   DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS & METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
All of the methods presented in this paper perform well when the diagnostic measures all reflect the same direction. In the data from the caries trial, this was not the case, and the augmented data are a better illustration of the analysis methods. The properties of these diagnostic measures are not the focus of this paper, but none of the currently available new measurement methods has yet been sufficiently validated to be widely accepted in modern caries clinical trials. In addition, we made a series of decisions concerning transformation of the diagnostic results and the construction of our indices. Although these decisions were unlikely to make much difference in our results, there would need to be agreement on these issues if a "standard analysis" for caries trials is to be developed.

Simulation studies comparing some of the global tests have shown that the GLS, which is the most flexible, tends to perform at least as well in most situations and better in some (O’Brien, 1984). The rank-sum test has the major advantage that no transformations are needed, and descriptive statistics can be presented on the original scales. However, as with most non-parametric analyses, if the assumptions of a parametric approach can be met, the rank-sum test will be less powerful.

The caries indices may have great appeal, since they are essentially an extension of current practice. This is particularly true of the MAX index. One drawback of the new indices is that the resulting scale has no biological interpretation and is dependent on the baseline distribution of each measure. Perhaps in the future, a standard transformation could be used across trials. This would result in comparable numbers. Finally, the hybrid method performed as well as the others and exhibited more homogeneous results across the analysis methods. This may indicate that it is more robust, but it is difficult to draw conclusions from a single trial. The hybrid method does allow the investigator to examine different areas of the mouth but is the most computationally unwieldy. Still, it can handle the unbalanced data in the fairest way.

The major goal of adding new diagnostic tests to a caries trial is to increase our ability to detect differences among treatments. Analysis of the augmented data clearly shows that all of these methods are able to increase the power of a clinical caries trial if the diagnostic methods are an accurate and precise measure of the caries process.


   FOOTNOTES
 
Presented at the International Consensus Workshop on Caries Clinical Trials, Glasgow, Scotland, January 7–10, 2002


   REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS & METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Bohning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U (1999). The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Statist Soc A 162:195–209.

Caplan DJ, Slade GD, Biesbrock AR, Bartizek RD, McClanahan SF, Beck JD (1999). A comparison of increment and incidence density analyses in evaluating the anticaries effects of two dentrifices. Caries Res 33:16–22.[ISI][Medline]

Chesters RK, Pitts NB, Matuliene G, Kvedariene A, Huntington E, Bendinskaite R, et al. (2002). An abbreviated caries clinical trial design validated over 24 months. J Dent Res 81:637–640.[Abstract/Free Full Text]

Chi GYH (1998). Multiple testings: multiple comparisons and multiple endpoints. Drug Information J 32:1347S–1362S.

Cook RJ, Farewell VT (1996). Multiplicity considerations in the design and analysis of clinical trials. J R Statist Soc A 159:93–110.

Fleiss JL (1984). Assessing treatment effects in caries clinical trials using ordered categorical data. J Dent Res 63(Spec Iss):778–782.

Follmann D (1995). Multivariate tests for multiple endpoints in clinical trials. Statist Med 14:1163–1175.

Geary DN, Huntington E, Gilbert RJ (1992). Analysis of multivariate data from four dental clinical trials. J R Statist Soc A 155:77–89.

Grainger DJ, Lehnhoff RW, Bollmer BW, Zacherl WA (1984). Analysis of covariance in dental caries clinical trials. J Dent Res 63(Spec Iss):766–772.

Hujoel PP, Isokangas PJ, Tiekso J, Davis S, Lamont RJ, DeRouen TA, et al. (1994). A re-analysis of caries rates in a preventive trial using Poisson regression models. J Dent Res 73:573–579.[Abstract/Free Full Text]

Kingman A (1984). Stratification methods in caries clinical trials. J Dent Res 63(Spec Iss):773–777.

Lefkopoulou M, Ryan L (1993). Global tests for multiple binary outcomes. Biometrics 49:975–988.[ISI][Medline]

Liang KY, Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73:13–22.[Abstract/Free Full Text]

O’Brien PC (1984). Procedures for comparing samples with multiple endpoints. Biometrics 40:1079–1087.[ISI][Medline]

Pocock SJ (1997). Clinical trials with multiple outcomes: a statistical perspective on their design, analysis, and interpretation. Controlled Clin Trials 18:530–545.[ISI][Medline]

Sammel M, Lin X, Ryan L (1999). Multivariate linear mixed models for multiple outcomes. Statist Med 18:2479–2492.

Tang DI, Gnecco C, Geller NL (1989). An approximate likelihood ratio test for a normal mean vector with nonnegative components with application to clinical trials. Biometrika 76:577–583.[Abstract/Free Full Text]

Zhang J, Quan H, Ng J, Stepanavage ME (1997). Some statistical methods for multiple endpoints in clinical trials. Controlled Clin Trials 18:204–221.[ISI][Medline]




This article has been cited by other articles:


Home page
J. Dent. Res.Home page
J.W. Stamm
The Classic Caries Clinical Trial: Constraints and Opportunities
J. Dent. Res., July 1, 2004; 83(suppl_1): C6 - C14.
[Full Text] [PDF]


Home page
J. Dent. Res.Home page
P.B. Imrey and A. Kingman
Analysis of Clinical Trials Involving Non-cavitated Caries Lesions
J. Dent. Res., July 1, 2004; 83(suppl_1): C103 - C108.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Katz, B.P.
Right arrow Articles by Huntington, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Katz, B.P.
Right arrow Articles by Huntington, E.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
IADR Journals Advances in Dental Research ®
Journal of Dental Research ® Critical Reviews (1990-2004)