JDR JDR Most Cited Articles
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by D’Agostino, R.B.
Right arrow Articles by Massaro, J.M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by D’Agostino, R.B., Sr.
Right arrow Articles by Massaro, J.M.
J Dent Res 83(Spec Iss C):C18-C24, 2004
© 2004 International and American Associations for Dental Research


PROCEEDINGS
Clinical

New Developments in Medical Clinical Trials

R.B. D’Agostino, Sr.1,2,*, and J.M. Massaro1,2

1 Boston University, Mathematics and Statistics Department, Statistics and Consulting Unit, 111 Cummington St., Boston, MA 02215; and
2 Harvard Clinical Research Institute;

* corresponding author, ralph{at}bu.edu


   ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
This paper reviews several new developments and long-standing good practices for conducting clinical trials. Discussion starts with the need for clear statements of study objectives, proceeds to clarify target and sample population, and elaborates on primary vs. secondary variables with the need for alpha adjustment in the presence of multiple outcomes. Here we also review the issue of surrogate endpoints. Study design issues—including blinding, randomization, and multicenter studies—come next. Then we discuss the current trend of the replacement of placebo-controlled trials by active controlled non-inferiority trials, the increasing use of Independent Data Monitoring Committees, the prominence of analysis on Intention-to-Treat samples, and the importance of imputation of missing data. We close with a brief discussion of the unit of analysis, the role of newer statistical analysis methods, safety issues, subset analysis, and, most importantly, clinical significance.

KEY WORDS: clinical trials • study design • study conduct • study analysis


   INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
The field of medical clinical trials is broad, with many general aspects applicable to an array of drugs, biologics, and devices. Over the years, especially in the areas involving oversight and approval by regulatory agencies, there have been rapid and positive developments. The guiding principle is to "keep the study close to the product claim". This principle has an impact on all aspects of study design, conduct, analysis, and interpretation and is seen dramatically in the application to confirmatory clinical trials. For this paper, confirmatory trials are randomized controlled clinical trials where the study objective is to confirm the effectiveness of the treatment or treatments under investigation and where the outcome effectiveness variables and statistical analysis plan are pre-specified.

Table 1Go lists major issues of confirmatory clinical trials. They consist first of pre-study issues such as stating the study objectives, identifying the target and patient population, declaring the primary and secondary efficacy variables, selecting study control groups, deciding upon a study design that avoids biases, determining how subjects will be allocated to treatments to achieve and justify generalizing the study results, clarifying the types of comparisons to be made (such as to demonstrate superiority or non-inferiority over a control treatment), and selecting the sample size necessary to achieve statistical significance. Second, there are the study’s monitoring issues that arise while the study is ongoing. Next are the data analysis issues, involving both efficacy and safety analyses, that require an analysis plan that addresses the selection of the datasets for analyses, the unit of analysis, missing data, and subset analyses. Last, there is the need to interpret the study results in terms of clinical significance.


View this table:
[in this window]
[in a new window]
 
Table 1. Issues (Design/Conduct/Analysis)
 
In this paper, we review the issues and components of designing, conducting, and analyzing confirmatory clinical trials. Our emphasis is on identifying those features that have evolved within the last couple of decades and which have become accepted and considered essential to the appropriate implementation of confirmatory clinical trials. Our review summarizes developments and information from many sources. In particular, it incorporates the activities and guidelines of the International Conference on Harmonization (ICH, 1997a) E8 and E9 (general and statistical issues for clinical trials and selection of control groups) that involve an effort on the part of regulatory agencies in the United States, Europe, and Japan to achieve harmonization within the drug approval process.


   STUDY OBJECTIVES
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
The study objectives of a confirmatory randomized control trial must be stated clearly prior to the study and should be in a study protocol. They should relate directly to the efficacy variables of the study. It is best to keep them focused and simple. The objectives will ultimately determine sample sizes and the length of the study. For example, a clinical trial involving cholesterol-lowering drugs can have objectives ranging from simple lowering of total cholesterol to reducing coronary heart disease. The former requires a two- to three-month study of 300 patients, while the latter may take 10,000 subjects followed for 5 years. A caries trial with an objective of caries reduction can take three to four years, while an investigation of the progression of the caries process in terms of demineralization and remineralization may be achievable within a year.

Statements of objectives such as "the objective of the study is to compare treatments A and B" are no longer useful and need to be replaced with statements such as "the objective is to demonstrate superiority of A over B" or "the objective is to demonstrate non-inferiority of A to B".


   TARGET AND SAMPLE POPULATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
The Target Population is the population to which the clinical trial results will be generalized. The Sample Population is the population from which study subjects will be drawn. These are often not the same. Mild cases (e.g., those who rarely get headaches in analgesic trials) are often excluded, since they will show no drug effect or may not even have need for the treatment during the course of the study. For safety reasons, people with severe problems are excluded (e.g., those with extremely high and hard-to-control high blood pressure in an anti-hypertension drug trial). Further, subjects on concomitant medications are excluded, since these medications may make it impossible to evaluate the effect of the new treatment. The inclusion and exclusion criteria take us away from the target population. In addition, the investigators’ ability to obtain certain classes of subjects adds to the separation of the target and sample populations. For years, many clinical trials were devoid of females and uneducated and non-white subjects.

Recent thinking emphasizes that the study subjects should correspond to those to whom the product will be marketed. Age, gender, and race, as appropriate, should be covered. The target and sample populations should be as close to the same as possible. Cost savings and other such considerations should not dictate the science. In addition, there is sentiment that the study conditions should be realistic. For over-the-counter drugs, the Food and Drug Administration (FDA) often requires, before a drug is approved, the performance of clinical trials in settings similar to those in which the drug will actually be taken. These studies are called Actual Use studies.

The implications for caries clinical trials are intriguing. Given our experience, the most informative future caries trials—with decayed, filled, and missing surface (DMFS) increments as the efficacy variable—will be those involving supervised brushing with compliant subjects (such as young females), who are caries-prone and developing second molars. While generalizing from these studies may become an issue, we believe that, to achieve statistical significance, there is a need to restrict future caries clinical trials to high-risk groups. This may also require age restriction. To ensure generalizability to the target population, the studies will need, as suitable, representation of gender, race, and socio-economic status. Achieving the balance of designing studies with high-risk subjects and still being generalizable presents a substantial challenge.


   EFFICACY VARIABLES
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
Primary efficacy variables should be kept to a minimum. The study objectives and efficacy variables need to relate clearly and sharply to each other. One of the curses of the modern age is that data can be collected on a large number of variables. As these are identified for collection, deciding upon their importance and relevance to the study objectives is crucial. Then they need to be classified as primary and secondary. Ideally, there would be one primary efficacy variable, and it would be the variable capable of providing the most clinically relevant and convincing evidence directly related to the primary objective of the trial. Multiple primary efficacy variables, however, are common. One must carefully consider how to deal with "multiple testing" or "alpha spending" where the latter term refers to distributing the Type I or alpha error associated with testing the primary efficacy variables. Other efficacy variables are classified as secondary. These usually are variables that further illuminate the primary variables and/or supply more information on the study objectives. Quality of Life scales have become standard secondary efficacy variables in many fields.

We can view traditional three-year dental caries trials as fitting into this mold, where the primary variables were "change in decayed, missing, and filled surfaces (DMFS) after three years", "change in decayed, missing, and filled teeth (DMFT) after three years", and "change in the DFS of the interproximal teeth", and where secondary variables were changes in DMFS and DMFT after one and two years. In those studies, however, the classification into primary and secondary was never made, and, more importantly, the multiple testing issues were never addressed.

Tremendous effort has been spent on addressing the multiple testing problems associated with the primary variables. Separate testing of individual variables is one approach. The development of composite variables has proved to be very useful. These range from the combinations of end-points, such as combining fatal and non-fatal coronary events and hospitalizations in cardiovascular studies, to rating scales developed by sophisticated psychometric methods. The latter are used often, for example, in such diverse settings as arthritis and psychiatric studies. Global assessment variables are also considered as measuring an overall composite. (See D’Agostino [2000] for discussion of alpha spending and secondary variables.)

There are at least two modern approaches applicable to the traditional three-year dental caries trials. One approach considers the three primary variables separately, divides the desired overall alpha (say, 0.05) by three (that is, 0.05/3 = 0.167), and tests each primary variable at this latter level. Statistical significance on any variable makes the clinical trial successful or "positive". Product claims then relate only to the significant primary outcomes. A second modern approach would be to consider the trial successful or positive only if statistical significance at the desired overall alpha (0.05) is achieved simultaneously on all three primary outcome variables. That is, all three variables are statistically significant at the 0.05 level of significance. Product claims relate to all primary variables. In the latter approach, each variable is tested at a larger significance level (0.05 vs. 0.05/3); however, the trial is positive only if all primary efficacy variables attain statistical significance. The approach to be used must be selected before the blind of the study is broken and before the statistical analysis begins.

Another issue of modern interest concerns the allocation of the alpha error to secondary variables, especially when the primary variables are not statistically significant (D’Agostino, 2000; Koch, 2000; Moye, 2000; O’Neill, 2000). For example, in a cardiovascular disease trial, what is the appropriate interpretation when the primary outcome variable related to exercise ability is not significant at the 0.05 level, but the significance level for overall mortality, a secondary variable, is 0.001? (See Moye [1995], Fisher [1999], and Fisher and Moye [1999].) It is hard to ignore an important variable such as mortality. A prior allocation of alpha may need to be applied to major secondary endpoints.

Such a situation could happen in future caries clinical trials. For example, in a one-year study, the primary efficacy variable of DMFS increment is not statistically significant, yet a secondary efficacy variable measuring demineralization and remineralization is. Future caries clinical trials may have the latter variables as the primary variables. At a minimum, anticipation of the significance of demineralization or remineralization will be important.


   SURROGATE VARIABLES
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
A surrogate endpoint or variable in a clinical trial, as defined by Temple (1995), is a laboratory measurement or a physical sign used as a substitute for a clinically meaningful endpoint that measures directly how a patient feels, functions, or survives. It is an intermediate endpoint that is usually obtained much sooner than the desired clinical endpoint and is usually much cheaper to obtain and study (see Table 2Go). For example, for people who are hypertensive, the clinical endpoint of real interest is whether the treatment under investigation can reduce cardiovascular outcomes; a surrogate is the ability of the treatment to reduce blood pressure. Or, with an abdominal and pelvic surgery procedure to reduce and/or prevent adhesions, the ultimate goal is improved fertility, and reduction of bowel obstruction pain is a surrogate.


View this table:
[in this window]
[in a new window]
 
Table 2. Sample Size Implications with Surrogate Outcomes (Wittes et al., 1989)
 
Surrogate variables have received much attention (Prentice, 1989; Freedman et al., 1992; Fleming et al., 1994; Fleming and DeMets, 1996; Buyse and Molenberghs, 1998; D’Agostino, 2000). A surrogate endpoint is useful if it can be used in place of the desired clinical endpoint in the sense that it is a reliable predictor of the treatment’s clinical benefit. In addition, the study involving the surrogate endpoint should also have the ability to capture all the information on adverse effects associated with the treatment.

Some define an intermediate endpoint as a surrogate if it is a reliable predictor of the desired clinical endpoint. We use intermediate variable as a surrogate here in the sense of Temple (1995), given above, where it is a variable obtained sooner than the more meaningful clinical endpoint. The challenge is to establish that this intermediate variable does relate meaningfully to the desired clinical endpoint. As an example, a common proposed intermediate surrogate variable for cardiovascular disease is stenosis or blockage in the carotid artery, as measured by carotid ultrasound. The blockage comes much sooner than the cardiovascular disease. The question is whether it relates well to later development of the disease.

Surrogate variables have come under heavy criticism. First, regulatory agencies have stated that when the desired claim has the implication of an effect on a hard clinical outcome, such as the development of disease or death, then the clinical trial outcome variable or variables should be direct measurements of these. For example, if a treatment given to people with heart attacks is to reduce the development of a second heart attack, it is not acceptable to show that blood pressure and/or cholesterol is reduced. Rather, it is necessary to show that the development of a second heart attack is reduced by use of the treatment.

Second, surrogates have a history, still unfolding, of not being related to the desired clinical outcome (D’Agostino, Jr., 2000). In the classic example of the Cardiac Arrhythmia Pilot Study (CAPS) in 1986 and the Cardiac Arrhythmia Suppression Trial (CAST), a combination of encainide/flecainide did positively affect the surrogate endpoint of incidence of arrhythmias. It was reduced. However, total mortality and arrhythmic deaths increased. More recently, there was the Heart and Estrogen/Progestin Replacement Study (HERS). In this study, estrogen use in post-menopausal women with coronary disease did lower cholesterol. However, this reduction had no effect on coronary deaths or myocardial infarctions. Also, in the Antihypertensive and Lipid-Lowering Treatment to prevent Heart Attack Trial (ALLHAT) of 44,000 patients, 9067 were randomized to doxazosin and 15,268 to chlorthalidone. Blood pressure was reduced for both treatments, but the doxazosin group had significantly more congestive heart failure incidence cases. Analysis of the data suggests that there may be some beneficial effect of chlorthalidone beyond the blood pressure effect. If blood pressure reduction, a surrogate endpoint, had been the primary endpoint, this conclusion would not have been possible.

There are two major forces at play here. First, there is a desire in many quarters for trials with hard clinical endpoints such as death and heart attack as the primary efficacy variables. Second, there is great skepticism about the ability of surrogates to relate correctly to the desired hard endpoints. These considerations have led in some fields, such as cardiovascular research, to the performance of massive clinical trials containing routinely 4000, 10,000, or more subjects (see Table 2Go) and lasting a minimum of 5 years. The major features of these studies often are the randomization of the subjects, minimum collection of data, and careful follow-up, mainly to obtain data on the development of hard events. Such trials are called Large Simple Trials.

Another force involved here is the mentality to use the intention-to-treat subjects (that is, all those randomized) for the primary analysis. We say more about this later. These trials are large, long, multicenter, often multinational, and extremely expensive.

The adequacy of a surrogate endpoint to answer a question about a (different) clinically meaningful endpoint is problematic. In many settings, surrogate endpoints are being replaced by clinically meaningful endpoints.

The implication for future caries trials is important, since some recent attempts and discussion may be interpreted as attempting to replace the hard endpoint of caries with surrogates measuring the caries process. This is justified as a means of reducing the length of the study and its costs (NIH Consensus Development Conference, 2001). A major challenge to the dental field is to "validate" that the new variables measuring demineralization and remineralization are not surrogate variables, but rather are meaningful hard clinical endpoints.


   CONTROL GROUP
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
The need for an appropriate control group is always essential. Figs. 1Go and 2Go display the problem when a study does not contain a placebo control. The comparison of the active control C with the test treatment T in Figs. 1Go and 2Go indicates that the two treatments are similar. However, if a placebo group does not exist, then one can never be sure if the two treatments are better than the placebo, as Fig. 1Go indicates, or no different from the placebo, as Fig. 2Go indicates. A current problem in clinical trials is the justification for the use of a placebo group. For mainly ethical reasons, more and more current trials can justify the use of only active controls (Ellenberg and Temple, 2000; Temple and Ellenberg, 2000).



View larger version (4K):
[in this window]
[in a new window]
 
Figure 1. Comparison of test treatment T with active control C and unobserved placebo P (T and C superior to P).

 


View larger version (5K):
[in this window]
[in a new window]
 
Figure 2. Comparison of test treatment T with active control C and unobserved placebo P (T and C not superior to P).

 

   STUDY DESIGN (AVOID BIAS)
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
While the need to randomize and blind (mask) study subjects and investigators (and even examiners, etc.) is long-standing, and the usefulness of stratification by variables such as age, gender, and severity to achieve balance has long been recognized, a recent trend has been to require substantiation and documentation of the process. In our experience, weak documentation has sometimes called the entire study into question. For example, how does one interpret a two-arm study of 100 subjects where the same treatment is assigned to a string of 10 subjects (for example, those with good prognosis) when documentation of proper randomization is not available? Without documentation, such a sequence would most likely be interpreted as resulting from a faulty treatment assignment process. Plans for randomization and blinding should be well-conceived, -executed, and -documented.

Issues such as centralized vs. within-centers randomization, stratification, and randomization by blocks are receiving detailed review. Also, traditional issues such as recruitment, subject eligibility, baseline period activities (including washout procedures to obtain good baselines and run-in procedures to evaluate potential compliance capabilities of subjects), use of prophylaxes at baselines, and actual administration of treatments have become major components of study protocols and later study review. Also, because, in many studies, concern focuses not only on whether an effect is possible, but also on whether it will be maintained, issues such as how long it will take to see a significant effect and how long it will be maintained have become essential to consider. For example, if some heroic cardiac surgery prolongs life only for a matter of weeks, enthusiasm for the declaration of effectiveness may be diminished.

Other important issues—such as patient accrual, spacing of visits, and measurements to be taken on visits—relate to the length (and thus cost) of the study and, often, patient compliance.


   STUDY DESIGN (SAMPLES)
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
In the past, many clinical trials were restricted to two treatments, and the choice between parallel sample or a crossover study design was the major decision. Often, and correctly, the parallel group design was selected. Today there is a movement to elaborate in at least two ways. One is to perform a factorial study where two major questions can be answered. For example, address the comparison of two antihypertension treatments, and, on those who also have cholesterol problems, perform a comparison of lipid-lowering drugs. Correct use of a factorial design allows for independent assessment of both of these questions.

The second elaboration of clinical trials is to mount large multicenter and often multinational studies to ensure generalizability and also, in some regulatory settings, to justify the need for only one study for approval.


   TYPES OF COMPARISONS
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
As mentioned above, placebo controls have often been the optimal control group for establishing effectiveness of an experimental treatment. All that was needed was to show superiority of the treatment to the placebo in two randomized studies, and approval was justified. At times it was essential to establish that the study had sensitivity (sometimes called assay sensitivity), and an active control was also added as, for example, in analgesic studies (D’Agostino and Heeren, 1991). Here the comparison of the active control with the placebo was an essential component of the analysis. The ideal is a study with a placebo, an active control, and an experimental treatment. Now, however, with the large array of proven effective treatments, ethical considerations often remove the possibility of using a placebo. Dose-response trials are possible alternatives but also raise ethical problems, since the low dose may not be any different from a placebo. In response to this, we find that superiority trials are being replaced with active control non-inferiority trials.

A whole array of questions and new issues arises with non-inferiority trials (Blackwelder, 1982; ICH E10 Guidelines, 1997c; Ebbutt and Frith, 1998; Hauck and Anderson, 1999; Hwang and Morikawa, 1999). In Tables 3Go, 4aGo, and 4bGo, T and "Test" represent the value of the outcome variable for the new treatment. Similarly, C and "Control" and P and "Placebo" represent the values of the outcomes for the active control and placebo, respectively. Further, here the Tables deal with trials where higher values of the outcome variable are desirable.


View this table:
[in this window]
[in a new window]
 
Table 3. Types of Statistical Comparisons
 

View this table:
[in this window]
[in a new window]
 
Table 4a. The Non-inferiority Trial
 

View this table:
[in this window]
[in a new window]
 
Table 4b. Assessment of Non-inferiority in Non-inferiority Trial
 
Prior to the active control non-inferiority trial (or at least before the blinding of the trial is broken), we need to state how close the new treatment T must be to the control treatment C on the outcome variable for the new treatment to be considered non-inferior to the control. This non-inferiority margin is represented by M in Table 3Go. In the analysis of the trial, there are three steps. First, the non-inferiority trial must give assurance that the active control would have been superior to a placebo if a placebo had been used. This is the need to demonstrate or establish "assay sensitivity". The use of past placebo control trials often accomplishes this. To do this, one must have available historical data in which it has been established that the active control C is superior to the placebo P. The setting for these historical data should be in conditions similar to those of the present clinical trial. (This is step 1 in Table 4aGo.) Second, the non-inferiority active control trial should demonstrate that the new treatment T is within the non-inferiority margin M of the active control C (step 1 in Table 4bGo). Third, it is then necessary to use the C vs. T data (step 1 of Table 4bGo) in conjunction with the C vs. P placebo control trial (step 1 of Table 4aGo) to demonstrate that T is superior to P. This step is the putative placebo comparison. In conjunction with this step, it is often necessary to establish that not only is the new treatment superior to the placebo, but also that it retains at least a certain amount (say, 80%) of the superiority that the active control trial displayed over the placebo. (Table 4bGo, step 2, demonstrates this last step.) If we think of C-P as representing the difference between the active control and the placebo, and T-P as the difference between the new treatment and the placebo, then the amount retained by the new treatment is (T-P)/(C-P). (For further details, see Ellenberg and Temple, 2000; Hasselblad and Kong, 2001; Temple and Ellenberg, 2001.)


   SAMPLE SIZE
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
Given the above, the next step in the design of the study is the determination of sample size. It is essential that studies are sized and powered adequately. This usually means that the level of significance (or size) should be 0.05 and the power at least 0.80. This should be the case for superiority, non-inferiority, and equivalency studies. Most of the necessary inputs will have been considered in the above. Other inputs, such as the interim analysis plan and meaningful alternative hypotheses, are also necessary. We discuss the issues briefly below.


   TRIAL MONITORING
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
In the past, the monitoring of a study’s progress was essentially the function of the study sponsor. Today this is less and less the case. Monitoring the quality of the study is still the function of the sponsor; however, this responsibility is often shared with a new entity, the Independent Data Monitoring Committee (IDMC), also called the Data and Safety Monitoring Committee (DSMC). This committee consists usually of three or more people; at least two are clinicians and one a biostatistician (Armstrong and Furberg, 1995; Armitage, 1999a,b). Others—such as epidemiologists, ethicists, and patient advocates—are also members as appropriate to the study.

In addition to monitoring quality, monitoring for safety and efficacy has shifted from sponsor to the IDMC. In many present-day clinical trials, during the course of the study, the clinical investigators and those assessing study endpoints are not privy to unblinded data. Only the IDMC is. If interim analyses are performed for evaluation of efficacy or safety, they are done solely for the IDMC. These interim analyses affect alpha spending (DeMets and Lan, 1994; O’Brien and Fleming, 1979), and the IDMC has to deal with this issue. Also, usually, the IDMC alone can break the blind, and often its reports are the study reports on safety sent to the FDA. Interim analyses for safety and efficacy have become standard features of trials, and the prominence of the IDMCs has increased accordingly.

Other activities, such as the evaluation of the need for sample size adjustments or study length extension (Kieser and Friede, 2000), are done under the direction of the IDMC. Of course, the study sponsor decides if the sample size or length of the study will be increased.

The existence and functions of the IDMC mean the lack of control by the sponsor. For example, in a traditional three-year caries trial, the IDMC would carry out the one- and two-year analyses, and the sponsor may need to wait three years before unblinded data can be analyzed. This has not been the procedure in the past, where one- and two-year unblinded data were often analyzed by the sponsor.

Recently, the FDA has generated guidelines for IDMCs (2001). These support the roles mentioned above.


   ANALYSIS SETS
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
At the end of a study, there are at least three datasets that can be analyzed. All should be pre-specified before the blind is broken. First, there is the intention-to-treat (ITT) dataset consisting of all randomized subjects. Second, there is a modified ITT dataset consisting, for example, of those randomized and those who did receive and did take the treatment. Then there is the dataset of those who followed the study protocol and did finish the study. This is often called the per protocol dataset.

Present-day thinking favors the ITT sample for the primary data analysis. It is the only dataset that preserves randomization and, some argue, the only one that prevents biases and justifies statistical analyses (Ellenberg, 1996; Lachin, 2000). Ideally, the primary analysis should be performed on the ITT data, then analyses with the modified ITT data and the per protocol data should be done to demonstrate consistency and also to supply data on those who actually took the treatment. However, with missing data, such as arises from dropouts, there can be serious problems with the use of the ITT data. (We say more on this below.) Unfortunately, the use of other datasets can introduce even more serious biases. In general, they are not appropriate as the primary analysis dataset, and their use for any analysis needs serious justification.

In the traditional three-year caries trial, the subjects finishing the three years and supplying at least baseline and three-year data would be analyzed as the analysis set. This set is a per protocol dataset and may represent up to 30% dropout. This analysis does have justification, and we return to this example below in our discussion of missing data.


   UNIT OF ANALYSIS
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
Over the years, much confusion has arisen over how to deal with multiple measurements on the same subject collected either at one time point or over time. In periodontal trials, multiple teeth were measured and each tooth analyzed as if it were a separate subject. In other studies, multiple measurements are made over time (longitudinal) and the time points analyzed independently of other time points. New methods such as generalized estimating equations and random-effects models can easily deal with these problems. They can deal with correlations within subjects and across time (Diggle et al., 1996; Cnaan et al., 1997; Burton et al., 1998; Albert, 1999) and also deal with the problem of multiple testing over time. These methods offer great promise for increased efficiency in the analysis of clinical trials


   MISSING DATA
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
Almost every study will have missing data due to some measurements not having been taken on visits, blood specimens being destroyed or not being analyzed, dropouts, etc. In the past, the amount of missing data was anticipated, and the sample size was selected to be sufficiently large to accommodate this. All further considerations were simply ignored. This is no longer considered acceptable. Heroic efforts are often exerted to keep subjects in a study, and where data are missing, imputation methods are often applied (Shih and Quan, 1997; Myers, 2000; Verbeke et al., 2001). One common method used in longitudinal studies is to take the last observation available on a subject and move it forward for analyses. This is called the last observation carried forward (LOCF) method (Siddiqui and Ali, 1998). It is often a reasonable procedure in a placebo-controlled superiority trial where subjects improve over time. Moving the last observation forward may penalize the treatment and favor the null hypothesis of no difference between the treatment and the placebo. In other settings, it may be completely inappropriate.

A procedure very much recommended today is to understand the mechanism that leads to missing data, perform imputation, analyze the ITT sample with imputed data, and evaluate the effects of the imputations on the results of the analyses. In this process, terms such as "missing completely at random", "missing at random", and "non-random/non-ignorable missingness" are used. The term ‘at random’ implies that there is a random mechanism involved, and knowledge of the treatment group membership, for example, may be sufficient to obtain good imputed values. The term "non-ignorable" implies that more information may be needed to obtain imputed values. Methods do exist that may help. Recently, the methods of multiple imputation have been suggested to help with the non-ignorable aspects of missing data and also to accommodate the adjustments in precision that should accompany imputation of missing data (Schafer, 1999). The need for sensitivity analysis to judge the sensitivity to imputation schemes is also receiving attention. There are many open questions here, and the best or even reasonable procedures are not known.

For caries clinical trials, using the ITT dataset with imputation for missing values has tremendous implications. In the traditional three-year caries study, up to 30% of the subjects may drop out solely because of moving out of the study area. Such dropouts are "missing completely at random dropouts" and technically can be ignored. Imputation of missing data by moving forward the last observation for these subjects (i.e., baseline, one-year, or two-year values) will almost guarantee non-significance. There will not be enough separation between the new treatment and the control treatment. For future caries trials, we need careful consideration of modified ITT or per protocol analyses as the appropriate analyses. We also need an investigation of the effects of the many missing data methods on caries trials data.


   SAFETY
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
Safety data have always been important, but often not rigorously analyzed. Recent developments have led to more rigorous analysis. For example, data mining techniques have been developed and successfully applied to safety data to reveal structures not previously seen (Chuang-Stein et al., 2001; Gait et al., 2000). Drug approval may now very well involve not only efficacy studies, but also substantial safety studies with complete analyses. Further, because there are so many effective treatments, the choice among these effective treatments may rest ultimately upon the safety issues.


   SUBSETS
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
Subset analysis has also become a major essential component. Consistency of effects is usually demanded. This is especially true when a single large study is being used for regulatory approval.


   CLINICAL SIGNIFICANCE
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
Last, after the statistical analysis has established significance of the treatment under investigation, with its corresponding consistency across subsets, the quantification of effect size or clinical effect has become a routine component of clinical trials. Clinical significance is necessary to judge the efficacy of a treatment. It is not only necessary to know that there is a statistically significant effect, but it is also important to ask if this effect is clinically meaningful. Meta-analysis techniques have proved to be useful devices for generating these measures by combining effects over all available well-controlled clinical trials.

For caries clinical trials using DMFS increments as the primary outcome variable, we can argue that any statistically significant value of one treatment vs. another can be considered clinically significant, since this translates into surfaces saved from caries. Clinical significance for demineralization and remineralization is more complicated. The direct correspondence of a change in these to clinically meaningful outcomes is not apparent.


   SUMMARY: SO WHAT IS NEW?
 TOP
 ABSTRACT
 INTRODUCTION
 STUDY OBJECTIVES
 TARGET AND SAMPLE POPULATIONS
 EFFICACY VARIABLES
 SURROGATE VARIABLES
 CONTROL GROUP
 STUDY DESIGN (AVOID BIAS)
 STUDY DESIGN (SAMPLES)
 TYPES OF COMPARISONS
 SAMPLE SIZE
 TRIAL MONITORING
 ANALYSIS SETS
 UNIT OF ANALYSIS
 MISSING DATA
 SAFETY
 SUBSETS
 CLINICAL SIGNIFICANCE
 SUMMARY: SO WHAT IS...
 REFERENCES
 
In the above, we have attempted to review new developments in clinical trials while also putting them into the context of the various components of the trials. We have also illustrated many of the features with caries clinical trials. There are many important features, both new and of long-standing good practice. Some of the major ones are as follows: