JDR JDR Most Read Articles
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Blackwelder, W.C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Blackwelder, W.C.
J Dent Res 83(Spec Iss C):C113-C115, 2004
© 2004 International and American Associations for Dental Research


PROCEEDINGS
Clinical

Current Issues in Clinical Equivalence Trials

W.C. Blackwelder

8613 Hempstead Avenue, Bethesda, MD 20817-6711, USA; wcb{at}boo.net


   ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 ISSUES IN EQUIVALENCE TRIALS
 DISCUSSION
 REFERENCES
 
A clinical trial designed to show that an experimental treatment E is similar to a control treatment S in a specified direction is a one-sided equivalence or similarity trial—in the terminology of the International Conference on Harmonisation, a non-inferiority trial (ICH, 1998). We design such a study to show that E is not worse than S (often an accepted or standard treatment) by as much as a pre-specified margin {theta}0. The quantity {theta}0 can be either a difference or ratio of an appropriate outcome in individuals treated with E and S. A critical issue is whether one can conclude from a non-inferiority trial that E is effective. Closely related is an appropriate choice of {theta}0, which should be substantially less than the estimated effect of S if available from previous studies; {theta}0 should also be acceptable to clinicians, either because of advantages of E or because a difference or ratio less than {theta}0 is considered unimportant clinically. Another possible approach for showing that E is effective is to estimate its effect compared with placebo from historical data. If previous studies that consistently show an effect of S are not available, alternative study designs should be considered. Findings of superiority or non-inferiority of E, when the study was planned to show the other, are possible and may be supportable. A finding that E is at the same time statistically significantly worse than S and "non-inferior" to S should not be a problem, if the criterion {theta}0 is appropriate and this possibility was considered in the protocol. Various sorts of non-adherence may make treatments appear similar, even if they are not. In particular, random non-adherence of study participants to the assigned treatment regimen may cause an intention-to-treat analysis to give a misleading result of similarity. Thus, maintaining a high degree of adherence to protocol is especially important in an equivalence or non-inferiority trial. Interim analysis does not present statistical problems in these trials; early stopping may not be wise in many cases, however, because strong interim evidence for non-inferiority may actually be an indicator that E is superior to S.

KEY WORDS: equivalence • non-inferiority • superiority • clinical trials


   INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 ISSUES IN EQUIVALENCE TRIALS
 DISCUSSION
 REFERENCES
 
An equivalence trial is a clinical trial designed to evaluate whether an experimental treatment E is similar to a control treatment S, by an appropriate definition of similarity. (The trial may involve three or more treatments, but for simplicity this paper is concerned only with studies of two treatments.) S will generally be an accepted active control treatment, and we can think of it as "standard" treatment. E may or may not actually be superior to S; often it is assumed that E and S have equal effects—hence the term ‘equivalence’ trial. The concept of similarity may be either one- or two-sided. In International Conference on Harmonisation (ICH) terminology—developed by regulatory and industry representatives from the European Union, the United States, and Japan—a two-sided trial of similarity is referred to as an ‘equivalence’ trial and a one-sided trial as a ‘non-inferiority’ trial (ICH, 1998). We compare the effects of treatments E and S through a measure {theta}, which is most often a difference of proportions or means, or a ratio of proportions, means, incidence rates, hazards, or odds. A two-sided equivalence trial is designed to show that the effects of E and S do not differ by as much as pre-specified quantities {theta}01 in one direction and {theta}02 in the other—i.e., {theta}01 < {theta} < {theta}02. An example is a bioequivalence trial, in which the amounts of a drug in blood plasma or serum might be compared for two treatments. Clinical equivalence studies are usually one-sided, i.e., designed to show that E is not worse than S by as much as a pre-specified quantity {theta}0. In dental research, a one-sided conclusion of similarity has been referred to as ‘least as good’ (Kingman, 1992). The constants {theta}0, or {theta}01 and {theta}02, are called ‘margin of non-inferiority’ or ‘margins of equivalence’.

The comparative measure of outcomes, {theta}, is arbitrarily defined here so that a difference > 0, or a ratio > 1, indicates that S is superior to E. For example, if µE and µS represent caries incidence rates or mean increments of DMFS scores for E and S, respectively, the difference is defined as µE - µS, and the ratio is defined as µES. (In dental research, treatment effects have typically been expressed as percent reduction; if E is actually superior to S, the percent reduction in caries incidence or mean DMFS increment can be written 100 [1 - µES]%.)

Similarity may be demonstrated by either a confidence interval (CI) or a hypothesis test, as long as the CI and test are based on the same statistic and are defined consistently with one another, so that the CI contains all values of {theta}, and only those values, that the test does not reject. For example, if {theta} is a ratio of proportions, a test and CI can be defined by the method of likelihood scores (Gart and Nam, 1988). A CI readily lends itself to graphical presentation; in addition, basing analysis on a CI avoids the potential problem of choosing the wrong null hypothesis. A hypothesis test shows whether the data are consistent with the null hypothesis, and a CI shows the hypotheses that are consistent with the data.

The Fig.Go depicts a two-sided 100(1–2{alpha})% CI for {theta} with upper limit less than {theta}0, which we assume was specified previously as the non-inferiority margin. Since the lower limit is not necessary in determining whether the data satisfy the definition of non-inferiority, we could use a one-sided 100(1-{alpha})% CI. Thus, we can say that, with type I error rate {alpha} (equivalently, at confidence level 1-{alpha}), E is not worse than S by as much as {theta}0, and E and S are similar (i.e., E is non-inferior to S) according to the trial definition. However, the additional information given by the lower limit can be important in interpreting the data. For example, suppose {theta} = µES is the ratio of caries incidence rates; then the lower confidence limit could be > 1, indicating that E is statistically significantly worse than S. Such a possibility illustrates how the terms ‘equivalence’ and ‘non-inferiority’ can be misleading and should not be taken literally, since E can be both inferior to (significantly worse than) S and, according to the definition, ‘non-inferior’ to S. This possibility is discussed further in a later section.



View larger version (4K):
[in this window]
[in a new window]
 
Figure. A 100 (1–2{alpha})% confidence interval for {theta}, around the point estimate . An upper limit less than {theta}0 allows a conclusion of similarity (non-inferiority) to be drawn.

 
For the example in the Fig.Go, the appropriate null hypothesis is H0: µES ≥ {theta}0, tested against the alternative hypothesis H1: µES < {theta}0 at (one-sided) significance level or type I error rate {alpha}. Assuming the test statistic and CI are defined consistently, the test will reject the null hypothesis if and only if the upper limit of the 100(1 - 2{alpha})% CI is less than {theta}0.

It has frequently been pointed out that the null hypothesis of no difference (H0: µES = 1) is not appropriate in an equivalence trial (see, for example, Dunnett and Gent, 1977; Westlake, 1979; Blackwelder, 2001). Failure to reject the hypothesis of no difference has, in some trials, been considered sufficient to conclude that the treatments are equivalent. Unfortunately, this erroneous logic is still sometimes used. However, there may be an important difference and the null hypothesis of no difference nevertheless may not be rejected, due to an insufficient sample size, unexpectedly large variability, or some other aspect of study design or conduct. On the other hand, in a study with high power, this hypothesis might be rejected when the difference is small and clinically of little consequence.

Equivalence trials in dental research have been a consideration for some time. For example, in 1991, an ad hoc committee recommended to the Council on Dental Therapeutics that efficacy of anticaries agents be established through a non-inferiority criterion (Proskin et al., 1992). The criterion was based on an average of results for a 200-ppm-F preparation and a ‘Gold Standard’ preparation.


   ISSUES IN EQUIVALENCE TRIALS
 TOP
 ABSTRACT
 INTRODUCTION
 ISSUES IN EQUIVALENCE TRIALS
 DISCUSSION
 REFERENCES
 
Showing that E is Effective; Choice of {theta}0
It is critical that, if E is shown to be non-inferior to S according to a criterion {theta}0, it can also be concluded that E is effective. There are several requirements for such a conclusion:

  1. There must be convincing prior evidence of the effectiveness of S compared with placebo or no treatment; the effectiveness of S must have been consistently demonstrated.
  2. It must be clear that S is effective in the current study. (For practical purposes, it is generally assumed that S must have the same effect as in previous trials, but this is not strictly necessary mathematically.)
  3. The current study must be capable of distinguishing between an effective and an ineffective treatment—in ICH parlance, the study must have ‘assay sensitivity’ (ICH, 2000).

If there are no studies establishing the effectiveness of S (or none under the conditions of the current trial), a finding of non-inferiority to S may not be sufficient to establish the effectiveness of E. All of the above requirements imply that relevant conditions in the current trial should be the same as in previous trials of S. Depending on the setting, relevant conditions might include the types of patients in the trial, details of treatment preparation and administration, concomitant treatment, and other factors. For example, S might have been studied previously in an area without fluoridation of the water supply, but the current trial might be planned for an area with fluoridation. In that case, it may not be clear that either S or E is effective, unless a placebo control can be included in the trial. A placebo may be ethical in some settings, even if effective therapy exists (Temple and Ellenberg, 2000). Possible alternatives include an add-on trial, in which all randomized patients receive some standard therapy and the randomization is to E or placebo as additional treatment; a dose-response trial; and other designs (ICH, 2000; Temple and Ellenberg, 2000).

The criterion or margin {theta}0 is a crucial element of a well-designed equivalence or non-inferiority trial. It should be chosen so that if E is not worse than S by as much as {theta}0, then it should be clear that E is an effective treatment. In addition, E should be acceptable to clinicians, either because the difference between E and S is not important clinically or because of some advantage of E—e.g., it causes fewer adverse side-effects, is easier to administer, or costs less. Suppose that, from previous studies, we have a good estimate of the effect of S and that the estimate is relevant in the present trial. Then the margin {theta}0 should be less than the effect of S—one possibility is a fraction of the effect, to preserve a desired proportion of the effect (Temple and Ellenberg, 2000). Another possibility is to set {theta}0 equal to a lower confidence limit for the effect, or a fraction of a lower confidence limit. Research into an appropriate choice of {theta}0 is of considerable current interest. Alternatively, one might estimate the effect of E, compared with placebo, from the trial results and historical data. Although {theta}0 in that case would not appear explicitly in the analysis, it or a similar quantity would still be relevant for sample size determination.

Superiority Conclusion from a Non-inferiority Trial
A trial may be designed to show non-inferiority but actually show a statistically significant improvement for E compared with S. The question then arises as to whether the investigator is entitled to claim superiority for E over S, rather than merely the non-inferiority for which the trial was planned. From a statistical point of view, this is not a problem, as is readily seen from a confidence interval. If the upper confidence limit in the Fig.Go is to the left of both {theta}0 and 1 (or {theta}0 and 0 for a difference), then both {theta}0 and 1 can be rejected as plausible values of {theta} with type I error rate {alpha} (i.e., at confidence level 1-{alpha}). That a conclusion of superiority is warranted in this situation can also be argued from a hypothesis testing point of view (Morikawa and Yoshida, 1995).

In many cases, the non-inferiority and superiority conclusions may be based on different groups of individuals (Wiens, 2001). If so, a single CI will not apply to both findings. However, a more general result is easily obtainable from a straightforward unconditional probability argument, applicable to either a CI or a hypothesis test and applicable whether the analysis datasets are the same or different. Let A refer to a non-inferiority result on dataset 1, and let B refer to a finding of superiority on dataset 2. Then the probability of both A and B, given the appropriate null value of {theta} (for superiority, 1 for a ratio and 0 for a difference; for non-inferiority, {theta}0), is less than or equal to the probability of either A or B alone, and the type I error rate is not inflated.

Even though the argument is straightforward, when regulatory authorities will be considering the trial in an application for licensure or registration, it is advisable to discuss the possibility of showing superiority in a non-inferiority trial with them in advance of the trial—at least in advance of the analysis.

Non-inferiority Conclusion from a Trial Designed for Superiority
It may also be possible to support a conclusion of non-inferiority from a trial that was designed to show superiority. This case, however, is more problematic, unless the possibility is considered in advance and an appropriate criterion {theta}0 for non-inferiority is specified before the analysis. Given {theta}0, the mathematical justification is straightforward, since the unconditional probability argument given above applies here also. However, unless {theta}0 has been specified in advance and the issue discussed with regulatory authorities if appropriate, agreement on the non-inferiority criterion may be difficult.

Non-inferiority Conclusion when E is Significantly Worse than S
Let {theta} = µES be the ratio of caries incidence rates. Suppose that the lower confidence limit is greater than 1 and the upper limit is less than the pre-specified margin {theta}0, indicating that E is significantly worse than S and at the same time ‘non-inferior’ to S. Such a finding should not be a problem if clinicians are really willing to use E as long as it meets the non-inferiority criterion. As was pointed out earlier, either {theta}0 represents a clinically unimportant difference, or else other advantages of E outweigh a small benefit of S (smaller than {theta}0) relative to E in the trial endpoint.

Adherence to Protocol
In a trial designed to show superiority, there is an incentive for investigators to adhere rigorously to the protocol, to obtain the desired result. That incentive is not present in an equivalence or non-inferiority trial, since various sorts of lack of adherence can make treatments appear similar, even if there is a true difference between them (Jones et al., 1996).

A particular adherence issue is compliance of study participants with the treatment regimen. In a trial designed to show superiority, the primary analysis should generally be based on data from all randomized patients (intention-to-treat, or ITT, analysis), or at least from all patients who received treatment. However, since random non-compliance will tend to cause treatments to appear similar, even if they are different, ITT analysis may lead to a misleading conclusion of non-inferiority. To be convincing in the presence of patient non-compliance, the conclusion should also be justified from appropriate analysis of compliers (Lewis and Machin, 1993).

Thus, strict adherence to the protocol, by both investigators and study participants, is especially important in an equivalence or non-inferiority trial. Understanding of this point by investigators and randomized participants, as well as careful monitoring of all aspects of adherence, is therefore important. If appropriate, a run-in period to monitor compliance before randomization may be helpful.

Interim Analysis
As with clinical trials in general, an equivalence or non-inferiority trial can be evaluated in an interim analysis for possible early stopping or sample size re-estimation (Blackwelder, 2001). Familiar group sequential methods (Jennison and Turnbull, 1999) are appropriate, with {theta}0, rather than a ratio of 1 or difference of 0, taken as the relevant bound for a confidence limit or the null value for a hypothesis test. However, investigators may wish to consider whether early stopping because of strong evidence for non-inferiority is wise. If such evidence is present at an interim analysis, it is likely that the data show a trend toward superiority of E, unless the analysis is very close to the planned end of the trial. The possibility of demonstrating superiority may be a strong incentive to continue the trial.

Serial Non-inferiority Trials
A special illustration of the difficulty of concluding effectiveness from non-inferiority is a series of trials consisting of a placebo-controlled trial of S, followed by a non-inferiority trial of E1 and S, then a non-inferiority trial of E2 and E1, etc. Very quickly in such a series, it may become infeasible or even impossible to conclude statistically that the new experimental treatment is effective based on a finding of non-inferiority to its active control. To establish the effectiveness of new treatment Ei, one might design a trial to show E1 non-inferior to the treatment S originally found effective. If this is not feasible, it may be necessary to show that Ei is superior to placebo or to an effective active treatment, or to consider some other alternative to a non-inferiority trial.


   DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 ISSUES IN EQUIVALENCE TRIALS
 DISCUSSION
 REFERENCES
 
This paper has considered briefly some important issues in equivalence or non-inferiority trials comparing two active treatments. Perhaps the most critical is whether the trial can justify a conclusion that an experimental treatment is effective. If not, an alternative design should be considered; it may be that the trial should be planned to show superiority of the experimental treatment. Sponsors and investigators should be very careful about conducting a trial to show effectiveness of an experimental treatment through similarity to an active control.

Since investigators do not know the results of the trial before conducting it, it may be best in many cases not to think of the trial as a ‘superiority’ or ‘non-inferiority’ trial. Consider and address both possibilities in the protocol—as well as a finding of both inferiority of the experimental treatment and, according to the criterion {theta}0, ‘non-inferiority’. When it seems reasonable that the experimental treatment may be superior, it may be feasible to plan for a trial size that can give high power both for that result and for non-inferiority.


   ACKNOWLEDGMENTS
 
The author thanks Albert Kingman for very helpful discussions.


   FOOTNOTES
 
Presented at the International Consensus Workshop on Caries Clinical Trials, Glasgow, Scotland, January 7–10, 2002


   REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 ISSUES IN EQUIVALENCE TRIALS
 DISCUSSION
 REFERENCES
 
Blackwelder WC (2001). Equivalence trials. In: Biostatistics in clinical trials. Redmond CK, Colton T, editors. New York: Wiley, pp. 179–185.

Dunnett CW, Gent M (1977). Significance testing to establish equivalence between treatments, with special reference to data in the form of 2 x 2 tables. Biometrics 33:593–602.[ISI][Medline]

Gart JJ, Nam J (1988). Approximate interval estimation of the ratio of binomial parameters: a review and corrections for skewness. Biometrics 44:323–338.[ISI][Medline]

International Conference on Harmonisation (1998). Guidance E9: statistical principles for clinical trials. Fed Register 63(179) or http://www.ifpma.org/ich1.html.

International Conference on Harmonisation (2000). Guidance E10: choice of control group and related issues in clinical trials. http://www.ifpma.org/ich1.html.

Jennison C, Turnbull BW (1999). Group sequential methods with applications to clinical trials. New York: Chapman and Hall.

Jones B, Jarvis P, Lewis JA, Ebbutt AF (1996). Trials to assess equivalence: the importance of rigorous methods. Br Med J 313:36–39.[Free Full Text]

Kingman A (1992). Specific statistical considerations relevant to the design and analysis of gingivitis trials demonstrating product superiority or equivalence. J Periodontal Res 27:378–389.[ISI][Medline]

Lewis JA, Machin D (1993). Intention to treat—who should use ITT? Br J Cancer 68:647–650.[ISI][Medline]

Morikawa T, Yoshida M (1995). A useful testing strategy in phase III trials: combined test of superiority and test of equivalence. J Biopharm Stat 5:297–306.[Medline]

Proskin HM, Chilton NW, Kingman A (1992). Interim report of the ad hoc Committee for the Consideration of Statistical Concerns Related to the Use of Intra-oral Models in Submissions for Product Claims Approval to the American Dental Association. J Dent Res 71(Spec Iss):949–952.

Temple R, Ellenberg SS (2000). Placebo-controlled trials and active-control trials in the evaluation of new treatments. Part 1: ethical and scientific issues. Ann Intern Med 133:455–463.[Abstract/Free Full Text]

Westlake WJ (1979). Statistical aspects of comparative bioavailability trials. Biometrics 35:273–280.[ISI][Medline]

Wiens BL (2001). Something for nothing in noninferiority/superiority testing: a caution. Drug Inf J 35:241–245.[ISI]




This article has been cited by other articles:


Home page
J. Dent. Res.Home page
N.B. Pitts and J.W. Stamm
Preface
J. Dent. Res., July 1, 2004; 83(suppl_1): C4 - C5.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Blackwelder, W.C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Blackwelder, W.C.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
IADR Journals Advances in Dental Research ®
Journal of Dental Research ® Critical Reviews (1990-2004)