## Introduction

Interim analysis of accruing information in clinical trials is necessary in order to monitor for unexpectedly large treatment effects and for excess toxicity. In many clinical trials survival may be one of the main outcome measures, and it would clearly be unethical and unacceptable to continue recruiting patients to the trial if early results provide conclusive evidence of a convincing superiority of one or other treatment policies. These considerations have led many clinical trials organizations to institute formal procedures for regular monitoring and interim analyses of their trials, especially those trials which are large, have lengthy recruitment periods, and involve patient survival. In many cases the results of such monitoring are reviewed by specially convened Data Monitoring Committees.

* Correspondence to: P. M. Fayers, Unit for Epidemiology and Clinical Research, Faculty of Medicine, Medisinsk Teknisk Senter, N 7005 Trondheim, Norway f Current address: Unit for Epidemiology and Clinical Research, Faculty of Medicine, Medisinsk Teknisk Senter, N 7005 Trondheim, Norway

Tutorials in Biostatistics Volume 1: Statistical Methods in Clinical Studies Edited by R. B. D'Agostino © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-02365-1

Bayesian methods which may be used for data monitoring in clinical trials are illustrated, and a simple exposition provided showing how to apply these techniques. Previously published papers on this subject have tended either to be theoretical or to be wide ranging;1-6 in all cases, details of the methods have been presented in a manner which may not be readily accessible to those who simply seek to apply Bayesian methods to their own trials but find the mathematics unfamiliar. In this paper the description of the methods should be sufficiently simple for non-mathematical readers to appreciate the objectives, to be able to perform the calculations, and to understand the interpretation of the results; however, mathematical details intended for applied statisticians are also included. Thus this tutorial adopts a position mid-way between exposition and 'cook-book', and should be suitable for both clinicians and statisticians involved

Three worked examples are given, based upon data monitoring of the Medical Research Council (MRC) OE02 trial which is evaluating the role of surgery with or without adjuvant chemotherapy for treatment of patients with oesophageal carcinoma. Confidentiality precludes the publication of interim results, and so hypothetical scenarios are presented illustrating monitoring of: (i) a trial with positive results, but at an early stage of patient recruitment; (ii) the same trial at a later stage, when early termination would be recommended, and (iii) a trial which could be terminated because early results suggest there is unlikely to be any treatment difference. These examples relate to survival comparisons, which are very pertinent to trials in many disease areas, but adaptation to binary or continuous endpoints is relatively simple.

When a trial accrues patients over several years, results on earlier patients become available before the later ones are randomized. If the results look promising in favour of one treatment, the question can arise as to whether the trial should be terminated early. In particular, if the early results provide reasonably conclusive evidence of an advantage in favour of one of the treatments, it may be considered unethical to continue recruiting patients. This especially applies to clinical trials in potentially fatal diseases, in which patients receiving an inferior therapy may be at higher risk of death. It is crucial that such studies should be closely monitored so that if the new treatment has an effect that is larger than expected the trial may be terminated early; it is equally crucial that if the new treatment is unexpectedly found to be inferior, the trial should also terminate early. However, even where survival is not the primary outcome, it may still be unethical to continue exposing patients to an inferior treatment. Furthermore, one can also argue that it is an abuse of research funds to continue even a harmless clinical trial beyond the point at which there is sufficient evidence as to which therapy more is effective. Thus many clinical trial protocols contain explicit statements about the frequency and timing of interim analyses, and many trials have a formal, independent Data Monitoring Committee which is assigned the task of

Superficially, therefore, it might appear that clinical trials should be terminated as soon as there is a convincing and statistically significant difference between the treatments. Thus one might envisage a sequential or group sequential trial,7,8 which would specify a stopping rule based upon formal statistical tests. This approach is being used in a few MRC trials, albeit with some reservations.9 It is worth remembering that if a trial was important enough to start in the light of the knowledge available then caution should be exercised before too lightly concluding that there is sufficient evidence to terminate the trial; hence p-values alone are unlikely to suffice for decision making about the future of a clinical trial, although they may be an important

Nevertheless, there is an opposing school of thought which argues persuasively that the role of a clinical trial is to influence clinical opinion and clinical practice. Thus if a clinical trial detects a large treatment effect after half the patients have been entered, and as a consequence is terminated early, that trial may be received with considerable scepticism by clinicians; despite any significant p-values that are cited, many clinicians may still remain unconvinced by the weight of evidence that has been produced. These clinicians are likely to continue treating new patients in the same way that they have done in the past. The clinical trial, therefore, will have failed in its primary objective. Through early termination, it has failed to obtain sufficient evidence to alter the management and therapy of future patients. This philosophy has led many trialists to be cautious about stopping recruitment prematurely. The ISIS (International Study of Infarct Survival) trials, for example, explicitly state in the protocols that the Data Monitoring Committee will only disclose interim results to the steering committee if there is 'both (a) "proof beyond all reasonable doubt" that for all, or for some, types of patient one particular treatment is clearly indicated or contraindicated in terms of net difference in mortality, and (b) evidence that might reasonably be expected to influence materially the patient management of many clinicians who are already aware of the other main trial results'; the protocol also suggests that this might perhaps correspond to a difference of at least three standard deviations.10 The need to convince others is also formalized by the drug regulatory process, and a trial that has stopped early may fail to convince regulators; for this reason, too, many trialists are wary of premature termination of patient

The concept of the role of a clinical trial being to influence clinical opinion has a number of important consequences. In particular, it implies that statistical significance and statistical stopping rules will not in themselves be sufficient, and that one should additionally consider the prior opinions of clinicians. If clinicians, in general, are sceptical about the merits of a new treatment in terms of its prolonging life or curing patients, the necessary evidence to change that view will have to be substantial; if, on the other hand, most clinicians already expect the new treatment to be an improvement, far less weight of evidence will be necessary to influence

In practice, most major trials groups use an independent Data Monitoring Committee to help with the review of trials. One of the functions of a Data Monitoring Committee is to offer advice on whether a particular trial should terminate, and although this is not only a statistical decision, statistical guidelines can help formalize and clarify some of the issues outlined above.11 There are essentially two schools of thought concerning the statistical procedures and calculations that should be made. The first, adopting a 'classical frequentist' approach, pre-specifies a number of 'looks' at the accumulating data and uses the observed p-values of these looks as a basis for stopping. At each look a relatively stringent significance level is used, so that the overall level of significance for the trial is maintained at, say, 5 per cent. The Pocock rule,8 for example, uses the same significance level at every analysis, whereas the O'Brien/Fleming rule12 uses extremely stringent criteria at the very earliest visits on the grounds that early observed differences are much more likely to be spurious. These are examples of group-sequential designs. The second, or 'Bayesian', approach formalizes the idea that external or prior evidence or beliefs can be summarized mathematically, and that in stopping the trial one is balancing the evidence from the trial against this other evidence.1"6,13 When the trial evidence can outweigh this other evidence it is time to stop the trial. Clearly the formalization of this other evidence is critical. This paper examines the practical aspects of specifying prior opinions and the application of a Bayesian

As an example, we consider the MRC OE02 clinical trial. This aims to evaluate the role of pre-operative chemotherapy for patients with resectable cancer of the oesophagus.

The outlook for patients with oesophageal cancer undergoing surgery remains poor, with only 20 per cent remaining alive at 2 years, and only 5 per cent alive and disease-free at 5 years. However, results from several small, uncontrolled, phase II studies suggest that this cancer may respond to chemotherapy given either pre- or post-operatively. 2-year survival figures of as large as 30 to 40 per cent have been claimed. Two of the more active chemotherapy agents are cisplatin and fluorouracil. Hence OE02 is comparing survival for patients randomized to either pre-operative chemotherapy followed by surgery, or surgery alone. The chemotherapy in OE02 consists of two four-day courses of cisplatin and fluorouracil, with an interval of three weeks between courses. (Copies of the protocol may be obtained from the MRC Cancer Trials Office.)

However, the chemotherapy is expensive. It may sometimes have adverse side-effects including nausea and vomiting, and less frequently diarrhoea, stomatitis, renal disturbance and myelosup-pression. Furthermore, since these patients will eventually undergo surgery, most of them would prefer the surgery to take place as soon as possible. Demonstrating equivalence of the two treatment arms is of no interest. Therefore OE02 is testing the hypothesis that pre-operative chemotherapy will improve survival, and that the patients' overall well-being is not impaired. Thus the primary endpoint of interest in OE02 is length of survival, although clearly the general

In accordance with MRC standard policy, a Data Monitoring Committee was created. This includes one independent statistician, and two independent clinicians who are experts in oesophageal cancer but are not entering patients into OE02. The trial was launched in 1992, and has a planned sample size of 800 patients. Over 400 patients were recruited by summer 1996. As this is an on-going clinical trial, the true interim results are confidential; the examples that follow are

• ~N(/i, a2) indicates 'is distributed as a normal (Gaussian) distribution with mean ¡x and

• log(/i) is the log-hazard ratio, and log^) the log hazard ratio under the alternative hypo-

• We assume a clinical trial is being carried out, and that at the time of carrying out the interim analysis we have observed Ox and 02 deaths or 'events' in the two treatment groups.

• Ej and E2 are the 'expected' number of deaths that would have been observed under the null hypothesis; computer programs which calculate survival comparisons usually display

Suppose the clinical trial was designed to compare survival in patients randomized between a standard form of treatment versus a new treatment. Frequently there will be prior knowledge about the nature of the survival curve for the standard treatment. This may be derived from previous studies or from clinical experience. For example, in the OE02 trial past experience enabled us to expect that 20 per cent of patients receiving standard surgical treatment would still be alive at 2 years after surgery. Thus the 2-year survival rate is 0-20. More generally, we use survx and surv2 to represent the survival rates for the two treatment groups, where survival is measured at some fixed time relative to randomization. Thus survx might be the pre-study estimate of survival in the standard or control arm of the trial, and would represent the proportion of patients expected to be alive at some specified time point relative to when the patient was randomized. Surv2 would be the survival rate that is hypothesized for the alternative treatment. The trial will have been designed to test a null hypothesis of no treatment difference, against an alternative hypothesis that the treatment difference is at least surv2-surv1.

An estimate of survu together with a target value for the alternative hypothesis of a treatment difference of at least surv2-survu is usually specified in clinical trial protocols and is used as a basis for sample size estimation (see example 1(a)). The sample size calculations ensure that when the clinical trial has been completed, and provided there has been adequate follow-up of the patients, the trial-based estimates of survt and surv2 will be sufficiently precise to enable adequately powerful hypothesis testing; a review of sample size issues is given in Fayers and Machin,14 and tables for sample size estimation are available.15,16

In terms of hazard ratios, this is equivalent to carrying out a trial to detect a log hazard ratio, which we call log(/ix), of log (/ix) = log (log (surv i )/log (surv2)). (2)

Hence we have a null hypothesis that the log hazard ratio is zero, and an alternative hypothesis that the log hazard ratio is log^).

In OE02 the baseline proportion surviving 2 years, in patients receiving surgery alone, was assumed to be 0-20 (20 per cent of patients remaining alive after 2 years). The alternative hypothesis is that pre-operative chemotherapy produces an absolute improvement of 10 per cent, to 0-30 at 2 years. These values were used as a basis for the sample size estimation, and are specified in the study protocol. From equation (2), this translates to an alternative hypothesis with a log hazard ratio of log(/îx) = 0-290.

## Post a comment