K ak 82 2k ak

For binary data, random effects models are most commonly used with logit or probit link functions. A structural defect exists with the log and identity links in treating {ak} as normally distributed; for any parameter values with a>0, with positive probability a particular realization of the random effect corresponds to nik outside [0,1].

2.3. Treatment-by-centre interaction

Even if a model that is additive in centre and treatment effects fits sample data adequately, it is usually unrealistic to expect the true association to be identical (or essentially identical) in each stratum. This subsection considers models that permit interaction. With a fixed-effects approach, the model logit(rcik) = a* + ftk/2, logit(^2k) = ak - ftk/2 (5)

has odds ratio eftk in centre k. It is saturated (residual d.f. = 0), having 2K parameters for the 2K binomial probabilities. The ML estimate of ftk is the sample log-odds ratio in stratum k, ftk = log(Wl1k«22k/ni2k«21k ).

Usually, such as in meta analyses, one would want to extend such a model to determine explanations for the variability in associations among the strata. When the strata have a natural ordering with scores {zk}, an unsaturated model (d.f. = K — 2) results from assuming a linear trend in the log-odds ratios; that is, by replacing ftk in model (5) by ft + zkX. Often other explanatory variables are available for modelling the odds ratio [18-20]. Then, one could construct a model of form ftk = zk [

describing the centre-specific log-odds ratios, where zk is a column vector of explanatory variables and [ is a column vector of parameters. A related model adds a random effect term for each centre to reflect unexplained variability [21].

For the random effects approach without other explanatory variables, an additional parameter can represent variability in the true effects. The logit-normal model is logit(^1k ) = ak + bk/2, logit(^2k ) = ak — bk/2 (6)

where {ak} are independent from N(a, aa), {bk} are independent from N(ft,ab), and {ak} are independent of {bk}. Here, ft is the expected value of centre-specific log-odds ratios, and ab describes their variability. An equivalent model form is logit(^ik ) = ak + ftxi + bik, where xi is a treatment dummy variable (x1 = 1,x2 = 0) and b1k and b2k are independent N(0, a), where a2 corresponds to a^/2 in parameterization (6). Note that one should not formulate the model as logit(ftik ) = ak + bkxi, since the model then imposes greater variability on the logit for the first treatment unless one permits (ak,bk) to be correlated.

Analogous random effects models apply with alternative link functions. Again, the models with identity or log link are structurally improper when either variance component is positive. This suggests a caution, as results reported for software using a particular estimation method may depend on whether the parameter constraints are recognized. In our experience, the identity link often has convergence problems. Good initial estimates of (a, ft, aa,ab) can be helpful, such as using values suggested by fixed effects modelling. In some applications it is also sensible to let (ak,bk) be correlated, by treating it as a bivariate normal random effect [22]. With the identity link, for instance, centres with ak close to 0 may tend to have values of bk relatively close to

0. We do not discuss such models here, as such modelling is better supported with moderate to large K, and our examples have relatively small K with sparse data. With such examples, some will think it bold or foolhardy of us to use even relatively simple random effects models!

We now discuss model fitting and parameter estimation. Unless stated otherwise, the discussion refers to the logit models.

3.1. Model fitting

It is straightforward to fit the fixed effects models with standard software. Possibilities include software for binary responses such as PROC LOGISTIC in SAS, or software for generalized linear models such as PROC GENMOD in SAS and the glm function in S-plus.

Random effects models for binary data are more difficult to fit. One must integrate the joint mass function of the responses with respect to the random effects distributions to obtain the likelihood function [23], which is a function of ft and the other parameters of those distributions. With the logit interaction model (6), for instance, the likelihood function equals where F is a N(a,ca) CDF, G is a N(ft,cj) CDF, n1k = exp(ak + bk/2)/[1 + exp(ak + bk/2)], and %2k = exp(ak — bk/2)/[1 + exp(ak — bk/2)]. One can approximate the likelihood function using numerical integration methods, such as Gauss-Hermite quadrature. The approximation improves as the number of quadrature points q increases, more points being needed as the variance components increase in size. Performance is enhanced by an adaptive version of quadrature that transforms the variable of integration so that the integrand is sampled in an appropriate region [24,25]. Having approximated the likelihood, one can use standard maximization methods such as Newton-Raphson to obtain the estimates. As a by-product, the observed information matrix, based on the curvature (second derivatives) of the log-likelihood at the ML estimates, is inverted to provide an estimated asymptotic covariance matrix.

Other approximations for integrating out the random effects lead to related approximations of the likelihood function and the ML estimates. Most of these utilize linearizations of the model. A Laplace approximation yields penalized quasi-likelihood (PQL) estimates [26], and a related generalization includes an extra scale parameter [27]. These approximations can behave poorly when variance components are large or when distributions are far from normal, such as Bernoulli or binomial with small indices at each setting of predictors [25,26,28]. When feasible, it is better to use adaptive Gauss-Hermite quadrature with sufficiently large q, the determination of 'sufficiently large' being based on monitoring the convergence of estimates and standard errors as q increases. Other promising ML approximations use Monte Carlo approximation methods [28,29], for which the approximation error is estimable and decreases as the number of simulations increases. One can also use Markov chain Monte Carlo methods with an approximating Bayes model that uses flat prior distributions for the other parameters [30], although the danger exists of improper posterior distributions [31-33].

0 0