# The Nature Of Multilevel Models

Traditional statistical models were developed making certain assumptions about the nature of the dependency structure among the observed responses. Thus, in the simple regression model y = A + Ax + e the standard assumption is that the y given xt are independently identically distributed (i.i.d.), and the same assumption holds also for generalized linear models. In many real life situations, however, we have data structures, whether observed or by design, for which this assumption does not hold.

Suppose, for example, that the response variable is the birthweight of a baby and the predictor is, say, maternal age, and data are collected from a large number of maternity units located in different physical and social environments. We would expect that the maternity units would have different mean birthweights, so that knowledge of the maternity unit already conveys some information about the baby. A more suitable model for these data is now where we have added another subscript to identify the maternity unit and included a unit-specific effect Uj to account for mean differences amongst units. If we assume that the maternity units are randomly sampled from a population of units, then the unit specific effect is a random variable and (1) becomes a simple example of a two-level model. Its complete specification, assuming Normality, can be written as follows:

where i1, i2 are two births in the same unit j with, in general, a positive covariance between the responses. This lack of independence, arising from two sources of variation at different levels of the data hierarchy (births and maternity units) contradicts the traditional linear model assumption and leads us to consider a new class of models. Model (2) can be elaborated in a number of directions, including the addition of further covariates or levels of nesting. An important direction is where the coefficient (and any further coefficients) is allowed to have a random distribution. Thus, for example the age relationship may vary across clinics and, with a slight generalization of notation, we may now write (2) as yij = A0 + Aix,j + Uj + eij

yij=A ijx0ij + AjXj

X0ij =1

var(u0j )=ou20, var(uij ) = ou2i cov(u0jui j )=ouqi ; var(e0j ) = O0

and in later sections we shall introduce further elaborations. The regression coefficients p1 are usually referred to as 'fixed parameters' of the model and the set of variances and covariances as the random parameters. Model (3) is often referred to as a 'random coefficient' or 'mixed' model.

At this point we note that we can introduce prior distributions for the parameters of (3), so allowing Bayesian models. We leave this topic, however, for a later section where we discuss MCMC estimation.

Another, instructive, example of a two-level data structure for which a multilevel model provides a powerful tool, is that of repeated measures data. If we measure the weight of a sample of babies after birth at successive times then the repeated occasion of measurement becomes the lowest level unit of a two-level hierarchy where the individual baby is the level-2 unit. In this case model (3) would provide a simple description with xiij being time or age. In practice linear growth will be an inadequate description and we would wish to fit at least a (spline) polynomial function, or perhaps a non-linear function where several coefficients varied randomly across individual babies, that is each baby has its own growth pattern. We shall return to this example in more detail later, but for now note that an important feature of such a characterization is that it makes no particular requirements for every baby to be measured at the same time points or for the time points to be equally spaced.

The development of techniques for specifying and fitting multilevel models since the mid-1980s has produced a very large class of useful models. These include models with discrete responses, multivariate models, survival models, time series models etc. In this tutorial we cannot cover the full range but will give references to existing and ongoing work that readers may find helpful. In addition the introductory book by Snijders and Bosker  and the edited collection of health applications by Leyland and Goldstein  may be found useful by readers.

A detailed introduction to the two-level model with worked examples and discussion of hypothesis tests and basic estimation techniques is given in an earlier tutorial  that also gives details of two computer packages, HLM and SAS, that can perform some of the analyses we describe in the present tutorial. The MLwiN software has been specifically developed for fitting very large and complex models, using both frequentist and Bayesian estimation and it is this particular set of features that we shall concentrate on.