Incidence Analysis

effects) models,32'33 but only recently has substantial work been done on applying these sorts of models to binary outcomes.34,35 In this section we describe a local or 'transitional' model, which aims to model the incidence of events that represent a change in the individual's status on an outcome measure between one occasion and a subsequent one.

Our analysis is limited to outcome events that only occur once for each subject: truly incident events in the usual epidemiological definition. In considering the incidence of regular smoking we therefore include only those subjects who had no history of regular smoking at the inception of the cohort. We may then consider our data in the form of an inception time, tf, at which the subject was not smoking and a time of last review, if at which the subject may or may not have been smoking. The time scale used is the subject's age, the only scale that has a common meaning across all subjects, but of course each subject's status is only recorded at times that correspond to waves of the survey. A simple empirical analysis may be performed by calculating incidence rates from these data within subgroups defined by sex and other covariates, by the standard epidemiological method of dividing total number of events by total person-years of follow-up accumulated over all subjects in the subgroup. In this calculation, where an individual takes up smoking during their interval of observation, we assume that the person-years of follow-up is if — tf minus half the length of the last between-wave interval, since we do not have precise information on the time of incidence.

In order to examine more systematically the effect of risk factors and covariates on incidence, we seek an appropriate model to describe the rate of occurrence of the uptake of smoking between tf and tf, allowing for differences in this time interval between subjects (most obviously, the interval will tend to be shorter for those who do in fact commence smoking). The particular model we propose is a discrete version of the proportional hazards regression model that is commonly used in survival analysis,36,37 where the outcome measure is the time to a particular event. The exact time of occurrence of events such as 'started smoking' is not recorded in studies such as ours since we only know the time interval during which the event occurred or failed to occur; such data are termed 'interval censored'.37 Define as the probability that subject i becomes a daily smoker in their interval of observation. Expressed in terms of the underlying 'survival' variable 7] (the time to 'failure' for individual i), we may write pt = Pr {T; e (tf, tf)| 7} ^ if}.

A useful representation of the probability distribution of a survival variable is the instantaneous rate of failure or hazard rate (probability of failing at time t given survival until that time). If we assume that the underlying incidence process fits a proportional hazards model then the hazard rate for subject i, which we denote ¿¡(t), depends in a log-linear fashion on subject factors Xh independently of time t:

where A0(t) is a baseline hazard rate (applying to those individuals for whom = 0). It can then be shown straightforwardly36 that a particular transformation of ph the complementary log-log, also follows a linear model in Xh Specifically

The integral in this expression reflects the dependence of the baseline risk on the time interval (tf, tf) and, as long as A0(i) does not vary greatly over the time span of interest, we may use the approximation {¡U0(w)dw « (tf - tf)J0 where J0 is the mean baseline hazard. This suggests that log Ut) = logAo(£) + Xjp

where fl0 = log (10)- Our analysis of incidence proceeds by fitting the generalized linear model described by (15), using the method of maximum likelihood for binary outcomes,10 which can be readily accomplished in a number of software packages such as SAS, GLIM and Stata.

The analysis just described assumes that the covariates represented by the vector Xt have a constant value over time for each subject, but a simple generalization allows for time-dependent covariates. We expand the data set to allow a separate record for each wave of data in each subject. Following the same logic as above, for every interval j (before the uptake of daily smoking), the corresponding probability pu can now be modelled as where tfu — tfj is the length of the ;th inter-wave interval for subject i and Xu is the corresponding wave-specific covariate vector. The [ioj terms allow for possible variation in the baseline hazard across waves. Ignoring potential dependency within subjects a 'pseudo-likelihood' can be defined from the first-order specification (16) by the Bernoulli assumption that failure occurs with probability pij. As in the discussion of Section 3, solving the corresponding estimating equations will provide asymptotically unbiased estimates of the /? coefficients, but the standard errors will be incorrect. Again, however, we can obtain asymptotically robust standard errors by using the

We initially calculated the observed incidence rates of regular smoking by dividing total number of events (a) by total person-years of follow-up (T) accumulated over all subjects in the subgroup.

* Set the survival time (st) details for subsequent st commands and are presented per 1000 person years in Table III. The 95 per cent confidence intervals for these rates were calculated by the strate command38 using standard approximations,39 as (a/T)e±1-96^1'a. The covariate values were those reported by the respondent at the beginning of

Table III. Estimates* (with 95 per cent confidence intervals) of: 1, crude incidence rates per 1000 person years; 2, crude rate ratios (RR); 3, unadjusted RR from the complementary log-log survival model (16); 4, adjusted RR from model (18); 5, adjusted RR from model (16) with robust SEs

RR from comp. log-log survival model

Incidence rate RR (crude) Unadjusted

(model SE)


(robust SE)

Previous smoking status Non-smoker


None in last week

1-4 days in last week

Total CIS score 0-5

0 0

Post a comment