## Info

Predicted Prob{y=l}

Figure 9. Bootstrap calibration curve for the full penalized extended CR model. 150 bootstrap repetitions were used in conjunction with the lowess smoother.49 Also shown is a 'rug plot' to demonstrate how effetive this model is in discriminating patients into low and high risk groups for Pr(V = 0) (which corresponds with the derived variable value y = 1 when cohort = 'all')

Predicted Prob{y=l}

Figure 9. Bootstrap calibration curve for the full penalized extended CR model. 150 bootstrap repetitions were used in conjunction with the lowess smoother.49 Also shown is a 'rug plot' to demonstrate how effetive this model is in discriminating patients into low and high risk groups for Pr(V = 0) (which corresponds with the derived variable value y = 1 when cohort = 'all')

probabilities. The actual occurrences of binary responses are smoothed using lowess (with the 'no iteration' option) to estimate probabilities. Then choose a grid of predicted values, for example, 001, 003, 005, ... ,0-99. Fit lowess to the predicted probabilities derived from the final model and the actual binary outcomes from the final model and the actual binary outcomes from the original sample. Then evaluate the smoothed estimates at the grid. Differences between the lowess estimates and the 45° line are the estimates of apparent calibration accuracy. Then for each bootstrap resample, the ordinal model is fitted using PMLE from a sample with replacement from the patients, and the coefficients from this model are used to predict probabilities for the original sample. The discrepancies from the 45° line are compared with the discrepancies present when the bootstrap model was evaluated on the bootstrap sample. The difference in the discrepancies is the estimate of optimism. After averaging over 150 replications, separately for each probability level in the uniform grid, the estimates of optimism in the original, apparent, calibration errors are added to those errors. Then the bootstrap-corrected calibration curve is plotted.

All these steps are done using the following Design functions:

cal <- calibrate (full, pen, B=150, cluster=u$subs, subset=cohort=='all')

plot (cal)

The results are shown in Figure 9. One can see a slightly non-linear calibration function estimate, but the overfitting-corrected calibration is excellent everywhere, being only slightly worse than the apparent calibration. The estimated maximum calibration error is 0 043. The excellent validation for both predictive discrimination and calibration are a result of the large sample size, frequency distribution of Y, initial data reduction and PMLE.

Clinically-guided variable clustering and item weighting, done with very limited use of the outcome variable, resulted in a great reduction in the number of candidate predictor degrees of freedom and hence increased the true predictive accuracy of the model. Sources summarizing clusters of clinical signs, along with the temperature, respiratory rate, and weight-for-age after suitable non-linear transformation and allowance for interactions with age, are powerful predictors of the ordinal response. Graphical methods are effective for detecting lack of fit in the PO and CR models and for diagramming the final model. Model approximation is a better approach than stepwise methods (that use 7) to develop parsimonious clinical prediction tools. Approximate models inherit the shrinkage from the full model. For the ordinal model developed here,

The bootstrap, as in a wide variety of other situations, is an effective tool for validating an ordinal logistic model with respect to discrimination and calibration without having the need to hold back data during model development. The final CR ordinal logistic model accurately predicted severity of diagnosis/outcome (as summarized by several disparate outcome variables) in infants screened for pneumonia, sepsis, and meningitis in developing countries. There was nothihg about the continuation ratio model that made it fit the data set better than other ordinal models (which we have found to be the case in one other large data set), and in fact there is some evidence that the equal-slopes CR model fits the data more poorly than the equal-slopes PO model. The real benefit of the CR model is that using standard binary logistic model software one can flexibly specify how the equal-slopes assumption can be relaxed.

Faraway64 has demonstrated how all data-driven steps of the modelling process increase the real variance in 'final' parameter estimates, when one estimates variances without assuming that the final model was prespecified. For ordinal regression modelling, the most important modelling steps are (i) choice of predictor variables; (ii) selecting or modelling predictor transformations; and (iii) allowance for unequal slopes across y-cut-offs (that is, non-PO or non-CR). Regarding steps (ii) and (iii) one is tempted to rely on graphical methods such as residual plots to make detours in the strategy, but it is very difficult to estimate variances or to properly penalize assessments of predictive accuracy for subjective modelling decisions. Regarding (i), shrinkage has been proven to work better than stepwise variable selection when one is attempting to build a main-effects model.56 Choosing a shrinkage factor is a well-defined, smooth, and often a unique process as opposed to binary decisions on whether variables are 'in' or 'out' of the model. Likewise, instead of using arbitrary subjective (residual plots) or objective (y2 due to cohort x covariable interactions, that is, non-constant covariable effects) assessments, shrinkage can systematically allow model enhancements in so far as the information content in the data will support, through the use of differential penalization. Shrinkage is a solution to the dilemma faced when the analyst attempts to choose between a parsimonious model and a more complex one that fits the data. Penalization does not require the analyst to make a binary decision, and it is a process that can be

## Post a comment