## Info

Although the p-value remains significant regardless of the correction used, it is interesting to note that for this example, pms is more conservative than even the standard Bonferroni correction. This is expected because of the small number of candidate cutpoints.3,11

It is common practice to categorize a continuous prognostic variable for clinical use. This is done in order to set up practical eligibility criteria, stratification variables for clinical trials, or to guide clinicians and patients in their choice of therapy. This paper provided guidelines for applying a progression of exploratory categorization methods, together with the adjusted minimum p-value approach, in order to find the best cutpoint for the given data. Several p-value adjustment methods that account for the problem of multiple correlated testing, and re-assess the prognostic significance of the newly dichotomized variable, were discussed. The relative ease of applying these methods was demonstrated in two comprehensive case studies involving binary and censored outcomes, and programs for implementation were provided.

A thorough exploration of the data using percentile-grouped, smoothed, and predicted failure time plots helped to confirm the appropriateness of proceeding with the cutpoint analysis in case study 1, and suggested an interval on which to perform the search in case study 2. In case study 1, graphical exploration of the patterns in the chi-squared values over the entire range of cutpoints considered provided further insight into the possibility of a two-cutpoint model. In case study 2, explorations of the patterns in both the chi-squared values and relative risks over the entire range of candidate cutpoints ensured that a clinically relevant cutpoint was selected. Both of these observations would have been missed in a typical analysis focusing only on the minimum

Usually a prognostic variable is dichotomized in a univariable setting and then included in a multivariable model. While a univariable exploration of the most appropriate way for categorization is a necessary first step, it must be recognized that the cutpoint may depend on the levels of other independent prognostic variables. In reality, breast cancer treatment decisions are based on the 'TNM' (tumour size, nodal involvement, degree of metastasis) classification system, of which tumour size is only one factor.1 For example, patients are classified in a particular risk category when tumour size is < 2 cm only if cancer is found in the lymph nodes; they are classified in that same risk category when tumour size is 2-5 cm if there is no lymph node involvement. Efforts have begun to incorporate cutpoint selection and p-value adjustments in a multivariable setting using classification and regression trees, but further evaluation of these

In summary, the following steps are recommended for performing a comprehensive cutpoint 1. Identify continuous prognostic variable in appropriate regression setting.

(c) plot of lowess smoothed y curve over range of observed x's;

(b) dichotomize X at each cutpoint and compute association with 7, using:

(c) select cutpoint associated with min p-value, max chi-squared or relative risk;

(d) re-assess prognostic significance of dichotomized X by adjusting p-value using:

Below is the code for the functions used to perform the cutpoint analyses presented in Sections 3 and 4. It is written in the S-plus programming language (version 3.3). A line begining with '#' indicates a comment. The code for each function is preceded by a list describing what it does, the input it requires, and the output it produces, to facilitate translation into other languages.

Evaluates potential cutpoints using chi-squared, p-value and relative risk criteria for data with binary outcomes (see 'minimum p-value approach', Section 2.2).

pvalue = vector of p-values associated with above chi-squared values; Relrisk = vector of relative risks for above table.

function (x, ybin, xcutint)

tmpl <-sapply (sort(unique(xcutint)), function(xO, x, ybin)

# sapply is a looping function that applies the function given as its

# 2nd argument repeatedly to each, element in the list given as its 1 st argument

tmp <- chisq.test (l*(x < = xO), ybin) tabl <- table (l*(x > xO), ybin)

rr<-((tabl[l, 1] + 0-5)/((tabl[l, 1] + 0-3) + (tabl[2, 1] + 0-5)))/ (Ctablfl, 2] + 0-5)/((tabl[l, 2] + 0-5) + (tabl[2, 2] + 0-5)))

# xO is the cutpoint being tested; its value is read from the list

# given as the 1st argument to sapply c(xO, tmpSstatistic, tmp$p.value, rr)

# c = collect these items into a row matrix

# transpose to get a column instead of a row matrix names (tmpl) <-c("Cutpoint", "Chisquare", "pvalue", "Relrisk") tmpl

Function Name MINPCENS.

### Function Performed

Evaluates potential cutpoints using chi-squared, p-value and relative risk criteria for data with censored outcomes (see 'minimum p-value approach', Section 2.2).

Input x = vector of observed values of continuous prognostic factor;

time = vector of observed values of outcome time to event;

status = vector of observed values of censoring indicator (0 means censored);

xcutint = vector of potential cutpoints.

Output

An object (matrix) containing the following variables: Cutpoint = sorted version of xcutint;

Chisquare = vector of chi-squared values from log-rank tests; see Section 2.2; pvalue = vector of p-values associated with above chi-squared values; Relrisk = vector of relative risks based on univariable Cox regressions.

Computes the adjusted minimum p-value formulae derived by Miller and Siegmund, Altman, and epsi.high = proportion of observed values of factor x that are at or below the highest cutpoint epsi.low = proportion of observed values of factor x that are below the lowest cutpoint value

Cut.point = (scalar) the Cutpoint associated with the minimum pvalue;

pval«-c(Cut.point, round(pmin, 6), epsi.high, epsi.low, round (pacor, 6))

names(pval)«- c("Cut.point", "p-min", "epsi.high", "epsi.low",

pval

# PALT51 OCCutpoint, pvalue)

function(Cutpoint, pvalue)

pmin<- min(pvalue)

Cut. point <- Outpoint [pvalue = = min(pvalue)] poor 10 <— l'63*pmin*(l + 2-35* log(pmin)) pcor5 <- - 3-13*pmin*(l + l-65*log(pmin)) pval «- c(Cut.point, round(pmin, 6), round(poor5, 6), round( poor 10, 6))

names(pval) <- c("Gut.point", "p-min", "palt5", "paltlO") pval

#PMODBONF(x, Outpoint, pvalue)

function(x, Outpoint, pvalue)

Cut.point«- CutpointQpvalue = = min(pvalue)] z <- qnorm(l — pmin/2) f.z. <- dnorm(z) n <- length(x) dsum <- 0

1«- length(x[x <= Outpoint [i]]) 11 <-length(x[x<= Outpoint [i + 1]]) t«- sqrt(l -(1 * (n — 1 l))/(n — 1) * 11)) d <- sqrt(2/3-14)*f.z * (t — (((£2)/4 - l)*t"3)/6 )

pmodbonf <- pmin + dsum pval <-c(Cut. point, round(pmin, 6), round(phonf, 6)) name(pval) <-c("Cut.point", "p-min", "pmodbonf") pval

## Post a comment