6 Survival Task

Abstract

TODO (150-200 WORDS)

Major changes expected!

This page is a work in progress and major changes will be made over time.

TODO

6.1 Survival Prediction Problems

This section continues by defining the survival problem. Defining a single ‘survival prediction problem’ (or ‘task’) is important mathematically as conflating survival problems could lead to confused interpretation and evaluation of models. Let $(\mathbf{X},\mathbf{t},\boldsymbol{\delta})$ and $\mathcal{D}$ be as defined in Section 4.1. A general survival prediction problem is one in which:

a survival dataset, $\mathcal{D}$, is split (Section 3.1.1) for training, $\mathcal{D}_{train}$, and testing, $\mathcal{D}_{test}$;
a survival model is fit on $\mathcal{D}_{train}$; and
the model predicts some representation of the unknown true survival time, $Y$, given $\mathcal{D}_{test}$.

The process of fitting is model-dependent, and can range from simple maximum likelihood estimation of model coefficients, to complex algorithms. The model fitting process is discussed in more abstract detail in (Section 3.1) and then concrete algorithms are discussed in (?sec-review). The different survival problems are separated by prediction types or prediction problems, which can also be thought of as predictions of different representations of $Y$. Four prediction types are possible:

The relative risk of an individual for experiencing an event: A single continuous ranking.
The time until an event occurs: A single continuous value.
The prognostic index for a model: A single continuous value.
The survival distribution: A probability distribution.

The first three of these are referred to as deterministic problems as they predict a single value whereas the fourth is probabilistic and returns a full survival distribution. Definitions of these are expanded on below but first note that survival predictions differ from other fields in two respects:

The outcome of interest is $Y$, which is different to the outcome used for model training, $(T=\min(Y, C), \Delta=\mathbb{I}(Y\leq C))$. This differs from, say, regression in which the same object (a single continuous variable) is used for fitting and predicting.
With the exception of the time-to-event prediction, all other prediction types do not predict $\mathbb{E}(Y)$ but some other related quantity.

Survival prediction problems must be clearly separated as they are inherently incompatible. For example, it is not meaningful to compare a relative risk prediction from one observation to a survival distribution of another. Whilst these prediction types are separated above, they can be viewed as special cases of each other. Both (1.) and (2.) may be viewed as variants of (3.); and (1.), (2.), and (3.) can all be derived from (4.); this is elaborated on below and discussed fully in Chapter 21.

Relative Risk/Ranking

This is a common survival problem and is defined as predicting a continuous rank for an individual’s relative risk of experiencing the event. For example, given three subjects, $\{i,j,k\}$, a relative risk prediction may predict the risk of event as $\{0.1, 0.5, 10\}$ respectively. From these predictions, the following types of conclusions can be drawn:

Conclusions comparing subjects. For example, $i$ is at the least risk; the risk of $j$ is only slightly higher than that of $i$ but the risk of $k$ is considerably higher than $j$; the corresponding ranks for $i,j,k,$ are $1,2,3$;
Conclusions comparing risk groups. For example, thresholding the risks at $1.0$ means that $i$ and $j$ are in a low-risk group whilst $k$ is in a high-risk group.

So whilst many important conclusions can be drawn from these predictions, the values themselves have no meaning when not compared to other individuals. Interpretation of these rankings has historically been conflicting, with some using the interpretation “higher ranking implies higher risk” whereas others may assume “higher ranking implies lower risk”. The difference is often due to model types (for example, PH and AFT models have opposite interpretation, Chapter 13) but can also be due to software implementation differences. In this book, a higher ranking will always imply a higher risk of event (as in the example above).

Time to Event

Predicting a time to event is the problem of predicting the expectation $\hat{y}=\mathbb{E}(y|\mathbf{x})$. A time-to-event prediction is a special case of a ranking prediction as an individual with a longer survival time will have a lower overall risk: if $\hat{y}_i,\hat{y}_j$ and $\hat{r}_i,\hat{r}_j$ are survival time and ranking predictions for subjects $i$ and $j$ respectively, then $\hat{y}_i > \hat{y}_j \rightarrow \hat{r}_i < \hat{r}_j$.

For practical purposes, time-to-event would be the ideal prediction type as it is easy to interpret and communicate. However, this type of prediction is rare because only a subset of the domain of $Y$ is observed due to censoring. For non-parametric methods, the estimated cdf is improper and thus integration requires extrapolation of the cdf. For parametric models, the distribution of event times is fully specified once the paramers of the assumed distribution have been estimated, however, if the parameters were estimated based on only a small subset of the possible domain of $Y$, this essentially still constitutes extraploation and will in most cases yield implausible predictions. A popular alternative is therefore to estimate the restricted mean survival time (RMST) (Han and Jung 2022; Andersen, Hansen, and Klein 2004).

Prognostic Index

Given covariates, $\mathbf{X}\in \mathbb{R}^{n \times p}$, and a vector of model coefficients, $\boldsymbol{\beta}\in \mathbb{R}^p$, the linear predictor is defined by $\boldsymbol{\eta}:= \mathbf{X}^\intercal\boldsymbol{\beta}\in \mathbb{R}^n$. The prognostic index is a term often used to refer to some transformation, $\phi$, on the linear predictor, $\phi(\eta)$, (which could simply be the identity function $(f(x) = x)$). Assuming a predictive function (for survival time, risk, or distribution defining function) of the form $g(\varphi)\phi(\eta)$, for some function $g$ and variables $\varphi$ where $g(\varphi)$ is constant for all observations (for example Cox PH (Section 13.1.2)), then predictions of $\eta$ are a special case of predicting a relative risk, as predictions of $\phi(\eta)$ if $\phi$ is rank preserving – this assumptions is sometimes written as “if there is a one-to-one mapping between the prediction and the expected survival times”. A higher prognostic index may imply a higher or lower risk of event, dependent on the model structure. As stated above, if the prognostic index is used for a rank prediction, then this book always assumes “higher value higher risk”.

Survival Distribution

Predicting a survival distribution refers specifically to predicting the distribution of an individual subject’s survival time, i.e., modelling the distribution of the event occurring over $\mathbb{R}_{\geq 0}$. Therefore, this is seen as the probabilistic analogue to the deterministic time-to-event prediction, these definitions are motivated by similar terminology in machine learning regression problems (Section 3.1).

Distributional prediction can in theory target any of the quantities introduced in Section 4.1.1 but predicting $S(t)$ and/or $h(t)$ is most common. Hazard based approaches are particularly relevant for non- and semi-parametric estimation of the distribution, where no assumptions are made about the underlying distribution of event times.

6.2 Survival Analysis Task

The survival prediction problems identified in Section 6.1 are now formalised as machine learning tasks.

Survival Task

Box 6.1 Let $(X,T,\Delta)$ be random variables t.v.i. $\mathcal{X}\times \mathcal{T}\times \{0,1\}$ where $\mathcal{X}\subseteq \mathbb{R}^p$ and $\mathcal{T}\subseteq \mathbb{R}_{\geq 0}$. Let $\mathcal{S}\subseteq \operatorname{Distr}(\mathcal{T})$ be a convex set of distributions on $\mathcal{T}$ and let $\mathcal{R}\subseteq \mathbb{R}$. Then,

The probabilistic survival task is the problem of predicting a conditional distribution over the positive Reals and is specified by $g: \mathcal{X}\rightarrow \mathcal{S}$.
The deterministic survival task is the problem of predicting a continuous value in the positive Reals and is specified by $g: \mathcal{X}\rightarrow \mathcal{T}$.
The survival ranking task is specified by predicting a continuous ranking in the Reals and is specified by $g: \mathcal{X}\rightarrow \mathcal{R}$.

Any other survival prediction type falls within one of the tasks in Box 6.1, for example predicting log-survival time is the deterministic task and predicting prognostic index or linear predictor is the ranking task. Removing the separation between the prognostic index and ranking prediction types is due to them both making predictions over $\mathbb{R}$ and hence their difference lies in interpretation only.

In this book, unless otherwise specified, the term survival task, will be used to refer to the probabilistic survival task. As a final note, these definitions are given in the most general case where the time variable is over $\mathbb{R}_{\geq 0}$. In practice, all models instead assume time is over $\mathbb{R}_{>0}$ and so a subject $i$ that experiences an outcome at $0$ is either deleted or their outcome time is set to $T_i = \epsilon$ for some very small $\epsilon \in \mathbb{R}_{>0}$.

--- abstract: TODO (150-200 WORDS) --- {{< include _setup.qmd >}} # Survival Task {#sec-survtsk} {{< include _wip.qmd >}} ___ TODO ## Survival Prediction Problems {#sec-surv-set-types} This section continues by defining the survival problem. Defining a single 'survival prediction problem' (or 'task') is important mathematically as conflating survival problems could lead to confused interpretation and evaluation of models. Let $(\XX,\tt,\dd)$ and $\calD$ be as defined in @sec-surv-set-math. A general survival prediction problem is one in which: * a survival dataset, $\calD$, is split (@sec-surv-setml-meth) for training, $\dtrain$, and testing, $\dtest$; * a survival model is fit on $\dtrain$; and * the model predicts some representation of the unknown true survival time, $Y$, given $\dtest$. The process of fitting is model-dependent, and can range from simple maximum likelihood estimation of model coefficients, to complex algorithms. The model fitting process is discussed in more abstract detail in (@sec-surv-setml) and then concrete algorithms are discussed in (@sec-review). The different survival problems are separated by *prediction types* or *prediction problems*, which can also be thought of as predictions of different representations of $Y$. Four prediction types are possible: 1. The *relative risk* of an individual for experiencing an event: A single continuous ranking. 2. The *time until an event* occurs: A single continuous value. 3. The *prognostic index* for a model: A single continuous value. 4. The *survival distribution*: A probability distribution. The first three of these are referred to as *deterministic* problems as they predict a single value whereas the fourth is *probabilistic* and returns a full survival distribution. Definitions of these are expanded on below but first note that survival predictions differ from other fields in two respects: * The outcome of interest is $Y$, which is different to the outcome used for model training, $(T=\min(Y, C), \Delta=\II(Y\leq C))$. This differs from, say, regression in which the same object (a single continuous variable) is used for fitting and predicting. * With the exception of the time-to-event prediction, all other prediction types do not predict $\EE(Y)$ but some other related quantity. Survival prediction problems must be clearly separated as they are inherently incompatible. For example, it is not meaningful to compare a relative risk prediction from one observation to a survival distribution of another. Whilst these prediction types are separated above, they can be viewed as special cases of each other. Both (1.) and (2.) may be viewed as variants of (3.); and (1.), (2.), and (3.) can all be derived from (4.); this is elaborated on below and discussed fully in @sec-car. #### Relative Risk/Ranking {.unnumbered .unlisted} This is a common survival problem and is defined as predicting a continuous rank for an individual's relative risk of experiencing the event. For example, given three subjects, $\{i,j,k\}$, a relative risk prediction may predict the risk of event as $\{0.1, 0.5, 10\}$ respectively. From these predictions, the following types of conclusions can be drawn: * Conclusions comparing subjects. For example, $i$ is at the least risk; the risk of $j$ is only slightly higher than that of $i$ but the risk of $k$ is considerably higher than $j$; the corresponding ranks for $i,j,k,$ are $1,2,3$; * Conclusions comparing risk groups. For example, thresholding the risks at $1.0$ means that $i$ and $j$ are in a low-risk group whilst $k$ is in a high-risk group. So whilst many important conclusions can be drawn from these predictions, the values themselves have no meaning when not compared to other individuals. Interpretation of these rankings has historically been conflicting, with some using the interpretation "higher ranking implies higher risk" whereas others may assume "higher ranking implies lower risk". The difference is often due to model types (for example, PH and AFT models have opposite interpretation, @sec-models-classical) but can also be due to software implementation differences. In this book, a higher ranking will always imply a higher risk of event (as in the example above). #### Time to Event {.unnumbered .unlisted} Predicting a time to event is the problem of predicting the expectation $\hat{y}=\EE(y|\xx)$. A time-to-event prediction is a special case of a ranking prediction as an individual with a longer survival time will have a lower overall risk: if $\hat{y}_i,\hat{y}_j$ and $\hat{r}_i,\hat{r}_j$ are survival time and ranking predictions for subjects $i$ and $j$ respectively, then $\hat{y}_i > \hat{y}_j \rightarrow \hat{r}_i < \hat{r}_j$. For practical purposes, time-to-event would be the ideal prediction type as it is easy to interpret and communicate. However, this type of prediction is rare because only a subset of the domain of $Y$ is observed due to censoring. For non-parametric methods, the estimated cdf is improper and thus integration requires extrapolation of the cdf. For parametric models, the distribution of event times is fully specified once the paramers of the assumed distribution have been estimated, however, if the parameters were estimated based on only a small subset of the possible domain of $Y$, this essentially still constitutes extraploation and will in most cases yield implausible predictions. A popular alternative is therefore to estimate the *restricted mean survival time* (RMST) [@han.restricted.2022; @andersen.regression.2004].  #### Prognostic Index {.unnumbered .unlisted} Given covariates, $\XX \in \Reals^{n \times p}$, and a vector of model coefficients, $\bbeta \in \Reals^p$, the linear predictor is defined by $\boldsymbol{\eta}:= \XX^\trans\bbeta \in \Reals^n$. The prognostic index is a term often used to refer to some transformation, $\phi$, on the linear predictor, $\phi(\eta)$, (which could simply be the identity function $(f(x) = x)$). Assuming a predictive function (for survival time, risk, or distribution defining function) of the form $g(\varphi)\phi(\eta)$, for some function $g$ and variables $\varphi$ where $g(\varphi)$ is constant for all observations (for example Cox PH (@sec-surv-models-crank)), then predictions of $\eta$ are a special case of predicting a relative risk, as predictions of $\phi(\eta)$ if $\phi$ is rank preserving -- this assumptions is sometimes written as "if there is a one-to-one mapping between the prediction and the expected survival times". A higher prognostic index may imply a higher or lower risk of event, dependent on the model structure. As stated above, if the prognostic index is used for a rank prediction, then this book always assumes "higher value higher risk".  #### Survival Distribution {.unnumbered .unlisted} Predicting a survival distribution refers specifically to predicting the distribution of an individual subject's survival time, i.e., modelling the distribution of the event occurring over $\NNReals$. Therefore, this is seen as the probabilistic analogue to the deterministic time-to-event prediction, these definitions are motivated by similar terminology in machine learning regression problems (@sec-surv-setml). Distributional prediction can in theory target any of the quantities introduced in @sec-distributions but predicting $S(t)$ and/or $h(t)$ is most common. Hazard based approaches are particularly relevant for non- and semi-parametric estimation of the distribution, where no assumptions are made about the underlying distribution of event times. ## Survival Analysis Task {#sec-surv-setmltask} The survival prediction problems identified in @sec-surv-set-types are now formalised as machine learning tasks. :::: {.callout-note icon=false} ## Survival Task ::: {#cnj-task-surv} Let $(X,T,\Delta)$ be random variables t.v.i. $\calX \times \calT \times \bset$ where $\calX \subseteq \Reals^p$ and $\calT \subseteq \NNReals$. Let $\calS \subseteq \distrT$ be a convex set of distributions on $\calT$ and let $\calR \subseteq \Reals$. Then, * The *probabilistic survival task* is the problem of predicting a conditional distribution over the positive Reals and is specified by $g: \calX \rightarrow \calS$. * The *deterministic survival task* is the problem of predicting a continuous value in the positive Reals and is specified by $g: \calX \rightarrow \calT$. * The *survival ranking task* is specified by predicting a continuous ranking in the Reals and is specified by $g: \calX \rightarrow \calR$. ::: :::: Any other survival prediction type falls within one of the tasks in @cnj-task-surv, for example predicting log-survival time is the deterministic task and predicting prognostic index or linear predictor is the ranking task. Removing the separation between the prognostic index and ranking prediction types is due to them both making predictions over $\Reals$ and hence their difference lies in interpretation only. In this book, unless otherwise specified, the term *survival task*, will be used to refer to the probabilistic survival task. As a final note, these definitions are given in the most general case where the time variable is over $\NNReals$. In practice, all models instead assume time is over $\PReals$ and so a subject $i$ that experiences an outcome at $0$ is either deleted or their outcome time is set to $T_i = \epsilon$ for some very small $\epsilon \in \PReals$.