5 Event-history Analysis

Abstract

TODO (150-200 WORDS)

Major changes expected!

This page is a work in progress and major changes will be made over time.

In this section we will go beyond single-event data or at least view time-to-event data more generally as data in which one could observe multiple, potentially mutually exclusive events. From this more general point of view such data is sometimes refered to event-history data and its analysis as event-history analysis (EHA). The table below gives one way to classify the different types of data that could occur in EHA, where we differentiate between the number of occurences of the same event type (rows) and whether we consider the occurence of the same event or different events (columns).

	single outcome	many outcomes
one/first event	single event	competing risks
multiple events	recurrent events	multi state

Another way to think about event history is in terms of transitions between different states, as illustrated in Figure 5.1. Usually, a subject starts out in state 0 (for example, healthy) and transitions to different states from there. States from which further transitions are possible are called transient, otherwise a state is called terminal. If no transition happens within the follow-up time, the subject remains in the initial state, in event history terms the observation is censored for this transition.

In the single-event setting (Figure 5.1, left panel), a subject can only transition to one state (the event of interest). In the competing risks setting (Figure 5.1, middle panel, see Section 5.1), a subject could transition to any of the $K$ mutually exclusive states, thus the subject is at risk for a transition to multiple states. However, once one of them occurs, it is impossible to transition to another (for example, death from different causes). In both cases, the event needs not be terminal, we just end the observation after the first occurence of an event.

In the most general case, the multi-state setting (Figure 5.1 right panel, see Section 5.2), there are multiple transient and terminal states with potential back transitions (for example, moving between different stages of an illness with the possibility of (partial) recovery and death as terminal event). The recurrent events setting is as a special case of the multi-state setting but could also be treated as a more general case of the single event setting with correlated observations (Section 5.3 and Figure 5.2) .

Schematic illustration of event history analysis. — Figure 5.1: Illustration of different types of time-to-event outcomes. Left: Standard single-event setting with transition from initial state (0) to state (1); Middle: Competing risks setting with $k$ competing events. Once one of the $\{1,\ldots, k\}$ events is observed, the others cannot be observed; Right: Multi-state setting where subjects can transition between multiple transient and terminal states with possible back-transitions.

5.1 Competing Risks

In constrast to single-event survival analysis, competing risks are concerned with the time to the first of many, mutually exclusive events. Mutually exclusive here means that once one of the events occurs, we can not observe the others. As in the single event case, the events may not necessarily be terminal (such as death), instead they just signify the end of the period of interest for a given observation.

Formally, let $Y$ be the time-to-event as before. In the competing risks setting, there is now an additional random variable $E_Y\in \{1,\ldots,k\}$, which denotes one of $k$ competing events that can occur at event time $Y$. The survival outcome notation, $(T, \Delta)$ has to be extended to accomodate for the possible competing risks. Previously, $\Delta = 1$ if right-censoring is observed, this can be modified by multiplying $\Delta$ by the index of the event that occurs. Now, $\Delta \in \{0, 1, \ldots, k\}$ where $\Delta = 0$ still indicates censoring and $\Delta = j$ means the $j$th event occurred.

The predominate approach to competing-risks analysis is known as cause-specific hazards, where $h_{0e}$ is defined as the hazard for the transition from initial state $0$ to state $e =1,\ldots,k$:

\[ h_{0e}(\tau) = \lim_{\Delta \tau \to 0} \frac{P(\tau \leq Y \leq \tau + \Delta \tau, E = e\ |\ Y \geq \tau)}{\Delta \tau}, \; e = 1, \dots, k. \]

As competing events are mutually exclusive at any time $\tau$, it is useful to further define the:

all-cause hazard, $h(\tau) = \sum_{e = 1}^{k}h_{0e}(\tau)$: which is the…
all-cause cumulative hazard: $H(\tau) = \int_{0}^\tau h(u)\ du = \sum_{e=1}^{k}\int_{0}^\tau h_{0e}(u)\ du = \sum_{e=1}^k H_{0e}(\tau)$, and
all-cause survival probability: $S(\tau) = P(Y \geq \tau) = \exp(-H(\tau))$

Note that whilst the all-cause survival probability is defined in the same way as the standard survival probability, the interpretation is that none of the events occurred before $\tau$. The all-cause survival probability is also not directly estimated but instead calculated from the cause specific hazards.

Another important distribution function in competing risks analysis is the Cumulative Incidence Function (CIF), which denotes the probability of experiencing an event $E$ before time $\tau$

\[ F_{e}(\tau) = P(Y \leq \tau, E = e) = \int_{0}^\tau h_{0e}(u)S(u)\ du. \] Intuitively the CIF is the probability of not experiencing any event, $S(u)$, multiplied with the hazard of experiencing an event of type $e$, $h_{oe}(u)$.

Calculating the CIF (in addition to the cause-specific hazards) is particularly important when making statements about the absolute risk (probability) for an event of type $e$, as it depends on all cause-specific hazards via $S(\tau)$.

This will be particularly relevant when assesing how features affect the probability for observing a specific event. While some features might strongly affect some of the cause-specific hazards, if an event has overall low prevalence, then the absolute probability of observing the event will still be low. Hence the CIF is useful for incorporating this information into an interpretable quantity.

Following Beyersmann, Allignol, and Schumacher (2012), this book assumes that the cause-specific hazards model fully describes the data generating processes and doesn’t require an independence assumption for the competing events.

5.2 Multi-state Models

Major changes expected!

This page is a work in progress and major changes will be made over time.

5.3 Recurrent Events

Major changes expected!

This page is a work in progress and major changes will be made over time.

Schematic illustration of recurrent events transitions. — Figure 5.2: Illustration of transitions in the recurrent events setting. Left: “Clock reset” representation of recurrent events. Each time an event occurs, the clock is reset to 0 and we once again wait for occurence of an event (of the same type); Right: “Clock forward” representation. A subject transitions from 1-st, to 2-nd, etc. to K-th event occurence

--- abstract: TODO (150-200 WORDS) --- {{< include _setup.qmd >}} # Event-history Analysis {#sec-eha} {{< include _wip.qmd >}} ___ In this section we will go beyond single-event data or at least view time-to-event data more generally as data in which one could observe multiple, potentially mutually exclusive events. From this more general point of view such data is sometimes refered to *event-history data* and its analysis as *event-history analysis* (EHA). The table below gives one way to classify the different types of data that could occur in EHA, where we differentiate between the number of occurences of the same event type (rows) and whether we consider the occurence of the same event or different events (columns). | | single outcome | many outcomes | |-|-|-| | one/first event | single event | competing risks | | multiple events | recurrent events | multi state | Another way to think about event history is in terms of transitions between different states, as illustrated in @fig-eha. Usually, a subject starts out in state 0 (for example, healthy) and transitions to different states from there. States from which further transitions are possible are called *transient*, otherwise a state is called *terminal*. If no transition happens within the follow-up time, the subject remains in the initial state, in event history terms the observation is censored for this transition. In the *single-event setting* (@fig-eha, left panel), a subject can only transition to one state (the event of interest). In the *competing risks setting* (@fig-eha, middle panel, see @sec-competing-risks), a subject could transition to any of the $K$ mutually exclusive states, thus the subject is *at risk* for a transition to multiple states. However, once one of them occurs, it is impossible to transition to another (for example, death from different causes). In both cases, the event needs not be terminal, we just end the observation after the first occurence of an event. In the most general case, the *multi-state setting* (@fig-eha right panel, see @sec-multi-state), there are multiple transient and terminal states with potential back transitions (for example, moving between different stages of an illness with the possibility of (partial) recovery and death as terminal event). The *recurrent events setting* is as a special case of the multi-state setting but could also be treated as a more general case of the single event setting with correlated observations (@sec-recurrent-events and @fig-recurrent-events) . ![Illustration of different types of time-to-event outcomes. Left: Standard single-event setting with transition from initial state (0) to state (1); Middle: Competing risks setting with $k$ competing events. Once one of the $\{1,\ldots, k\}$ events is observed, the others cannot be observed; Right: Multi-state setting where subjects can transition between multiple transient and terminal states with possible back-transitions.](Figures/survival/event-history-analysis.png){#fig-eha fig-alt="Schematic illustration of event history analysis."} ## Competing Risks {#sec-competing-risks} In constrast to single-event survival analysis, competing risks are concerned with the time to the first of many, mutually exclusive events. Mutually exclusive here means that once one of the events occurs, we can not observe the others. As in the single event case, the events may not necessarily be terminal (such as death), instead they just signify the end of the period of interest for a given observation. Formally, let $Y$ be the time-to-event as before. In the competing risks setting, there is now an additional random variable $E_Y\in \{1,\ldots,k\}$, which denotes one of $k$ competing events that can occur at event time $Y$. The survival outcome notation, $(T, \Delta)$ has to be extended to accomodate for the possible competing risks. Previously, $\Delta = 1$ if right-censoring is observed, this can be modified by multiplying $\Delta$ by the index of the event that occurs. Now, $\Delta \in \{0, 1, \ldots, k\}$ where $\Delta = 0$ still indicates censoring and $\Delta = j$ means the $j$th event occurred. The predominate approach to competing-risks analysis is known as *cause-specific hazards*, where $h_{0e}$ is defined as the hazard for the transition from initial state $0$ to state $e =1,\ldots,k$: $$ h_{0e}(\tau) = \lim_{\Delta \tau \to 0} \frac{P(\tau \leq Y \leq \tau + \Delta \tau, E = e\ |\ Y \geq \tau)}{\Delta \tau}, \; e = 1, \dots, k. $$ As competing events are mutually exclusive at any time $\tau$, it is useful to further define the: - *all-cause hazard*, $h(\tau) = \sum_{e = 1}^{k}h_{0e}(\tau)$: which is the... - *all-cause cumulative hazard*: $H(\tau) = \int_{0}^\tau h(u)\ du = \sum_{e=1}^{k}\int_{0}^\tau h_{0e}(u)\ du = \sum_{e=1}^k H_{0e}(\tau)$, and - *all-cause survival probability*: $S(\tau) = P(Y \geq \tau) = \exp(-H(\tau))$ Note that whilst the all-cause survival probability is defined in the same way as the standard survival probability, the interpretation is that *none* of the events occurred before $\tau$. The all-cause survival probability is also not directly estimated but instead calculated from the cause specific hazards. Another important distribution function in competing risks analysis is the *Cumulative Incidence Function* (CIF), which denotes the probability of experiencing an event $E$ before time $\tau$ $$ F_{e}(\tau) = P(Y \leq \tau, E = e) = \int_{0}^\tau h_{0e}(u)S(u)\ du. $$ Intuitively the CIF is the probability of not experiencing any event, $S(u)$, multiplied with the hazard of experiencing an event of type $e$, $h_{oe}(u)$. Calculating the CIF (in addition to the cause-specific hazards) is particularly important when making statements about the absolute risk (probability) for an event of type $e$, as it depends on all cause-specific hazards via $S(\tau)$. This will be particularly relevant when assesing how features affect the probability for observing a specific event. While some features might strongly affect some of the cause-specific hazards, if an event has overall low prevalence, then the absolute probability of observing the event will still be low. Hence the CIF is useful for incorporating this information into an interpretable quantity.  Following @beyersmann.competing.2012, this book assumes that the cause-specific hazards model fully describes the data generating processes and doesn't require an independence assumption for the competing events. ## Multi-state Models {#sec-multi-state} {{< include _wip.qmd >}} ## Recurrent Events {#sec-recurrent-events} {{< include _wip.qmd >}} ![Illustration of transitions in the recurrent events setting. Left: "Clock reset" representation of recurrent events. Each time an event occurs, the clock is reset to 0 and we once again wait for occurence of an event (of the same type); Right: "Clock forward" representation. A subject transitions from 1-st, to 2-nd, etc. to K-th event occurence](Figures/survival/recurrent-events.png){#fig-recurrent-events fig-alt="Schematic illustration of recurrent events transitions."}