## Load packages

## Why do we need `tmerge`

?

In survival analysis, we differentiate between **time-independent covariates** and **time-dependent covariates**. Time-independent covariates are constant over time, while time-dependent covariates can vary over time.

As an example, assume we are modeling time-to-death in years, with exposure to a chemical as our time-dependent covariate of interest. Assume that we can quantify the exposure as 0, 1, or 2. We will use this exposure and the sex of the subject as covariates in our model.

To represent time-dependent covariates, we need to have multiple rows for each subject, where each row represents a different value of the time dependent covariates.

Take for example a subject who started with `exposure = 1`

. Then at 4 years, their exposure status changed to `exposure = 0`

. Then at 7 years, their exposure status changed to `exposure = 2`

. Then the subject died at 10 years (i.e., `status = 1`

). We would need three rows to represent this subject, since there are 3 distinct time periods: `0-4`

, `4-7`

, and `7-10`

. The `survival`

package uses the names `tstart`

and `tstop`

to denote the beginning and end of each time period. So when structuring data from time-dependent variables, the rows for this subject would look like this:

id | tstart | tstop | exposure | status |
---|---|---|---|---|

1 | 0 | 4 | 1 | 0 |

1 | 4 | 7 | 0 | 0 |

1 | 7 | 10 | 2 | 1 |

## Using `tmerge`

### Creating example data

First, we need our data in two data frames, one for the time-independent covariates and one for the time-dependent covariates.

Here’s some example data for the time-independent covariates. We have 3 subjects, and each row contains their id, sex, survival time, and whether or not they experience the event of interest (in this case, death). We use `event = 1`

to indicate death, and `event = 0`

to indicate censoring.

id | sex | surv_time | event |
---|---|---|---|

1 | M | 5 | 1 |

2 | F | 10 | 1 |

3 | F | 15 | 0 |

And here’s some example time-dependent data. Each subject has a record for their `exposure`

status at `time = 0`

, and another record whenever their exposure status changes. For example, in the data below, subject 1 has

- exposure status 0 from time 0 to 2
- exposure status 1 from time 2 to 4
- exposure status 2 from time 4 onwards

id | time | exposure |
---|---|---|

1 | 0 | 0 |

1 | 2 | 1 |

1 | 4 | 2 |

2 | 0 | 0 |

2 | 7 | 1 |

3 | 0 | 0 |

We will use the `tmerge`

function to turn these data frames in a single data frame to use in a time-dependent survival analysis. The `tmerge`

function is used multiple times in the process of formatting data for time-dependent covariates.

First, we use `tmerge`

with the independent variables. Note that we call `tmerge`

with `df_time_ind`

as both the `data1`

and `data2`

argument. We must also specify the `id`

variable and the `event`

variable using the syntax `event(survival_time_variable, event_indicator_variable)`

. Using the name `event`

on the left of the expression is optional.

## Code

```
df_time_ind <-
tmerge(data1=df_time_ind,
data2=df_time_ind,
id=id,
event=event(surv_time, event))
```

Now the `df_time_ind`

data frame looks like this:

id | sex | surv_time | event | tstart | tstop |
---|---|---|---|---|---|

1 | M | 5 | 1 | 0 | 5 |

2 | F | 10 | 1 | 0 | 10 |

3 | F | 15 | 0 | 0 | 15 |

Notice that the `tstart`

, `tstart`

, and `event`

variables have been added.

Now to add the time-dependent variables, we call `tmerge`

again, now with `df_time_ind`

as the `data1`

argument and `df_time_dep`

as the `data2`

argument. To specify the time-dependent exposure variable, we use the `tdc`

function with the syntax `time_dependent_variable = tdc(time, time_dependent_variable)`

.

## Code

```
df_final <-
tmerge(data1=df_time_ind,
data2=df_time_dep,
id=id,
exposure=tdc(time, exposure))
```

Below we have our completed dataset with properly structured time-dependent variables.

id | sex | surv_time | event | tstart | tstop | exposure |
---|---|---|---|---|---|---|

1 | M | 5 | 0 | 0 | 2 | 0 |

1 | M | 5 | 0 | 2 | 4 | 1 |

1 | M | 5 | 1 | 4 | 5 | 2 |

2 | F | 10 | 0 | 0 | 7 | 0 |

2 | F | 10 | 1 | 7 | 10 | 1 |

3 | F | 15 | 0 | 0 | 15 | 0 |

Finally, fitting a model with the `survival`

package uses the general syntax `Surv(tstart, tstop, event_indicator_variable)`

as shown below, where we fit a Cox proportional hazard model.

## References

For more details, see this presentation and this report on further features of `tmerge`

.