Survival Analysis Step 2: Subgroup and Interaction • leo.ukb

What this step is for

This tutorial is the second survival-analysis step in leo.ukb. After the main Cox model, the next practical question is usually:

does the exposure-outcome association differ across groups, and if so, on what scale?

That question can lead to three related but different outputs:

a formal multiplicative interaction test
a subgroup table that describes the pattern of hazard ratios
an additive-interaction summary for binary-binary contrasts

The key is not to mix them up. Step 2 is where we learn how to choose the right one and how to read the results without over-interpreting them.

Start from the data and analytic roles

We keep using survival::lung, but this time we build the data frame around teaching roles rather than only model fitting.

lung_df <- dplyr::transmute(
  survival::lung,
  outcome = as.integer(status == 2),
  outcome_censor = time / 365.25,
  age = age,
  age_high = factor(
    ifelse(age >= 65, "Older", "Younger"),
    levels = c("Younger", "Older")
  ),
  sex = factor(sex, levels = c(1, 2), labels = c("Male", "Female")),
  ecog_group3 = factor(
    dplyr::case_when(
      ph.ecog == 0 ~ "ECOG0",
      ph.ecog == 1 ~ "ECOG1",
      ph.ecog >= 2 ~ "ECOG2plus",
      TRUE ~ NA_character_
    ),
    levels = c("ECOG0", "ECOG1", "ECOG2plus")
  ),
  ecog_binary = factor(
    ifelse(ph.ecog >= 2, "Poor", "Good"),
    levels = c("Good", "Poor")
  )
)

Before fitting anything, look at the structure the way you would in a real paper:

dplyr::glimpse(lung_df)
#> Rows: 228
#> Columns: 7
#> $ outcome        <int> 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ outcome_censor <dbl> 0.8377823, 1.2457221, 2.7652293, 0.5749487, 2.4175222, …
#> $ age            <dbl> 74, 68, 56, 57, 60, 74, 68, 71, 53, 61, 57, 68, 68, 60,…
#> $ age_high       <fct> Older, Older, Younger, Younger, Younger, Older, Older, …
#> $ sex            <fct> Male, Male, Male, Male, Male, Male, Female, Female, Mal…
#> $ ecog_group3    <fct> ECOG1, ECOG0, ECOG0, ECOG1, ECOG0, ECOG1, ECOG2plus, EC…
#> $ ecog_binary    <fct> Good, Good, Good, Good, Good, Good, Poor, Poor, Good, P…

Analytic roles in the Step 2 teaching data frame
Variable	Type	Teaching role
`outcome`	Binary event indicator	Outcome event
`outcome_censor`	Follow-up time in years	Outcome time
`age`	Continuous exposure	Main exposure in multiplicative examples
`sex`	Binary modifier	Binary modifier for multiplicative interaction
`ecog_group3`	3-level ECOG performance-status grouping	Multi-level modifier for subgroup display
`age_high`	Binary age grouping	Binary age variable for additive interaction
`ecog_binary`	Binary ECOG performance-status grouping	Binary ECOG modifier for additive interaction

It is also worth checking the subgroup and joint-cell sizes before talking about effect modification.

knitr::kable(
  dplyr::count(lung_df, ecog_group3, outcome),
  caption = "Outcome counts across the multi-level subgroup"
)

Outcome counts across the multi-level subgroup
ecog_group3	outcome	n
ECOG0	0	26
ECOG0	1	37
ECOG1	0	31
ECOG1	1	82
ECOG2plus	0	6
ECOG2plus	1	45
NA	1	1

knitr::kable(
  dplyr::count(lung_df, age_high, ecog_binary),
  caption = "Four joint cells for the additive-interaction example"
)

Four joint cells for the additive-interaction example
age_high	ecog_binary	n
Younger	Good	107
Younger	Poor	20
Younger	NA	1
Older	Good	69
Older	Poor	31

The second table is especially important because additive interaction needs a binary-by-binary setup with all four cells present.

Choose the scale before fitting

The easiest way to avoid confusion is to decide what question you are asking before you pick the function.

decision_table <- data.frame(
  Goal = c(
    "Formal multiplicative interaction",
    "Subgroup display",
    "Additive interaction"
  ),
  Exposure_types = c(
    "Continuous, binary, or multi-level categorical",
    "Continuous, binary, or categorical",
    "Binary only"
  ),
  Modifier_types = c(
    "Continuous, binary, or multi-level categorical",
    "Binary or multi-level categorical",
    "Binary only"
  ),
  Main_function = c(
    "leo_cox_interaction()",
    "leo_cox_subgroup()",
    "leo_cox_add_interaction()"
  ),
  Main_question = c(
    "Does the association differ on the hazard-ratio scale?",
    "What does the HR look like within each stratum?",
    "Does the joint effect exceed what we expect under additivity?"
  )
)

knitr::kable(
  decision_table,
  caption = "Which Step 2 tool matches which scientific question?"
)

Which Step 2 tool matches which scientific question?
Goal	Exposure_types	Modifier_types	Main_function	Main_question
Formal multiplicative interaction	Continuous, binary, or multi-level categorical	Continuous, binary, or multi-level categorical	leo_cox_interaction()	Does the association differ on the hazard-ratio scale?
Subgroup display	Continuous, binary, or categorical	Binary or multi-level categorical	leo_cox_subgroup()	What does the HR look like within each stratum?
Additive interaction	Binary only	Binary only	leo_cox_add_interaction()	Does the joint effect exceed what we expect under additivity?

This table is the practical rule of thumb:

multiplicative interaction is the most general option
subgroup tables need a categorical modifier because they need explicit strata
additive interaction is currently the most restrictive because RERI, AP, and S require a binary-by-binary contrast

If a modifier is continuous, leo_cox_interaction() can still test it formally, but leo_cox_subgroup() is only appropriate after a clinically meaningful grouping has been defined.

Multiplicative interaction is the general formal test

In a Cox model, multiplicative interaction asks whether the exposure effect differs across levels of a modifier on the hazard-ratio scale.

Example 1: continuous exposure by binary modifier

Here we ask whether the age-outcome association differs by sex.

res_inter_sex <- leo_cox_interaction(
  df = lung_df,
  y_out = c("outcome", "outcome_censor"),
  x_exp = "age",
  x_inter = "sex",
  x_cov = list(
    "Crude" = NULL,
    "Model A" = c("ecog_group3")
  ),
  verbose = FALSE
)

res_inter_sex$result
#>     Model Exposure Interaction   N Case N Control N Person-years Interaction DF
#> 1   Crude      age         sex 227    164        63     190.3409              1
#> 2 Model A      age         sex 227    164        63     190.3409              1
#>   P for interaction Exposure class Interaction class
#> 1             0.602     Continuous            Binary
#> 2             0.180     Continuous            Binary

How to read this result:

Exposure class = Continuous means age is handled as a continuous exposure
Interaction class = Binary means sex is treated as a binary modifier
P for interaction is the formal heterogeneity test

In this teaching example, the adjusted P for interaction is 0.18, so there is no clear evidence that the age-outcome association differs by sex on the multiplicative hazard-ratio scale.

That is already a useful result. A non-significant interaction does not mean “nothing happened”; it means the data do not strongly support effect modification on this scale.

Example 2: continuous exposure by multi-level categorical modifier

The same function also works for a multi-level subgroup variable.

res_inter_ecog <- leo_cox_interaction(
  df = lung_df,
  y_out = c("outcome", "outcome_censor"),
  x_exp = "age",
  x_inter = "ecog_group3",
  x_cov = list("Crude" = NULL),
  verbose = FALSE
)

res_inter_ecog$result
#>   Model Exposure Interaction   N Case N Control N Person-years Interaction DF
#> 1 Crude      age ecog_group3 227    164        63     190.3409              2
#>   P for interaction Exposure class      Interaction class
#> 1             0.037     Continuous Categorical (3 levels)

Here the modifier is a 3-level categorical variable, and the interaction test returns P for interaction = 0.037.

That means the age-outcome association appears to differ across ECOG groups on the multiplicative hazard-ratio scale. Because the modifier is categorical, a subgroup table is now a natural follow-up.

Subgroup analysis describes the pattern after the formal test

leo_cox_subgroup() is not a replacement for the interaction test. It is the display layer that helps you describe what the pattern looks like once a categorical modifier has been chosen.

res_sub_ecog <- leo_cox_subgroup(
  df = lung_df,
  y_out = c("outcome", "outcome_censor"),
  x_exp = "age",
  x_subgroup = "ecog_group3",
  x_cov = list("Crude" = NULL),
  verbose = FALSE
)

res_sub_ecog$result
#>      Subgroup     Level   N Model Exposure Outcome    Exposure level Case N
#> 1 ecog_group3     ECOG0  63 Crude      age outcome Per unit increase     37
#> 2 ecog_group3     ECOG1 113 Crude      age outcome Per unit increase     82
#> 3 ecog_group3 ECOG2plus  51 Crude      age outcome Per unit increase     45
#>   Control N Person-years    HR       95% CI P value P for interaction
#> 1        26     60.69268 1.059 1.013, 1.107   0.011             0.037
#> 2        31     97.28131 1.004 0.980, 1.029   0.731             0.037
#> 3         6     32.36687 0.994 0.958, 1.030   0.733             0.037

This subgroup table tells a more descriptive story:

in ECOG0, the HR per one-year increase in age is 1.059
in ECOG1, the HR is close to null at 1.004
in ECOG2plus, the HR is also close to null at 0.994
the matching P for interaction = 0.037 reminds us that the subgroup pattern should be interpreted through the formal interaction test

In manuscript language, the safest interpretation is:

There was evidence that the age-outcome association differed by ECOG performance status on the multiplicative scale. The age-associated hazard increase appeared stronger in ECOG0 than in ECOG1 or ECOG2plus.

This is much better than writing “one subgroup was significant and the others were not”, which is not a proper interaction argument.

Additive interaction answers a different question

Multiplicative and additive interaction are not the same. Additive interaction asks whether the joint effect of two binary factors is more or less than what we would expect from adding their separate effects.

That is why the current teaching workflow in leo.ukb is binary-by-binary.

Here we use:

ecog_binary: poorer versus better ECOG performance status
age_high: older versus younger age

The binary age grouping is only for teaching additive interaction. In a real analysis, the cutpoint should be clinically justified in advance rather than chosen after looking at the results.

res_add <- leo_cox_add_interaction(
  df = lung_df,
  y_out = c("outcome", "outcome_censor"),
  x_exp = "ecog_binary",
  x_inter = "age_high",
  x_cov = list("Crude" = NULL),
  verbose = FALSE
)

res_add$result
#>   Model    Exposure Interaction   N Case N Control N Person-years
#> 1 Crude ecog_binary    age_high 227    164        63     190.3409
#>                      Reference group Recode applied Multiplicative interaction
#> 1 ecog_binary=Good, age_high=Younger          FALSE                      0.454
#>   Multiplicative 95% CI P for interaction   RERI   RERI 95% CI     AP
#> 1          0.226, 0.911             0.029 -1.475 -3.199, 0.249 -0.739
#>       AP 95% CI     S     S 95% CI
#> 1 -1.723, 0.245 0.403 0.154, 1.056

This output should be read in layers.

First, the setup columns:

Reference group tells you which joint category is treated as the baseline
Recode applied tells you whether the function had to recode the binary variables internally so the reference structure was interpretable

Then the scale-specific measures:

Multiplicative interaction: interaction on the hazard-ratio scale
RERI: relative excess risk due to interaction
AP: attributable proportion due to interaction
S: synergy index

The null values differ by measure:

multiplicative interaction: 1
RERI: 0
AP: 0
S: 1

In this teaching example:

the multiplicative interaction estimate is 0.454 with P for interaction = 0.029
RERI = -1.475 and AP = -0.739 both point toward negative interaction on the additive scale
but the confidence intervals for RERI, AP, and S all include their null values

The first bullet is easy to misread unless we spell out what the number is. Multiplicative interaction is not the joint-group hazard ratio itself. It is:

HR11 / (HR10 × HR01)

So a value below 1 does not mean the joint group is automatically protective or that the joint-group hazard ratio must be below 1. It means:

the observed joint effect is smaller than what we would expect if the two separate hazard ratios combined by simple multiplication.

For a concrete example:

if ecog_binary alone had HR = 2
and age_high alone had HR = 3
then under no multiplicative interaction the joint-group HR would be expected to be 2 × 3 = 6
if the multiplicative interaction term were 0.454, the joint-group HR would instead be about 2 × 3 × 0.454 = 2.724

That joint-group HR is still above 1; it is just weaker than the pure multiplicative expectation. In other words, this points to negative interaction, or antagonism, on the multiplicative scale.

So the most careful interpretation is:

The binary contrast showed evidence of interaction on the multiplicative scale, but additive-interaction estimates were imprecise and did not provide clear evidence on the additive scale.

This is a good teaching result because it shows that the answer can depend on the scale. A significant multiplicative interaction does not guarantee a clear additive interaction, and vice versa.

How to interpret common result patterns

The practical reading rules are:

Multiplicative interaction supported, additive interaction not supported

State that effect modification is supported on the hazard-ratio scale, while additive-interaction evidence remains uncertain.
Additive interaction supported, multiplicative interaction not supported

State that the joint effect exceeds additivity, even though the hazard-ratio scale does not show clear multiplicative deviation.
Both supported

State explicitly that the association differs on both scales.
Neither supported

State that there is no clear evidence of effect modification in the current data.

And one rule should always stay in place:

“one subgroup was significant and another was not” is not itself evidence of interaction.

How to write Step 2 results in a paper

Here are manuscript-style templates that match the outputs above.

For a null multiplicative interaction:

There was no clear evidence that the association between age and the event differed by sex after adjustment (P for interaction = 0.18).

For a supported multiplicative interaction followed by a subgroup table:

The association between age and the event differed by ECOG performance status (P for interaction = 0.037). In subgroup analyses, the hazard ratio per one-year increase in age was 1.059 (95% CI 1.013, 1.107) in ECOG0, but was close to null in ECOG1 and ECOG2plus.

For additive interaction:

For the binary contrast between poor performance status and older age, the multiplicative interaction term was 0.454 (95% CI 0.226, 0.911), indicating that the joint effect was smaller than expected under pure multiplication of the two separate hazard ratios; this does not itself mean that the joint-group HR was below 1. Additive-interaction estimates remained imprecise (RERI -1.475, 95% CI -3.199, 0.249).

This wording keeps the scale explicit and avoids over-claiming.

What to take away from Step 2

Step 2 should leave you with five practical rules:

multiplicative interaction is the most general formal test in the Cox workflow
subgroup tables are descriptive and work best when the modifier is categorical
additive interaction currently belongs to binary-by-binary contrasts
the same data can look different on multiplicative and additive scales
interpretation should follow the scale actually tested, not only the raw subgroup P values

Step 3 then asks a different question:

could part of the exposure-outcome association operate through a mediator?