M-14: Assessing Consistency of Subgroup Specific Treatment Effects in Clinical Trials with Binary Endpoints
University of Bremen Germany
The objective is the development of an equivalence test to demonstrate consistency of treatment effects between two subgroups (e.g. comorbidities, gender or age) in confirmatory clinical trials with binary endpoints and to evaluate its properties compared with the conventional test of interaction.
Monte-Carlo simulations were used to assess testing strategies in trial datasets which evaluate binary endpoints. Differential odds ratios were induced for the subgroups, defined by varying values of the treatment effect difference within both subgroups relative to the overall treatment effect.
Every patient’s binary endpoint can take only two values, indicating an event happening or not, and is generated and analysed depending on covariates for treatment, subgroup and interaction of both. A logistic regression model allows analysing the effect of both covariates and the interaction on the event probability.
We have performed a) a statistical test for the overall treatment effect, b) a test for treatment-by-subgroup interaction, c) subgroup specific tests for the treatment effect as well as d) a new equivalence test.
First results show that the subgroup interaction test has low power of less than 20% when the treatment effect is twice as high in one compared to the other subgroup (e.g. Odds ratio of 1.24 in one subgroup, 1.54 in the second subgroup and 1.38 overall), whereas power decreases further for unbalanced subgroups. When there is no treatment effect in one of the subgroups, the power of the interaction test reaches about 80%, but drops to about 30% for highly unbalanced subgroups.
The new equivalence test based on the treatment-by-subgroup interaction’s coefficient from the logistic model requires consistency margins to be determined prior to the trial. These margins facilitate the consideration of clinical relevance of treatment effects. Different values for these margins were investigated empirically.
For the equivalence test, in contrast to the conventional interaction test, the power to reject the null hypothesis of heterogeneity can be increased by higher sample sizes. Depending on the chosen margins, significance of the equivalence test can also exclude a clinically relevant interaction.
The assessment of consistency of subgroup specific treatment effects is particularly important for the health economic risk-benefit evaluation. While the overall treatment effect might show a statistically significant benefit of a new drug, this benefit might be different across different medical subgroups. On the other hand, apparently different subgroup effects might be detected, e.g. by using an interaction test, just because of a large sample size, while the effect difference would not be clinically relevant.
Instead of the interaction test, we propose an equivalence test which rejects heterogeneity of the treatment effects in the subgroups. This test addresses the question of consistency more directly, because rejecting the null hypothesis demonstrates homogeneity. Since small deviations from homogeneity of subgroup specific treatment effects might not be clinically relevant, consistency margins are introduced which can be determined prior to the study.
Another advantage is the increased power for higher sample sizes. Additional subgroup specific tests for treatment effect might not be necessary for decision making in the trial. Thus the number of tests and the type-I-error can be reduced if no adjustment had been included.