Abstract
Background and objectives: Prognostic models are, among other things, used to provide risk predictions for individuals who are not receiving a certain treatment, in order to assess the natural course of a disease and in turn guide treatment decisions. As treatment availability and use changes over time, researchers may be
... read more
faced with assessing the validity of existing models in partially treated populations, and may account for treatment use in different ways. We aimed to investigate how treatment use contributes to the poor performance commonly observed in external validation studies, and to explore methods to address the issue. Methods: The effect of treatment use on the observed performance of a model was evaluated analytically. Development data sets representing untreated individuals were simulated using a logistic model and “optimal” models were developed using those sets. Validation sets drawn from the same theoretical population were simulated to receive an effective (binary) treatment. The prevalence and effectiveness of treatment were varied, with and without being dependent on true risk. Model performance in the validation sets was expressed in terms of calibration slope, observed:expected ratio (O:E) and C statistic. We examined the results of i) ignoring treatment, ii) restricting validation to untreated patients, and iii) adjusting the observed event rates to account for treatment effects. This was expressed through the difference (Δ) between each performance measure after applying a method and the value observed in the untreated set. Results: Validation of a model derived in untreated individuals in a treated validation set resulted in poorer model performance than that observed in the same population, if left untreated. Treatment of 50% of patients with a highly effective treatment (higher risk patients had a higher probability of receiving treatment; treatment effect odds ratio: 0.5), resulted in a decrease in the O:E from 1.0 to 0.7, and a decrease in the C statistic from 0.67 to 0.62 when compared to the observed statistics in an untreated set. This trend was observed across settings with different mechanisms for treatment allocation and different population risk distributions. As treatment prevalence and effectiveness increased, the observed model performance almost invariably decreased. Restricting the validation to only untreated individuals resulted in performance measures closer to those observed in the full untreated validation set, at the cost of precision (Δ O:E = 0.0; C statistic = 0.03). When treatment allocation was completely based on risk (I.e. according to a risk threshold), the restriction approach was less effective (ΔO:E = 0.0; ΔC statistic = 0.07). Increasing the observed event rates to account for treatment effects improved the observed model calibration, but this was highly sensitive to incorrect assumptions about treatment use and effectiveness. Conclusions: Validating a model designed to make predictions of the “natural course” of an individual’s health in a validation data set containing treated individuals may result in an underestimation of the performance of the model in untreated individuals. Current methods are not sufficient to account for the effects of treatment, and findings from such studies should be interpreted with caution.
show less