Already a DIA Member? Sign in. Not a member? Join.

Sign in

Forgot User ID? or Forgot Password?

Not a Member?

Create Account and Join

Menu Back to Poster-Presentations-Details

S131: Predicting Mortality among Patients with Colorectal Cancer using Big Data: A Machine Learning Approach





Poster Presenter

      Xiaomo Xiong

      • PhD student
      • University of South Carolina
        United States

Objectives

It is crucial to understand the mortality among patients with colorectal cancer (CRC), which is the second leading death cause among cancer survivors in the United States. Therefore, we aimed to develop an algorithm using machine learning methods to predict mortality for patients with CRC.

Method

Data were obtained from the MarketScan dataset, including claims linked with Electronic Medical Records (EMR). Four algorithms were used to predict mortality, including Cox proportional-hazard regression, random survival forests, gradient boosting, and the survival support vector machine.

Results

A retrospective longitudinal study was conducted from 2012 to 2020, and all analyses were conducted in Python 3.10. The study included two time periods, a one-year baseline period before the index date of the first diagnosis of colorectal cancer, and a follow-up period after the first diagnosis. The end of follow-up was based on death, the end of continuous enrollment, and the end of the study period, whichever came first. We included patients with CRC who were at least 18 years old at the index date of the first diagnosis. In addition, patients had to contiguously enroll in the dataset for 12 months before the index date of the first diagnosis and at least 1 month after the index date. We finally included a total of 13,790 patients with CRC. Two-thirds of the included patients (9,194) were assigned to the training group for machine learning, and one-third (4,596) were assigned to the test group for algorithm validation. We used the concordance index (C-index), the time-dependent area under the curve (tdAUC), and the Integrated Brier Score (IBS) to identify the algorithm with the best performance of prediction. Results showed that the random survival forests had the best performance, with a C-index of 0.740, a tdAUC of 0.716, and an IBS of 117. The gradient boosting ranked second, with a C-index of 0.738, a tdAUC of 0.716, and an IBS of 117, while the Cox proportional-hazard regression ranked third, with a C-index of 0.728, a tdAUC of 0.710, and an IBS of 110. To identify the predictors of mortality for patients with CRC, we calculated the mean of importance. Using the best algorithm we identified, the random survival forests, the top five important features that contributed to predicting the mortality among patients with CRC were metabolic syndrome (0.105 ± 0.009), congestive heart failure (0.013 ± 0.003), chronic obstructive pulmonary disease (0.011 ± 0.003), age (0.104 ± 0.004), and health care plan (0.103 ± 0.002).

Conclusion

In this study, we developed four algorithms using four methods to predict mortality among patients with CRC and identified the predictors associated with CRC mortality. The results showed that the random survival forests machine learning method had the best performance in predicting mortality, with metabolic syndrome, age, and health care plan being among the top predictors. This algorithm could help clinicians identify patients at higher risk of mortality and intervene early, potentially improving outcomes for CRC patients. The algorithms developed in this study can be used in clinical settings to inform clinical decisions. First, it could help clinicians identify patients with CRC who are at higher risk of mortality, enabling them to tailor treatment plans and interventions accordingly. Secondly, the algorithm's identification of predictors could guide clinicians in developing personalized treatment plans. For example, patients with metabolic syndrome may require more intensive management of comorbidities such as hypertension, diabetes, or obesity, while older patients may need more careful consideration of the risks and benefits of treatment options. Lastly, the study's use of machine learning algorithms provides an efficient and reliable way to predict mortality among CRC patients. With the increasing availability of electronic health records and big data, the use of machine learning algorithms could facilitate risk stratification and improve clinical decision-making in various aspects of cancer care. Overall, the development of this algorithm and its use in clinical practice has the potential to improve outcomes for CRC patients by enabling earlier identification and intervention for high-risk patients, personalized treatment plans, and more efficient and accurate risk stratification.

Be informed and stay engaged.

Don't miss an opportunity - join our mailing list to stay up to date on DIA insights and events.