449 Predictive Limitations of Hematopoietic Stem Cell Transplantation Associated Mortality: A Machine Learning in-Silico Analysis of the EBMT - Acute Leukemia Working Party Registry

Track: Poster Abstracts
Saturday, February 14, 2015, 6:45 PM-7:45 PM
Grand Hall CD (Manchester Grand Hyatt)
Roni Shouval, MD , The Chaim Sheba Medical Center, Tel-Hashomer, Division of Hematology and Bone Marrow Transplantation, Ramat-Gan, Israel
Myriam Labopin, MD , EBMT Paris study office / CEREST-TC, Paris, France
Ron Unger, PhD , The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
Sebastian Giebel, MD , Department of Bone Marrow Transplantation and Oncohematology, Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Gliwice Branch, Gliwice, Poland
Fabio Ciceri, MD , Hematology and Bone Marrow Transplantation Unit, San Raffaele Scientific Institute, Milan, Italy
Christoph Schmid, MD , Department of Hematology and Oncology, Klinikum Augsburg, Ludwig-Maximilians-University, Munich, Germany
Jordi Esteeve, MD , Hematology Department, IDIBAPS, Hospital Clínic, Barcelona, Spain
Norbert Gorin, MD , Hopital Saint-Antoine, Paris, France
Frédéric Baron, MD, PhD , University of Liège, GIGA-I3, Liège, Belgium
Bipin N. Savani, MD , Medicine, Vanderbilt University, Brentwood, TN
Mohamad Mohty, MD, PhD , Department of Haematology, Saint Antoine Hospital, Paris, France
Arnon Nagler, MD , The Chaim Sheba Medical Center, Tel-Hashomer, Division of Hematology and Bone Marrow Transplantation, Ramat-Gan, Israel
Presentation recording not available for download or distribution as requested by the presenting author.

Several risk scores have been developed for the prediction of transplant related mortality (TRM) following allogeneic hematopoietic stem cell transplantation (HSCT). These have been validated; however, predictive performance is suboptimal. In addition to inherent uncertainty in such a complex medical procedure, methodological factors impeding prediction might be attributed to the statistical methodology, number and quality of features collected, or simply the population size. Using an in-silico approach (i.e. iterative computerized simulations), based on machine learning (ML) algorithms, we set to explore the factors limiting prediction.

ML is a subfield of computer science and artificial intelligence that deals with the construction and study of systems that can learn from data, rather than follow explicitly programmed instructions. Commonly applied in complex data scenarios, such as financial and technological settings, it may be suitable for outcome prediction of HSCT.

Study design involved two phases. The first, focused on development of several ML based prediction models of day 100 TRM. A cohort of 28,236 acute leukemia, adult allogeneic HSCT recipients were analyzed. Twenty four variables were included. In the second phase, by applying a repetitive computerized simulation, factors necessary for optimal prediction were explored: algorithm type, size of data set, number of included variables, and performance in specific subpopulations; Models were assessed and compared on the basis of the area under the receiver operating characteristic curve (AUC).

We developed 6 ML based prediction models for day 100 TRM. Optimal AUCs ranged from 0.65-0.68. Predictive performance plateaued for a population size ranging from n=5647-8471, depending on the algorithm (Figure 1). A feature selection algorithm ranked variables according to importance. Provided with the ranked variable data, we discovered that a range of 6-12 ranked variables were necessary for optimal prediction, depending on the algorithm. Predictive performance of models developed for specific subpopulations, ranged from an average of 0.59 to 0.67 for patient in second complete remission and patients receiving reduced intensity conditioning respectively.

In summary, we present a novel computational approach for prediction model development and analysis in the field of HSCT. Using data commonly collected on transplant patients, our simulation elucidate outcome prediction limiting factors. Regardless of the methodology applied, predictive performance converged when sampling more than 5000 patients. Few variables "carry the weight" with regard to predictive influence. Overall, the presented findings reveal a phenomenon of predictive saturation with data traditionally collected. Improving predictive performance will likely require additional types of input like genetic, biologic and procedural factors.  

Disclosures:
Nothing To Disclose