Kavli Affiliate: Donald Geman, Laurent Younes
| Authors: Mohamed Omar, Wikum Dinalankara, Lotte Mulder, Tendai Coady, Claudio Zanettini, Eddie Luidy Imada, Laurent Younes, Donald Geman and Luigi Marchionni
| Summary:
Abstract Many gene signatures have been developed by applying machine learning (ML) on omics profiles, however, their clinical utility is often hindered by limited interpretability and unstable performance in different datasets. Here, we show the importance of embedding prior biological knowledge in the decision rules yielded by ML approaches to build robust classifiers. We tested this by applying different ML algorithms on gene expression data to predict three difficult cancer phenotypes: bladder cancer progression to muscle invasive disease, response to neoadjuvant chemotherapy in triple-negative breast cancer, and prostate cancer metastatic progression. We developed two sets of classifiers: mechanistic, by restricting the training process to features capturing a specific biological mechanism; and agnostic, in which the training did not use any a priori biological information. Mechanistic models had a similar or better performance to their agnostic counterparts in the testing data, with enhanced stability, robustness, and interpretability. Our findings support the use of biological constraints to develop robust and interpretable gene signatures with high translational potential. Competing Interest Statement The authors have declared no competing interest.