Representation of features as images with neighborhood dependencies for improvement of anti-cancer drug sensitivity predictive modeling using machine learning



Journal Title

Journal ISSN

Volume Title



Precision medicine entails the design of targeted therapies that are matched for each individual patient to various drugs and drug combination. Thus, predictive modeling of anti-cancer drug responses for specific patients{here are cell lines or tumors{ constitutes a significant challenge for personalized therapy, which has been done by primarily focusing on generating mathematical functions that map genomics profiles and chemical drug compound structures to drug sensitivity. Numerous machine learning algorithms have been proposed for mapping these genomics and chemical drug characterizations to drug sensitivity, such as ensemble based learning techniques, linear regression, kernel based methods, deep learning based approaches. The typical practice is to design supervised predictive models for each individual drug based on the genomics characterizations of patients. These models consider a training set of cell lines with experimentally-measured genomics characterizations{such as RNA-Seq, micro-array gene expression, Reverse Phase Protein Array, methylation, SNPs{ or chemical drug descriptors{ mathematical representations of molecules’ properties generated by algorithms{ and responses to different drugs. In this dissertation, I have explored diverse approaches for drug sensitivity prediction and combination therapy design based on feature representation learning compatible with convolutional neural networks. The primary objective of my research is to select the optimal therapy for a new cancerous patient. Deep learning with Convolutional Neural Networks has shown great promise in image-based classification and enhancement but is often unsuitable for predictive modeling using features without spatial correlations{tabular data. We present a feature representation approach termed REFINED(REpresentation of Features as Images with NEighborhood Dependencies) to arrange high-dimensional vectors in a compact image form conducible for CNN-based deep learning. The similarities between features was considered to generate a concise feature map in the form of a two-dimensional image by minimizing the pairwise distance values following a Bayesian Metric Multidimensional Scaling approach. It is hypothesized that this approach enables embedded feature extraction and, integrated with CNN-based deep learning, can boost the predictive accuracy. The superior predictive capabilities of the proposed framework as compared to state-of-the-art methodologies in drug sensitivity prediction scenarios is illustrated, using synthetic datasets, drug chemical descriptors as predictors from NCI60, and both transcriptomic information and drug descriptors as predictors from GDSC. In another novel study, I investigated feasibility of developing ensemble learning techniques based on REFINED. In the first study, we showed REFINED-CNN based models provide promising results in drug sensitivity prediction. The primary idea behind REFINED CNN is representing high dimensional vectors as compact images with spatial correlations that can benefit from convolutional neural network architectures. However, the mapping from a vector to a compact 2D image is not unique due to variations in considered distance measures and neighborhoods. In this study, predictions based on ensembles built from such mappings that can improve upon the best single REFINED CNN model prediction were considered. Results illustrated using NCI60 and NCI-ALMANAC databases shows that the ensemble approaches can provide significant performance improvement as compared to individual models. Furthermore, it is illustrated that a single mapping created from the amalgamation of the different mappings can provide performance similar to stacking ensemble but with significantly lower computational complexity

Embargo status: Restricted until 06/2022. To request the author grant access, click on the PDF link to the left.



Deep Learning, Representation Learning, Drug Sensitivity, Precision Medicine