Classification of protein-ligand binding using their structural dispersion



Journal Title

Journal ISSN

Volume Title



It is known that a protein's biological function is in some way related to its physical structure. Many researchers have studied this relationship both for the entire backbone structures of proteins as well as their binding sites, which are where binding activity occurs. However, despite this research, it remains an open challenge to predict a proteins function from its structure. The main purpose of this research is to gain a better understanding of how structure relates to binding activity and to classify proteins according to function via structural information.

We approach the problem by first calculating the distances of each atom to the three principal axes. Then, we construct the covariance matrix for these distances for each binding site, which is named as Covariances of Distances to Pricipal Axis (CDPA) to serve as our data objects. To apply this methodology, we used the dataset compiled by Kahraman et al. (2007) and the extended Kahraman dataset that was used in Hoffmann et al (2010). Then, we performed classification on these matrices using a variety of techniques, including nearest mean. We apply this general approach to different types of distance, namely the Euclidean, Log-Euclidean, Cholesky, Square-Root, and Canonical distances. Finally, we compared the performance of the model-based technique using the CDPA with different distances to the alignment-based techniques arising from Ellingson and Zhang (2012) and Hoffmann et al. (2010).

Embargo status: Restricted until 09/2022. To request the author grant access, click on the PDF link to the left.



Proteins Function, Ligand, Covariance Matrix, Covariances of Distances to Principal Axis