Analyzing genomic data using tensor-based orthogonal polynomials with application to synthetic RNAs

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

An important goal in molecular biology is to quantify both the patterns across a genomic sequence and the relationship between phenotype and underlying sequence. We propose a multivariate tensor-based orthogonal polynomial approach to characterize nucleotides or amino acids in a given sequence and map corresponding phenotypes onto the sequence space. We have applied this method to a previously published case of small transcription activating RNAs. Covariance patterns along the sequence showcased strong correlations between nucleotides at the ends of the sequence. However, when the phenotype is projected onto the sequence space, this pattern does not emerge. When doing second order analysis and quantifying the functional relationship between the phenotype and pairs of sites along the sequence, we identified sites with high regressions spread across the sequence, indicating potential intramolecular binding. In addition to quantifying interactions between different parts of a sequence, the method quantifies sequence–phenotype interactions at first and higher order levels. We discuss the strengths and constraints of the method and compare it to computational methods such as machine learning approaches. An accompanying command line tool to compute these polynomials is provided. We show proof of concept of this approach and demonstrate its potential application to other biological systems.

Description

C The Author(s) 2020. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Keywords

Genomic Data, RNA

Citation

Saba Nafees, Sean H Rice, Catherine A Wakeman, Analyzing genomic data using tensor-based orthogonal polynomials with application to synthetic RNAs, NAR Genomics and Bioinformatics, Volume 2, Issue 4, December 2020, lqaa101, https://doi.org/10.1093/nargab/lqaa101

Collections