Browsing by Author "Cannon, Charles H. (TTU)"
Now showing 1 - 11 of 11
- Results Per Page
- Sort Options
Item An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data(2015) Fan, Huan; Ives, Anthony R.; Surget-Groba, Yann; Cannon, Charles H. (TTU)Background: Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging. Results: To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method ( https://sourceforge.net/projects/aaf-phylogeny ) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms. Conclusion: Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms.Item Development of high-throughput SNP-based genotyping in Acacia auriculiformis x A. mangium hybrids using short-read transcriptome data(2012) Wong, Melissa M.L.; Cannon, Charles H. (TTU); Wickneswari, RatnamBackground: Next Generation Sequencing has provided comprehensive, affordable and high-throughput DNA sequences for Single Nucleotide Polymorphism (SNP) discovery in Acacia auriculiformis and Acacia mangium. Like other non-model species, SNP detection and genotyping in Acacia are challenging due to lack of genome sequences. The main objective of this study is to develop the first high-throughput SNP genotyping assay for linkage map construction of A. auriculiformis x A. mangium hybrids.Results: We identified a total of 37,786 putative SNPs by aligning short read transcriptome data from four parents of two Acacia hybrid mapping populations using Bowtie against 7,839 de novo transcriptome contigs. Given a set of 10 validated SNPs from two lignin genes, our in silico SNP detection approach is highly accurate (100%) compared to the traditional in vitro approach (44%). Further validation of 96 SNPs using Illumina GoldenGate Assay gave an overall assay success rate of 89.6% and conversion rate of 37.5%. We explored possible factors lowering assay success rate by predicting exon-intron boundaries and paralogous genes of Acacia contigs using Medicago truncatula genome as reference. This assessment revealed that presence of exon-intron boundary is the main cause (50%) of assay failure. Subsequent SNPs filtering and improved assay design resulted in assay success and conversion rate of 92.4% and 57.4%, respectively based on 768 SNPs genotyping. Analysis of clustering patterns revealed that 27.6% of the assays were not reproducible and flanking sequence might play a role in determining cluster compression. In addition, we identified a total of 258 and 319 polymorphic SNPs in A. auriculiformis and A. mangium natural germplasms, respectively.Conclusion: We have successfully discovered a large number of SNP markers in A. auriculiformis x A. mangium hybrids using next generation transcriptome sequencing. By using a reference genome from the most closely related species, we converted most SNPs to successful assays. We also demonstrated that Illumina GoldenGate genotyping together with manual clustering can provide high quality genotypes for a non-model species like Acacia. These SNPs markers are not only important for linkage map construction, but will be very useful for hybrid discrimination and genetic diversity assessment of natural germplasms in the future. © 2012 Wong et al.; licensee BioMed Central Ltd.Item Evidence for a trade-off strategy in stone oak (Lithocarpus) seeds between physical and chemical defense highlights fiber as an important antifeedant(2012) Chen, Xi (TTU); Cannon, Charles H. (TTU); Conklin-Brittan, Nancy LouTrees in the beech or oak family (Fagaceae) have a mutualistic relationship with scatter-hoarding rodents. Rodents obtain nutrients and energy by consuming seeds, while providing seed dispersal for the tree by allowing some cached seeds to germinate. Seed predation and caching behavior of rodents is primarily affected by seed size, mechanical protection, macronutrient content, and chemical antifeedants. To enhance seed dispersal, trees must optimize trade-offs in investment between macronutrients and antifeedants. Here, we examine this important chemical balance in the seeds of tropical stone oak species with two substantially different fruit morphologies. These two distinct fruit morphologies in Lithocarpus differ in the degree of mechanical protection of the seed. For 'acorn' fruit, a thin exocarp forms a shell around the seed while for 'enclosed receptacle' (ER) fruit, the seed is embedded in a woody receptacle. We compared the chemical composition of numerous macronutrient and antifeedant in seeds from several Lithocarpus species, focusing on two pairs of sympatric species with different fruit morphologies. We found that macronutrients, particularly total non-structural carbohydrate, was more concentrated in seeds of ER fruits while antifeedants, primarily fibers, were more concentrated in seeds of acorn fruits. The trade-off in these two major chemical components was more evident between the two sympatric lowland species than between two highland species. Surprisingly, no significant difference in overall tannin concentrations in the seeds was observed between the two fruit morphologies. Instead, the major trade-off between macronutrients and antifeedants involved indigestible fibers. Future studies of this complex mutualism should carefully consider the role of indigestible fibers in the foraging behavior of scatter-hoarding rodents. © 2012 Chen et al.Item Historical distribution of Sundaland's Dipterocarp rainforests at Quaternary glacial maxima(2014) Raes, Niels; Cannon, Charles H. (TTU); Hijmans, Robert J.; Piessens, Thomas; Saw, Leng G.; Van Welzen, Peter C.; Ferry Slik, J. W.The extent of Dipterocarp rainforests on the emergent Sundaland landmass in Southeast Asia during Quaternary glaciations remains a key question. A better understanding of the biogeographic history of Sundaland could help explain current patterns of biodiversity and support the development of effective forest conservation strategies. Dipterocarpaceae trees dominate the rainforests of Sundaland, and their distributions serve as a proxy for rainforest extent. We used species distribution models (SDMs) of 317 Dipterocarp species to estimate the geographic extent of appropriate climatic conditions for rainforest on Sundaland at the last glacial maximum (LGM). The SDMs suggest that the climate of central Sundaland at the LGM was suitable to sustain Dipterocarp rainforest, and that the presence of a previously suggested transequatorial savannah corridor at that time is unlikely. Our findings are supported by palynologic evidence, dynamic vegetation models, extant mammal and termite communities, vascular plant fatty acid stable isotopic compositions, and stable carbon isotopic compositions of cave guano profiles. Although Dipterocarp species richness was generally lower at the LGM, areas of high species richness were mostly found off the current islands and on the emergent Sunda Shelf, indicating substantial species migration and mixing during the transitions between the Quaternary glacial maxima and warm periods such as the present.Item How does conversion of natural tropical rainforest ecosystems affect soil bacterial and fungal communities in the Nile river watershed of Uganda?(2014) Alele, Peter O.; Sheil, Douglas; Surget-Groba, Yann; Lingling, Shi; Cannon, Charles H. (TTU)Uganda's forests are globally important for their conservation values but are under pressure from increasing human population and consumption. In this study, we examine how conversion of natural forest affects soil bacterial and fungal communities. Comparisons in paired natural forest and human-converted sites among four locations indicated that natural forest soils consistently had higher pH, organic carbon, nitrogen, and calcium, although variation among sites was large. Despite these differences, no effect on the diversity of dominant taxa for either bacterial or fungal communities was detected, using polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE). Composition of fungal communities did generally appear different in converted sites, but surprisingly, we did not observe a consistent pattern among sites. The spatial distribution of some taxa and community composition was associated with soil pH, organic carbon, phosphorus and sodium, suggesting that changes in soil communities were nuanced and require more robust metagenomic methods to understand the various components of the community. Given the close geographic proximity of the paired sampling sites, the similarity between natural and converted sites might be due to continued dispersal between treatments. Fungal communities showed greater environmental differentiation than bacterial communities, particularly according to soil pH. We detected biotic homogenization in converted ecosystems and substantial contribution of β-diversity to total diversity, indicating considerable geographic structure in soil biota in these forest communities. Overall, our results suggest that soil microbial communities are relatively resilient to forest conversion and despite a substantial and consistent change in the soil environment, the effects of conversion differed widely among sites. The substantial difference in soil chemistry, with generally lower nutrient quantity in converted sites, does bring into question, how long this resilience will last. © 2014 Alele et al.Item Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing(2011) Wong, Melissa M.L.; Cannon, Charles H. (TTU); Wickneswari, RatnamBackground: Acacia auriculiformis × Acacia mangium hybrids are commercially important trees for the timber and pulp industry in Southeast Asia. Increasing pulp yield while reducing pulping costs are major objectives of tree breeding programs. The general monolignol biosynthesis and secondary cell wall formation pathways are well-characterized but genes in these pathways are poorly characterized in Acacia hybrids. RNA-seq on short-read platforms is a rapid approach for obtaining comprehensive transcriptomic data and to discover informative sequence variants.Results: We sequenced transcriptomes of A. auriculiformis and A. mangium from non-normalized cDNA libraries synthesized from pooled young stem and inner bark tissues using paired-end libraries and a single lane of an Illumina GAII machine. De novo assembly produced a total of 42,217 and 35,759 contigs with an average length of 496 bp and 498 bp for A. auriculiformis and A. mangium respectively. The assemblies of A. auriculiformis and A. mangium had a total length of 21,022,649 bp and 17,838,260 bp, respectively, with the largest contig 15,262 bp long. We detected all ten monolignol biosynthetic genes using Blastx and further analysis revealed 18 lignin isoforms for each species. We also identified five contigs homologous to R2R3-MYB proteins in other plant species that are involved in transcriptional regulation of secondary cell wall formation and lignin deposition. We searched the contigs against public microRNA database and predicted the stem-loop structures of six highly conserved microRNA families (miR319, miR396, miR160, miR172, miR162 and miR168) and one legume-specific family (miR2086). Three microRNA target genes were predicted to be involved in wood formation and flavonoid biosynthesis. By using the assemblies as a reference, we discovered 16,648 and 9,335 high quality putative Single Nucleotide Polymorphisms (SNPs) in the transcriptomes of A. auriculiformis and A. mangium, respectively, thus yielding useful markers for population genetics studies and marker-assisted selection.Conclusion: We have produced the first comprehensive transcriptome-wide analysis in A. auriculiformis and A. mangium using de novo assembly techniques. Our high quality and comprehensive assemblies allowed the identification of many genes in the lignin biosynthesis and secondary cell wall formation in Acacia hybrids. Our results demonstrated that Next Generation Sequencing is a cost-effective method for gene discovery, identification of regulatory sequences, and informative markers in a non-model plant. © 2011 Wong et al; licensee BioMed Central Ltd.Item Molecular Evolutionary Analysis of the Alfin-Like Protein Family in Arabidopsis lyrata, Arabidopsis thaliana, and Thellungiella halophila(2013) Song, Yu; Gao, Jie; Yang, Fengxi; Kua, Chai Shian; Liu, Jingxin; Cannon, Charles H. (TTU)In previous studies, the Alfin1 gene, a transcription factor, enhanced salt tolerance in alfalfa, primarily through altering gene expression levels in the root. Here, we examined the molecular evolution of the Alfin-like (AL) proteins in two Arabidopsis species (A. lyrata and A. thaliana) and a salt-tolerant close relative Thellungiella halophila. These AL-like proteins could be divided into four groups and the two known DUF3594 and PHD-finger domains had co-evolved within each group of genes, irrespective of species, due to gene duplication events in the common ancestor of all three species while gene loss was observed only in T. halophila. To detect whether natural selection acted in the evolution of AL genes, we calculated synonymous substitution ratios (dn/ds) and codon usage statistics, finding positive selection operated on four branches and significant differences in biased codon usage in the AL family between T. halophila and A. lyrata or A. thaliana. Distinctively, only the AL7 branch was under positive selection on the PHD-finger domain and the three members on the branch showed the smallest difference when codon bias was evaluated among the seven clusters. Functional analysis based on transgenic overexpression lines and T-DNA insertion mutants indicated that salt-stress-induced AtAL7 could play a negative role in salt tolerance of A. thaliana, suggesting that adaptive evolution occurred in the members of AL gene family. © 2013 Song et al.Item Protein domain analysis of genomic sequence data reveals regulation of LRR related domains in plant transpiration in Ficus(2014) Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H. (TTU); Du, Fang K.Predicting protein domains is essential for understanding a proteins function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT-2 and LRR-8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion. Copyright:Item Pseudo-Sanger sequencing: Massively parallel production of long and near error-free reads using NGS technology(2013) Ruan, Jue; Jiang, Lan; Chong, Zechen; Gong, Qiang; Li, Heng; Li, Chunyan; Tao, Yong; Zheng, Caihong; Zhai, Weiwei; Turissini, David; Cannon, Charles H. (TTU); Lu, Xuemei; Wu, Chung I.Background: Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging.Results: We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing.Conclusions: Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics. © 2013 Ruan et al.; licensee BioMed Central Ltd.Item Reference-Free Comparative Genomics of 174 Chloroplasts(2012) Kua, Chai Shian; Ruan, Jue; Harting, John (TTU); Ye, Cheng Xi; Helmus, Matthew R.; Yu, Jun; Cannon, Charles H. (TTU)Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions. © 2012 Kua et al.Item Variable mating behaviors and the maintenance of tropical biodiversity(2015) Cannon, Charles H. (TTU); Lerdau, ManuelCurrent theoretical studies on mechanisms promoting species co-existence in diverse communities assume that species are fixed in their mating behavior. Each species is a discrete evolutionary unit, even though most empirical evidence indicates that inter-specific gene flow occurs in plant and animal groups. Here, in a data-driven meta-community model of species co-existence, we allow mating behavior to respond to local species composition and abundance. While individuals primarily out-cross, species maintain a diminished capacity for selfing and hybridization. Mate choice is treated as a variable behavior, which responds to intrinsic traits determining mate choice and the density and availability of sympatric inter-fertile individuals. When mate choice is strongly limited, even low survivorship of selfed offspring can prevent extinction of rare species. With increasing mate choice, low hybridization success rates maintain community level diversity for extended periods of time. In high diversity tropical tree communities, competition among sympatric congeneric species is negligible, because direct spatial proximity with close relatives is infrequent. Therefore, the genomic donorship presents little cost. By incorporating variable mating behavior into evolutionary models of diversification, we also discuss how participation in a syngameon may be selectively advantageous. We view this behavior as a genomic mutualism, where maintenance of genomic structure and diminished inter-fertility, allows each species in the syngameon to benefit from a greater effective population size during episodes of selective disadvantage. Rare species would play a particularly important role in these syngameons as they are more likely to produce heterospecific crosses and transgressive phenotypes. We propose that inter-specific gene flow can play a critical role by allowing genomic mutualists to avoid extinction and gain local adaptations.