Comprehensive characterization and quantification of glycoproteins derived from biological samples by enrichment techniques and LC-MS/MS



Journal Title

Journal ISSN

Volume Title



To better understand biological processes and disease progress/development, elucidating the microheterogeneity of glycoproteins is needed. Glycoproteomic approach is evaluating the changes of glycosylated proteins, in terms of determining glycosylation site occupancy (or absence) in proteins. Therefore, it defines the microheterogeneity associated with the glycosylation sites of proteins. The overarching goal of the research is to comprehensively characterize and quantify glycoproteins/glycopeptides derived from biological samples by enrichment techniques and liquid chromatography interfaced to mass spectrometry (LC-MS or LC-MS/MS). The focus of the second chapter was to quantify the glycosylation site thus providing information regarding the occupancy of glycosylation site and the extent of occupancy with a particular glycan structure. Quantification of glycopeptides was performed using multiple reaction monitoring (MRM). The enhanced sensitivity and selectivity of MRM makes it a better tool for quantification. Oxonium ions that are diagnostic of glycopeptides were used as transitions in MRM mode. To optimize conditions for MRM, model glycoprotein fetuin was used to evaluate the different number of transitions. The transitions were oxonium ions characteristic of glycopeptides. The results suggested that the segmented MRM with 3 transitions experiment exhibited maximum peak heights and reliable STD values. Accordingly, it resulted in 18% of RSD value on average from the total SMRM experiments with 3 transitions. Also, 40% normalized collision energy produced better results with more efficient fragmentation. These conditions were validated using alpha-1-acid glycoprotein and applied to blood serum and the results were found to be consistent with those for fetuin. These conditions appeared to be reliable for efficient glycopeptide quantification. In the third chapter, LC-CID/HCD/ETD-MS/MS was applied for comprehensive characterization of O-glycosylation and hydroxylation of amino acids residues associated with type II collagen (CO2A1) protein. O-Glycosylation of collagen is a unique type of PTMs, involving the attachment of galactose (Gal) or glucose-galactose (Glc-Gal) moieties to hydroxylysine (HyK). Also, hydroxyproline (HyP) results from the posttranslational hydroxylation of some proline residues in collagen. The pattern/extent of these modifications significantly influences fibrillogenesis, cross-linking, and matrix mineralization of collagen. 23 CO2A1 lysine residues were observed as unmodified, hydroxylated or glycosylated with Glc-Gal or Gal moieties. Employing different types of tandem MS permitted the characterization of CO2A1 glycosylation sites. Both Gal and Glc-Gal moieties occupied 22 of the identified glycosylation sites while K773 was observed as unmodified. A large number of HyP residues at Yaa positions of Gly-Xaa-Yaa motif were detected. ETD experiments revealed partial macroheterogeneities associated with K299-K308, K452-K464, K464-K470 and K857-K884 glycosylation sites.
The fourth chapter investigated changes in glycosylation associated with different states of esophageal diseases using different enrichment techniques. Currently, two major enrichment techniques have been widely applied in glycoproteomics, namely lectin affinity chromatography-based (LAC) and hydrazide chemistry-based (HC) enrichments. The separate and complementary qualitative and quantitative data analyses of protein glycosylation were performed using both enrichment techniques. Chemometric and statistical evaluations, PCA plots or ANOVA test, respectively, were employed to determine and confirm candidate cancer-associated glycoprotein/glycopeptide biomarkers. Out of 139, 59 common glycoproteins (42% overlap) were observed in both enrichment techniques. This overlap is very similar to previously published studies. The quantitation and evaluation of significantly changed glycoproteins/glycopeptides are complementary between LAC and HC enrichments. LC-ESI-MS/MS analyses indicated that 7 glycoproteins enriched by LAC and 11 glycoproteins enriched by HC showed significantly different abundances between disease free and disease cohorts. For MRM quantitation, the optimized experimental condition from chapter 2 was applied. MRM quantitation resulted in 13 glycopeptides by LAC enrichment and 10 glycosylation sites by HC enrichment to be statistically different among disease cohorts. Recently, hydrophilic interaction liquid chromatography (HILIC) has been introduced to show efficient capturing of glycopeptides. In the fifth chapter, the use of HILIC enrichment is described to monitor glycosylation variation of different HIV gp120 proteins. Glycans of HIV-1 envelope glycoprotein (Envs) play a pivotal role in viral infection and immune evasion allowing the virus to escape neutralization by antibodies. The receptor binding gp120 subunit of HIV-1 Envs is heavily glycosylated with 24 consensus N-linked recognition sites. The overarching goal of this study is to decipher the microheterogeneity of gp120s concerning clades or viral sequences along with the glycosylation machinery of the host prior to the development of broadly neutralizing antibodies using glycopeptides as epitopes. We are illustrating in this study that the glycosylation of gp120 is dependent on clades, isolates, and host machinery. Currently, V1/V2 (i.e., N160 recognition) and V3 (i.e., N332 recognition) domains have been shown to be critical for neutralization of HIV-1 Envs by antibodies. Here, high microheterogenieties associated with these glycosylation sites were determined among different isolates of gp120s. This was also in agreement with what has been shown in the literature. Therefore, an effective development of broadly neutralizing antibodies targeting glycosylation sites should account for the excessive microheterogenieties of the different glycosylation sites of gp120s, especially those that are targeted by the antibodies. Although computational methods based on mass spectrometric data have proven to be effective in monitoring changes in the glycome, developing such methods for the glycoproteome are challenging, largely due to the inherent complexity in simultaneously studying glycan structures with their corresponding glycosylation sites. The sixth chapter introduced a computational framework for identifying intact glycopeptides in complex proteome samples. Scoring algorithms were presented for tandem mass spectra of glycopeptides resulting from CID, HCD, and ETD fragmentation modes. An empirical false-discovery rate estimation method, based on a target-decoy search approach, was derived for assigning confidence. The power of the new method was further enhanced when multiple data sets are pooled together to increase identification confidence. Using this framework, 103 glycopeptides from 53 sites across 33 glycoproteins were identified with high confidence in complex human serum proteome samples. The conventional proteomic platforms were used and tested with standard depletion of the 7-most abundant proteins. The seventh chapter focused on developing a new algorithm for quantitation of intact glycopeptides. This was a continuous work from the sixth chapter. The developed method was applied to esophageal cancer study based on blood serum samples from cancer patients. This was in an attempt to detect potential biomarkers of site-specific glycosylations. Several glycoproteins showed significantly different abundances of site-specific glycosylations within cancer/control samples. The results demonstrated that the statistical method were robust for assessing quantitative alterations of protein glycosylation at site-specific levels across different classes of samples, thereby setting the stage for glycoproteomic biomarker discovery. In the eighth and ninth chapters, the glycosylation of prostate specific antigen was studied. Prostate specific antigen (PSA) is currently used as a biomarker to diagnose prostate cancer. PSA test has been widely used to detect and screen prostate cancer. However, PSA test in diagnostic gray zone does not clearly distinguish between benign prostate hypertrophy and prostate cancer due to the overlap. To develop more specific and sensitive candidate biomarkers for prostate cancer, in depth understanding of the biochemical characteristics of PSA (such as glycosylation) is needed. PSA has a single glycosylation site at N69 with glycans constituting approximately 8% of the protein by weight. In eighth chapter, we reported the comprehensive identification and quantitation of N-glycans from two PSA isoforms using LC-MS/MS. There were 56 N-glycans associated with PSA while 57 N-glycans were observed in the case of PSA-high pI isoform (PSAH). Three sulfated/phosphorylated glycopeptides were detected; the identification of which was supported by tandem MS data. One of these sulfated/phosphorylated N-glycans, HexNAc5Hex4dHex1s/p1, was identified in both PSA and PSAH at relative intensities of 0.52% and 0.28%, respectively. Quantitatively, the variations were monitored among two isoforms. Because we were one of the labs participating in the 2012 ABRF Glycoprotein Research Group (gPRG) study, the results were compared to those presented in this study. Our qualitative and quantitative results summarized here were comparable to what was summarized in the interlaboratory study. During our previous study of PSA N69 glycosylation, additional glycopeptides were observed in the PSA sample that were not previously reported and did not match glycopeptides of impure glycoproteins existed in the sample. This extra glycosylation site of PSA is associated with mutation in KLK3 genes. Among single nucleotide polymorphisms (SNPs) of KLKs families, the rs61752561 in KLK3 genes is an unusual missense mutation resulting in the conversion of D102 to N102 in PSA amino acid sequence. Accordingly, a new N-glycosylation site is created with an N102MS motif. In ninth chapter, we reported the first qualitative and quantitative glycoproteomic study of PSA N102 glycosylation site by LC-MS/MS. We successfully applied tandem MS to verify the amino acid sequence possessing N102 glycosylation site and associated glycoforms of PSA samples acquired from different suppliers. A total of 21, 7, and 16 glycoforms were detected for LeeBio, Sigma, and EMD PSA samples, respectively. Interestingly, fucosylated glycopeptides were not detected on N102. Among the 3 PSA samples, HexNAc2Hex5 was the predominant glycoform at N102 while HexNAc4Hex5Fuc1NeuAc1 or HexNAc4Hex5Fuc1NeuAc2 were the primary glycoforms at N69.



Glycoproteomics, LC-MS/MS, Enrichment