Using Novel Biomarkers to Facilitate AI-Driven Glycoproteomics

Published on: 
In the Lab eNewsletter, Pharmaceutical Technology's In the Lab eNewsletter, November 2022, Volume 17, Issue 11

With the evolution of scalability of peripheral blood protein glycosylation, research has expanded for new biomarkers.

Most strategies for liquid-biopsy-based early cancer screening and detection suffer from modest sensitivity as well as insufficient specificity, and thus miss many early-stage cancers, while resulting in unnecessary, invasive, and expensive follow-up procedures. Despite the large sample volumes analyzed, the false negative rates of these technologies severely limit their utility. This is a result of the tests’ reliance for accuracy on tumors shedding sufficient amounts of material into patients’ circulation.

Glycoproteomics provides an alternative by measuring subtle shifts in the glycosylation profiles of relatively abundant circulating proteins, such as immunoglobulins and acute phase reactants. Recently, such changes have been demonstrated to show highly sensitive and specific correlations with a range of malignant diseases as well as other conditions, opening the door for the glycopro-teome to serve as a new source of highly accurate biomarkers that offer clinically relevant, profound new insights into disease processes via simple blood samples. This opportunity should not be lost on pharmaceutical companies, given its potential to produce powerful new diagnostic technologies and therapies.

The promise

Up to 70% of all proteins are glycosylated, making glycosylation by far the most common post-translational modification. More importantly, the extreme diversity of glycans—in which every compositional combination of monosaccharides may take on a wide variety of structural moieties—generates a vast repertoire of molecules, increasing the potential analyte space of the proteome by several orders of magnitude.

A large body of research has shown that protein glycosylation is highly dynamic, changing in real time with physiological and pathological states and thus—at its location far downstream the biological cascade from DNA and proximal to phenotype—predestined to be a highly differentiated and informative source of biomarkers.

Conceptually different from conventional proteomic analysis, glycoproteomic analysis does not depend on quantitative changes in abundance/expression of a particular protein. Instead, it records subtle shifts in the proportion of individual glycan moieties attached to a given glycosylation site of a protein, a domain completely missed by standard laboratory analytical approaches, such as immunoassays, which indiscriminately recognize the peptide backbone without further submolecular resolution.

Likewise, glycomic analysis, which relies on the aggregate measurement of different glycans after enzymatic separation from their cognate proteins, while occasionally yielding interesting results, discards the much more granular information provided by an analytical approach that preserves the integrity of the glycosylated protein. Glycoproteomics, therefore, allows characterization of individual glycoproteins at the resolution of both the glycan moieties and the amino-acid residues to which they are covalently attached (i.e., the glycosylation sites).

Advertisement

The challenge

Like everywhere else, there is no free lunch in life science either. The power of analysis at the resolution of individual protein glycoforms comes at the expense of very challenging analytics. Discovery of novel protein glycoforms relies entirely on the submolecular resolution provided by mass spectrometry (MS). Likewise, measuring known, previously characterized, glycosylation isoforms of a protein depends, by and large, on MS because of the limited specificity and/or difficult generation of glycan- and glycoprotein-specific affinity reagents. MS technology has, over the past decade or so, become increasingly robust and dependable, but the instrumentation is still expensive and complex, requiring highly trained, specialized operators.

The true bottleneck,—paradoxically, a consequence of the improvements in MS technology and its throughput—however, is now represented by the volume and complexity of the data generated by MS. Glycoproteomic chromatogram processing relies on experts laboriously and manually calling and integrating the area under the ragged peaks characteristic of the primary data output generated by MS. Until recently, even with the best available software, this allowed only a frustratingly slow “one-off” approach to glycoproteomic research and analysis, which was unsuitable for translational studies, and applications in which large numbers of clinical samples need to be analyzed to arrive at valid interpretations and, ultimately, clinical actionability. As an example, while running 100 patient samples on an MS can now be carried out in the space of one to two days, interpreting/processing the output, assuming 1000 peaks per sample, will keep a PhD busy for eight months.

The MS-AI-Empowered solution

Enter artificial intelligence (AI). Ushering in a foundational change in complex data processing, AI-driven glycoproteomic analysis can at long last be performed at scale, five orders of magnitude faster, and at superior fidelity and reproducibility than was heretofore possible.

The biotechnology company InterVenn Biosciences, for example, has developed a neural-network-based algorithm, which was trained by manually curating hundreds of thousands of features (i.e., glycoproteome chromatographic peaks), and which performs well on unseen test samples, reaching near-perfect agreement (Pearson’s r 0.99) with expert human annotation—as compared to ≈75% correlation using the most advanced currently available software packages. As a result, it can now process the 1000 features on each of 100 samples from the previous example to yield numerical, normalized data directly usable for biostatistical analysis in 6 minutes as opposed to full-time work by a spectrometrist for eight months (1).

The transformation of the liquid biopsy concept

The insights gleaned from rapidly scaling glycoproteomic interrogation of peripheral blood samples, admittedly still at an early stage and based on a limited number of pilot study examples, have been intriguing and powerful (2–4). By focusing on readily accessible high-abundance plasma glycoproteins, which undergo significant changes in their glycosylation status as part of a systemic response to localized, remote disease processes, liquid biopsy-based glycoproteomics is opening an entirely new domain of R&D interrogation, resulting in identification of new biomarkers, which promise to meet the exacting needs of the clinical markets. These biomarkers have the potential to accelerate the shift to a multi-omic approach to precision medicine, with benefits to individual patients and the healthcare system overall.

Much more research is needed –and being conducted—to understand the biology behind these glycosylation changes. Presumably, they are driven by signal cascades activated by the tumor mi-croenvironment of very early-stage tumors, or even precancerous lesions, which are not detectable by ct- or cf-DNA based methods. However, highly informative glycoproteomic biomarkers can be measured that do not depend on the presence of material shed by the tumor, representing a sea change for the role of liquid biopsy in disease management and preventive care.

References

  1. Z. Wu et al., J Proteomics, 223:203820 (2020).
  2. P. Ramachandran et al., J Proteome Research 21:1083-1094 (2022).
  3. C. Pickering et al., Viruses 14:553 (2022).
  4. D. Serie et al., Urologic Oncology 40:168.e11 (2022). υ

About the author

Klaus Lindpaintner, MD, MPH is chief scientific officer and distinguished scientist at InterVenn BioSciences.