Industrializing Design, Development, and Manufacturing of Therapeutic Proteins

Published on: 
Pharmaceutical Technology, Pharmaceutical Technology-05-01-2011, Volume 2011 Supplement, Issue 3

The authors discuss various approaches and related issues, including production of difficult-to express proteins using cell-free expression systems, scalability of protein expression, and site-specific chemical modifications.

From an industrial research and development (R&D) perspective, the design and development of protein therapeutics today appears somewhat akin to the rational design of small-molecule discovery back in the 1970s when lead compounds were generated from known physiological substrates or ligands. Facing a need to find novel and diverse small-molecule leads, attention in the 1980s centered on high-throughput screening (HTS) technologies and compound libraries. Those libraries, albeit large, were hardly diverse, with most therapeutic agents coming from a few target protein classes. Complementation of libraries with natural products, the development of combinatorial chemistry, and application of focused-library sets followed. This evolution, together with automated methods for content-rich assay systems and fast make-test cycles, enhanced discovery of novel, potent, and diverse lead series.

Contrast this with the present analagous processes for protein therapeutics: the discovery and development of novel biologics is hardly diverse, efficient or rapid. State-of-the art protein discovery and development use multiple expression hosts (e.g., mouse, E.coli, Chinese hamster ovary (CHO), and NS0) and several reformatting steps between hosts are often necessary during testing, scale-up, and production. The process of developing cell-based protein expression systems that are efficient, consistent, and scalable often is difficult and sometimes impossible using currently available technology.

To date, more than 150 protein drugs have been approved for clinical use, nearly all of which are produced in cell-based expression systems, such as E. coli, CHO cells, and Saccharomyces cerevisiae (S. cerevisea). These cell-based systems have several limitations, and many biologics can't be developed in these systems. For example, these systems only allow the overexpression of proteins that don't affect the physiology of the host cells. For many expression systems, identifying cell lines that stably synthesize high protein titers of the desired product is a time-consuming and labor-intensive process. Ideally, the same production host for rapid variant discovery, production for animal testing, and manufacturing of a clinical candidate would be used.

Ideally, one would want to emulate the huge leap made in iterative drug design seen in small-molecule discovery, namely, rapid make-test cycles and generation of multiple parallel libraries of drug candidates with diverse structural elements to optimize activity while maintaining feasibility for manufacture. An ideal system would do the following:

  • Make fast make-test cycles a prerequisite for re-iterative design on the order of three to five days, similar to those for focused small-molecule libraries

  • Create efficient and rapid expression and purification that allows for libraries of hundreds to thousands of protein-sequence variants to be simultaneously tested per make-test cycle using standard off-the-shelf robotics equipment

  • Incorporate preferred sequences defined from selection technologies, such as ribosome or phage display, into whole protein therapeutics for testing

  • Enhance the diversity of chemical structures by expanding the library of available amino acids at specifically targeted points in the protein sequence from 20 natural to many hundreds of non-natural amino-acids

  • Optimize several properties (e.g., agonist or antagonist, affinity, stability, and predictive manufacturability) simultaneously through rapid high-throughput make-test cycles

  • Create processes that are not only rapid but amenable to rapid scale-up and cGMP manufacturing once the desired therapeutic construct has been identified.

As ambitious as such a system would seem, several exciting technologies are emerging that improve expression systems and enhance diversity to enable modification of intrinsic properties of proteins, such as enzyme catalytic efficiency or binding. Others combine different properties in single therapeutics by conjugation chemistries. Further emerging technologies can lead to more rapid and parallel expression of many protein drug candidates. Getting all of these desirable technologies into a single amenable platform that has the flexibility to be scaled and support cGMP manufacturing is in sight.

Advances in development

Early improvements in endogenous protein-based therapeutics produced new, commercially successful therapeutics with desirable properties by simply extending sequence incorporating fusion to proteins such as the constant fragment of antibodies (Fc) or by PEGylating to increase half-life. Beyond these early approaches, considerable effort to produce ever-more elegant constructs that combine two separate functions have been made. One promising approach, antibody drug conjugates (ADCs), involves using a targeting antibody to known tissue selective cell-surface antigens or receptors to target conjugated toxins or cytotoxic drugs and so enhance selectivity over normal tissue.

Successful design of effective ADCS is complex and requires linking cytotoxic drug payloads to tumor-targeting antibody constructs. The selection of an ideal antigen target for optimal internalization and specificity for tumor tissues is critical. The design of linkers that are stable in circulation, but cleave when internalized in tumor cells to release the cytotoxic drug, adds to the complexity, but the other major technical hurdle has been to define how the cytotoxic payload with linker are conjugated to the targeting antibody. The biopharmaceutical companies Seattle Genetics and Immunogen have developed robust platforms that depend on conjugation of linkers and cytotoxic warheads to available cysteine or lysine residues, respectively, in the tumor targeting antibody sequence.

Despite successes with ADCs, there are many examples where seemingly optimized functional components (i.e., antigen-binding motif, linker, and drug-conjugate) do not translate into a developable therapeutic candidate. ADCs produced using conjugation chemistries to endogenous cysteines and lysines inevitably lead to the production of multiple species of the ADC with the drug conjugated in varying payloads of between one and nine molecules per immunoglobulin G (IgG). Furthermore, all sites of conjugation are not equal. Some conjugations interfere with antigen-binding epitopes, thereby reducing binding affinity and/or drug half-life (1). All too often, poor efficacy is revealed in the clinic only after significant investment in cell-based expression systems and scale-up.


Site-directed conjugations

Several technologies aim to provide chemically amenable sites within a protein sequence for the posttranslational chemical conjugation of small-molecule drugs, peptides, or other constructs to improve or add functionality. For example, sequence-specific conjugations producing homogeneous ADCs with fixed payloads are aimed at improving tumor-cell killing and increasing therapeutic index. Engineered ThioMabs (Roche/Genentech technology) that use natural cysteine residues that must be carefully unmasked during production for subsequent site-specific conjugation have shown preclinical proof-of-concept (2). These approaches await further clinical validation.

Carlos Barbas' laboratory at The Scripps Research Institute exploited the use of exposed tyrosine residues within the complimentarity determining regions (CDRs) of IgG molecules as the basis for linking drug conjugates. CovX, acquired by Pfizer in 2008, was founded to develop this technology. These sorts of approaches are attractive in that posttranslational chemical coupling to a common IgG construct with resulting extended half-life represents a platform amenable to many different small-molecule or peptide agonists.

Non-natural amino acids

The introduction of non-natural amino acids (nnAAs), or those amino acids not part of the 20 naturally incorporated ones into proteins, plays an important role in basic peptide and protein research. They are increasingly used to develop biologics with enhanced pharmacological properties beyond providing sites for drug conjugation. NnAAs can be introduced through chemical synthesis in peptides or biosynthetically in proteins. Currently, only peptides and very small protein drugs with nnAAs are on the market because they can be made synthetically and avoid the limitations of cell-based expression systems. A prominent example is the semisynthetic, broad-spectrum antibiotic, ampicillin, into which the nnAAs D-phenylglycine and D-4-hydroxyphenylglycine have been incorporated (3).

The opportunities to broaden protein diversity and properties with nnAAs are enormous, as is the ability to incorporate chemical modifications in proteins that can endow current biopharmaceuticals with improved or new properties. These chemical modifications can change the characteristics of proteins, including ligand-binding properties, stability, spectroscopic properties, folding behavior, catalytic efficiency, and substrate specificity. These modifications provide possibilities to develop biobetters and biosuperiors that have superior pharmacological properties, including improved safety profiles, longer half-life, and enhanced activity (4).

Much effort has been put into developing technologies that ensure a site-specific incorporation of the nnAAs with a high rate of yield. Various methods for site-specific introduction have been established, both semisynthetic and recombinant methods. Few methods, however, have made it from the bench at small-scale protein production to commercial scale.

A recently formed biotechnology company, Redwood Biosciences, is using an approach based on the work of Carolyn Bertozzi's laboratory at the University of California, Berkeley. Her work focuses on genetically encoded aldehyde tags and aims to exploit a specific sequence (originally found within the sequence of sulphatases) that is posttranslationally recognized and modified by a formyl glycine-generating enzyme to produce a so-called aldehyde chemical handle (5). The incorporation of the CxPxR sequence at specific positions in candidate protein therapeutics provides a means to produce a site-specific nnAA with a reactive aldehyde amenable to drug conjugation.

One of the oldest methods for nnAA incorporation into proteins uses auxotrophic strains from E. coli that cannot synthesize a specific natural amino acid and thus have to uptake it from the growth medium. A structurally similar nnAA can be supplied within the growth medium in place of the natural amino acid and will be alternatively incorporated into the protein. A major downside is that the specific nnAA will be incorporated at every site coding for the natural amino acid and can lead to misfolding and impaired function of the target protein, or the nnAA can be incorporated in the host-cell's proteins, which can have toxic effects (6). Allozyne has pioneered this type of cell-based expression system that incorporates nnAAs into protein sequences, but this approach requires extensive re-engineering of the target protein sequence used, due to the region-specific nature of the nnAA incorporation using this method.

Ambrx has developed cell-based nnAA incorporation systems where E. coli or CHO cells are engineered with orthogonal pairs of transfer (tRNA) and tRNA synthetases to charge and incorporate nnAAs at selected codons at specific points in the coding sequence of the expressed protein. This approach is a significant advance and provides answers to at least some of the questions raised about nonspecific sites of conjugations in ADCs. For truly expanding the number and variety of nnAA that can be incorporated to determine the effect on function, even at a single amino-acid position, the approach demands significant investment to engineer further orthogonal pairs of tRNA synthetases and tRNAs that can recognize a library of nnAAs. A further complication is that these pairs should be exquisitely selective over natural amino acids to avoid their incorporation over the desired nnAA, which can be challenging in a drug-manufacturing context with strict regulatory requirements. When this challenge is taken into account, along with the variability in efficiency with which nnAAs are absorbed into the cell, these systems will not likely be amenable to fast reiterative make-test cycles with libraries of nnAAs at multiple sites of incorporation.

All of these considerations suggest a clear need to move away from the conventional cell-based protein expression systems to address the critical requirement for a rapid make-test system that is amenable to many parallel re-iterations of site-specific incorporations of defined natural amino-acid sequences or multiple nnAAs at multiple sites. The answer may not lie with cell-based systems at all, but with completely in vitro biochemical protein synthesis based on novel cell-free expression systems.

Cell-based expression systems: parallel reiterative design and scalable platforms

Adnexus Therapeutics (acquired by Bristol-Myers Squibb in 2007) has developed an E. coli-based platform for producing adnectins. Adnectins are derived from human fibronectin and many trillions of adnectin variants can be generated to represent a screenable library for desirable therapeutic properties. Scale-up from a selected lead is rapid, albeit with the requirement for PEGylation for clinical candidate manufacture (7).

Fabrus has recently addressed the rapid make-test cycle approach for biologics using arrays of predefined Fab antibody sequences produced in a high throughput expression system based on production of proteins in E coli, cell-lysis, and subsequent protein purification (8). The large volumes of cell culture required for high-yield parallel production of Fabs requiring a significant investment in specialized robotics equipment. The method allows production of hundreds of variant Fab proteins over a one-week production cycle to begin variant testing in high-throughput biochemical-binding assays.

Expression systems: cell-free systems

Although cell-free protein synthesis has been practiced for decades as a research tool, only recently have advances suggested its feasibility for commercial biologics drug development and production as an alternative to conventional cell-based expression systems (9). An ideal cell-free protein production platform would produce fully soluble and correctly folded proteins at high volumetric productivities at any scale. The platform would be rapidly and predictably optimized by systems-level process design and control without the demanding requirements for maintaining cell viability and be readily adapted to high-throughput methods, including in vitro evolution of proteins to allow incorporation of nnAA into polypeptides. The platform would be based on simple batch systems using standard bioreactors that are known to scale to thousands of liters for both cell fermentation and subsequent cell-free protein production (10, 11).

Early efforts at developing such a system focused on projected costs that were much too high, as well as on proteins with disulfide bonds that could not be folded effectively. By focusing on basic biochemical reactions and controlling cell-free metabolism, these limitations have been methodically addressed (12). Amino-acid supply has been stabilized, and metabolism activated to dramatically reduce substrate costs by requiring only the addition of nucleotide monophosphates to drive energy production. Commercially available in vitro transcription translation kits based on E. coli, wheat germ, rabbit reticulocytes, and insect-cell extracts do not offer this advantage and are suitable only for research exploration at small scale. Control of the sulfhydryl redox potential has been gained and a robust disulfide isomerase added to facilitate oxidative protein-folding (13). These advances not only suggest production feasibility for pharmaceutical proteins containing the 20 natural amino acids, but they also provide enabling technology for incorporation of nnAAs at commercial scale.

A recent publication demonstrates that this open cell-free system (OCFS) developed by Swartz and collaborators can be optimized for high-level production of proteins to allow for scale-up to commercial levels once the target protein is identified (14). The authors expressed a multidisulfide-bonded protein, biologically active granulocyte-macrophage colony-stimulating factor (rhGM-CSF), at titers of 700 mg/L in 10 h. Importantly, they could show that the product was linearly scalable from starting materials in 96-well plates up to 100-L culture volume (14). The open nature of the system allows mass spectrometry-based profiling of the cell-free metabolome and proteome. Rapid testing of the effects of addition and subtraction of various components for system optimization can be modeled without the requirements for tuning more complex cellular networks required for maintaining cell viability commonly encountered in mammalian cell-line development.

Difficult-to-express proteins

The lack of a membrane-barrier in the OCFS provides the opportunity to express and study proteins that are difficult to express in cell-based systems. Many proteins that can't be readily expressed by in vivo expression systems due to poor folding, inclusion body accumulation, or due to toxicity can be expressed in an E. coli cell-free lysate.

Combinatorial screening of proteins

The advancements in genomic research and increased numbers of sequenced genomes require expression systems that allow fast production of the proteins under investigation. Cell-free expression systems can provide a useful tool for rapid screening and analysis of protein function, which is important for protein-drug discovery and development. DNA molecules can be amplified, transcribed, and translated in microplate wells and the expressed protein can be assayed immediately (15). Recently, HTS in a cell-free wheat germ system led to the discovery of a novel malaria vaccine candidate (16). Finally, the linear scalability allows proteins identified in display-based selections and HTS to be immediately scaled for production of multiple gram quantities, thus avoiding the delays and challenges of conventional mammalian cell line development.

Currently, cell-based expression technologies exhibit several limitations with respect to protein production at all phases of the drug discovery and development pipeline. Rapid production of proteins with novel chemical modifications, such as ADCs, are particularly challenging. E. coli-based cell-free protein synthesis systems, however, provide robust, rapid,and scalable protein production. The E. coli-based OCFS system, in particular, allows rapid and multiplexed production of various difficult to express proteins and opens the unprecedented ability to explore therapeutics beyond the 20 amino acids that define today's proteins (17). The OCFS, combined with rational protein design and the focused use of libraries of nnAAs, allows for rapid exploration and identification of protein therapeutics, moving from the exploratory stage to clinical scale-up on an unprecedented, rapid timescale.


Growing demand for new and better biopharmaceuticals has led to sophisticated advances in protein synthesis that now allow for:

  • Rapid production of target proteins, including those that are difficult to express in cell-based expression systems

  • Straightforward scalability of protein expression from HTS to commercial levels

  • Combinatorial screening of many proteins to identify and optimize drug candidates

  • Introduction of site-specific chemical modifications, including nnAAs into proteins to improve pharmacological properties

These new approaches to protein expression will revolutionize the development of biopharmaceuticals, and open up the possibility to create drugs that were previously inaccessible, and even unimaginable until now.

Trevor Hallam* is chief scientific officer and Christopher Murray is vice-president of research, both at Sutro Biopharma, 310 Utah Ave, Suite 150, South San Francisco, CA 94080,

* To whom all correspondence should be addressed


1. K.J. Hamblett et al., Clin. Cancer Res. 10 (20), 7063–7070 (2004).

2. J.R. Junutula et al., Clin. Cancer Res. 16 (19), 4769–4778 (2010).

3. J.S. Ma, Chim. Oggi 21 (6), 65–68 (2003).

4. A.R. Goerke and J.R. Swartz, Biotechnol. Bioeng. 102 (2), 400–416 (2009).

5. I.S. Carrico, B.L. Carlson, and C.R. Bertozzi, Nat. Chem. Biol. 3 (6), 321–322 (2007).

6. A.J. de Graaf et al., Bioconjugate Chem. 20 (7), 1281–1295 (2009).

7. R. Mamluk et al., MAbs 2 (2), 199–208 (2010).

8. H. Mao et al., Nat. Biotechnol. 28 (11), 1195–2002 (2010).

9. J. Swartz, J. Ind. Microbiol.and Biotechnol. 33 (7), 476–485 (2006).

10. E.A Burks et al., Proc. Natl. Acad. Sci. 94 (2), 412–417 (1997).

11. L. Jermutus, L.A. Ryabova, and A. Pluckthun, Curr. Opin. Biotechnol. 9 (5), 534–548 (1998).

12. M.C. Jewett et al., Mol. Syst. Biol. 4, 220–end page (2008).

13. A.R. Goerke and J.R. Swartz, Biotechnol. Bioeng. 99 (2), 351–367 (2008).

14. J.F. Zawada et al., Biotechnol. Bioeng. in press (2011).

15. M. He, New Biotechnol. 25 (2–3), 126–132 (2008).

16. T. Tsuboi et al., Infect. Immun. 76 (4), 1702–1708 (2008).

17. J. Swartz Nat. Biotechnol. 27, 731–732 (2009).