Exploring the Skills Gap in Statistical Programming Organizations

Published on: 
In the Lab eNewsletter, Pharmaceutical Technology's In the Lab eNewsletter, August 2023, Volume 18, Issue 8

Interoperability difficulties resulting from discrepancies in preferred programming languages can be resolved via statistical computing environments.

In the biomedical and pharmaceutical industries, speed to market of safe and effective new therapies is everything—both for patients who need them and for the life sciences companies that develop them. As companies look to accelerate R&D, there has been an increasing focus on clinical development, where the availability of efficiency-enabling technology is currently outpacing workforce enablement.

Biostatistics leaders, in particular, are aiming to modernize the skillset of their current statistical programming organizations. The prevailing approach has been to get established Statistical Analysis System (SAS) programmers proficient in R and eventually in Python, two open-source programming languages that are constantly evolving due to thriving communities of active contributors who are continually expanding the languages’ toolsets.

However, having all SAS programmers proficient in R or Python may not be necessary to take advantage of all the functionality that these languages offer. Novel, flexible statistical computing environments (SCEs) can solve the unique challenges that biostatistics leaders face in enabling diverse teams to effectively collaborate and accelerate study submissions.

A rift appears in the biostatistics workforce

Although FDA has long held to the principle of language independence, SAS has until recently been the predominant statistical computing language used for study submissions. This is because both the expertise and infrastructure within the industry have long been SAS-based.

However, SAS is a closed-source language, it is not extensible, and it requires licensing. Because of the availability and extensibility of open-source alternatives, such as R and Python, the speed at which these latter two languages have evolved to meet the statistical and data science needs of the user community has far outpaced SAS in recent years.

New talent entering the field of biometrics has a clear preference for using R and Python over SAS, leaving pharmaceutical organizations with an interoperability challenge. Consequently, a schism is widening between more experienced biostatisticians, who tend to prefer SAS, and those who are entering or have just recently entered the field, who are more likely to be proficient in R or Python.


This fluency gap has placed an added burden on the emerging workforce to develop SAS proficiency in addition to business domain expertise—an expensive and time-intensive operation for both workers and their employers. It also puts stress on experienced statistical programmers, forcing them to choose between relying on the SAS expertise they have spent years developing or moving to a new language they are not comfortable with, all while remaining productive and efficient in their jobs. For biostatistics organizations, the challenge is acute as the need for statistical programmers is greater than the number of jobseekers, contributing to resource shortages and delays.

All of this is unfolding at a pivotal moment in the broader evolution of technology. Recent advances such as rapidly expanding graphic processing unit compute power, the emergence of generative artificial intelligence, and increased efficiency of solid-state data storage are poised to radically transform all aspects of R&D.

A different approach to enabling the biostatistics workforce

Biostatistics leaders understand that opportunities exist to maximize the impact of available data. However, the challenge lies in enabling statistical programming teams to collaborate on studies as effectively as possible with minimal disruption, particularly when something as fundamental as the SAS and R/Python proficiency schism is so pronounced.

Biostatistics leaders have been thinking that the first step in this process is unifying the skillsets of their teams. However, this isn’t necessarily a prerequisite to be able to take advantage of today’s available open-source technologies. Instead, biostatistics organizations can adopt readily available SCE systems that enable all team members, regardless of their programming language expertise, to collaborate on studies and individually upskill with minimal disruption. Neither group needs to focus precious time and effort solely on learning a new language that can realistically require an entire career to master.

These new SCE systems allow tremendous flexibility in how individuals can contribute to team efforts:

  • Good Practice (GxP) and non-GxP work can be performed on the same platform without compromising on flexibility or compliance.
  • Each team member can use their language and integrated development environment software of choice to analyze data and create outputs within one organized project.
  • Each team member has the flexibility to choose the right tool for each individual deliverable.
  • The platform provides automatic versioning, traceability, and reproducibility of results.
  • Organized batches of SAS, R, and Python jobs can be executed from one command.
  • Organizations can leverage data for multiple purposes without having to make copies from one system to another.

In the author’s experience, the availability of these systems is most welcome news for biostatistics leaders struggling to build out and enable biostatistics teams that can accelerate R&D. There is a pool of candidates out there who have statistical programming expertise, but not in SAS: however, that is okay. When facilitated by the right technical infrastructure, these candidates can contribute meaningfully to the work at hand without needing as much additional training. Meanwhile, experienced SAS programmers don’t need to divert their effort and focus away from current deliverables to learn new languages. Biostatistics leaders finally have an easy solution to the challenge of the talent gap.

When the most important choice to make is the choice to empower team members to do their innovative work in the way they want to, biostatistics leaders will not only solve their workforce enablement challenges, but also enable their teams to move at the speed that their business and the life sciences industry require.

About the author

Caroline Phares is Global Head of Health and Life Sciences at Domino Data Lab.