Emerging Life Sciences Standards

April 1, 2004
Jill H. Kaufman

Pharmaceutical Technology Europe

Pharmaceutical Technology Europe, Pharmaceutical Technology Europe-04-01-2004, Volume 16, Issue 4

This article looks at some of the current and emerging standards that are shaping the life sciences industry, and how pharma and biotech companies can benefit by implementing them.

Image PhotoDisc.

Although life sciences standards are still in the early days of development, some are available for use today. Specific standards are needed for a large variety of products such as toaster plugs, bank cards, compact discs, railroad tracks and the interchange of information between computer systems. It is important for organizations to utilize, follow and participate in these global standard initiatives because they help reduce costs and provide a more efficient exchange of information.

The ever-increasing complexity of developing new products and services benefits from the use of information systems technology. This applies to both commercial and scientific development. The most fundamental exchange or processing of information can only be enabled with speed, accuracy and reliability when supported by standards, and building proprietary interfaces is costly and time-consuming.

Information technology (IT) standards facilitate interoperability, which is defined as "the ability of two or more systems or components to exchange information and to use the information that has been exchanged." IT standards include specifying which information needs to be exchanged (syntax) and the meaning of the information (semantics). Syntax information includes the specific information to be exchanged and the associated structured format, for example patient last name (15 characters, alphanumeric) and date admitted (6 characters in MMDDYY format, numeric). Additionally, semantics provides a common understanding of the information, that is, a common vocabulary or thesaurus (for example, M=Male and F=Female or 1=male and 0=female).

Research focussed standards organizations

This article highlights some of the emerging life science standards and associated standards organizations in research, clinical trials and clinical genomics. These organizations differ in size, the formality of their structure and procedures to create standards, and how often they meet.

Research focussed standards organizations include Interoperable Informatics Infrastructure Consortium (I3C), Object Management Group-Life Sciences Research (OMG-LSR) Committee, Microarray Gene Expression Data Society (MGED) and Global Grid Forum's Life Sciences Grids Research Group (GGF LSG-RG). Standards organizations focussed on developing clinical trial standards include Clinical Data Interchange Standards Consortium (CDISC) and Health Level Seven's Regulated Clinical Research Information Management Committee (HL7 RCRIM). Health related clinical genomics work is being undertaken by the HL7 Clinical Genomics Special Interest Group.

Figure 1: Research focussed standards organizations.

MGED

MGED has created a specification for microarray data to produce standards that allow sharing of microarray data. MGED worked with OMG to create the gene expression specification, which is flexible and robust. Microarray gene expression object model (MAGE-OM) defines the objects of gene expression data independent of any implementation. MGED has also translated the MAGE-OM into an XML-based format (MAGE-ML) for facilitating the exchange of the data - an important international standard within the life sciences community. Enhancements to MAGE are being worked on collaboratively by OMG-LSR and MGED. MGED members represent an international community of biologists, computer scientists, data analysts and enabling vendors. MGED board members represent organizations including US and UK-based academic institutions.

OMG-LSR

OMG-LSR committee is focussed on creating interoperability standards that benefits researchers. OMG has a structured modelling process and methodology to create standards. OMG-LSR works with other standards organizations and takes their specification through their formal standardizations process, for example, I3C's life science identifier (LSID). OMG-LSR's joint creation with MGED of MAGE (in October 2002) provides a way to represent and exchange gene expression data and associated annotations.

I3C

Started in 2002, I3C aims to accelerate life sciences discovery through software interoperability. Some of its current work includes

  • Developing a schema for unique life sciences identifiers that provides a common way to find and name biological data (see LSID section).

  • Enabling interoperability based on the naming, description and discovery of services and data in the life sciences.

  • Fostering interoperability between de facto standards in pathways and systems biology.

  • Participating in joint standards work with HL7 and CDISC, based on the US Food and Drug Administration (FDA) 2003 draft guidance pharmacogenomics data submissions.

I3C has a diverse membership including pharma and biotech companies, academic research organizations and supporting vendor organizations that have come together to create specifications that eliminate barriers to interoperability and promote the integration of life sciences information.

GGF

The GGF is an international community-driven forum of individual, academic, government, and corporate researchers and practitioners working on distributed computing or grid technologies. In October 2002, the GGF established its first industry-academic focussed research group, the Life Sciences Grid (LSG-RG).

LSG-RG explores issues related to IT integration within life sciences on a grid infrastructure. Some of its goals include identifying clear examples and the diverse use of life sciences and health grids; discussing issues of access to data in life sciences; identifying how the grid is being challenged by the life sciences and where there is need for activity; and identifying different solution areas and possible reference architectures.

One of its projects involves creating a listing of all global grids relating to life sciences, including research, clinical trial and health/medical imaging. An additional project under discussion includes creating best practices of life sciences grids. LSG-RG will also collaborate and provide focussed content for GGF workshops.

Clinical trial and clinical genomics standards

CDISC and HL7 work together to create clinical trial standards, while HL7 is also independently creating clinical genomics standards.

CDISC

Created in 1997, CDISC develops the industry standards that support the electronic acquisition, exchange, submission and archiving of clinical trial data. CDISC's international membership includes pharmaceutical companies, contract research organizations (CROs), labs and information technology vendors. The consortium includes a European group (ECG) and Japanese group (JCG), and works closely with HL7's committee on clinical trial standards.

CDISC work groups include the lab operational data model (ODM), submission data standards (SDS) model and the analysis dataset model (ADaM). The lab model is discussed in more detail later in the article; below is an overview of the remaining work groups:

  • The SDS model encompasses data that is required for electronic submission to FDA for approval. It describes the content, structure and attributes of submitted data.

  • The ODM provides a format for clinical trial information representing study data, metadata and administrative data, including audit trial data that would be exchanged during or archived after a trial. It provides the vendor a neutral platform for the exchange of data in XML format. The model incorporates (and is fully compliant with) FDA guidance and regulations governing the use of computer systems in clinical trials.

  • The ADaM provides analysis datasets and related documentation to assist FDA in its statistical review of a sponsor's findings.

HL7

HL7 is recognized as the key standards developing organization within the international health care community. Accredited by the American National Standards Institute (ANSI), it produces standards that facilitate the exchange of medical records, clinical data and administrative/demographic data. Two areas of interest to the life sciences community are HL7's work relating to clinical trials and clinical genomics.

There is an active international community participating in HL7, with local country affiliates groups. Hospitals are active participants in HL7 committees and it is not unusual to have medical doctors participating in standards development. FDA has a strong presence in HL7 since it was mandated to use ANSI-accredited standards.

HL7's clinical trials committee, regulated clinical research information management (RCRIM), works in conjunction with CDISC to create clinical trial standards. The committee is creating standards for exchanging data regarding and generated by regulated clinical research. Standards will include HL7 messages and structure documents to exchange clinical trial lab, operational and submission data.

In January 2003, HL7 began work on clinical genomics standards and formed the Clinical Genomics Special Interest Group (SIG), which is the first standards work in the industry on the use of genomic data in health care. The scope of the SIG is the personalization (differences in an individual's genome) of genomic data and its link to relevant clinical information. The SIG is currently defining use cases for clinical genomics in areas including diagnostic testing for cystic fibrosis and breast cancer, tissue typing and clinic genomics data within clinical trial information exchange. Participating organizations in this SIG include the Mayo Clinic and Fred Hutchinson Cancer Research Center, as well as others with medical, genomics or information technology domain expertise. Organizations are encouraged to sign up for the SIG distribution list and participate in conference calls and meetings.

Standards: three key examples

MAGE

Microarray experiments generate a wealth of gene expression data, providing important insights into a variety of biological processes. It is very hard to interpret or analyse gene expression results without being aware of the many variables, such as experimental setup, biomaterial used and treatment procedures.

The microarray standards provide a means for specifying what data and associated annotations are needed to represent a microarray experiment. Included are the structure and format of microarray-based experiments that can be communicated, describing a comprehensive representation of microarray data. MAGE object model (MAGE-OM) describes microarray experiment setup, array designs, manufacturing information, gene expression data and analysis results. Components of microarray experiments that are represented in the standards include experimental design, samples used, the extract preparation and labelling, array design, hybridization procedures and parameters, and measurements.

MAGE consists of three parts:

  • MAGE-OM describes a comprehensive representation of microarray experiments based on different types of arrays. It includes the structure and format of microarray-based experiments that can be communicated. MAGE-OM can be used to map data structures in different platforms including Perl, Java and C++.

  • Markup Language based on XML (MAGE-ML) is a document exchange format derived from the object model.

  • Open source software toolkit (MAGE-stk) assists users in creating MAGE-ML. This toolkit, which is available in Perl and Java, helps to facilitate ease-of-use of the standard.

LAB specification

The CDISC laboratory standards team (CDISC LAB) team began work in 2000 on developing a model for the acquisition and interchange of clinical trial laboratory data. This standard is for interchange across central laboratories and their clients. Participants in this work included pharmaceutical companies, central labs, a CRO and technology companies. A high percentage of clinical trial submission data is lab data.

According to Susan Bassion, leader of the CDISC LAB team, the new model standard provides the opportunity for central labs, pharmaceutical companies and CROs to agree on a single standard format for data interchange that can support each organization's varying data content. Central labs providing clinical trial services to the pharmaceutical industry have had to write propriety interfaces for each organization and sample type, a process that requires considerable time and money for development.

As Bassion states, some labs are working with more than 1200 different interfaces. In addition, pharmaceutical companies report that it may take up to 6 months to work out the data interchange process with a new laboratory. Some central labs are already discussing providing preferential pricing to companies that will accept lab data transfers in a CDISC format that reflect the time savings made possible through use of the standard.

Released in 2002, CDISC LAB Model V 1.0 defines the standard content for transferring clinical laboratory data to sponsors. It defines 90 fields to accumulate different types of lab data. Implementations are available in ASCII, SAS, XML and HL7. The CDISC Lab standard is aligned with logical observation identifier names and codes (LOINC), a lab code standard.

Extensions for microbiology have been added to the model as Version 1.1 and the team is currently developing a urology model. Additionally, the CDISC lab team works with HL7 to co-ordinate the LAB model with the HL7 clinical trial LAB standard.

LSID

In 2003, the I3C completed a specification on life science identifiers (LSIDs), which provide a simple and standardized way to identify and access distributed biological data. Biological data could include anything that can be expressed in bytes. This vast list includes everything from medical images, spreadsheets and publications right through to protein structures, chemical analysis reports and DNA sequences; LSIDs refer to persistent, location independent, resource names.

An LSID can assist a scientist not only in fetching the data that the identifiers name but in collaboration between multiple scientists who can now be sure that they are referring to exactly the same thing. Since LSIDs provide a globally unique name for any item of data, they are ideal for retrieving, referring to, or annotating any piece of information held in one database by an unrelated third party application.

The LSID resolution process fetches the data named by an LSID. IBM has created several resolver software implementations of the I3C LSID specification that have been contributed to the I3C. Any organization can download the resolver open source code from http://ibm.com/developerworks/opensource/lsid/.

I3C has submitted the LSID specification to the OMG where it is going through a formal standardization process. LSIDs can be utilized on both public (for example, the protein data bank [PDB]) and in-house data repositories. Members of the I3C technical committee are available to assist pharma and biotech companies, research organizations and public repositories implement LSID.

Conclusion

Although life sciences standards are just emerging, the adapting organizations are already reaping the benefits. Standards save money by eliminating the reprogramming of applications each time requirements or organizations change. Without standards, an organization would need to create a unique interface for each organization or information exchange network.

However, the industry-wide adoption of life sciences standards depends on a variety of drivers, such as customers, business partners and federal governments, which will have a big influence; for example, in the US, FDA will eventually specify electronic submission of clinical trial information through the HL7 standard.

Life sciences standards development is at an early stage with some standards available today and more under development. The industry will reap benefits as they utilize these standards, which will facilitate interoperability. Standards bodies encourage organizations to actively participate in shaping emerging life sciences standards.

Bibliography

1.

www.i3c.org

2. http://lsr.omg.org

3. www.mged.org

4. www.ggf.org

5. www.CDISC.org

6. www.hl7.org