Defining and Managing Raw Manufacturing Data

Published on: 
Pharmaceutical Technology, Pharmaceutical Technology-06-02-2019, Volume 43, Issue 6
Pages: 32–38

Protecting the integrity of raw data is crucial to regulatory compliance and to proving that manufacturing and quality operations are being run and managed properly.

Data integrity ensures that information stored during pharmaceutical manufacturing is reliable and trustworthy. Electronic records (e-records) pose special data integrity challenges. Links between electronic data, raw data (i.e., the first capture of information, whether recorded on paper or electronically [1]), metadata, and records must not be compromised or broken if the data and their relationships with other data are to be valid.  

Preserving the integrity of the raw electronic data generated by manufacturing and quality operations is crucial because these data provide the only evidence that these departments are being run and managed correctly and in a way that complies with regulations. It is the foundation for continuous process verification (CPV) and process validation. Technological controls must be in place to ensure the integrity of these data. This article discusses these controls and how they should be implemented for identification, storage, protection, retrieval, retention time, and disposition of current good manufacturing practice (cGMP) records (2).

Data lifecycle

The data lifecycle (Figure 1) helps to map and explain the controls that are necessary to manage data, raw data, metadata, and records (3) properly. Data access control is crucial, for example, and any changes to an e-data point can only be made by someone who has been authorized to make those changes. Failure to address even one element of the data life cycle will weaken the overall effectiveness of controls implemented for the computer system and for e-data integrity.

During the data capture stage, data are collected and related actions are performed. Then, during transformation, the data are scaled and converted, and then built-in checks (summarized in European Union [EU] Annex 11-11 [4]) are performed to verify that all the data are correct after the transformation. 

Because the data transferred during this stage move between process equipment and the computer, the interface between the two should be validated and checked periodically to ensure accuracy. The accuracy and reliability of the raw data depend not only on properly calibrated and maintained instruments and equipment, but also on the integrity of the raw data that have been recorded. When instruments and equipment cannot ensure secure data access and administration of electronic data files, collected data can be passed directly to a secure environment (typically a supervisory control and data acquisition [SCADA] system or data historian) for processing and recordkeeping.

After the data have been transformed and used, they must be moved to a recordkeeping environment where they can be edited for as long as necessary as the data are retained. During the active phase (i.e., when data are actively being used), raw data may be cleansed periodically to correct inconsistent values after they have been used. Periodically, data must be reconciliated. Cleansing or data cleaning is performed to detect and correct corrupt or inaccurate records from a record set, table, or database. This activity must be suitably managed and documented (e.g., by establishing an audit trail). The inactive phase starts with records archival, which applies to inactive, superseded, replaced, and withdrawn data. These records must be kept to meet the data retention schedule and traceability requirements.  These records usually maintain “read” and “view” attributes.  There are exceptions, however, in which “processability” may be extended for the full life of the records through to discard. 

Pharmaceutical manufacturing operations

In these types of operations, data that have been loaded from field sensors contain a measurable attribute of a physical entity, process, or event (5). The loaded data are recorded, becoming raw data, which are considered “original” or “source captured” (6). When multiple raw data are generated to satisfy a cGMP requirement, such raw data become a cGMP record (7). Examples of raw data in a typical manufacturing environment include:

  • Analog readings (e.g., of temperature, pressure, flow rates, levels, weights, central processing unit [CPU] temperatures, mixer speeds, or fan speeds)

  • Digital readings (e.g., of valves, limit switches, motors on/off, and discrete level sensors)

  • Product information (e.g., IDs for product, batch, material, or raw material lot)

  • Quality info (e.g., process and product limits, custom limits)

  • Alarm info (e.g., out-of limits or return-to- normal signals).

The raw data hold the content of the e-record that will reproduce the full cGMP automated activities (8). Properly recorded and managed raw data are the foundation that is required to demonstrate the product identity, strength, purity, and safety. The e-records associated with raw data demonstrate that the manufacturer’s processes meet the requirements of cGMPs, including those for process sequencing and instructions (9).


Accurate management of data during entry or collection, storage, transmission, and processing (10,11) provides controls required for the processing and retention of loaded data, raw data, and e-records. The integrity of manufacturing raw data is a basic prerequisite to CPV, an essential part of FDA’s process validation requirements. CPV is designed to provide continual assurance that the process remains in a state of control during commercial manufacturing. The collection of information about the performance of the process will allow detection of undesired process variability so that the process remains in control.


Identification of cGMP records

Identification to the cGMP records (12) and associated controls are crucial to the success of any pharmaceutical manufacturing operation. The characterization of these cGMP records usually starts with a primary design document such as a process and instrumentation drawing (P&ID). A process flow diagram (PFD) or some other form of schematic may also be used.

Table I depicts the critical process parameters for a solid dosage form manufacturing process. Process equipment incorporates instrumentation designed to control the process and acquire data about each critical process parameter.

The primary goal for controllers is that they work accurately in the intended process. The controllers are dynamically verified during the qualification of the automated cell controller. The cell controllers are typical Level 0 in the ANSI/ISA-95 and essential to ensure proper functioning of the process and product quality. The input/output (I/O) list refers to the information that comes into and goes out of the manufacturing system.

In Table I, for example, the air temperature, air volume dew point, and product temperature are the I/Os associated with a fluid bed dryer (13). Field instruments provide measurements of these values from field instruments via terminating wires in the digital system I/O processing section. After transformation, the data are transmitted to the SCADA system over the communications link.

The first step in documenting the I/O requirements is to compile a list of all the applicable points that are referenced on the P&ID. This is necessary so that the specific signal and termination data can be associated with each point or each instrument. Alarms and reporting requirements must also be considered.

Storage of records

Raw data are original records generated by means of computer systems and become the contents of an e-record. E-records storage devices record, store, or retrieve e-records from any medium, including the medium itself. This is considered a short retention environment. Design specifications or similar documents must describe the file structure(s) in which the e-records are to be stored, as well as the capacity requirements of the storage, and how the security scheme is implemented.  The file structure and security are verified/tested during the qualification.

After the data are recorded and retained by computer storage (e.g., historian/SCADA storage), the physical and logical controls to the e-records must be in place. These controls include physical protections, stamped audit trail, data management, archival and retrieval of records. Alarms and the associated actions to the alarms are managed by the programmable logic controller (PLC). The associated alarm records are saved in the corresponding repository system at the storage device level. Physical protection is important because environmental effects can cause media to deteriorate. Copying information without changing it offers a short-term solution, ensuring that information is stored on newer media before the old media deteriorate to the point where the information can no longer be retrieved.

To ensure data integrity during storage, any changes that have been made to an e-record must be recorded, including the previous entry, who made the change, and when the change was made (14). To reduce the risk of losing the e-records in storage and to guarantee that they will be ready for use, data must periodically be backed up. Backup data must be stored separately from the primary storage location, and at a frequency based on an analysis of risk to CGMP e-records and the capacity of the storage device.

The efficacy of the backup and restore processes must be verified as part of the qualification process. In addition, the capacity level of the storage must be monitored. As in archived e-records, the e-records in storage need to be verified periodically for accessibility, readability, and integrity. If changes are implemented to the computer infrastructure and/or application, then it is required to ensure and test the ability to retrieve e-records.

One critical element to consider is legal holds to the e-records, which may exist when the manufacturing company or contract manufacturer is involved in litigation. These records cannot be destroyed, even if the data retention period has expired. The regulated entity is under a legal obligation to retain all relevant, and a legal holds record system or other mechanism must be implemented to identify e-records that would be affected by a legal hold.

In optimizing physical location requirements for the e-data, web and database servers should ideally be separated.  Database servers should be isolated from a website’s demilitarized zone (DMZ), based on security standards.  A DMZ is a physical or logical subnetwork that contains and exposes an organization’s external-facing services to a larger and untrusted network, usually the Internet. The purpose of a DMZ is to add an additional layer of security to an organization’s local area network (LAN); an external network node only has direct access to equipment in the DMZ, rather than any other part of the network (15). 

These servers can locate them on a physically separate network segment from the web and other Internet-accessible servers that support the business. Preferably, one should partition the database server off from the web servers by a dedicated firewall. This firewall should only allow database traffic between the web server and database server. The firewall should also deny and log all traffic from any other location, or other types of traffic from the web server. Regulators, and particularly FDA, expect that data written in the storage device be saved at the time they are generated (16). As appropriate, it is the expectations of the regulatory authorities that the data written in the storage device must be saved at the time the data are generated (17).

Protection of data and records 

The protection of transient data, raw data, and e-records cover data in storage, during processing, and while in transit (18–20). The protection of transient data, raw data, and e-records may be set in two environments: transient data before reaching the historian/SCADA and raw data, as shown in Figure 2.

Transient data. At the PLC level, the analog data are extracted from the PLC memory, transformed (i.e., digitized, validated, normalized, and scaled) and sent to the SCADA. The data collected directly from manufacturing equipment and control signals between equipment and a data server (e.g., SCADA) may be regarded as transient and cannot be edited by reasonable means or reprocessed by the human user. Similar to the controls associated with e-records in transit, the data integrity controls for transient data are:

  • Qualification of the infrastructure. The outcome of this qualification provides documentary evidence that accounts for the correct implementation of integrated hardware and associated devices (21).

  • Built-in checks for the correct I/Os. These built-in checks are, at first, validated. During the operational stage, the built-in checks must be periodically verified (as required by FDA 21 US Code of Federal Regulations (CFR) Part 211.68(b) and EU Annex 11-5) (22 and 23). 

  • Accuracy checks. Usually performed at the supervisory system level, accuracy checks are required for critical data that have been entered manually by authorized personnel. These critical data require input verification to prevent incorrect data entries.

After the data are recorded and retained, physical and logical controls to the e-records must be implemented and executed. These controls include security, access authorization, backups, periodic reviews, stamped audit trails, built-in checks (required by FDA Compliance Policy Guide CPG Section 425.400), and other relevant data-management controls. The PLC manages alarms and associated actions, which are saved at the storage device level. Controls are required if  the e-records and associated raw data must be transferred from the original processing environment. After concluding the migration process, verification must be performed to ensure that the information in the original e-records has not been altered. This verified copy becomes a true or certified copy.

Retrieval of records

Access to e-records should be ensured throughout the retention period (as required by EU Annex 11-7.1). The access to these records must be controlled to ensure the integrity of the e-records in storage. The controls associated with e-records in storage allow those individuals who depend on the e-records to correctly fulfill their job functions.

During the Active Phase, manufacturing e-records will typically be held in the environment in which the records were initially created. In this environment, the e-records are visible to the tools that created them. Any features designed to allow them to be changed or deleted must ensure audit trails that record the reason for change or deletion, as well as other information as required by the applicable regulation. 

Periodic (or continuous) reviews must be performed after the initial validation (as required by EU Annex 11-11) of the processing environment. These reviews check stored, backup, and archived e-records for accessibility, readability, and accuracy. They also verify the output of the backup and the accuracy of the overall audit trail, verifying the accuracy and reliability of the e-records transferred (WHO 3.2). In addition, processes for reading and managing e-records must ensure their data integrity. The infrastructure between the records in storage and the processing environment must be a controlled environment and must be qualified and checked for accuracy.

Data retention time

The EU cGMPs establish that raw data supporting information in the marketing authorization (24), such as validation or stability data, should be retained while the authorization remains in force.  In some cases, periods up to 30 years’ worth of raw data must be retained. It may be considered acceptable to retire certain documentation when the data have been superseded by a full set of new data. In such cases, justification should be documented and should take into account the requirements for retention of batch documentation. The accompanying raw data should be retained for a period at least as long as the records for all batches whose release has been supported on the basis of that validation exercise.

For a medicinal product, the batch documentation must be retained for at least one year after the expiry date of the batches to which it relates, or at least five years after the certification referred to in Article 51(3) of Directive 2001/83/EC, whichever is the longer period. At least two years of data must be retrievable in a timely manner for the purposes of regulatory inspection.

Applicable FDA regulations, 21 CFR 211.180(a), call for data that are part of the drug product production and control records to be retained for at least one year after the expiration date of the batch or, in the case of certain over-the-counter (OTC) drug products lacking expiration dating because they meet the criteria for exemption under 21 CFR 211.137, for three years after distribution of the batch. As the results of the traceability requirements, the raw data will be retained as specified in 21 CFR211.180(a).

When computer systems are used instead of written documents, the manufacturer shall first validate the systems by showing that the e-records will be appropriately stored during the anticipated period of storage. E-records stored by those systems shall be made readily available in a legible form and provided upon the regulators’ request. The electronically stored e-records shall be backed up and protected against loss or damage, and audit trails shall be maintained.


Disposition of records

If active records are transferred to another environment, validation should include checks that data have not been altered in value and/or meaning during this migration process, as required by EU Annex 11-4.8.

E-records that are placed in retention environments, other than the environments that were used for their original creation, should preserve the integrity of the raw data, associated e-record, and protection mechanisms used to prevent informational loss and/or corruption. Should records require modifications in retention environments, a clear audit trail of change or replacement history, including record removal, should be maintained.

Once e-records have been placed in the retention environments, they should never be directly modified. If technical limitations require the electronic record to be modified in the retention environment, the change must have traceability to the same change in the processing environment. The inactive phase starts with records archival. These records need to be kept to meet retention schedule requirements and traceability.  These records usually maintain read/view attributes. Finally, during the deletion phase, the e-records are discarded. This is a phase of short duration and includes metadata and audit trails.


1.  UK Medicines and Healthcare Products Regulatory Agency (MHRA), GXP Data Integrity Guidance and (MHR, 2018).
2. ISO, ISO 9001:2000 Quality Management Systems – Requirements, 4.2.4 Control of Records,, (ISO, 2015).
3. ISPE/PDA, Technical Report: Good Electronic Records Management (GERM), Collection of Felated Data Treated as a Unit. (ISPE/PDA, July 2002).
4. European Union, Annex, (EU, 2011).
5. MHRA, MHRA CGMP Data Integrity Definitions and Guidance for Industry, March 2015.
6. ISPE/PDA, “Technical Report: Good Electronic Records Management (GERM),” July 2002.
7. MHRA, “GxP Data Integrity Guidance and Definitions,” March 2018.
8. FDA, Data Integrity and Compliance with cGMP, Q&A - Guidance for Industry (FDA, 2018).
9. CFDA, Draft Data Integrity Guidance (September 2017) (CDFA, 2017).
10. O. López, Preface, Data Integrity in Pharmaceutical and Medical Devices Regulation Operations, pp.15 -17 (CRC Press, Boca Raton, FL, 2017).
11. NIST SP 800-33, “Underlying Technical Models for Information Technology Security,” December 2001 (Withdrawn: August 2018).
12. FDA, Guidance for Industry-Process Validation: General Principles and Practices (FDA, 2011).
13. L.T. Amy, Automation Systems for Control and Data Acquisition (ISA, 1992).
14. A Kane, Sidebar in A. Siew, “Designing Optimized Formulations,” Pharmaceutical Technology 41(4) (2017).
15. Health Canada, “Good Manufacturing Practices (GMP) Guidelines for Active Pharmaceutical Ingredients,” GUI-0104, C.02.05, Interpretation #15 (Health Canada, December 2013).
16. B. Mitchell, “Using a DMZ in Computer Networking,”, Jan. 6, 2019.
17. FDA, 21 CFR Part
18. EU, Chapter 4 Section 4.8, (EU, 2011). 
19.ICH, ICH Q7 GMP Guide for APIs, Section 6.14.
20. NIST SP 800-33, “Underlying Technical Models for Information Technology Security,” December, 2001 (Withdrawn: August 2018).
21. FDA, 21 US Code of Federal Regulations (CFR) Part 211.68(b).
22. EU Annex
23. O. López, Computer Infrastructure Qualification for FDA Regulated Industries (PDA and DHI Publishing, LLC, 2006).
24. EU, Good Manufacturing Practice Medicinal Products for Human and Veterinary Use, Volume 4, Chapter 4: Documentation, Section 4.12.

Article Details

Pharmaceutical Technology
Vol. 43, No. 6
June 2019
Pages: 32–38


When referring to this article, please cite it as O. López, “Defining and Managing Raw Manufacturing Data,” Pharmaceutical Technology 43 (6) 2019.


Submitted: January 4, 2019
Accepted:  January 24, 2019