Data integrity has been the subject of many “Focus on Quality” columns over the past few years (1–5). However, we have never mentioned, let alone discussed, data quality.
Data Quality and Data Integrity Are the Same, Right? Wrong!
Do the terms data integrity and data quality mean the same? As “maybe” and “yes” are not the correct answers, we will explore the differences between the two terms, and how data integrity is a key part of data quality.
Data integrity has been the subject of many “Focus on Quality” columns over the past few years (1–5). However, we have never mentioned, let alone discussed, data quality. In many people’s minds,
are terms that have equivalent meanings, hence the first part of the title of this month’s column. However, in reading the MHRA (Medicines and Healthcare Products Regulatory Agency, the UK regulator) GXP data integrity guidance, there is an indication from a regulator that the two terms are different (6). Newton and White have discussed the differences between data quality and data integrity in an ISPE blog in 2015 (7), and, in this column, we expand and explore the similarities and differences between the two terms.
MHRA GXP Data Integrity Guidance 2018
The MHRA’s 2018 guidance has in the introduction section the following statement in clause 2.7:
“This guidance primarily addresses data integrity and not data quality since the controls required for integrity do not necessarily guarantee the quality of the data generated (6).”
Clearly, in the mind of the regulator, there is a difference between data quality and data integrity. But what exactly is it?
Our data quality versus data integrity story starts toward the end of the last century with ALCOA-not the Aluminium Company of America, but an acronym standing for attributable, legible, contemporaneous, original, and accurate. These criteria have has been discussed in several data integrity guidances from the Food and Drug Administration (FDA), MHRA, the World Health Organization (WHO), and the Pharmaceutical Inspection Convention and Pharmaceutical Inspection Co-operation Scheme (PIC/S) (6,8–10). The WHO guidance contains the most detailed discussion of ALCOA criteria in an appendix with a definition of each term, presentation of the expectations for paper and electronic records, plus special risk management factors (9).
A European Medicines Agency (EMA) publication on electronic source data extended ALCOA to include an additional four terms of complete, consistent, enduring, and available (11). The extension of these four additional terms is known as ALCOA plus, or ALCOA+. A recent article in LCGC North America discusses all nine ALCOA+ criteria in more detail (12).
Origin of ALCOA
Reading further through the MHRA guidance, there is clause 3.10, a part of which is presented below, discussing ALCOA and ALCOA+ criteria:
“….. ALCOA was historically regarded as defining the attributes of data quality that are suitable for regulatory purposes…. ”
This is interesting; ALCOA criteria were originally defined as the attributes of data quality? How did this come about? The ALCOA acronym was invented by an FDA good laboratory practice (GLP) inspector named Stan Wollen, who developed it as an aide memoire to remember the key requirements of data quality. This is outlined in a publication entitled “Data Quality and the Origin of ALCOA” (13), available for download on the web. There is a lot of background information together with Wollen’s interpretation of ALCOA under the US GLP regulations 21
58 (14). For our discussion, ALCOA was originally associated with data quality, as noted by the MHRA in its GXP guidance document (6), long before it was stolen by various guidance documents for use as a cornerstone of data integrity.
Defining Data Quality and Data Integrity
Before we go into more detail about the two terms, we need to define them. The definitions for both data integrity and data quality are taken from the 2018 MHRA GXP Guidance for Data Integrity (6), and are presented in Table I. We will consider and discuss the data integrity definition first, and then spend time looking at data quality. I am not going to discuss the last paragraph in the right-hand column of Table I about sound science and risk management, although those are key inputs to data integrity, but not part of the data integrity and data quality debate in this column.
MHRA Data Integrity Definition Changes
What is interesting about the 2018 definition of data integrity is how far it has changed since the first MHRA guidance published in 2015. The earlier definition was:
“The extent to which all data are complete, consistent and accurate throughout the data lifecycle (15).”
This definition is a modification of a 1995 FDA definition of data integrity (16). The 2015 definition can be compared with that published in 2018, where the additions are highlighted in bold text:
“Data integrity is the degree to which data are complete, consistent, accurate, trustworthy, reliable and that these characteristics of the data are maintained throughout the data life cycle (6).”
There is the addition of two criteria (trustworthy and reliable), as well as a requirement to maintain all five data integrity characteristics throughout the data life cycle.
What is not stated in this definition, but in the explanation for raw data, is that dynamic data must be maintained in this format throughout the data life cycle. We shall return to this regulatory expectation, and the total inability of industry and suppliers to meet it, in a future “Focus on Quality” column.
Additional Data IntegrityRequirements?
As discussed above and in Table I, we have additional criteria listed in the MHRA definition: complete, consistent, trustworthy, and reliable, in addition to the five original ALCOA criteria (6). Two are ALCOA+ criteria that will be discussed below, plus two new requirements. ALCOA+ on steroids perhaps? Let us look at these four criteria, which to make matters worse are not defined in the guidance document:
Complete: This is one of the ALCOA+ criteria, and is also an FDA GMP regulatory requirement in 21 CFR 211.194(a) where complete data are mentioned (17). Put simply, every item of data and metadata collected during an analysis from sampling to generation of the reportable result is covered under “complete,” including (but not limited to) any instrument log entries, documentation of mistakes, investigations, deviations, and instrument and calibration failures. Of course, you won’t be allowing any laboratory user deletion privileges, will you?
Consistent: Another of the ALCOA+ criteria that requires that the data be consistent with the processes being followed, and that the time and date stamps reflect the creation, modification, and audit trail entries are as expected. The process (manual or automatic, hybrid or electronic) should ensure that the acquisition, transformation, or calculation of data is transparent and traceable.
Trustworthy: This is a new data integrity requirement that, together with “reliable,” harks back to the last century and the requirements outlined in 21 CFR 11 for electronic records and electronic signatures (18). In essence, can you trust the data or records? In some respects, it is the reverse of the ALCOA criteria. For example, if a record is attributed to an individual (ALCOA requirement), did that person actually do the work? Was the person in the laboratory at the time the work was performed, or did somebody act on their behalf?
Reliable: Another new data integrity requirement that means that all records and data are suitable and can be depended upon. Where a true copy has been made of an original record, is the verification process acceptable? The record set is not the nth iteration of testing to pass and that work has actually been performed as required under EU GMP Chapter 1.8 and 1.9 for in process and quality control testing, respectively (19)?
The addition of trustworthy and reliable to the ALCOA+ criteria are good. So, what does this mean for data integrity as a whole?
Data Integrity-Can I Trust the Data?
Data integrity provides assurance that the analytical work in the laboratory from the sampling to the calculation of the reportable result has been carried out correctly. In short, can you trust and rely on the analysis and the reportable results? If you cannot trust the analysis and the data, then you cannot make a quality decision based on poor, incomplete, or falsified data.
It is the role of the analytical scientist and the second-person reviewer to ensure data integrity, but that also requires that the instruments, computer systems, analytical procedures, and laboratory environment are set up correctly, and that the organization within which they work ensures that they are not pressured to cut corners or falsify data.
Therefore, ensuring the integrity of data is a major contributor to data quality. But, in focusing on data integrity, we must not lose sight of data quality.
Understanding Data Quality
You can see from the definition of data quality in Table I that there are three main elements: Data produced is exactly what was intended to be produced It is fit for its intended purpose
Data quality also includes ALCOA.
Hmmmm, what do these mean in practice? Let us look at item 1, that data produced is exactly what was intended to be produced. The problem with this is the wording of the definition. Although a laboratory produces mountains of data, we abstract and use the information within them to make decisions. For a detailed discussion of this process together with the controlled and uncontrolled parameters, see the article by McDowall and Burgess (20). Therefore, the purpose of a laboratory is to generate information from the analysis of samples in order to make decisions. These decisions could be to aid product development or release a batch of product.
To do this requires trained staff, qualified analytical instrumentation, validated software, and analytical procedures verified or validated under actual conditions of use, as required by 21 CFR 211.194(a)(2) (17). This is the same as the first three levels of the data integrity model discussed in earlier articles (21–23). Where the second item comes in is that the output of the analytical procedure must be fit for use. As a simple example, there is no point calculating results of an analysis to three significant figures if all that was required was the presence or absence of the analyte.
Data Quality-Can I Use the Data?
Once you have ensured the integrity of data, we must consider data quality. In part, this focuses on the laboratory’s ability to deliver the results to the sample submitter and meet their requirements for the analysis such as:
However, there is a compromise with data quality that is best illustrated by the data quality triangle.
Data Quality Triangle
There is a problem with the definitions of data quality that is not mentioned by the MHRA and the American Health Information Management Association (AHIMA); this is the data quality triangle, as shown in Figure 1. At each end of the triangle are the following attributes:
The problem is that you can only select two of the three at any one time. For example, if you want a fixed time and high quality, how much are you willing to pay or how much resource will you commit? Equally, if you want analysis performed quickly and at low cost, quality suffers. The quality triangle is applicable to any laboratory in any industry, but in a regulated laboratory you still must ensure data integrity as well. The influence of senior and laboratory management on managing the three criteria of the data quality triangle will directly impact both data quality and data integrity.
Integrity and Quality Use the Same Data Set
It is imperative to realize, as mentioned earlier, that the data set for ensuring data quality and data integrity is the same, and not two different ones. Furthermore, data integrity is not ensured first and followed by a second pass to ensure data quality. To ensure data quality and data integrity, both must occur in parallel, simultaneously and on the same dataset.
Because the two terms are different, as Newton and White (7) point out, you can have data integrity without data quality, and vice versa. A laboratory can have excellent data integrity, but if the analytical information to make a decision is delivered late, or not at all, then the data quality is useless. Alternatively, if each laboratory is permitted to create its local practices for data management, you can have integrity without quality (or, at least, very low levels of quality). All the data are there, but consolidating them requires a standardization project: good integrity, lousy quality. That is the one place where quality and integrity do differ, and where the regulations are weak (Newton, personal communication).
Newton and White (7) present a case where an organization outsources analytical work to contract laboratories, but there is little oversight of the work, resulting in data integrity lapses. When the results are received by the organization, entry to the LIMS controlled and delivery of the results to the decision makers is rapid, and with good data quality.
The similarities and differences between data quality and data integrity are shown in Figure 2. The similarities are:
Does ALCOA Apply to Data Quality or Data Integrity?
The problem and potential conflict comes with the third element of the MHRA definition for data quality that states that data quality includes ALCOA. This is consistent with Wollen’s original view when devising the acronym ALCOA (13). However, when we see the MHRA definition of data integrity in Table I, each ALCOA criterion is spelled out in the definition. Puzzled?
Is ALCOA an integral part of data integrity, which in turn is a component of data quality, or is ALCOA applicable to both terms, as noted in the MHRA definition?
Look at it from the perspective of the data and records being generated. Can you separate the data for data quality from data integrity? The answer is no, because the record set for both data integrity and data quality will be the same. Therefore, ALCOA criteria apply equally to data quality and data integrity. Think it through; otherwise you could have some data where all actions were attributable for data integrity, but not for data quality.
Figure 2 shows the relationship diagrammatically; in the middle is the analysis workflow from sampling to a decision based on the reportable result from the analysis. Throughout this process, data and records are generated that comprise the complete data from the analysis. This record set is impacted by the ALCOA criteria (as well as the four additional requirements) for data integrity mentioned in the MHRA definition in Table I and the ALCOA criteria for data quality.
Data Quality Attributes
Apart from ALCOA, there are no criteria for data quality mentioned by MHRA (6), as shown in Table I. Looking outside of the pharmaceutical industry, Newton and White (7) quoted AHIMA, which has a definition of data quality as it ensures clear understanding of the meaning, context, and intent of the data. Additionally, there are 10 criteria or attributes for data quality that are summarized below. These definitions include attributes of data quality intended for a healthcare environment; where appropriate, I have edited the definitions, denoted by “…” to remove the references to healthcare data:
When looking at these definitions, there are some commonalities with ALCOA+ criteria; for example, comprehensiveness can be equated to complete.
Quality Does Not Own Quality Any More
Data quality is not owned by the quality assurance department, as quality is now everybody’s job. Data quality starts in the laboratory. Quality assurance staff are not there to identify errors as that is a laboratory function of the two most important people in any analysis: the performer of a test, and the reviewer of the work undertaken. Data integrity and data quality are a laboratory responsibility. Quality assurance provides the advice, ensures that work is compliant with regulations, and provides the quality oversight via audits and data integrity investigations (22,24).
In this column, we have discussed the differences between data integrity and data quality for the same analytical data set. Although both terms use the ALCOA+ criteria, it is imperative that both data integrity and data quality are ensured so that the numbers can be trusted (data integrity) and decisions can be taken based on the results (data quality).
I would like to thank Chris Burgess and Mark Newton for helpful comments in preparation of this column.
R.D. McDowall is the director of R.D. Mc- Dowall Limited and the editor of the “Questions of Quality” column for LCGC Europe, Spectroscopy’s sister magazine. Direct correspondence to: SpectroscopyEdit@mmhgroup.com