This is our 100th "Chemometrics in Spectroscopy" column, and when including "Statistics in Spectroscopy," there are now a
total of 138 columns. We began in 1986 and have been working since that time to cover in-depth discussions of both basic and
difficult subjects. In this newest series, we have been discussing the subject of multivariate calibration transfer (or calibration
transfer) and the determination of acceptable error for spectroscopy. We have covered the basic spectroscopy theory of spectroscopic
measurement in reflection, discussed the concepts of measuring and understanding instrument differences, and provided an overview
of the mathematics used for transferring calibrations and testing transfer efficacy. In this installment, we discuss the statistical
methods used for evaluating the agreement between two or more instruments (or methods) for reported analytical results. The
emphasis is on acceptable analytical accuracy and confidence levels using two standard approaches: standard uncertainty or
relative standard uncertainty, and Bland-Altman "limits of agreement."
As we have discussed in this series (1–3), calibration transfer involves several steps. The basic spectra are initially measured
on at least one instrument (that is, the parent, primary, or master instrument) and combined with the corresponding reference
chemical information (that is, actual values) for the development of calibration models. These models are maintained on the
original instrument over time, are used to make the initial calibration, and are transferred to other instruments (that is,
child, secondary, or transfer instruments) to enable analysis using the child instruments with minimal corrections and intervention.
We note that the issue of calibration transfer disappears if the instruments are precisely alike. If instruments are the "same"
then one sample placed on any of the instruments will predict precisely the "same" result. Because instruments are not alike
and also change over time, the use of calibration transfer techniques is often applied to produce the best attempt at model
or data transfer. As mentioned in the first installment of this series (1), there are important issues of attempting to match
calibrations using optical spectroscopy to the reference values. The spectroscopy measures the volume fractions of the various
components of a mixture.
Historically, instrument calibrations have been performed using the existing analytical methods to provide the reference values.
These existing methods have often overwhelmingly used weight percents for their results. Until this paradigm changes and analysts
start using the correct units for reporting their results we have to live with this situation. We will also have to recognize
that the reference values may be some prescribed analysis method, the weight fraction of materials, the volume percent of
composition, or sometimes some phenomenological measurement having no known relation to underlying chemical entities, resulting
from some arbitrary definition developed within a specific industry or application. Amongst ourselves we have sometimes termed
these reference methods "equivalent to throwing the sample at the wall and seeing how long it sticks!" The current assumption
is the nonlinearity caused by differences in the spectroscopy and the reported reference values must be compensated for by
using calibration practices. This may not be as simple as presupposed, but requires further research.
Multivariate calibration transfer, or simply calibration transfer, is a set of software algorithms, and physical materials
(or standards) measured on multiple instruments, used to move calibrations from one instrument to another. All the techniques
used to date involve measuring samples on the parent, primary (calibration) and child, secondary (transfer) instruments and
then applying a variety of algorithmic approaches for the transfer procedure. The most common approaches involve partial least
squares (PLS) calibration models with bias or slope corrections for predicted results, or the application of piecewise direct
standardization (PDS) combined with small adjustments in bias or slope of predicted values. Many other approaches have been
published and compared, but for many users these are not practicable or have not been adopted and made commercially available
for various reasons.
In any specific situation, if the prescribed method for calibration transfer does not produce satisfactory results, the user
simply begins to measure more samples on the child (transfer) instrument until the model is basically updated based on the
child or transfer instrument characteristics. We have previously described the scenario in which a user has multiple products
and constituents and must check each constituent for the efficacy of calibration transfer. This is accomplished by measuring
10–20 product samples for each constituent and comparing the average laboratory reference value to the average predicted value
for each constituent, and then adjusting each constituent model with a new bias value, resulting in an extremely tedious and
unsatisfying procedure. Such transfer of calibration is also accomplished by recalibration on the child instrument or by blending
samples measured on multiple instruments into a single calibration. Although the blending approach improves method robustness
(or ruggedness) for predicted results across instruments using the same calibration, it is not applicable for all applications,
for analytes having small net signals, or for achieving the optimum accuracy.
How to Tell if Two Instrument Predictions, or Method Results, Are Statistically Alike
The main question when comparing parent to child instrument predictions, a reference laboratory method to an instrument prediction,
or results from two completely different reference methods, is how to know if the differences are meaningful or significant
and when they are not. There is always some difference expected, since an imperfect world allows for a certain amount of "natural"
variation. However when are those differences considered statistically significant differences, or when are the differences
too great to be acceptable? There are a number of reference papers and guides to tell us how to compute differences, diagnose
their significance, and describe the types of errors involved between methods, instruments, and analytical techniques of many
types. The analytical method can be based on spectroscopy and multivariate calibration methods, other instrumental methods,
or even gravimetric methods. We have included several of the most noted references in the reference section of this column.
One classic reference of importance for comparing methods is by Youden and Steiner (4). This reference describes some of the
issues we will discuss in this column as well as details regarding collaborative laboratory tests, ranking of laboratories
for accuracy, outlier determination, ruggedness tests for methods, and diagnosing the various types of errors in analytical
Let us begin with a set of measurement data as shown in Table I. This is simulation data that is fairly representative of
spectroscopy data following multivariate calibration. These data could refer to different methods, such as methods A, B, C,
and D; or to different instruments. For our discussion we will designate that the data is from four different instruments
A to D for 20 samples. The original data from a calibrated instrument is A1. The results of data transferred to other instruments
is represented by B, C, and D. There are duplicate measurements for A as A1 and A2, and for B as B1 and B2, respectively.
From these data we will perform an analysis and look for levels of uncertainty and acceptability for the analytical results.
Note: C1 and D1 data are used in Figure 2 and will be referred to along with A2 and B2 in the next installment of this column.
Table I: Data used for illustration, instruments (or methods A, B, C, and D)