The results we found from our previous subseries about classical least squares analysis provides the mechanism for understanding
when and why calibration transfer can be done easily or when it will be difficult. Those results also provide a basis for
a modified understanding of what calibration transfer means and how we can tell whether or not such a transfer can be performed,
for any given analysis.
Calibration transfer is an important and popular topic for both the science and practical applications of near-infrared (NIR)
spectroscopy. However, there is no consensus on the meaning of the term, and any claims of calibration transfer may be well
nigh meaningless in a scientific sense, despite its practical importance. We propose a definition, and a method of evaluating
calibration transfer, based on our recent discoveries about the nature of light absorbance in spectroscopic analysis.
Jerome Workman, Jr.
Before we continue, therefore, let's recap the key findings of our previous column (1) and what the new discoveries were:
1) Different measures of concentration commonly used for reference laboratory values for NIR calibrations do not (repeat not) have a one-to-one correspondence (that is, they do not form a bijective function) with each other. An important example
is the weight fractions of the components in a mixture compared to the volume fractions of those components.
2) The different concentration measures are not linearly related to each other.
3) The NIR absorbances, operative for the spectral values used in calibrations, are in fact related to the volume fractions
of the mixture components.
4) The lack of bijectivity of point 1 and the nonlinearity of point 2 have nothing to do with the spectroscopy.
It took considerable head-scratching, but eventually the realization dawned that the underlying properties leading to both
characteristics are purely the physical chemistry of the mixtures. This is further illustrated in Figure 1, which shows the
relationships between weight fractions and volume fractions for the set of mixtures of toluene, dichloromethane, and n-heptane, as examined in our previous column (1).
Figure 1: Weight fractions versus volume fractions of (a) toluene, (b) dichloromethane, and (c) n-heptane in ternary mixtures
of the three compounds.
These properties of mixtures shed much light (no pun intended) on the behavior of mixtures and on the effect of that behavior
on the various calibration algorithms that we apply. There are many implications and ramifications of the new knowledge we
have gained, which are touched upon in another publication (2). Here, we discuss the effects of these properties on the behavior
of data when we perform calibration transfer exercises.
"Calibration transfer" has been a buzzword in NIR spectroscopy for as long as NIR has been practiced. Other labels have been
used over the years, such as "universal calibration," but the concept was the same: to create a calibration model on one instrument
and apply it to samples measured on a different instrument. However, there were never any specifications to describe, or even
define, what we meant by calibration transfer. No objective criteria were ever set up, by which we could ever know whether
we had, in fact, successfully transferred a calibration, or what results can be expected from applying a transferred calibration
to routine analysis.
In place of a formal definition, an empirical procedure has been used, which therefore has also been serving as a de facto
definition. The procedure is to attempt transferring a calibration from one instrument to another, and then testing whether
the transfer has succeeded. "Success" is determined by the agreement between the accuracy from the analyses on the "parent"
instrument (also called the "master") instrument with the accuracy of the analytical results on the second instrument (usually
called the "child" or "slave" instrument). Samples different than the ones used for calibration are usually measured on the
child unit in order to simultaneously validate the calibration through the use of the separate sample set.
However, this empirical procedure conflates (the proper statistical term is "confounds") the issue of calibration transfer
with the issue of whether the original calibration is any good to start with, whether the new sample set used to test the
transfer capability is itself a proper set, and also whether the reference laboratory values are appropriate and have the
same accuracy as the ones used for the calibration sample set. In principle, a true calibration transfer procedure should
provide the same analytical performance on the two instruments, whether that performance is good or bad. With the current
state of affairs, if we get good (accurate) performance on both instruments then that constitutes evidence that all is well.
On the other hand, if we get good (accurate) performance on the parent instrument but poor performance on the child instrument,
we have no information as to whether that is because of the calibration being nontransferable, a defect in the child instrument,
an improper set of test samples, or a set of samples for the child instrument that has poor reference laboratory values.
Our previous exposition of the behavior of classical least squares (CLS) and the results from applying it to data was unfortunately
never completed. We do hope to eventually publish "the rest of the story" (with all respect to Paul Harvey), but for now we
will jump ahead of that full exposition to make use of some of the results and learn what they tell us about the behavior
of data for calibration transfer.
The data used were taken from the 2002 International Diffuse Reflectance Conference (IDRC, Chambersburg, Pennsylvania) as
the dataset for the "Software Shootout" at that conference. It was made publicly available and is described and can be downloaded
from the following web page: http://www.idrc-chambersburg.org/ss20022012.html.
The contributors of the "Software Shootout" dataset also published their results from it (3); the publication also includes
a more detailed description of the data.
More details concerning the instrumentation, sample preparation, and measurement procedures can also be found in the above-cited
article (3). In this article, we concentrate our attention on the results from the 155 pilot plant calibration samples. Suffice
it to say, for now, that we found equivalent results from the other two data sets included in that shootout data.
For the sake of completeness we note that for the purposes of the shootout, the instruments were arbitrarily designated unit
1 and unit 2. All our calibrations were performed on data from unit 1 and the models were used to predict corresponding measurements
from unit 2.