The results we found from our previous subseries about classical least squares analysis provides the mechanism for understanding when and why calibration transfer can be done easily or when it will be difficult. Those results also provide a basis for a modified understanding of what calibration transfer means and how we can tell whether or not such a transfer can be performed, for any given analysis.
Before we continue, therefore, let's recap the key findings of our previous column (1) and what the new discoveries were:
2) The different concentration measures are not linearly related to each other.
3) The NIR absorbances, operative for the spectral values used in calibrations, are in fact related to the volume fractions of the mixture components.
4) The lack of bijectivity of point 1 and the nonlinearity of point 2 have nothing to do with the spectroscopy.
These properties of mixtures shed much light (no pun intended) on the behavior of mixtures and on the effect of that behavior on the various calibration algorithms that we apply. There are many implications and ramifications of the new knowledge we have gained, which are touched upon in another publication (2). Here, we discuss the effects of these properties on the behavior of data when we perform calibration transfer exercises.
"Calibration transfer" has been a buzzword in NIR spectroscopy for as long as NIR has been practiced. Other labels have been used over the years, such as "universal calibration," but the concept was the same: to create a calibration model on one instrument and apply it to samples measured on a different instrument. However, there were never any specifications to describe, or even define, what we meant by calibration transfer. No objective criteria were ever set up, by which we could ever know whether we had, in fact, successfully transferred a calibration, or what results can be expected from applying a transferred calibration to routine analysis.
In place of a formal definition, an empirical procedure has been used, which therefore has also been serving as a de facto definition. The procedure is to attempt transferring a calibration from one instrument to another, and then testing whether the transfer has succeeded. "Success" is determined by the agreement between the accuracy from the analyses on the "parent" instrument (also called the "master") instrument with the accuracy of the analytical results on the second instrument (usually called the "child" or "slave" instrument). Samples different than the ones used for calibration are usually measured on the child unit in order to simultaneously validate the calibration through the use of the separate sample set.
However, this empirical procedure conflates (the proper statistical term is "confounds") the issue of calibration transfer with the issue of whether the original calibration is any good to start with, whether the new sample set used to test the transfer capability is itself a proper set, and also whether the reference laboratory values are appropriate and have the same accuracy as the ones used for the calibration sample set. In principle, a true calibration transfer procedure should provide the same analytical performance on the two instruments, whether that performance is good or bad. With the current state of affairs, if we get good (accurate) performance on both instruments then that constitutes evidence that all is well. On the other hand, if we get good (accurate) performance on the parent instrument but poor performance on the child instrument, we have no information as to whether that is because of the calibration being nontransferable, a defect in the child instrument, an improper set of test samples, or a set of samples for the child instrument that has poor reference laboratory values.
Our previous exposition of the behavior of classical least squares (CLS) and the results from applying it to data was unfortunately never completed. We do hope to eventually publish "the rest of the story" (with all respect to Paul Harvey), but for now we will jump ahead of that full exposition to make use of some of the results and learn what they tell us about the behavior of data for calibration transfer.
The data used were taken from the 2002 International Diffuse Reflectance Conference (IDRC, Chambersburg, Pennsylvania) as the dataset for the "Software Shootout" at that conference. It was made publicly available and is described and can be downloaded from the following web page: http://www.idrc-chambersburg.org/ss20022012.html.
The contributors of the "Software Shootout" dataset also published their results from it (3); the publication also includes a more detailed description of the data.
More details concerning the instrumentation, sample preparation, and measurement procedures can also be found in the above-cited article (3). In this article, we concentrate our attention on the results from the 155 pilot plant calibration samples. Suffice it to say, for now, that we found equivalent results from the other two data sets included in that shootout data.
For the sake of completeness we note that for the purposes of the shootout, the instruments were arbitrarily designated unit 1 and unit 2. All our calibrations were performed on data from unit 1 and the models were used to predict corresponding measurements from unit 2.