Progress
This column and the next few installments are intended to constitute a discussion on the ramifications of the new findings.
We base our discussions on the previously reported results (1–11).
To review the findings until now, in our previous column (11) we examined the behavior and performance of nearly ideal samples
(clear liquid mixtures) using a nearly ideal calibration algorithm (classical least squares [CLS]) and found several characteristics
of our data set that indicate that volume fractions are the operative physical variable that spectroscopy is sensitive to.
First, we noticed that the values from the CLS analysis of the spectral data from the second laboratory had values that corresponded
to the values that represented the target values in the original experimental design: 0%, 25%, 50%, 75%, and 100%. We also
noted that in the samples for the second laboratory, those values corresponded to the volume fractions of the various components.
We then applied the scientific method to our microcosm and formulated the hypothesis that when Beer's law holds, the absorbance
spectra of clear solutions are in fact sensitive to the volume fractions of the components in the mixture. Continuing the
application of the scientific method, we verified the hypothesis by applying it to the data from the first laboratory, which
had not previously been analyzed this way, and found that indeed the performance of the CLS method was much improved when
the CLS values were compared to the volume fractions, rather than any of the other concentration units that had been used
previously.
Figure 1: Plots of CLS values versus weight percent and volume percent, for toluene: (a) weight percent versus CLS values;
(b) volume percent versus CLS values.

This was ascertained by comparing individual values of the concentrations as determined by the various methods. A more robust
comparison can be obtained by calculating a statistical figure of merit, similarly to the way accuracy measures are obtained
for other quantitative calibration algorithms. There are both graphical methods available and numeric methods. It made sense
to use both approaches for the comparison, if for no other reason than that is standard procedure when performing calibrations
with the other algorithms, and we decided to examine our CLS results as closely as is normally done. We started with the numeric
approach, and computed the root mean square differences and correlation coefficients between the concentration values from
the CLS method and the concentration values obtained using other units. The calculations were taken over all 15 of the samples,
going from 0% to 100% of each component in accordance with the experimental design used (see Figure 1 in reference 5).
Table I: Comparison of performance statistics for different units of concentration

As we saw, the data from the second laboratory were somewhat erratic, a characteristic that was explained by David Heaps,
the scientist who performed the data collection in that laboratory, as being caused by fact that the sample cell used was
not large enough to encompass the entire beam from the spectrometer. This caused small variations in the position of the cell,
which ordinarily would be negligible, and had inordinately large effects on the readings. We also previously noted the erratic
nature of the data from the second laboratory in reference 9. Therefore, we present only the results from the first laboratory
in Table I.
Table I makes it abundantly clear that for all sample components, the comparison of the CLS values with the volume fraction,
overall, is much better than with any of the other units. Indeed, the values of these statistical tests are competitive with
what is usually expected for "typical" calibrations using other chemometric calibration algorithms (such as PCR or PLS)
CLS calculations, done the way described in the previous columns, have several interesting and useful characteristics of their
own.
 It's the nearest thing we have to an "absolute" analytical method, and therefore, the analytical results can be obtained
without ever doing a conventional "calibration" of the sort we usually think of.

Because results can be obtained without a conventional calibration, you don't have to worry about laboratory error; there's
no "lab error" because there are no laboratory values to deal with (except for validation).

CLS values are linear with the known concentrations (as long as they are expressed in appropriate units) over the range
0–100% (or, strictly speaking, volume fractions in the range 0–1).
 Because of this linearity, a conventional calibration based on volume fractions (or one of the scaled variants) should be
extrapolatable.
Given that these CLS values are obtained without any reference laboratory values, and using zero PLS or PCR factors (which
also means they can't be overfit), the results for the statistical values for performance of volume fractions in Table I are
pretty impressive.
Figure 2: Plots of CLS values versus weight percent and volume percent, for dichloromethane: (a) weight percent versus CLS
values; (b) volume percent versus CLS values.

We can further confirm the nature of the agreements by plotting the CLS values for each sample versus the concentration expressed
in the two different key units of interest. Figures 1–3 show this view of the data for each of the three different components.
The plots for the three components versus weight percents (Figures 1a, 2a, and 3a) show varying amounts of scatter. In Figure
1a, the plot for the CLS values for weight percent of toluene does not show any curvature in the plot to the naked eye. Figures
1a and 3a show a considerable amount of scatter, although Figure 2a does not.
Figure 3: Plots of CLS values versus weight percent and volume percent, for nheptane: (a) weight percent versus CLS values;
(b) volume percent versus CLS values.

The other three plots of CLS values versus volume fractions (Figures 1b, 2b, and 3b), show decreased amounts of scatter (except
for the dichloromethane). Figures 1–3 present a graphical comparison that provides a convenient "eyeball" comparison method.
Figures 1a, 2a, and 3a show that the reference values are constant, at each level of the analyte, at the value specified by
the experimental design. Figures 1b, 2b, and 3b show that all three exhibit linear relations between the CLS and volume fractions
over the range 0–100%. Neither do any of the plots of CLS values versus volume fractions show appreciable amounts of scatter;
indeed, the use of volume fractions as the unit for the reference values has caused all the data points to fall on a line
that is visibly straight, for all three components.
Table II: tvalues from linearity test, spectroscopy versus indicated units

But we would also like to be able to apply a more objective, mathematical comparison method. Therefore, we also applied a
statistical linearity test (described in reference 12) to the results. This test computes a tvalue against the null hypothesis so that there is no nonlinearity in the relationship between paired data points. A statistically
significant value for the computed tvalue indicates that there is enough nonlinearity to be definitely detected by an objective criterion, while a value that
is too small to be statistically significant means that any nonlinearity, if present, is too small to be detected. Table II
shows the tvalues for the comparisons when comparing the CLS values against other units.
The tvalues in Table II, for the comparison of the spectroscopic results with the gravimetric values, indicate highly significant
amounts of curvature in the relationship between the CLS values and the known weight percents for dichloromethane, in agreements
with our visual observation. The corresponding tvalues obtained when the spectroscopic results were compared to the volumetric values indicate no detectable nonlinearity
for this comparison. These statistics agree with what we infer from Figure 2.
Interestingly, neither toluene nor the nheptane exhibit statistically significant tvalues for either nonlinearity test. Whereas Figure 1a appears to be adequately straight, Figure 3a visually shows what appears
to be a curvature of the relationship for the nheptane; both ends of the relationship appear to bend upward from a line that follows the rest of the data. Table II, however,
shows that the tvalue for this case is only 3.25, a marginally significant value at best. In the cases of toluene and nheptane, the large amount of scatter seen in this plot is masking any systematic effect that might actually be present and
prevent us from demonstrating that the systematic portion is more than the large amount of scatter could account for, giving
us a nonsignificant result.
An interesting side note here: Both the graphical and numerical results show that volume fraction provides a spectroscopic
method that is substantially linear over the range 0–100% (when Beer's law holds, of course) and accommodates the variations
in the "matrix" — that is, the rest of the sample. Those properties would not disappear if the concentration values expressed
in volume fraction were multiplied by a constant. Convenient constants would be ones like density, moles per unit volume,
or similar implicit properties of the analyte. Suitable constants would be constants that are measures of a property per unit
volume (as the density and moles per unit volume themselves are). Multiplying one of these constants by the volume fraction
(or volume ratio) results in canceling the volume term in the numerator of the volume fraction, and thereby replacing the
numerator term in the volume fraction expression by the alternate unit (that is, mass if density is the constant, or moles
if moles per unit volume were the constant). The resulting values would then be concentration measures on a volume basis (density,
molarity, and so on), which arguably are more familiar to chemists. These other quantities would not necessarily satisfy the
more stringent requirements of the CLS algorithm, but would be expected to be linearly related to the volume concentration,
and therefore provide a satisfactory basis for expressing the analyte concentration for the more common calibration algorithms
that do indeed require the concentration for the calculations.
This is what the data show. The important question now is what it means.
