Classical Least Squares, Part X: Numerical Results from the Second Laboratory

June 1, 2012

The results from the experiment in the second laboratory are calculated and examined.

Here, we calculate and examine the results of performing the classical least squares calculations on the data from the experiment in the second laboratory.

This column is the next continuation of our discussion of the classical least squares (CLS) approach to calibration (1–9). In our previous column (9), we reported on the spectral results when a second scientist repeated the experimental exercise at his own (different) laboratory. On initial examination, we found that there was a difference in the materials used (chloroform instead of dichloromethane); otherwise the spectral results appeared to be equivalent to the results from the first laboratory.

Figure 1: Comparison of mixture spectra reproduced from CLS calculations in various wavelength regions, with actual mixture spectra, for sample consisting of 25% toluene, 25% chloroform, 50% n-heptane. Black = reconstructed spectrum; Blue = actual spectrum.

We now continue, as we did previously, by determining the ability of the CLS method to reproduce the spectra of the mixtures from the second laboratory. The three ternary mixtures are shown in Figures 1–3, each shown in expansions of the wavelength ranges examined previously in the first laboratory.

Figure 2: Comparison of mixture spectra reproduced from CLS calculations in various wavelength regions, with actual mixture spectra, for sample consisting of 50% toluene, 25% chloroform, 25% n-heptane. Black = reconstructed spectrum; Blue = actual spectrum.

We again find that in the second laboratory, the same as for the first laboratory, the reproductions of the target mixture spectra are exceptionally good. This provides confirmation that the CLS calculations are performing the way they should and that there seems to be no reason to suspect either the theory or the calculations based on that theory.

Figure 3: Comparison of mixture spectra reproduced from CLS calculations in various wavelength regions, with actual mixture spectra, for sample consisting of 25% toluene, 50% chloroform, 25% n-heptane. Black = reconstructed spectrum; blue = actual spectrum.

Therefore, we continued as before, by calculating the CLS values at the various wavelength ranges for a single selected sample from the set of data produced by the second laboratory and comparing them to the values set by the experimental design. Before doing that, however, we must point out a key difference between the samples created for this experiment and the samples created for the experiment performed in the first laboratory. In the second laboratory's experiment, we note that the samples were prepared volumetrically (refer to Table I from part IX [9]). In contrast, the samples from the first laboratory were prepared gravimetrically (refer to Table II in part V [5]). Thus, if we want to compare the spectral results to the target values in a way that corresponds to the way it was done in the first laboratory, we must convert the volumetric values to their equivalent gravimetric values.

Table I: Conversion from volume percent to weight percent for samples from the second laboratory (Astrazeneca)

The calculation is straightforward enough. Because we know the volume of each of the three components added in each mixture, we can find the weight (strictly speaking, the mass) of each component by multiplying the volume by the density of that component. Densities of organic materials are easily found in standard chemical tables (for example, see reference 10). Knowing the weights of the components makes the calculation of weight percent simple enough. The conversion of volume percent to weight percent for the samples from the second laboratory is shown in Table I.

Table II: Results from sample with nominal composition: toluene = 25%, chloroform = 25%, n-heptane = 50%. Tabled values are CLS concentration values.

Table I also contains, implicit within itself, the experimental design. Because the mixtures were made volumetrically, however, this time the design includes exactly the design-specified values for the components. This is because of the fact that specified component values were exactly measured out volumetrically, as described above, and therefore conform to the specifications. However, this is true only when we realize that it is now the volumes that conform to the design, rather than the weights as was done in the first laboratory.

Before comparing the weight percent values to the spectroscopic values, we need to make a short digression. We will come back to this comparison at a suitable time in the future, but before that, there is an important feature in Table I that we wish to point out now. As an example, we note that there are two samples containing 75% toluene by volume. The corresponding weight percent values of toluene in those two samples are 63.44 and 79.18. This is similarly true for the other concentrations of toluene. And, while it is more difficult to pick other examples out of the table, this is similarly true for the other two components. The critical fact to point out here is that the conversion between weight percent and volume percent is not unique. A given value of volume percent of a component can correspond to a wide range of weight percent values for that component, depending not on the concentrations of the component, but on the concentrations of the other components in the mixture. As we will see, this has important consequences for the performance of any attempts to perform calibrations using spectroscopy. That's the end of our brief digression.

Now with the weight percent values available to us, we can continue to evaluate the data from this experiment the way we did before, which is to compute the CLS composition values for a sample for each wavelength range we described previously, average together all the values obtained, and compare that with the (now) known weight percent values for the composition of this sample. The results for one of the ternary samples are shown in Table II.

Figure 4: (a) Absorbance spectra of all samples, over the entire wavelength range.

Whoa! It seems we were a bit premature in expecting results to fall into our laps. Notice how the computed values for the 5000–6500 cm-1 range are wildly different than values for all the other wavelength ranges, although the other four wavelength ranges are reasonably consistent. Now we have another mystery to solve, and because it is related to the spectral results, it would seem like that is where we need to look. Nothing noteworthy had previously appeared in the spectra that might cause such a discrepancy (9), but we were not looking for anything in particular either. Now that we have reason to suspect a spectral effect of some sort, in one particular wavelength range, it would seem that the time has come to look a bit closer. Figure 4a shows the spectra and the problem. Note the very large and strong absorbance band at just below 6000 cm-1 . Expansion of the scale, as shown in Figure 4b, reveals that the peak of this absorbance band is clipped at 8 absorbance units, and the "peak," which is actually saturated at a value of 8 absorbance units, is spread over the range 5906–5920 cm-1 .

Figure 4: (b) Expansion of the region around 5912 cm-1.

This was rather unexpected, since the previous spectral plots showed no signs of such a strong, yet narrow, band in the data. More importantly, the previous spectra of these materials were only examined in transmittance, rather than in absorbance (see Figures 1–3 in part IX [9]). An absorbance value of 8 indicates the value for the transmittance of the incoming radiation is 10-8 — essentially zero. The other spectra in Figure 4b show that even when diluted by the other components of the mixtures, the peak in question is indeed very sharp and highly absorbing. Comparisons of this peak with the spectra of the pure materials (see Figure 3 in part IX [9]) reveals that chloroform has a very strong, very sharp absorbance band at 5912 cm-1 , and it is this band that is so strong it is being "clipped."

This anomaly was not observed in the spectra from the first laboratory, since that experiment did not use chloroform. If we re-examine the spectrum of dichloromethane (Figure 3 in part IX [9]), we can see that while dichloromethane does have an absorbance band in the same region of the spectrum, that band is a doublet. Clearly, the vibration giving rise to the 5912 cm-1 band in chloroform is split between the two bands of the doublet, which are therefore individually weaker than the 5912 cm-1 band in chloroform.

Plotting the data only in transmittance, as we did earlier, had the unfortunate effect of masking the effect of this spectral band. Even though it is strong enough to absorb all the radiation incident on the sample at this wavelength, when plotted in transmittance the spectrum looks perfectly ordinary, and it is not clear from that plot that the value of the absorbance at that wavelength has been reduced to such an extremely small value.

After we identified the source of the discrepancy, the question became how to deal with it. If the saturation were confined to a single mixture, it might be possible to delete that mixture from the data set. Unfortunately, however, the saturated wavelength occurs in one of the pure component spectra. This circumstance means that the anomalous absorbance value will affect every sample that contains the chloroform and is subjected to CLS analysis.

The effect of this erroneous absorbance value is seen in Table II. First of all, the calculated chloroform value for the 5000–6500 cm-1 spectral region is completely anomalous, being reduced so far below the value calculated for the other spectral regions that it was this anomaly that alerted us to the fact that a problem existed in the first place.

Second, the effect of that one very narrow wavelength range is so severe that it also reduced the chloroform value calculated using the full spectrum by an amount that would have alerted us to the existence of a problem, all by itself.

Additionally, even averaging together the anomalous values with the good values resulted in a value for the mean that is lower than what might have been expected. This discrepancy is not so severe that it would have alerted us of a problem all by itself.

Finally, the calculated values in the 5000–6500 cm-1 wavelength range for the other components are affected. The calculated value for toluene, especially, is anomalously high, being roughly 10% above the values calculated for the other spectral ranges.

Thus, if this anomalous spectrum were to be used for analysis of the mixtures, we would expect similar effects to occur in all the samples used in this experiment, with corresponding deleterious effects on the results and on the conclusions drawn from those results.

What to Do?

Actually, what we can do, and ultimately what we did, is a fairly common and straightforward operation. In this situation, however, because we are working with an algorithm relatively unknown to most of our readers, we will approach the actual solution by proposing a thought experiment that uses an algorithm to solve the problem, and then will take it to a limiting condition.

The basis of the algorithm we will propose is actually an extension of Table II itself. In Table II, we noted, that several of the different wavelength ranges provide calculated results that are essentially the same for all of the different components and only the wavelength range containing the anomalous data is detrimentally affected.

So now, instead of setting the various wavelength ranges to widths that each encompass a set of the absorbances from all samples (as we noted in reference 6), we can use other wavelength ranges. Presumably, if the entire spectrum is representative of the behavior of the mixture, then any (or almost any) spectral range of the spectrum will behave the same way and will provide CLS results that are pretty much the same, regardless of the actual location of that range in the spectrum.

If we increase the number and reduce the sizes of the wavelength ranges we test, then we can narrow down the anomaly to a smaller wavelength region than 1500 cm-1 . We can go further and subdivide the wavelength ranges again, to narrow down the anomalous region more closely.

We can repeat that procedure several times; however, there is a limit. Obviously, we cannot analyze single wavelengths, because there are three components and we cannot solve an equation with three unknowns using a single wavelength. Clearly, for a three-component mixture there must be a minimum of three wavelengths included in any such calculations. This provides a lower limit on the number of wavelengths in any given range, and therefore an upper limit on the number of ranges the spectrum can be divided into.

Continuing our thought experiment, what's the next step? What do we do once we've established which wavelengths are the anomalous ones?

The next step is to recombine the "good" data ranges, while leaving out the "bad" ones. We performed the CLS calculations on only those data in ranges that we found contain good data and rejected the anomalous data from the calculations.

All of this is really just a very complicated way to show that we can simply delete those wavelengths containing data found to be anomalous from both the calculations and the spectra. We have already identified which wavelengths are the troublemakers by inspecting the spectra (see Figure 4). For safety's sake, we also would probably want to broaden the rejection region slightly to make sure that we are not including data in the calculations if that data is also being affected, even to a lesser amount, by whatever phenomenon is causing the defective data to saturate.

Figure 5: Prediction of wavelength-edited mixture spectrum from the similarly wavelength-edited pure component spectrum, for a mixture containing 25% toluene, 25% chloroform, and 50% n-heptane.

The result of these considerations was that we decided to delete five spectral data points around the affected wavelength. To verify that this deletion did not adversely affect the utility of the CLS calculations, the algorithm was applied to the wavelength-edited data from the same ternary sample used for the calculations given in Table II, containing 25% toluene, 25% chloroform, and 50% n-heptane, and the spectrum was predicted as described previously (7). The results from this prediction of the wavelength-edited data are shown in Figure 5. It is clear from the almost perfect overlap of the spectra that the removal of the anomalous wavelengths did not adversely affect the calculations.

Table III: Computed CLS values for ternary mixture components, after wavelength editing

By comparing the entries in Table III with the corresponding entries in Table II, it is clear that removing those few wavelengths containing anomalous data brought the computed composition values into substantial agreement with each other.

On the other hand, we observe that the computed CLS values for this sample are still not in agreement with the known weight percent values. Table IV presents the comparison values for all 15 samples used by the second laboratory.

Table IV: Comparisons of gravimetric and spectroscopic values for the second laboratory

As we can see, the computed CLS concentrations do not agree with the known weight percent values, a set of results similar to the results obtained from the first experiment. Furthermore, the discrepancies are comparably large. See, for example, the chloroform values for sample 8; the discrepancy is more than 15%. Furthermore, a comparison of these results with those in Table III of part VIII (8) show that while numerically different (because different materials were used), a comparison between the gravimetric and spectral values for any given component in a given sample show the same relationships in the two tables. That is, if the spectrally computed concentration for toluene in one table is greater than the gravimetric value then it is also greater in the other table, and by a roughly proportional amount.

Our first concern, of course, was to again verify that nothing went wrong in the execution of the experiment, as we did for the first laboratory's data (8). Examination of the results showed the same lack of evidence for any experimental causes for the problem.

These observations lend support to the hypothesis that the same physical effects are operative in the two experiments, even thought they are different numerically and we have not yet identified what they are.

In Our Next Installment

Finding that the experiments in the two different laboratories gave substantially the same results, we redoubled our efforts to determine the cause of the discrepancy between the spectral and reference concentrations. Serendipity helped lead to success.

Jerome Workman, Jr. serves on the Editorial Advisory Board of Spectroscopy and is the Executive Vice President of Engineering at Unity Scientific, LLC, (Brookfield, Connecticut). He is also an adjunct professor at U.S. National University (La Jolla, California), and Liberty University (Lynchburg, Virginia). His email address is JWorkman04@gsb.columbia.edu

Jerome Workman, Jr.

Howard Mark serves on the Editorial Advisory Board of Spectroscopy and runs a consulting service, Mark Electronics (Suffern, New York). He can be reached via e-mail: hlmark@prodigy.net

Howard Mark

References

(1) H. Mark and J. Workman, Spectroscopy25(5), 16–21 (2010).

(2) H. Mark and J. Workman, Spectroscopy 25(6), 20–25 (2010).

(3) H. Mark and J. Workman, Spectroscopy 25(10), 22–31 (2010).

(4) H. Mark and J. Workman, Spectroscopy 26(2), 26–33 (2011).

(5) H. Mark and J. Workman, Spectroscopy 26(5), 12–22 (2011).

(6) H. Mark and J. Workman, Spectroscopy 26(6), 22–28 (2011).

(7) H. Mark and J. Workman, Spectroscopy 26(10), 24–31 (2011).

(8) H. Mark and J. Workman, Spectroscopy 27(2), 22–34 (2012).

(9) H. Mark and J. Workman, Spectroscopy 27(5), 14–19 (2012).

(10) CRC Handbook of Chemistry and Physics, 92nd Edition, W.M. Haynes, Ed. (Taylor & Francis, 2011), ISBN 1439855110.