Columns | Column: Chemometrics in Spectroscopy

Hand holding a glowing AI sphere symbolizing the power and potential of artificial intelligence. | Image Credit: © lucegrafiar - stock.adobe.com.

This “Chemometrics in Spectroscopy” column traces the historical and technical development of these methods, emphasizing their application in calibrating spectrophotometers for predicting measured sample chemical or physical properties—particularly in near-infrared (NIR), infrared (IR), Raman, and atomic spectroscopy—and explores how AI and deep learning are reshaping the spectroscopic landscape.

Big data concept. | Image Credit: © your123 - stock.adobe.com.

This column is the continuation of our previous column that describes and explains some algorithms and data transforms beyond those most commonly used. We present and discuss algorithms that are rarely, if ever, seen or used in practice, despite that they have been proposed and described in the literature.

Abstract graphic world map illustration on blue background, big data and networking concept. 3D Rendering | Image Credit: © Pixels Hunter - stock.adobe.com.

In this column and its successor, we describe and explain some algorithms and data transforms beyond those commonly used. We present and discuss algorithms that are rarely, if ever, used in practice, despite having been described in the literature. These comprise algorithms used in conjunction with continuous spectra, as well as those used with discrete spectra.

A newly discovered effect can introduce large errors in many multivariate spectroscopic calibration results. The CLS algorithm can be used to explain this effect. Having found this new effect that can introduce large errors in calibration results, an investigation of the effects of this phenomenon to calibrations using principal component regression (PCR) and partial least squares (PLS) is examined.

As we have previously discussed, the most time consuming and bothersome issue associated with calibration modeling and the routine use of multivariate models for quantitative analysis in spectroscopy are the constant intercept (bias) or slope adjustments. These adjustments must be routinely performed for every product and each constituent model. For transfer and maintenance of multivariate calibrations this procedure must be continuously implemented to maintain calibration prediction accuracy over time. Sample composition, reference values, within and between instrument drift, and operator differences may be the cause of variation over time. When calibration transfer is attempted using instruments of somewhat different vintage or design type the problem is amplified. In this discussion of the problem we continue to delve into the issues causing prediction error, bias and slope changes for quantitative calibrations using spectroscopy.

This column addresses the issue of degrees of freedom (df) for regression models. The use of smaller degrees of freedom (df) (e.g., n or n-1) underestimates the size of the standard error; and possibly the larger df (e.g., n-k-1) overestimates the size of the standard deviation. It seems one should use the same df for both SEE and SECV, but what is a clear statistical explanation for selecting the appropriate df? It is a good time to raise this question once again and it seems there is some confusion among experts about the use of df for the various calibration and prediction situations - the standard error parameters should be comparable and are related to the total independent samples, data channels containing information (i.e., wavelengths or wavenumbers), and number of factors or terms in the regression. By convention everyone could just choose a definition but is there a more correct one that should be verified and discussed for each case? The problem with this subject is in computing the standard deviation using different df without a more rigorous explanation and then putting an over emphasis on the actual number derived for SEE and SECV, rather than on using properly computed confidence intervals. Note that confidence limit computations for standard error have been discussed previously and are routinely derived in standard statistical texts (4).

We present the first of a short set of columns dealing with the subject of statistics. This current series is organized as a “top down” view of the subject, as opposed to the usual literature (and our own previous) approach of giving “bottom up” description of the multitude of equations that are encountered. We hope this different approach will succeed in giving our readers a more coherent view of the subject, as well as persuading them to undertake further study of the field.