
Explore the evolution of chemometrics in spectroscopy, celebrating 40 years of insights and mathematical exploration in this dynamic field.

Explore the evolution of chemometrics in spectroscopy, celebrating 40 years of insights and mathematical exploration in this dynamic field.

This “Chemometrics in Spectroscopy” column traces the historical and technical development of these methods, emphasizing their application in calibrating spectrophotometers for predicting measured sample chemical or physical properties—particularly in near-infrared (NIR), infrared (IR), Raman, and atomic spectroscopy—and explores how AI and deep learning are reshaping the spectroscopic landscape.

This column is the continuation of our previous column that describes and explains some algorithms and data transforms beyond those most commonly used. We present and discuss algorithms that are rarely, if ever, seen or used in practice, despite that they have been proposed and described in the literature.

By providing automated tools and guidance, an ECS would aim to streamline the calibration process, improve calibration transfer, enhance operator efficiency, and improve the overall consistency and reliability of analytical results produced using advanced chemometrics and machine earning techniques.

In this column and its successor, we describe and explain some algorithms and data transforms beyond those commonly used. We present and discuss algorithms that are rarely, if ever, used in practice, despite having been described in the literature. These comprise algorithms used in conjunction with continuous spectra, as well as those used with discrete spectra.

There is a variation of the MLR calibration algorithm that can reduce sensitivity to repacked sample measurements. We explore that MLR method here in detail.

A sample library of selected references discussing the application of artificial intelligence (AI) in analytical chemistry and molecular spectroscopy is presented.

Are you intrigued by artificial intelligence, but unsure what it really means for analytical chemistry? Read on.

The past decision to use binary representation in computer architectures affects the results of chemometric-based outputs, especially if different data values are used.

We examine variations of the multiple linear regression (MLR) algorithm confer special properties on the model that the algorithm produces and critique the use of derivatives in calibration models.

Raw data produced by an NIR instrument undergoes some sort of processing, or transformation, to make them easier to use. In this series, we explore options for that data transformation, starting with multiple linear regression (MLR).

The second in a two-part series highlighting key explanatory or tutorial references for each of 29 chemometric methods.

The carefully selected literature references in this curated set describe the application of 29 major chemometric methods used for analyzing molecular spectroscopy data.

Mathematics is a formal logic system, perhaps the ultimate formal logic system. Here we describe the elegance of the foundations of the mathematics that chemometrics is based on.

We explore how different algorithms and different numbers of factors affect the results.

We provide a scorecard of chemometric techniques used in spectroscopy. The tables and lists of reference sources given here provide an indispensable resource for anyone seeking guidance on understanding chemometric methods or choosing the most suitable approach for a given analysis problem.

A previous analysis of data is compared to the results achieved using classical least squares and principal component analysis. What did we learn?

Alignment of the instrument y-axis is a critical step for quantitative and qualitative measurements using spectroscopy. Here, we explain in detail how to use photometric standards for ultraviolet, visible, near infrared, infrared, and Raman spectroscopy.

A newly discovered effect can introduce large errors in many multivariate spectroscopic calibration results. The CLS algorithm can be used to explain this effect. Having found this new effect that can introduce large errors in calibration results, an investigation of the effects of this phenomenon to calibrations using principal component regression (PCR) and partial least squares (PLS) is examined.

The use of reference materials to align or test the wavelength–wavenumber axis for optical spectroscopy is essential for quantitative and qualitative methods. This article provides details for using reference materials with ultraviolet, visible, near-infrared, infrared, and Raman spectroscopy methods.

What are the steps to take once an outlier is discovered? There are several options.

Calibration transfer involves several strategies and mathematical techniques for applying a single calibration database consisting of samples, reference data, and calibration equations to two or more instruments. In this installment, we review the chemometric and tactical strategies used for the calibration transfer process.

How can you detect the presence of an outlier when it is mixed with multiple other, similar, samples?

Calibration transfer involves multiple strategies and mathematical techniques for applying a single calibration database to two or more instruments. Here, we explain the methods to modify the spectra or regression vectors to correct differences between instruments.

Outliers are fundamentally a very fuzzy notion. Here, we try to clear up what outliers are and how they affect your data.

As we have previously discussed, the most time consuming and bothersome issue associated with calibration modeling and the routine use of multivariate models for quantitative analysis in spectroscopy are the constant intercept (bias) or slope adjustments. These adjustments must be routinely performed for every product and each constituent model. For transfer and maintenance of multivariate calibrations this procedure must be continuously implemented to maintain calibration prediction accuracy over time. Sample composition, reference values, within and between instrument drift, and operator differences may be the cause of variation over time. When calibration transfer is attempted using instruments of somewhat different vintage or design type the problem is amplified. In this discussion of the problem we continue to delve into the issues causing prediction error, bias and slope changes for quantitative calibrations using spectroscopy.

Part III of this series discusses the principle of least squares

This column addresses the issue of degrees of freedom (df) for regression models. The use of smaller degrees of freedom (df) (e.g., n or n-1) underestimates the size of the standard error; and possibly the larger df (e.g., n-k-1) overestimates the size of the standard deviation. It seems one should use the same df for both SEE and SECV, but what is a clear statistical explanation for selecting the appropriate df? It is a good time to raise this question once again and it seems there is some confusion among experts about the use of df for the various calibration and prediction situations - the standard error parameters should be comparable and are related to the total independent samples, data channels containing information (i.e., wavelengths or wavenumbers), and number of factors or terms in the regression. By convention everyone could just choose a definition but is there a more correct one that should be verified and discussed for each case? The problem with this subject is in computing the standard deviation using different df without a more rigorous explanation and then putting an over emphasis on the actual number derived for SEE and SECV, rather than on using properly computed confidence intervals. Note that confidence limit computations for standard error have been discussed previously and are routinely derived in standard statistical texts (4).

This column is the continuation of our discussion in part I dealing with statistics.

We present the first of a short set of columns dealing with the subject of statistics. This current series is organized as a “top down” view of the subject, as opposed to the usual literature (and our own previous) approach of giving “bottom up” description of the multitude of equations that are encountered. We hope this different approach will succeed in giving our readers a more coherent view of the subject, as well as persuading them to undertake further study of the field.