News|Articles|August 25, 2025

Error Bars in Chemometrics: What Do They Really Mean?

https://doi.org/10.56530/spectroscopy.zw5768i9

Listen

0:00 / 0:00

Key Takeaways

Spectroscopic calibration models face challenges due to multicollinearity, measurement errors, and variable sample matrices, complicating uncertainty estimation.
Traditional OLS methods often fail in spectroscopic contexts, necessitating alternative approaches like PLS, Bayesian methods, and resampling techniques.
Bayesian and resampling methods offer robust uncertainty estimation by addressing multicollinearity and non-Gaussian noise, but require careful implementation.
Future research aims to standardize uncertainty estimation methods, integrate instrument-level noise models, and extend techniques to nonlinear models.

This tutorial contrasts classical analytical error propagation with modern Bayesian and resampling approaches, including bootstrapping and jackknifing. Uncertainty estimation in multivariate calibration remains an unsolved problem in spectroscopy, as traditional, Bayesian, and resampling approaches yield differing error bars for chemometric models like PLS and PCR, highlighting the need for deeper theoretical and practical solutions.

Abstract

Spectroscopic calibration models, particularly those using multivariate regression methods, often operate in conditions where predictor variables are highly collinear, reference methods have their own measurement error, and sample matrices vary widely. These realities violate the assumptions of ordinary least squares (OLS) and complicate uncertainty estimation. This tutorial compares classical analytical error propagation with Bayesian and resampling methods such as bootstrapping and jackknifing. The discussion emphasizes interpreting coefficient standard errors, confidence intervals for predictions, and prediction uncertainty in contexts such as near-infrared (NIR), infrared (IR), Raman, and ultraviolet–visible (UV–vis) spectroscopy. Numerical examples and derivations are presented in matrix notation, enabling direct application by experienced practitioners.

Introduction

The key elements of this tutorial discussion are as follows. In this article we explore:

The importance of uncertainty estimation in spectroscopy (e.g., regulatory requirements in pharmaceutical PAT, food safety, forensic analysis).
The gap between theoretical derivations (assuming independent, identically distributed noise) and messy real-world datasets with correlated predictors.
Why chemometricians disagree on the “right” way to get prediction intervals.

Here is what we will discuss:

1. Classical Error Propagation

2. Confidence and Prediction Intervals in PLS/PCR

Role of latent variables and the breakdown of independence assumptions.

3. Bayesian Framework

Prior specification for regression coefficients, posterior distributions, credible intervals.
Example: Bayesian formulation.

4. Resampling Methods

Bootstrap: empirical distribution of coefficients and predictions.
Jackknife: systematic leave-one-out perturbations.

Examples:

Case Study 1: NIR moisture prediction with highly correlated overtones. Comparison of bootstrap vs. analytical error bars.
Case Study 2: Raman quantitative model with sparse calibration set—Bayesian credible intervals outperform classical methods in coverage probability.

In chemometrics, “uncertainty” is a multi-faceted concept:

Model coefficient uncertainty: How precisely do we know the regression weights?
Prediction uncertainty: How wide should the prediction interval be for a new sample?
Measurement system uncertainty: How do instrument noise and reference method error propagate into predictions?

Regulatory and consensus standards bodies, such as, the U.S.Food and Drug Administration (FDA), the European Medicines Agency (EMA), the American Society for Testing and Materials (ASTM) International, and the Association of Official Analytical Chemists (AOAC) increasingly require quantitative statements of uncertainty for spectroscopic methods, especially in Process Analytical Technology (PAT) and quality control (QC) applications.

In the idealized ordinary least squares (OLS) setting, assumptions such as independent, identically distributed, and Gaussian residuals allow closed-form solutions for regression coefficients, variances, and confidence intervals (1–4). In practice—particularly for NIR, Raman, and mid-infrared (MIR) spectroscopy—these assumptions are frequently violated. Multicollinearity among predictor variables inflates variance and destabilizes coefficient estimates. Finite calibration sets limit statistical power, leading to overfitting and poor generalization. Moreover, non-Gaussian noise (for example, impulsive outliers, skewed error distributions, heavy-tailed residuals, or uniform-type variability) violates the normality assumption, undermining the reliability of classical inferential statistics such as t-tests, F-tests, and prediction intervals (1–4).

To address these issues, chemometricians often turn to partial least squares (PLS) or principal component regression (PCR) to reduce dimensionality and mitigate multicollinearity. Ridge regression and other regularization techniques stabilize coefficient estimates by shrinking them toward zero. Robust regression methods, such as M-estimators, Huber loss, help manage outliers and non-Gaussian residuals. For multiway calibration and sample-specific standard errors, parallel factor analysis (PARAFAC)-based error propagation methods provide a structured approach (5). Finally, resampling approaches such as bootstrapping or cross-validation provide empirical error estimates and confidence intervals without relying strictly on OLS assumptions (6). Together, these methods allow more reliable calibration, prediction, and uncertainty estimation in real-world spectroscopic data.

Despite the availability of traditional statistical tools and advanced chemometric methods, error estimation in spectroscopy remains an unsolved problem. Even robust approaches like PLS, ridge regression, or bootstrapping rely on assumptions about the underlying data structure, sampling representativeness, and noise characteristics that are often violated in practice. Spectroscopic measurements are influenced by complex sample heterogeneity, instrument drift, baseline variations, and subtle non-linearities, which are difficult to capture fully in calibration models. Furthermore, non-Gaussian and correlated noise, coupled with limited calibration sets, means that conventional variance formulas and prediction intervals frequently underestimate uncertainty. As a result, reported and confidence intervals may not reflect true predictive performance, making reliable, generalizable error estimation an ongoing challenge in NIR, Raman, and MIR spectroscopy (1–4).

When a spectroscopic calibration model predicts that a sample contains 8.2% protein, the question arises: how certain are we about that number? Do we mean ±0.3%? ±1.5%? Is that bound symmetric or skewed? Will the same uncertainty apply to all future samples, or does it depend on where the sample lies in our spectroscopic calibration space?

For univariate regression, the answer comes from classical statistics. The mathematics of standard errors, t-distributions, and prediction intervals is well-established, and we can compute exact analytical expressions for both the uncertainty in the mean prediction and the uncertainty for individual observations.

Multivariate calibration—such as PLS applied to NIR, MIR, or Raman spectroscopy—complicates this picture considerably. Here, predictor variables (spectral data channels) are often highly collinear; the number of predictors may even exceed the number of samples; and latent variable extraction means that the fitted parameters are not directly estimated from independent variables. Moreover, spectral datasets often suffer from non-constant variance (heteroscedasticity), measurement errors in both predictor and response variables, and uncertainty from pre-processing steps like baseline correction or scatter normalization.

In spectroscopic terms, heteroscedasticity occurs when the measurement noise or residual variance changes across the spectral range or with analyte concentration, so some wavelengths or concentration levels are inherently noisier than others.

Chemometricians have long recognized that the traditional “error bars” drawn in textbooks do not directly translate to real-world spectroscopic models. Yet, regulatory agencies, quality control managers, and research scientists still require a defensible quantification of uncertainty.

Without it, predictions risk being over-trusted—or worse, misinterpreted—leading to flawed scientific conclusions or production errors.

To continue with this tutorial discussion, we will:

Derive uncertainty estimates for OLS and PLS in matrix notation, with emphasis on the differences between confidence intervals (mean prediction uncertainty) and prediction intervals (individual observation uncertainty).
Examine the limitations of analytical formulas when applied to high-collinearity datasets.
Introduce Bayesian methods and resampling-based approaches (bootstrap, jackknife) that circumvent some of these limitations.
Provide numerical examples showing how interval width and coverage probability vary by method.
Describe how to visualize and interpret these intervals effectively in a chemometrics context.

Our aim is to equip advanced practitioners and graduate-level students with both the mathematical understanding and the practical skills to report prediction uncertainty with rigor—bridging the gap between statistical theory and spectroscopic reality.

2. Theory and Derivations

2.1 Classical OLS Error Propagation

2.2 PLS and PCR Considerations

Because latent variables are constructed to be orthogonal, the variance formulas above can be adapted, but the estimated degrees of freedom become less obvious. Small calibration sets often lead to underestimated uncertainty due to overfitting.

2.3 Bayesian Formulation

Bayesian credible intervals for predictions are obtained by integrating over this posterior. This approach avoids singular matrix problems if priors are well chosen.

2.4 Resampling Methods

3. Numerical Examples

Example 1: NIR Moisture Prediction (Collinear Case)

Synthetic NIR data: 100 wavelengths, 50 calibration samples, moisture reference with ±0.2% lab error. Note: in chemometrics, PI stands for prediction interval, or the statistical range around a predicted value that quantifies the uncertainty of a single new observation, taking into account both the modeling error and the variability of the data. Unlike confidence intervals for model parameters, PIs reflect the expected spread of actual future measurements, providing a practical estimate of prediction uncertainty in spectroscopic calibrations.

To continue with our example:

OLS on full spectral set: classical formula yields 95% PI width ≈ 6× actual prediction error (due to collinearity).
PLS (6 LVs): PI width matches bootstrap estimate within ±10%.
Bayesian PLS: narrower but higher coverage probability (93–96%).

When applying ordinary least squares (OLS) to the full spectral dataset, classical formulas for the 95% prediction interval (PI) dramatically overestimate uncertainty, giving a width about six times larger than the actual prediction error; this inflation is primarily due to strong multicollinearity among wavelengths. Using partial least squares (PLS) with six latent variables (LVs) stabilizes the model, and the resulting PI width closely matches a bootstrap-based empirical estimate, typically within ±10%, indicating more realistic error quantification. Bayesian PLS produces even narrower PIs, reflecting its incorporation of prior information to constrain predictions, but it slightly overestimates coverage, giving actual probabilities in the 93–96% range, which balances precision with statistical reliability.

In simpler terms, these results show how different modeling approaches handle uncertainty in spectroscopic predictions. OLS tries to use all wavelengths at once, but because many spectral variables are highly correlated, it “overestimates” how uncertain the predictions are—like assuming every tiny wiggle in the spectrum could wildly change the result. PLS compresses the spectral information into a few key latent variables, effectively reducing noise and redundancy, so the prediction intervals become much more realistic and match what we see if we repeatedly resample the data (bootstrap). Bayesian PLS goes a step further by incorporating prior knowledge about the likely behavior of the system, producing slightly narrower intervals while still capturing most of the true variation, giving a practical balance between being confident and not overestimating uncertainty.

Example 2: Raman Model with Sparse Calibration

Raman spectra for an active pharmaceutical ingredient (API), 20 calibration samples, signal-to-noise ≈ 50.

In this example, a Raman spectroscopic model was developed for an active pharmaceutical ingredient (API) using only 20 calibration samples, which is a relatively small dataset. The spectra had a signal-to-noise ratio of about 50, meaning the signal from the API was fairly clear compared to the background noise. Despite the limited number of samples, the model was able to extract meaningful chemical information and make predictions, but the small calibration set naturally increases uncertainty compared with larger datasets. Essentially, this demonstrates that even with sparse data, Raman spectroscopy can provide useful quantitative insights, though care must be taken in interpreting predictions because the model’s confidence and error estimates may be less precise than they would be with a more extensive calibration set.

4. Discussion

Key points:

Classical formulas assume independent predictors and homoscedastic Gaussian noise (the noise level does not vary with wavelength or signal intensity)—conditions rarely met in spectroscopy.
PLS mitigates multicollinearity but complicates exact variance formulas due to data-driven latent variable selection.
Bayesian methods handle singularity naturally but require prior specification (risk of subjective bias).
Bootstrap is robust and assumption-light but computationally intensive for high-dimensional spectra.

Future Research Directions:

Combining instrument-level noise models with chemometric uncertainty estimates.
Standardizing Bayesian–bootstrap hybrids for regulatory reporting.
Extending uncertainty quantification to nonlinear models (kernel PLS, deep learning).
Developing International Organization for Standardization (ISO)-style standards for uncertainty estimation in multivariate spectroscopy.

Future research in this area is moving toward making uncertainty estimates more realistic, reliable, and standardized. One priority is combining models of instrument-level noise (such as detector drift, baseline shifts, and wavelength calibration errors) with chemometric models, so that error bars reflect both what the instrument contributes and what the model contributes. Another direction is blending Bayesian methods, which naturally produce probability distributions for predictions, with bootstrap resampling, which tests model stability on repeated subsets of the data. Such hybrid approaches could become accepted ways of reporting uncertainty to regulators, who need consistent and transparent methods.

Researchers are also pushing uncertainty estimation into nonlinear models, such as kernel PLS and deep learning, which are powerful but often criticized for being “black boxes” without interpretable error estimates. Finally, there is a strong push to create formal standards, possibly under the International Organization for Standardization (ISO), to guide laboratories and instrument manufacturers in how to calculate, report, and validate uncertainty in multivariate spectroscopy. These steps will ensure that uncertainty estimates are not only scientifically sound but also broadly trusted and comparable across industries.

5. How to Illustrate These Concepts in Practice

The abstract equations and matrix derivations in this tutorial benefit from being paired with carefully chosen visuals that make the differences between methods concrete. One such figure could show a standard calibration plot with predicted concentration on the x-axis and reference concentration on the y-axis. Around the regression line, a narrow shaded band could represent the confidence interval for the mean prediction, while a much wider band would represent the prediction interval for individual measurements. A highlighted example could have both intervals drawn explicitly, so the reader can immediately see why the PI is always wider than the confidence interval (CI).

A prediction interval (PI) is always larger than a confidence interval (CI) because it accounts for both the uncertainty in the model and the natural variability of individual measurements, whereas a CI only reflects the uncertainty in estimating the average response. In other words, the CI tells you where the “true average” is likely to lie, while the PI must also cover the extra variation you would expect if you measure a new sample, so it has to be wider to capture that additional uncertainty.

Another effective visual would pair a heatmap of the predictor correlation matrix from a synthetic NIR dataset with a bar chart of regression coefficient standard errors from both OLS and PLS models. The heatmap would make collinearity obvious, and the bar chart would make the inflated standard errors in OLS equally apparent.

A third figure could compare methods by showing average prediction interval widths and coverage probabilities for OLS, Bayesian regression, bootstrapping, and jackknifing. Here, the analyst would see that narrow intervals sometimes fail to capture the nominal percentage of true values, reinforcing that “narrow” is not synonymous with “better.”

Resampling approaches like bootstrap and jackknife would benefit from a process-flow diagram showing how data are resampled, models refitted, and predictions accumulated to form an empirical distribution. A histogram of bootstrap predictions with shaded percentile bounds would make the link between sampling and intervals explicit.

The Bayesian approach could be visualized as a posterior probability density curve for a prediction, with the central 95% credible interval shaded. Comparing this to symmetric classical prediction intervals would help the reader understand how Bayesian intervals can be asymmetric when the posterior distribution is skewed.

Finally, a layered variance diagram could illustrate that total prediction uncertainty comes from multiple sources: instrument noise at the base, reference method error above that, and model parameter variance on top. This visual reinforces that uncertainty is cumulative, not singular.

References

(1) Martens, H.; Næs, T. Multivariate Calibration; Wiley: Chichester, 1989.

(2) Faber, N. M.; Kowalski, B. R. Improved Prediction Error Estimates for Multivariate Calibration by Correcting for the Measurement Error in the Reference Values. Appl. Spectrosc. 1997, 51 (5), 660–666. DOI: 10.1021/ac970402y

(3) Faber, N. M. Improved Computation of the Standard Error in the Regression Coefficient Estimates of a Multivariate Calibration Model. Anal. Chem. 2000, 72 (19), 4675–4676. DOI: 10.1021/ac0001479

(4) Mark, H.; Workman, J. Jr. Chemometrics in Spectroscopy, 2nd ed.; Academic Press: Cambridge, MA, 2021.

(5) Olivieri, A. C.; Faber, N. M. Standard Error of Prediction in Parallel Factor Analysis of Three-Way Data. Chemom. Intell. Lab. Syst. 2004, 70, 75–82. DOI: 10.1016/j.chemolab.2003.10.005

(6) Efron, B.; Tibshirani, R. J. An Introduction to the Bootstrap, 1st ed.; Chapman & Hall/CRC: New York, 1994. DOI: 10.1201/9780429246593

_ _ _

This article was partially constructed with the assistance of a generative AI model and has been carefully edited and reviewed for accuracy and clarity.

Get essential updates on the latest spectroscopy technologies, regulatory standards, and best practices—subscribe today to Spectroscopy.

Subscribe Now!