News

Article

Smarter Spectroscopy With a New Machine Learning Approach to Estimate Prediction Uncertainty

Key Takeaways

  • Quantile regression forest (QRF) enhances spectroscopic analysis by providing accurate predictions and quantifying prediction uncertainty, crucial for fields like agriculture and pharmaceuticals.
  • QRF uses decision-tree ensembles to capture the full conditional distribution of predictions, offering prediction intervals and sample-specific uncertainty estimates.
SHOW MORE

A new study demonstrates how a machine learning technique, quantile regression forest, can provide both accurate predictions and sample-specific uncertainty estimates from infrared spectroscopic data. The work was applied to soil and agricultural samples, highlighting its value for chemometric modeling.

New Machine Learning Approach to Estimate Prediction Uncertainty © By Hafiz-chronicles-stock.adobe.com

New Machine Learning Approach to Estimate Prediction Uncertainty © By Hafiz-chronicles-stock.adobe.com

Introduction

Spectroscopy has long been used to analyze the chemical and physical properties of soils, food, and agricultural products. For decades, methods such as principal component regression and partial least squares regression have been the standard tools for modeling complex spectral data. But one challenge remains central: when a quantitative or qualitative model predicts a property from absorption spectra, how certain is that prediction (1–5)?

In their article, researchers A.M.C. Wadoux from the Sydney Institute of Agriculture, The University of Sydney, Australia, and L. Ramirez-Lopez from the Department of Forest Resources, University of Minnesota, USA, present a solution. They explore how a quantile regression forest (QRF), a machine learning (ML) algorithm based on random forest, can simultaneously deliver accurate predictions and quantify prediction uncertainty.

Why Prediction Uncertainty Matters

Uncertainty estimates are seldom included in ML models used in spectroscopy, especially in commercial applications. Yet, they are critical in fields such as agricultural analysis, food quality control, and pharmaceutical applications. Reliable uncertainty estimates help determine detection limits, guide regulatory decision-making, and ensure results can be used as inputs in further modeling (1).

As Wadoux and Ramirez-Lopez emphasize, previous approaches such as partial least squares regression could estimate uncertainty through bootstrapping, jackknifing, or error-based prediction intervals. However, these techniques do not easily extend to ML models. Quantile regression forest offers a pathway forward by using decision-tree ensembles to capture the full conditional distribution of predictions (1–5).

The Quantile Regression Forest (QRF) Method

RF is already a widely used ML technique in chemometrics. It builds multiple decision trees from bootstrap samples of the original dataset and averages their results to produce predictions. QRF as proposed by Meinshausen (2), modifies this framework by retaining the distribution of responses within the trees. This allows the calculation of prediction intervals and provides a sample-specific uncertainty estimate alongside each prediction (1).

In this study, QRF was tested using two public datasets of infrared spectroscopic measurements (1):

  • Soil properties: measured from near-infrared spectra, focusing on cation exchange capacity and total organic carbon.
  • Agricultural produce: specifically, the dry matter content of mangoes, based on visible and near-infrared spectra.

Results from Spectroscopic Analysis

According to the study, the QRF model produced highly accurate predictions across all tested datasets, often comparable or superior to results reported in earlier literature. Importantly, the algorithm could generate prediction intervals that reflected varying levels of confidence depending on sample characteristics (1).

For example, some values—especially those near detection limits—produced larger prediction intervals, showing greater uncertainty. The model’s capacity to provide this distributional information is significant for spectroscopic workflows that require both prediction and reliability assessment (1).

However, the validation step revealed that the uncertainty estimates were generally overestimated, particularly when interval widths were large. Despite this, the 90% prediction interval, a commonly reported metric, was found to be suitably accurate. The authors recommend QRF for operational applications and encourage its inclusion in future spectroscopy software developments (1).

Implications for Chemometrics and Beyond

The findings underline the importance of incorporating uncertainty analysis into ML models used in spectroscopy. As Wadoux and Ramirez-Lopez conclude, QRF represents a practical option for delivering high prediction accuracy while also quantifying uncertainty. This combination could benefit soil science, agriculture, food technology, pharmaceuticals, and any field relying on infrared spectroscopic analysis. By making prediction reliability transparent, QRF offers a more complete framework for using spectroscopy as both a scientific and operational tool (1).

References

(1) Wadoux, A. M. C.; Ramirez-Lopez, L. Uncertainty of Predictions in Absorption Spectroscopy: Modelling with Quantile Regression Forest. Chemom. Intell. Lab. Syst. 2025, 105473. DOI: 10.1016/j.chemolab.2025.105473.

(2) Meinshausen, N. Quantile Regression Forests. J. Mach. Learn. Res. 2006, 7 (6), 983–999. https://www.jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf (accessed 2025-08-20).

(3) Andrade-Garda, J. The Basics of Univariate and Multivariate Calibration. In Basic Chemometrics for Analytical Chemists; Garda, J.M.A. and Leardi, R. eds., World Scientific. 2025; p 31.

(4) Chiappini, F. A.; Alcaraz, M. R.; Forzani, L. A Bootstrap-Assisted Methodology for the Estimation of Prediction Uncertainty in Multilayer Perceptron-Based Calibration. Anal. Chim. Acta 2025, 1353, 343954. DOI: 10.1016/j.aca.2025.343954

(5) Carneiro, H. V.; Celani, C. P.; Booksh, K. S. The Classification Limit of Detection: Estimating Sample-Level Classification Uncertainty in Spectroscopy Using Monte Carlo Error Propagation of Spectral Noise. J. Chemom. 2025, 39 (7), e70048. DOI: 10.1002/cem.70048

Newsletter

Get essential updates on the latest spectroscopy technologies, regulatory standards, and best practices—subscribe today to Spectroscopy.

Related Videos
Colorful powder pigments in small jars. Generated with AI. | Image Credit: © Selsi - stock.adobe.com.