A recent study examined two chemometric methods for generating prediction rules.
In chemometric analysis, spectroscopists work with data to generate both quantitative and qualitative predictions to solve regression and classification problems. To accomplish this task effectively, analysts need to know how to run statistical tests for comparing the predictive performance of two or more prediction rules. According to author Tom Fearn of the University College London, there have been cases where claims have been made in papers that use chemometric analysis ineffectively to support flimsy claims (1). His paper, published in the Journal of Chemometrics, describes a few statistical tests that allow the analyst to compare the predictive performance of at least two, if not more, prediction rules (1).
Chemometrics, a field at the intersection of chemistry, statistics, and mathematics, often involves developing methods to generate analyte prediction rules. These rules convert raw spectroscopic or chromatographic data into meaningful analyte predictions, either as quantitative (regression) or qualitative (classification) predictions (1). Fearn's research highlights the importance of rigorously testing these rules to ensure that any claimed improvements in predictive accuracy are statistically significant and not mere artifacts of overfitting or other biases (1). Because chemometrics has been fused with multivariate and high-dimensional statistics, Fearn’s examination of prediction rules takes on greater significance (2).
Drawing businessman with statistics background | Image Credit: © ra2 studio - stock.adobe.com
Fearn’s study focuses on the distinction between a method and a prediction rule. Methods like partial least squares regression (PLSR) or linear discriminant analysis (LDA) are general approaches, whereas a prediction rule is a specific, fixed recipe derived from these methods to convert input data into predictions (1).
In the paper, Fearn addresses a common scenario in chemometrics research: the development of a new prediction method that is claimed to outperform existing ones. Often, the difference in performance is modest, making it essential to apply statistical tests to ensure these differences are not overstated (1). Fearn's work provides a comprehensive guide to these tests, enabling researchers to implement them without needing to delve into complex statistical literature (1).
Fearn’s paper is organized into two main sections. One section examines quantitative predictions, and the other explores qualitative predictions. In both, he discusses two primary validation approaches: using a separate test set and cross-validation. The validity of these comparisons hinges on ensuring that the samples used for prediction have not been seen during the training phase (1). This prevents overfitting, where a model performs well on training data but poorly on new, unseen data. Fearn recommended an approach for quantitative analysis in which bias and variance are tested separately, which can help compare variances allowing for correlation between two sets of errors (3).
Fearn also cautioned against a common pitfall in predictive modeling: using the same data for both tuning and comparing prediction rules. This can lead to misleadingly favorable results for more complex methods with more adjustable parameters, which can adapt too closely to the training data (1). To avoid this, Fearn stresses the importance of using genuinely unseen validation or test samples for final performance comparisons (1).
The practical implications of this research are significant for the chemometrics community. By following the statistical tests and validation procedures outlined by Fearn, researchers can more reliably compare prediction rules, leading to more robust and trustworthy advancements in the field. This work not only helps in distinguishing meaningful improvements from statistical noise, but it also promotes best practices in the development and assessment of predictive models.
(1) Fearn, T. Testing Differences in Predictive Ability: A Tutorial. J. Chemom. 2024, ASAP. DOI: 10.1002/cem.3549
(2) Wu, W.; Herath, A. Chemometrics and Predictive Modelling. In Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Zhang, L., Eds. Statistics for Biology and Health. Springer: Cham, 2016. DOI: 10.1007/978-3-319-23558-5_25
(3) Fearn, T. Comparing Standard Deviations. NIR News 1996, 7 (5), 5–6.
Get essential updates on the latest spectroscopy technologies, regulatory standards, and best practices—subscribe today to Spectroscopy.
Drone-Mounted Infrared Camera Sees Invisible Methane Leaks in Real Time
July 9th 2025Researchers in Scotland have developed a drone-mounted infrared imaging system that can detect and map methane gas leaks in real time from up to 13.6 meters away. The innovative approach combines laser spectroscopy with infrared imaging, offering a safer and more efficient tool for monitoring pipeline leaks and greenhouse gas emissions.
How Spectroscopy Drones Are Detecting Hidden Crop Threats in China’s Soybean Fields
July 8th 2025Researchers in Northeast China have demonstrated a new approach using drone-mounted multispectral imaging to monitor and predict soybean bacterial blight disease, offering a promising tool for early detection and yield protection.
Radar and Soil Spectroscopy Boost Soil Carbon Predictions in Brazil’s Semi-Arid Regions
July 7th 2025A new study published in Geoderma demonstrates that combining soil spectroscopy with radar-derived vegetation indices and environmental data significantly improves the accuracy of soil organic carbon predictions in Brazil’s semi-arid regions.
Advancing Deep Soil Moisture Monitoring with AI-Powered Spectroscopy Drones
July 7th 2025A Virginia Tech study has combined drone-mounted NIR hyperspectral imaging (400 nm to 1100 nm) and AI to estimate soil moisture at root depths with remarkable accuracy, paving the way for smarter irrigation and resilient farming.
AI Boosts SERS for Next Generation Biomedical Breakthroughs
July 2nd 2025Researchers from Shanghai Jiao Tong University are harnessing artificial intelligence to elevate surface-enhanced Raman spectroscopy (SERS) for highly sensitive, multiplexed biomedical analysis, enabling faster diagnostics, imaging, and personalized treatments.
Artificial Intelligence Accelerates Molecular Vibration Analysis, Study Finds
July 1st 2025A new review led by researchers from MIT and Oak Ridge National Laboratory outlines how artificial intelligence (AI) is transforming the study of molecular vibrations and phonons, making spectroscopic analysis faster, more accurate, and more accessible.