A recent study examined two chemometric methods for generating prediction rules.
In chemometric analysis, spectroscopists work with data to generate both quantitative and qualitative predictions to solve regression and classification problems. To accomplish this task effectively, analysts need to know how to run statistical tests for comparing the predictive performance of two or more prediction rules. According to author Tom Fearn of the University College London, there have been cases where claims have been made in papers that use chemometric analysis ineffectively to support flimsy claims (1). His paper, published in the Journal of Chemometrics, describes a few statistical tests that allow the analyst to compare the predictive performance of at least two, if not more, prediction rules (1).
Chemometrics, a field at the intersection of chemistry, statistics, and mathematics, often involves developing methods to generate analyte prediction rules. These rules convert raw spectroscopic or chromatographic data into meaningful analyte predictions, either as quantitative (regression) or qualitative (classification) predictions (1). Fearn's research highlights the importance of rigorously testing these rules to ensure that any claimed improvements in predictive accuracy are statistically significant and not mere artifacts of overfitting or other biases (1). Because chemometrics has been fused with multivariate and high-dimensional statistics, Fearn’s examination of prediction rules takes on greater significance (2).
Drawing businessman with statistics background | Image Credit: © ra2 studio - stock.adobe.com
Fearn’s study focuses on the distinction between a method and a prediction rule. Methods like partial least squares regression (PLSR) or linear discriminant analysis (LDA) are general approaches, whereas a prediction rule is a specific, fixed recipe derived from these methods to convert input data into predictions (1).
In the paper, Fearn addresses a common scenario in chemometrics research: the development of a new prediction method that is claimed to outperform existing ones. Often, the difference in performance is modest, making it essential to apply statistical tests to ensure these differences are not overstated (1). Fearn's work provides a comprehensive guide to these tests, enabling researchers to implement them without needing to delve into complex statistical literature (1).
Fearn’s paper is organized into two main sections. One section examines quantitative predictions, and the other explores qualitative predictions. In both, he discusses two primary validation approaches: using a separate test set and cross-validation. The validity of these comparisons hinges on ensuring that the samples used for prediction have not been seen during the training phase (1). This prevents overfitting, where a model performs well on training data but poorly on new, unseen data. Fearn recommended an approach for quantitative analysis in which bias and variance are tested separately, which can help compare variances allowing for correlation between two sets of errors (3).
Fearn also cautioned against a common pitfall in predictive modeling: using the same data for both tuning and comparing prediction rules. This can lead to misleadingly favorable results for more complex methods with more adjustable parameters, which can adapt too closely to the training data (1). To avoid this, Fearn stresses the importance of using genuinely unseen validation or test samples for final performance comparisons (1).
The practical implications of this research are significant for the chemometrics community. By following the statistical tests and validation procedures outlined by Fearn, researchers can more reliably compare prediction rules, leading to more robust and trustworthy advancements in the field. This work not only helps in distinguishing meaningful improvements from statistical noise, but it also promotes best practices in the development and assessment of predictive models.
(1) Fearn, T. Testing Differences in Predictive Ability: A Tutorial. J. Chemom. 2024, ASAP. DOI: 10.1002/cem.3549
(2) Wu, W.; Herath, A. Chemometrics and Predictive Modelling. In Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Zhang, L., Eds. Statistics for Biology and Health. Springer: Cham, 2016. DOI: 10.1007/978-3-319-23558-5_25
(3) Fearn, T. Comparing Standard Deviations. NIR News 1996, 7 (5), 5–6.
Get essential updates on the latest spectroscopy technologies, regulatory standards, and best practices—subscribe today to Spectroscopy.
AI-Powered Fusion Model Improves Detection of Microplastics in the Atmosphere
July 17th 2025Researchers from Nanjing University of Information Science & Technology have introduced a breakthrough AI-enhanced multimodal strategy for real-time detection of polyamide microplastics contaminated with heavy metals.
High-Speed Immune Cell Identification Using New Advanced Raman BCARS Spectroscopy Technique
July 16th 2025Irish researchers have developed a lightning-fast, label-free spectroscopic imaging method capable of classifying immune cells in just 5 milliseconds. Their work with broadband coherent anti-Stokes Raman scattering (BCARS) pushes the boundaries of cellular analysis, potentially transforming diagnostics and flow cytometry.
AI-Powered Raman with CARS Offers Laser Imaging for Rapid Cervical Cancer Diagnosis
July 15th 2025Chinese researchers have developed a cutting-edge cervical cancer diagnostic model that combines spontaneous Raman spectroscopy, CARS imaging, and artificial intelligence to achieve 100% accuracy in distinguishing healthy and cancerous tissue.
Drone-Mounted Infrared Camera Sees Invisible Methane Leaks in Real Time
July 9th 2025Researchers in Scotland have developed a drone-mounted infrared imaging system that can detect and map methane gas leaks in real time from up to 13.6 meters away. The innovative approach combines laser spectroscopy with infrared imaging, offering a safer and more efficient tool for monitoring pipeline leaks and greenhouse gas emissions.
How Spectroscopy Drones Are Detecting Hidden Crop Threats in China’s Soybean Fields
July 8th 2025Researchers in Northeast China have demonstrated a new approach using drone-mounted multispectral imaging to monitor and predict soybean bacterial blight disease, offering a promising tool for early detection and yield protection.