An Examination of Comparing Two Chemometric Methods For the Generation of Prediction Rules: A Recent Study

May 29, 2024

News

Article

A recent study examined two chemometric methods for generating prediction rules.

In chemometric analysis, spectroscopists work with data to generate both quantitative and qualitative predictions to solve regression and classification problems. To accomplish this task effectively, analysts need to know how to run statistical tests for comparing the predictive performance of two or more prediction rules. According to author Tom Fearn of the University College London, there have been cases where claims have been made in papers that use chemometric analysis ineffectively to support flimsy claims (1). His paper, published in the Journal of Chemometrics, describes a few statistical tests that allow the analyst to compare the predictive performance of at least two, if not more, prediction rules (1).

Chemometrics, a field at the intersection of chemistry, statistics, and mathematics, often involves developing methods to generate analyte prediction rules. These rules convert raw spectroscopic or chromatographic data into meaningful analyte predictions, either as quantitative (regression) or qualitative (classification) predictions (1). Fearn's research highlights the importance of rigorously testing these rules to ensure that any claimed improvements in predictive accuracy are statistically significant and not mere artifacts of overfitting or other biases (1). Because chemometrics has been fused with multivariate and high-dimensional statistics, Fearn’s examination of prediction rules takes on greater significance (2).

Drawing businessman with statistics background | Image Credit: © ra2 studio - stock.adobe.com

Fearn’s study focuses on the distinction between a method and a prediction rule. Methods like partial least squares regression (PLSR) or linear discriminant analysis (LDA) are general approaches, whereas a prediction rule is a specific, fixed recipe derived from these methods to convert input data into predictions (1).

In the paper, Fearn addresses a common scenario in chemometrics research: the development of a new prediction method that is claimed to outperform existing ones. Often, the difference in performance is modest, making it essential to apply statistical tests to ensure these differences are not overstated (1). Fearn's work provides a comprehensive guide to these tests, enabling researchers to implement them without needing to delve into complex statistical literature (1).

Fearn’s paper is organized into two main sections. One section examines quantitative predictions, and the other explores qualitative predictions. In both, he discusses two primary validation approaches: using a separate test set and cross-validation. The validity of these comparisons hinges on ensuring that the samples used for prediction have not been seen during the training phase (1). This prevents overfitting, where a model performs well on training data but poorly on new, unseen data. Fearn recommended an approach for quantitative analysis in which bias and variance are tested separately, which can help compare variances allowing for correlation between two sets of errors (3).

Fearn also cautioned against a common pitfall in predictive modeling: using the same data for both tuning and comparing prediction rules. This can lead to misleadingly favorable results for more complex methods with more adjustable parameters, which can adapt too closely to the training data (1). To avoid this, Fearn stresses the importance of using genuinely unseen validation or test samples for final performance comparisons (1).

The practical implications of this research are significant for the chemometrics community. By following the statistical tests and validation procedures outlined by Fearn, researchers can more reliably compare prediction rules, leading to more robust and trustworthy advancements in the field. This work not only helps in distinguishing meaningful improvements from statistical noise, but it also promotes best practices in the development and assessment of predictive models.

References

(1) Fearn, T. Testing Differences in Predictive Ability: A Tutorial. J. Chemom. 2024, ASAP. DOI: 10.1002/cem.3549

(2) Wu, W.; Herath, A. Chemometrics and Predictive Modelling. In Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Zhang, L., Eds. Statistics for Biology and Health. Springer: Cham, 2016. DOI: 10.1007/978-3-319-23558-5_25

(3) Fearn, T. Comparing Standard Deviations. NIR News 1996, 7 (5), 5–6.

Get essential updates on the latest spectroscopy technologies, regulatory standards, and best practices—subscribe today to Spectroscopy.

Subscribe Now!

Related Content

Close up side shot of microplastics lay on people hand. Concept of water pollution and global warming. Climate change idea. micro plastics concept in food and water or sea | Image Credit: © Deemerwha studio - stock.adobe.com.

AI-Powered Fusion Model Improves Detection of Microplastics in the Atmosphere

Will Wetzel

July 17th 2025

Article

Researchers from Nanjing University of Information Science & Technology have introduced a breakthrough AI-enhanced multimodal strategy for real-time detection of polyamide microplastics contaminated with heavy metals.

Visible light spectrum color waves perceived by human eye © Johannes-chronicles-stock.adobe.com

High-Speed Immune Cell Identification Using New Advanced Raman BCARS Spectroscopy Technique

Jerome Workman, Jr.

July 16th 2025

Article

Irish researchers have developed a lightning-fast, label-free spectroscopic imaging method capable of classifying immune cells in just 5 milliseconds. Their work with broadband coherent anti-Stokes Raman scattering (BCARS) pushes the boundaries of cellular analysis, potentially transforming diagnostics and flow cytometry.

Vibrant light waves: colorful spectrum visualization © StudioATC -chronicles-stock.adobe.com

AI-Powered Raman with CARS Offers Laser Imaging for Rapid Cervical Cancer Diagnosis

Jerome Workman, Jr.

July 15th 2025

Article

Chinese researchers have developed a cutting-edge cervical cancer diagnostic model that combines spontaneous Raman spectroscopy, CARS imaging, and artificial intelligence to achieve 100% accuracy in distinguishing healthy and cancerous tissue.

A refreshing bowl of mixed fruit salad featuring pineapple, grapes, melon. Generated with AI. | Image Credit: © aubriella - stock.adobe.com

New Frontiers in Fruit Analysis: How Raman Spectroscopy and Machine Learning Are Improving Quality Detection

Will Wetzel

July 14th 2025

Article

Researchers from Guangdong Polytechnic Normal University highlight how combining Raman spectroscopy with machine learning enables rapid, non-destructive, and highly accurate analysis of fruit quality, offering transformative potential for food safety and agricultural diagnostics.

Drone-mountrd Infrared camera sees invisible methane leaks in real time © DigitalSpace -chronicles-stock.adobe.com

Drone-Mounted Infrared Camera Sees Invisible Methane Leaks in Real Time

Jerome Workman, Jr.

July 9th 2025

Article

Researchers in Scotland have developed a drone-mounted infrared imaging system that can detect and map methane gas leaks in real time from up to 13.6 meters away. The innovative approach combines laser spectroscopy with infrared imaging, offering a safer and more efficient tool for monitoring pipeline leaks and greenhouse gas emissions.

Drone with spectroscopy reveals hidden threats to soybean crops in China © Та -chronicles-stock.adobe.com

How Spectroscopy Drones Are Detecting Hidden Crop Threats in China’s Soybean Fields

Jerome Workman, Jr.

July 8th 2025

Article

Researchers in Northeast China have demonstrated a new approach using drone-mounted multispectral imaging to monitor and predict soybean bacterial blight disease, offering a promising tool for early detection and yield protection.

An Examination of Comparing Two Chemometric Methods For the Generation of Prediction Rules: A Recent Study

References

Newsletter