Sample Selection Methods Tested Against Near-Infrared Spectral Information Entropy

March 25, 2024

News

Article

Scientists from East China Jiaotong University, located in Nangchang, Jiangxi, China, recently tested different sample selection methods using near-infrared (NIR) spectral information entropy as a similarity criterion. Their findings were published in the Journal of Chemometrics (1).

Young woman examines a spectroscopy picture in a quantum physics laboratory | Image Credit: © luchschenF - stock.adobe.com

Near-infrared (NIR) spectroscopy has been used in a wide variety of tasks in the past few years. The technique has been used to predict the harvest times of cabernet sauvignon grapes, detect Covid-19, and analyze emission lines from a supernova (in tandem with mid-infrared [MIR] spectroscopy (2–4). When using NIR, model constructions and maintenance updates are essential. Model construction, when being performed in machine learning, usually has a sample set divided into a calibration set and a validation set. The representativeness of the calibration set, and the reasonable distribution of the validation set affect the accuracy of the established model. Additionally, while maintaining and updating models, selecting the most informative updated samples can not only improve the model prediction accuracy, but also reduce the amount of sample preparation that is necessary.

For this study, spectral information entropy (SIE) is proposed as a similarity criterion for dividing sample sets, with this criterion being used to select updated samples. Two methods were used for comparing and verifying the superiority of this proposed method: the Kennard–Stone (KS) method, which is a way to perform a split between training and test set based on a distance metric between data points, spectra or labels, and the sample set portioning based on joint x–y distance (SPXY) method (5).

The model that was built after dividing the sample set with SIE was shown to have a good prediction effect compared to the sample sets that were divided with KS and SPXY. When predicting soluble solid content (SSC) and hardness, the prediction determination coefficient (R2P) was improved by over 15%, while the root mean square error (RMSE) of prediction was reduced by 50%. Regarding model updating, it was found that selecting a small number of updated samples using SIE can improve a correlation efficient (RP) by more than 80%, with updated models having prediction accuracies higher than those of the KS and SPXY methods. These results confirm that SIE can make the NIR analysis technique more reliable.

References

(1) Liu, Y.; He, C.; Jiang, X. Sample Selection Method Using Near-Infrared Spectral Information Entropy as Similarity Criterion for Constructing and Updating Peach Firmness and Soluble Solids Content Prediction Models. J. Chemom. 2023, 38 (2), e3528. DOI: https://doi.org/10.1002/cem.3528

(2) Luo, Y.; Zhao, J.; Zhu, H.; Li, X.; Dong, J.; Sun, J. Prediction of the Harvest Time of Caberney Sauvignon Grapes Using Near-Infrared Spectroscopy. Spectroscopy 2024. https://www.spectroscopyonline.com/view/prediction-of-the-harvest-time-of-cabernet-sauvignon-grapes-using-near-infrared-spectroscopy (accessed 2024-3-25)

(3) Acevedo, A. Detecting Covid-19 Using Visible or Near-Infrared Spectroscopy and Machine Learning. Spectroscopy 2023. https://www.spectroscopyonline.com/view/detecting-covid-19-using-visible-or-near-infrared-spectroscopy-and-machine-learning (accessed 2024-3-25)

(4) Wetzel, W. Observing Supernova 1987A with Near-infrared and Mid-infrared Spectroscopy. Spectroscopy 2024. https://www.spectroscopyonline.com/view/observing-supernova-1987a-with-near-infrared-and-mid-infrared-spectroscopy (accessed 2024-3-25)

(5) The Kennard-Stone Algorithm. NIRPY Research 2022. https://nirpyresearch.com/kennard-stone-algorithm/ (accessed 2024-3-25)

Related Content

Close up side shot of microplastics lay on people hand. Concept of water pollution and global warming. Climate change idea. Microplastics concept in food and water or sea | Image Credit: © Deemerwha studio - stock.adobe.com

Evaluating Microplastic Detection with Fluorescence Microscopy and Raman Spectroscopy

Will Wetzel

July 2nd 2025

Article

A recent study presented a dual-method approach combining confocal micro-Raman spectroscopy and Nile Red-assisted fluorescence microscopy to enhance the accuracy and throughput of microplastics detection in environmental samples.

Combining Spectroscopic and Chromatographic Techniques

August 1st 2013

Podcast

An interview with Charles Wilkins, the winner of the 2013 American Chemical Society Division of Analytical Chemistry Award in Chemical Instrumentation, sponsored by the Dow Chemical Company.

Unsolved Problems in Spectroscopy - Part 1

Toward a Generalizable Model of Diffuse Reflectance in Particulate Systems

Jerome Workman, Jr.

June 30th 2025

Article

This tutorial examines the modeling of diffuse reflectance (DR) in complex particulate samples, such as powders and granular solids. Traditional theoretical frameworks like empirical absorbance, Kubelka-Munk, radiative transfer theory (RTT), and the Hapke model are presented in standard and matrix notation where applicable. Their advantages and limitations are highlighted, particularly for heterogeneous particle size distributions and real-world variations in the optical properties of particulate samples. Hybrid and emerging computational strategies, including Monte Carlo methods, full-wave numerical solvers, and machine learning (ML) models, are evaluated for their potential to produce more generalizable prediction models.

Neurons disconnecting in the brain shown through a 3D neural network. Generated by AI. | Image Credit: © mat - stock.adobe.com

Deep Learning Model Improves Detection of Mild Cognitive Impairment via fNIRS Data

Will Wetzel

June 27th 2025

Article

Researchers develop robust diagnostic method using functional near-infrared (fNIR) spectroscopy and deep neural networks with high accuracy.

Feeding lawn with granular fertilizer for perfect green grass | Image Credit: © ronstik - stock.adobe.com

New Imaging Techniques Explored to Assess Quality of Sustainable Fertilizers

Will Wetzel

June 26th 2025

Article

Researchers from Cranfield University and partners from industry demonstrated the feasibility of using advanced, non-destructive imaging techniques to analyze and standardize organo-mineral fertilizers.

Cooked rice | Image Credit: © lcrribeiro33@gmail - stock.adobe.com

Combining AI and NIR Spectroscopy to Predict Resistant Starch (RS) Content in Rice

Will Wetzel

June 24th 2025

Article

A new study published in the journal Food Chemistry by lead authors Qian Zhao and Jun Huang from Zhejiang University of Science and Technology unveil a new data-driven framework for predicting resistant starch content in rice