Scientists from East China Jiaotong University, located in Nangchang, Jiangxi, China, recently tested different sample selection methods using near-infrared (NIR) spectral information entropy as a similarity criterion. Their findings were published in the Journal of Chemometrics (1).
Young woman examines a spectroscopy picture in a quantum physics laboratory | Image Credit: © luchschenF - stock.adobe.com
Near-infrared (NIR) spectroscopy has been used in a wide variety of tasks in the past few years. The technique has been used to predict the harvest times of cabernet sauvignon grapes, detect Covid-19, and analyze emission lines from a supernova (in tandem with mid-infrared [MIR] spectroscopy (2–4). When using NIR, model constructions and maintenance updates are essential. Model construction, when being performed in machine learning, usually has a sample set divided into a calibration set and a validation set. The representativeness of the calibration set, and the reasonable distribution of the validation set affect the accuracy of the established model. Additionally, while maintaining and updating models, selecting the most informative updated samples can not only improve the model prediction accuracy, but also reduce the amount of sample preparation that is necessary.
For this study, spectral information entropy (SIE) is proposed as a similarity criterion for dividing sample sets, with this criterion being used to select updated samples. Two methods were used for comparing and verifying the superiority of this proposed method: the Kennard–Stone (KS) method, which is a way to perform a split between training and test set based on a distance metric between data points, spectra or labels, and the sample set portioning based on joint x–y distance (SPXY) method (5).
The model that was built after dividing the sample set with SIE was shown to have a good prediction effect compared to the sample sets that were divided with KS and SPXY. When predicting soluble solid content (SSC) and hardness, the prediction determination coefficient (R2P) was improved by over 15%, while the root mean square error (RMSE) of prediction was reduced by 50%. Regarding model updating, it was found that selecting a small number of updated samples using SIE can improve a correlation efficient (RP) by more than 80%, with updated models having prediction accuracies higher than those of the KS and SPXY methods. These results confirm that SIE can make the NIR analysis technique more reliable.
(1) Liu, Y.; He, C.; Jiang, X. Sample Selection Method Using Near-Infrared Spectral Information Entropy as Similarity Criterion for Constructing and Updating Peach Firmness and Soluble Solids Content Prediction Models. J. Chemom. 2023, 38 (2), e3528. DOI: https://doi.org/10.1002/cem.3528
(2) Luo, Y.; Zhao, J.; Zhu, H.; Li, X.; Dong, J.; Sun, J. Prediction of the Harvest Time of Caberney Sauvignon Grapes Using Near-Infrared Spectroscopy. Spectroscopy 2024. https://www.spectroscopyonline.com/view/prediction-of-the-harvest-time-of-cabernet-sauvignon-grapes-using-near-infrared-spectroscopy (accessed 2024-3-25)
(3) Acevedo, A. Detecting Covid-19 Using Visible or Near-Infrared Spectroscopy and Machine Learning. Spectroscopy 2023. https://www.spectroscopyonline.com/view/detecting-covid-19-using-visible-or-near-infrared-spectroscopy-and-machine-learning (accessed 2024-3-25)
(4) Wetzel, W. Observing Supernova 1987A with Near-infrared and Mid-infrared Spectroscopy. Spectroscopy 2024. https://www.spectroscopyonline.com/view/observing-supernova-1987a-with-near-infrared-and-mid-infrared-spectroscopy (accessed 2024-3-25)
(5) The Kennard-Stone Algorithm. NIRPY Research 2022. https://nirpyresearch.com/kennard-stone-algorithm/ (accessed 2024-3-25)
Evaluating Microplastic Detection with Fluorescence Microscopy and Raman Spectroscopy
July 2nd 2025A recent study presented a dual-method approach combining confocal micro-Raman spectroscopy and Nile Red-assisted fluorescence microscopy to enhance the accuracy and throughput of microplastics detection in environmental samples.
Toward a Generalizable Model of Diffuse Reflectance in Particulate Systems
June 30th 2025This tutorial examines the modeling of diffuse reflectance (DR) in complex particulate samples, such as powders and granular solids. Traditional theoretical frameworks like empirical absorbance, Kubelka-Munk, radiative transfer theory (RTT), and the Hapke model are presented in standard and matrix notation where applicable. Their advantages and limitations are highlighted, particularly for heterogeneous particle size distributions and real-world variations in the optical properties of particulate samples. Hybrid and emerging computational strategies, including Monte Carlo methods, full-wave numerical solvers, and machine learning (ML) models, are evaluated for their potential to produce more generalizable prediction models.
Combining AI and NIR Spectroscopy to Predict Resistant Starch (RS) Content in Rice
June 24th 2025A new study published in the journal Food Chemistry by lead authors Qian Zhao and Jun Huang from Zhejiang University of Science and Technology unveil a new data-driven framework for predicting resistant starch content in rice