Identification of Different Dairy Products Using Raman Spectroscopy Combined with Fused Lasso Distributionally Robust Logistic Regression


To improve the robustness and accuracy of logistic regression identification method, a new Raman spectroscopy identification method was proposed that combines a distributionally robust optimization technique and fused lasso technique with logistic regression. Then, Raman spectroscopy was used to analyze two types of dairy products that were collected for anti-jamming identification testing to verify the effectiveness of the new method.

Raman spectroscopy has been more widely used recently in the quality detection of dairy products. Because Raman spectroscopy can conduct rapid analyses of small sample sizes at high dimensions, its use in the dairy industry is becoming a hot topic for researchers. To improve the robustness and accuracy of logistic regression identification method, a new Raman spectroscopy identification method was proposed that combines a distributionally robust optimization technique and fused lasso technique with logistic regression. Then, Raman spectroscopy was used to analyze two types of dairy products that were collected for anti-jamming identification testing to verify the effectiveness of the new method. The experimental results show that the proposed method is more robust and has a higher recognition accuracy than the traditional logistic regression.

As one of the important food sources for humans, dairy products are regularly consumed globally. Recently, the dairy industry has seen an increase in the sale of wantonly counterfeit dairy products driven by economic interests, which has seriously endangered the life and health of consumers. As a result, it has become imperative to efficiently identify and detect adulterated substances in dairy products, such as milk powder, to ensure the safety of dairy products safety, which is of vital significance to China’s current national conditions. Among the various detection technologies, Raman spectroscopic imaging technology is one of the few technologies that can simultaneously make it possible to rapidly conduct micro-region analysis and large-area scanning, and it has a great application potential in the analysis of adulterated substances of milk powder with its unique advantages of high resolution and high throughput. The Raman spectral analysis technique is based on the Raman scattering effect discovered by Indian scientist C. V. Raman. It analyzes the scattering spectrum different from the incident light frequency to obtain information on molecular vibration and rotation, and it is applied to the study of molecular structure.

Figure 1 (available in the PDF link below) shows the Raman spectroscopy of two different substances. We can see that the differences in the Raman spectroscopy of F and I are mainly reflected in the stokes peaks, and the algorithm proposed in this paper is mainly based on the extraction of peak segment characteristics as a means of identifying different material components.

The Raman spectrum has many raw data features, including a small sample size and several distinct peaks, but also a large amount of noise interference. Therefore, it is particularly important to pretreat the spectrum, remove noise interference, and preserve stokes peaks. When analyzing and applying Raman spectroscopy, because of the influence of noise, the small sample size and sequence correlation of features, the existing methods face many challenges, such as noise interference, low efficiency, and difficulty in identification. To solve these problems, a fused lasso logistic regression model based on the distributionally robust optimization, which uses the robust optimization idea to weaken the influence of noise, is proposed. The fused lasso method extracts the key feature terms and ensures the sequencing between the feature to highlight the spectral peaks. Meanwhile, we designed an algorithm for the model, and the iterative process of the algorithm has a simple closed form, which is convenient to be applied in practice.

Dairy products have long faced challenges, such as adulteration and counterfeiting, drug residues, contaminants, and excessive use of food additives and illegal additives. Compared to the high cost and long cycle of laboratory testing methods, Raman spectroscopy is non-destructive, efficient, and pollution-free, which is why it has been used in several food analysis applications. For example, Raman spectroscopy was used to perform real-time detection of the quality of different types of cooking oil under frying conditions, and it had also been used to classify and certify 70 servings of Spanish peppers (1,2). Ríos-Reina and others distinguished Spanish Protected Designation of Origin wine vinegar for categories with Raman spectroscopy (3). Raman spectroscopy was also used to classify and identify Spanish gasoline (4), and the exhaust soot of diesel engines and gasoline engines under different laser powers (5). Five fat samples using Raman spectroscopy were identified by using the kernel principal components analysis nearest neighbor model (6).

Because the signal is subject to experimental testing conditions and baseline interference, there are interference factors, such as noise, baseline drift, overlapping peaks, cosmic ray energy, and fluorescence that impact the analysis of Raman spectroscopy data. To resolve these problems, researchers have made many attempts to improve Raman measurements in recent years. One study made use of the correlation coefficients of wavelets of different levels to identify noise and complete the denoising of Raman spectra (7). Angeyo and Gari conducted correlation analysis between the spectra of seaweed samples and traditional inductively coupled plasma (ICP) spectra (1). In their research, they used the fully cross-verified partial least squares (PLS) regression method and the nonlinear iterative PLS algorithm. Principal component analysis (PCA) combined with linear discriminant analysis (LDA) was used for multivariate statistical analysis to distinguish the Raman spectra of different blood groups (8). However, most of Raman spectroscopy applications in dairy products are still used for qualitative detection. There are still some interference factors in the actual quantitative analysis process, such as poor reproducibility and signal susceptibility to experimental conditions. Infrared (IR) spectroscopy was combined with stoichiometry to detect the quality parameters in milk powder (9). Fourier transform Raman spectroscopy was explored as a fast and reliable screening method for assessing milk powder quality and identifying doped whey (10,11). Based on Swiss cheese maturation process, the PLS regression and artificial neural networks were utilized to model the relationship between spectral profiles and hardness values (12).

With the development of machine learning technology, a quantitative analysis method was established for polycyclic aromatic hydrocarbon (PAH) surface-enhanced Raman spectroscopy (SERS) with PCA dimensionality reduction and the support vector machine (SVM) algorithm (13). It was proposed that an adaptive genetic algorithm for point-by-point selection mixed terahertz absorption spectral wavelengths, dynamically adjusting the crossover and mutation probability (14). The extremely randomized trees model was used to accurately match the entire spectral range to their respective minerals (15). A dynamic spectrum matching method was developed, which was based on convolutional conjoined neural networks (16). Sha and others studied the uniformity of rice flours of four different particle sizes using relative standard deviation analysis of Raman spectra and hierarchical cluster analysis (17). A robust correction model was designed for blood glucose spectral monitoring using a support vector machine algorithm (18). Most studies have shown that Raman spectroscopy has broad application for rapid and non-destructive quality testing of dairy products. However, the advanced analysis methods of Raman spectroscopy in dairy products analysis applications still need to be explored.

The contributions of this paper are as follows: first, a new Raman spectroscopy recognition model is proposed by using the fused lasso term, distributionally robust optimization and logistic regression. Second, we transform the distributionally robust optimization model that cannot be solved directly into a tractable form through optimization theory. And third, we develop an effective algorithm for solving the transformed model. Its outer loop is the cutting plane method and its inner loop is the Variant Auxiliary Problem Principle (VAPP) method.

To view this article in its entirety, you can access the PDF of the full article below.

Access the PDF of the full article here.


(1) Angeyo, H. K.; Gari, S. Direct rapid quality assurance analysis of complex matrix materials: A chemometrics enabled energy dispersive X-ray fluorescence and scattering spectrometry application. Appl. Radiat. Isot. 2022, 110274. DOI: 10.1016/j.apradiso.2022.110274

(2) Campmajó, G.; Saurina, J.; Núñez, O.; et al. Differential mobility spectrometry coupled to mass spectrometry (DMS–MS) for the classification of Spanish PDO paprika. Food Chem. 2022, 390:, 133141. DOI: 10.1016/j.foodchem.2022.133141

(3) Ríos-Reina, R.; Elcoroaristizabal, S.; Ocaña-González, J. A.; et al. Characterization and authentication of Spanish PDO wine vinegars using multidimensional fluorescence and chemometrics. Food Chem. 2017, 230: 108–116. DOI: 10.1016/j.foodchem.2017.02.118

(4) Ardila, J. A.; Soares, F. L. F.; dos Santos Farias, M. A.; et al. Characterization of gasoline by Raman spectroscopy with chemometric analysis. Anal. Lett. 2017, 50 (7), 1126–1138. DOI: 10.1080/00032719.2016.1210616

(5) Ge, H.; Ye, Z.; He, R. Raman spectroscopy of diesel and gasoline engine-out soot using different laser power. J. Environ. Sci. 2019, 79, 74–80. DOI: 10.1016/j.jes.2018.11.001

(6) Wang, H.; Song, C.; Liu J.; et al. Authenticity identification and adulteration analysis of milk powder based on Raman spectroscopy-pattern recognition method. Spectrosc. Spectr. Anal. 2017, 37 (1), 124–128.

(7) Ehrentreich, F.; Sümmchen, L. Spike removal and denoising of Raman spectra by wavelet transform methods. Anal. Chem. 2001, 73 (17), 4364–4373. DOI: 10.1021/ac0013756

(8) Lin, D.; Zheng, Z.; Wang, Q.; et al. Label-free optical sensor based on red blood cells laser tweezers Raman spectroscopy analysis for ABO blood typing. Opt. Express 2016, 24 (21), 24750–24759. DOI: 10.1364/OE.24.024750

(9) Coitinho, T. B.; Cassoli, L. D.; Cerqueira, P. H. R.; et al. Adulteration identification in raw milk using Fourier transform infrared spectroscopy. J. Food Sci. Technol. 2017, 54 (8), 2394–2402. DOI: 10.1007/s13197-017-2680-y

(10) Almeida, M. R.; Oliveira, K. D. S.; Stephani, R.; et al. Fourier‐transform Raman analysis of milk powder: a potential method for rapid quality screening. J. Raman Spectrosc. 2011,42 (7), 1548–1552. DOI: 10.1002/jrs.2893

(11) Mabood, F.; Jabeen, F.; Hussain, J.; et al. FT-NIRS coupled with chemometric methods as a rapid alternative tool for the detection & quantification of cow milk adulteration in camel milk samples. Vib. Spectrosc. 2017, 92, 245–250. DOI: 10.1016/j.vibspec.2017.07.004

(12) Vásquez, N.; Magán, C.; Oblitas, J.; et al. Comparison between artificial neural network and partial least squares regression models for hardness modeling during the ripening process of Swiss-type cheese using spectral profiles. J. Food Eng. 2018, 219, 8–15. DOI: 10.1016/j.jfoodeng.2017.09.008

(13) Chen, Y.; Yan, X.; Zhang, X.; et al. Surface-Enhanced Raman Spectroscopy Quantitative Analysis of Polyey-clic Aromatic Hydrocarbon Based on Support Vector Machine Algorithm. Chin. J. Lasers 2019, 46 (3), 1–8. DOI: 10.3788/CJL201946.0311005

(14) Li, Z.; Guan, A.; Ge, H.; et al. Wavelength selection of amino acid THz absorption spectra for quantitative analysis by a self-adaptive genetic algorithm and comparison with mwPLS. Microchem J. 2017, 132,185–189. DOI: 10.1016/j.microc.2017.02.002

(15) Sevetlidis, V.; Pavlidis, G. Effective Raman spectra identification with tree-based methods. J. Cult. Herit. 2019, 37, 121–128. DOI: 10.1016/j.culher.2018.10.016

(16) Liu, J.; Gibson, S. J.; Mills, J.; et al. Dynamic spectrum matching with one-shot learning. Chemometrics Intell. Lab. Syst. 2019, 184, 175–181. DOI: 10.1016/j.chemolab.2018.12.005

(17) Sha, M.; Gui, D.; Zhang, Z.; et al. Evaluation of sample pretreatment method for geographic authentication of rice using Raman spectroscopy. J. Food Meas. Charact. 2019, 13, 1705–1712. DOI: 10.1007/s11694-019-00087-7

(18) Barman, I.; Kong, C. R.; Dingari, N. C.; et al. Development of robust calibration models using support vector machines for spectroscopic monitoring of blood glucose. Anal. Chem. 2010, 82 (23), 9719–9726. DOI: 10.1021/ac101754n

(19) Zhao, L.; Zhu, D. L. On iteration complexity of a first-order primal-dual method for nonlinear convex cone programming. J. Oper. Res. Soc. China 2022, 10, (1), 53–87. DOI: 10.1007/s40305-021-00344-x

Xiang Xu, Wentao Xiao, Yiyun Cao, and Zhengyong Zhang are with the School of Management Science and Engineering at Nanjing University of Finance and Economics, in Nanjing, China. Direct correspondence to:

Related Videos
Related Content