Investigating Food Purity Using Raman Spectroscopy Combined with Machine Learning

Determining the quality of the food we consume is important not just for reasons of safety, but for verifying authenticity as well. Changmou Xu, a Research Associate Professor at the University of Nebraska-Lincoln (UNL), and his colleagues have been exploring methods for food analysis that are rapid but do not harm the environment or the analysts. We spoke with Xu about the work he and his fellow scientists are doing using Raman spectroscopy combined with machine learning in pursuit of finding a rapid, greener way to analyze the purity of food products.

How did you become interested in investigating new ways to authenticate edible oils?

One main focus of the research activities in my laboratory at the Food Processing Center and Food Science and Technology Department at UNL is to evaluate the effect of phytochemicals and natural compounds on inhibiting lipid oxidation of edible oils and fatty foods. While my former PhD student, Hefei Zhao, was working on this project, we realized that the conventional methods for lipid analysis based on chromatography, spectrophotometry, and chemometrics were generally time-consuming, low-throughput, and labor-intensive. Also, these analyses generated a large amount of toxic and organic solvent wastes, which are harmful to the analyst and environment. Therefore, we believed it would be of great value to develop a new rapid and green method for analyzing the quality of edible oils, including the type, adulteration, and oxidation state. On the other hand, edible oils are an indispensable source of nutrition and, accordingly, are widely present in food. Oil adulteration has been a chronic issue for many years because of the large differences between oil prices. A rapid method to authenticate edible oils would make it easier to ensure a high-quality food product for consumers.

In a recent paper, you and your colleagues described how machine-learning (ML) and Raman spectroscopy were used for the rapid detection of edible oil type and adulteration (1). Why is Raman spectroscopy a good technique to combine with ML for applications such as the one discussed in your paper?

Raman spectroscopy observes the unique set of molecular vibrations for a given sample. In this manner, the technique can provide information, in just a few seconds, on which chemical functional groups are present, that can then be used to assess the chemical composition and purity of an edible oil. Raman spectroscopy does not require complicated sample pretreatment. Therefore, it is a good technique choice for rapid and online detection. However, the differences between Raman spectra among samples is subtle; therefore, it was necessary to apply statistical analysis to identify these unique spectral differences accurately and efficiently. This led us to collaborate with Dr. Robert Powers, a Professor of Chemistry, and Dr. Yuzhen Zhou, a Professor of Statistics, at the University of Nebraska-Lincoln, to employ ML algorithms for a rapid spectra analysis and to provide a simple direct readout of chemical names or concentrations.

How would you describe the ML technique you use and what happens when this type of ML is combined with Raman spectroscopy?

ML and its subset, deep learning, can be applied to many different types of multidimensional data, such as pictures, medical images, spectra, and so on for the rapid identification, feature extraction, classification, and regression. Nowadays, ML techniques are becoming increasingly popular in many disciplines outside of computer science. However, the application of ML to solving food science problems is still limited due to the time-consuming process of data collection. Consequently, the rapid detection attributed to Raman spectroscopy makes it potentially a best choice for a detection technology for combining ML with food science data. The ML algorithms we selected can rapidly and accurately identify oil types and can also provide the percentage of mixing oils in adulterated samples. The feature extraction capabilities of certain algorithms, such as random forest (RF), were effective in providing the specific features for chemical functional groups that were critical for classification and differentiation of oil types. Importantly, we found that the accuracy of different algorithms varied with the Raman spectral analysis. Therefore, we reported both high-performance and low-performance algorithms, which was another meaningful contribution of this study.

How does unsupervised principal component analysis (PCA) differ from your application of ML?

PCA is an unsupervised ML method for clustering or dimension reduction. It provides a simple visualization of the relative grouping and classification of the data. A primary limitation of PCA is that the algorithm simply highlights any source of group variation present in the data set, which may not be the desired group difference. When it was applied to our problem of identifying oil types, the algorithm was trying to group samples by their similarities in Raman spectroscopy without utilizing the known oil types in the training dataset (called an unlabeled dataset). It means the PCA method does not have a specific objective while doing clustering and hence it was “unsupervised” in learning. It is usually less efficient than supervised ML methods for classification tasks. Supervised learning methods such as the random forest are trained on the labeled dataset. A supervised method would use the Raman spectroscopy data along with known oil types of information during the training (or learning) process, which would guide the algorithm to discover important and essential features within the high-dimensional Raman spectroscopy for the purpose of accurately classifying oil types for new samples. Therefore, it usually outperforms the PCA method. Besides, ML can provide a direct readout of a sample classification into a specific group. It can also be easily applied to the analysis of large datasets.

What are the advantages of using ML with Raman spectroscopy as compared to gas chromatography (GC)–based food product authentication techniques for edible oils?

ML can be used with both analytical methods. The major difference is the time for data collection. ML with Raman spectroscopy is faster than GC, which generally requires one hour for sample analysis along with the additional time needed for sample pretreatment and post-data analysis. Raman also avoids any chemical waste associated with sample preparation. Furthermore, GC routinely requires chemical modification or derivatization to get the samples into the gas phase. Conversely, GC provides a higher accuracy in sample analysis according to our study. Overall, the two methods are complementary to each other.

Beyond food and beverage applications, are there other areas in which this technique would be most suitable?

Raman spectroscopy has been used in many research areas, including chemical, biological, medical, and environmental studies. According to KPCB’s Mary Meeker annual Internet Trends from 2016, people shared an average of 3.2 billion digital images every single day on select platforms globally. Growth remains robust as new real-time platforms emerge (2). But think about that. Can we learn more about our world via data collections on a molecular level, in addition to photos and pictures? Our results suggest that it would be possible to integrate Raman sensors with ML and a database onto our cellphones or smartwatches. This would enable individuals with live assessments of quality, authenticity, and safety of our food and environment, and alert us to any hazardous compounds, and so on. We believe this technology can be integrated into our daily life with further development of the technology.

What are your next steps or what new applications might you test for using this approach in food and beverage analysis?

Besides determining the type and adulteration of edible oils with this technique, we are also exploring its application to rapidly determining the oxidation of edible oils and fatty foods, which has a broader impact on food industry as lipid oxidation is a major reason for food spoilage. We are also interested in involving industry to further develop the ML-Raman spectroscopy technique for the rapid online detection of chemical compounds, food contaminants, or for quality control of food processing.

References

  1. H. Zhao, Y. Zhan, Z. Xu, J.J. Nduwamungu, Y. Zhou, R. Powers, and C. Xu, Food Chem 373, Part B, (2021). https://doi.org/10.1016/j.foodchem.2021.131471
  2. M. Meeker, Internet Trends 2016, p. 90 (2016). https://www.kleinerperkins.com/perspectives/2016-internet-trends-report

Changmou Xu is a Research Associate Professor at the Department of Food Science and Technology, University of Nebraska-Lincoln. He also is a Project Manager of the Food Processing Center at the department. He received his PhD degree in Food Science from the University of Florida. His research mainly focuses on the application of machine learning for intelligent food science research, and value-added and sustainable food production and processing. Xu also is an Associate Editor of the Journal of Food Biochemistry (Wiley).

Hefei Zhao is currently a Postdoctoral Scholar at the Department of Food Science and Technology, at the University of California, Davis. He received his PhD in Food Science from the University of Nebraska-Lincoln in 2021. His research focuses on food chemistry and analysis, and the combined application with machine learning methods.

Yuzhen Zhou is an Assistant Professor in the Department of Statistics at University of Nebraska-Lincoln. His expertise is statistical modeling on spatio-temporal data, machine learning, and deep learning methods, focusing on application to sequence data and image data.

Robert Powers is a Professor of Chemistry at the University of Nebraska-Lincoln, Director of the Systems Biology core facility within the Nebraska Center for Integrated Biomolecular Communication and is on the scientific advisory board for Olaris Therapeutics, Inc. and Nexomics Biosciences, Inc. Powers received his BA from Rutgers University, a PhD in chemistry from Purdue University, and was an IRTA postdoctoral fellow at the National Institute of Allergy and Infectious Diseases.

This research project was funded as a Layman Seed Award of the University of Nebraska Foundation.