Hyperspectral Imaging and Machine Learning to Non-Destructively Predict Nutritional Content of Cheese

Fact checked by Caroline Hroncich
News
Article

Researchers at Wittenborg University of Applied Sciences have developed a non-destructive method using hyperspectral imaging combined with chemometrics and machine learning to accurately predict fat and protein content in diverse cheese types.

Key Points

  • A research team developed a sustainable and rapid method using hyperspectral imaging (HSI) combined with chemometrics and machine learning (ML) to predict fat and protein content in 73 cheese samples, offering a viable alternative to traditional destructive testing.
  • The best-performing models—particularly UVE-PLS and IPW-PLS for chemometrics, and multilayer perceptron (MLP) for ML—achieved high prediction accuracy, with R² values up to 0.98, highlighting the method’s effectiveness for macronutrient estimation in a wide variety of cheeses.
  • While ML models showed promise, they lacked interpretability compared to chemometric models, and the study’s lack of external validation limits its generalizability.

A recent study published in Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy examined a new method for determining the fat and protein content of cheese. In this study, the research team demonstrated a new method using hyperspectral imaging (HSI) combined with chemometrics and machine learning (ML) (1). Led by Mercedes Bertotto, the research offers a more sustainable, rapid, and informative alternative to traditional analytical methods, which are often labor-intensive, destructive, and environmentally taxing.

Different kinds of delicious cheese on table | Image Credit: © Africa Studio - stock.adobe.com

Different kinds of delicious cheese on table | Image Credit: © Africa Studio - stock.adobe.com

What was the experimental procedure?

As part of the experimental procedure, the research team analyzed 73 cheese samples that represent 32 different varieties commonly found in Dutch supermarkets. By integrating this diverse array into a single broad-based predictive model, the team aimed to develop a more universally applicable method for macronutrient estimation (1). The team tested several data pretreatment techniques, variable selection strategies, and predictive modeling approaches, including both classical chemometric techniques, such as partial least squares (PLS) and advanced ML models like multilayer perceptron (MLP) neural networks.

What were the key findings?

Among the key findings in the study, the best-performing chemometric model for protein prediction employed an extended multiplicative scatter correction (EMSC) pretreatment with a degree of six, reaching an R² prediction (R²_pred) of 0.96 and a mean squared error of prediction (MSEP) of 2.61 (1). The uninformative variable elimination PLS (UVE-PLS) model, after selecting 80 relevant wavelengths, achieved an R²_pred of 0.98 and a lower RMSE of 1.41 (1).

The iterative predictor weighting PLS (IPW-PLS) model, meanwhile, was the most optimal for fat prediction. It achieved an R²_pred of 0.94 and a root mean squared error of prediction (RMSEP) of 2.15 (1). Comparably, the best MLP model achieved R²_pred values of 0.94 for protein and 0.97 for fat, which confirmed that ML could be used in this field for this purpose (1).

However, the research team acknowledges that ML models still need to be refined further. The study showed that ML models provided limited insight into which variables or wavelengths were most critical for prediction. However, the chemometric models were able to drill down into the key spectral features, and they were not random (1). Across several variable selection strategies, wavelengths at 941.1, 976.19, 1165.95, 1194.1, 1215.23, 1384.42, 1469.15, and 1716.85 nm consistently appeared as significant predictors of fat and protein content (1).

The methodology involved extensive testing of variable selection approaches such as CovSel, IPW-PLS, and interval PLS (iPLS). The researchers also compared the effect of different data pretreatment techniques, recognizing that pretreatment had a significant impact on model performance (1). This comprehensive evaluation allowed them to fine-tune the models and identify optimal strategies for prediction.

What should future studies concentrate on?

Cheese is an important food product that has several tangible health benefits (2). Apart from serving as a source of protein and fat, it contains other nutrients such as calcium, riboflavin, zinc, and phosphorus (2). Future studies can concentrate on analyzing and interpreting the other nutrients in cheese.

It is also important to acknowledge that although a validation set was used to compare various model and preprocessing combinations, it was not an external validation set (1). The authors acknowledge this limitation and caution against over-interpreting the results in terms of real-world generalizability (1). However, the consistent use of the validation set across all comparisons enabled a fair assessment of the relative strengths and weaknesses of each modeling strategy.

This study represents a step forward in the non-invasive analysis of food composition. By merging traditional chemometric rigor with the power of machine learning, Bertotto and the team have created a model framework that is both accurate and interpretable for not only cheese, but potentially for other foods as well (1).

References

  1. Bertotto, M.; Kok, E.; Ummels, M.; et al. Comparison Between Chemometrics and Machine Learning for the Prediction of Macronutrients in Cheese Using Imaging Spectroscopy. Spectrochimica Acta Part A: Mol. Biomol. Spectrosc. 2025, 343, 126484. DOI: 10.1016/j.saa.2025.126484
  2. Schaefer, A. Is Cheese Bad for You? Healthline.com. Available at: https://www.healthline.com/health/is-cheese-bad-for-you (accessed 2025-06-20).
Recent Videos
Christian Huck discusses how spectroscopic techniques are revolutionizing food analysis. | Photo Credit: © Spectroscopy.
Related Content