Key Points
- NIR spectroscopy combined with machine learning techniques (PCA, PCR, and self-organizing maps) enabled accurate, non-destructive quantification of cocoa content in chocolate.
- The study demonstrated high prediction accuracy (R² = 0.84) and effective classification of chocolate samples based on cocoa content.
- Given rising consumer demand for high-cocoa, clean-label, and specialty chocolates amid concerns over food fraud, the integration of spectroscopy and ML provides a promising solution for verifying cocoa levels and ensuring product integrity.
A recent study examined how a novel method that can accurately quantify the cocoa content in commercial chocolate samples. This study, which was published in the journal Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, showcased the utility of near-infrared (NIR) spectroscopy and machine learning (ML) techniques for analyzing the cocoa content in chocolate (1). The research team, comprised of scientists from various institutions in Brazil, integrated NIR spectroscopy with advanced data analysis tools such as principal component analysis (PCA), principal component regression (PCR), and kohonen neural networks (KNNs) to support both chocolate quality control and product authenticity verification (1).
What is the state of the chocolate market?
Despite the economic challenges in the global economy, the demand for chocolate remains high. The chocolate industry was worth $48.29 billion in 2022, and is expected to reach $67.88 billion by 2029 (2). One of the current trends in the chocolate industry is the rise of specialty chocolates, which are produced by the United States, France, Belgium, and Germany (2). The industry is also seeing increased demand for clean-label and organic products, which has resulted in the rise of dark and sugar-free chocolate (2).
Some types of chocolate also have health benefits, which are often linked to its cocoa content, which includes antioxidants, flavonoids, and central nervous system stimulants like caffeine and theobromine (1). As consumer interest in high-cocoa-content chocolates grows, it becomes more important that there are reliable methods to verify cocoa levels across a wide range of chocolate products.
What did the researchers test in their study?
In their study, the researchers tested whether NIR spectroscopy could serve as a fast, non-destructive, and accurate method to determine cocoa content. Using a spectral range between 900–1600 nm, they found that the cocoa percentage correlated most significantly with absorbance features between 900 and 1400 nm (1). This region of the NIR spectrum contains rich information about the molecular bonds associated with chocolate’s chemical constituents, including those compounds that contribute to its nutritional and sensory profiles (1).
Using PCA, the team interpreted the spectral data, identifying patterns among the samples of chocolate. The results obtained indicated clear separation between samples based on cocoa percentage, with the first three principal components capturing most of the spectral information (1). This demonstrated the strong discriminatory power of PCA and its ability to distinguish between chocolates of varying cocoa content (1).
The research team also evaluated the predictive potential of NIR spectra for cocoa quantification. Using principal component regression (PCR), the researchers obtained a coefficient of determination (R²) value of 0.84, indicating a reasonably high level of prediction accuracy (1). This result highlights the promise of NIR-PCA-PCR modeling in real-world quality assurance settings.
What are self-organizing maps (SOMs) and how were they used in the study?
The researchers also used self-organizing maps (SOMs) in their study to handle the complex spectral signatures of chocolate. The neural network was trained using a data set of 232 chocolate samples, which were split into training, testing, and validation sets at a 70-15-15 ratio (1). During training, the researchers applied a Gaussian neighborhood function and adjusted learning rates and node radius to optimize the network’s performance.
By using the quantization error (QE), which is a metric that checks how closely the input data matches the network’s output neurons, the researchers evaluated the neural network’s performance. The results showed that the SOM significantly enhanced classification precision and helped uncover subtle relationships between the NIR spectra and cocoa content (1).
As a result, integrating spectroscopy and ML allowed the team to map and recognize complex interdependencies between chemical compounds and cocoa percentage, making it possible to estimate cocoa content efficiently and accurately (1).
The authors emphasized that the integration of NIR spectroscopy with data reduction and ML tools like PCA, PCR, and KNN creates a powerful platform for chocolate authenticity testing. As cases of food fraud and mislabeling continue to draw concern globally, the development of rapid analytical techniques like this one offers timely and tangible benefits for both manufacturers and consumers (1).
References
- Goncalves Lima, C. M.; Silveira, P. G.; Santana, R. F.; et al. Leveraging Infrared Spectroscopy for Cocoa Content Prediction: A Dual Approach with Kohonen Neural Network and Multivariate Modeling. Spectrochimica Acta Part A: Mol. Biomol. Spectrosc. 2025, 335, 125975. DOI: 10.1016/j.saa.2025.125975
- Fortune Business Insiders, Cocoa and Chocolate Market Size, Share & COVID-19 Impact Analysis, By Type (Cocoa Ingredients (Butter, Liquor, Powder) and Chocolate (Dark, Milk, White, and Filled)), By Application (Food & Beverage, Cosmetics, Pharmaceuticals, and Others), and Regional Forecast, 2022-2029. Fortune Business Insiders. Available at: https://www.fortunebusinessinsights.com/industry-reports/cocoa-and-chocolate-market-100075 (accessed 2025-07-08).