News
Article
Spectroscopy
Author(s):
The analysis of chemical data has undergone a profound transformation, from early basic statistical methods into the modern era of machine learning (ML) and artificial intelligence (AI). This progression is particularly evident in the field of spectroscopy, where multivariate analysis techniques such as regression, principal component analysis (PCA), and partial least squares (PLS) laid the foundation for today’s more advanced or automated ML calibration modeling techniques. This Chemometrics in Spectroscopy column traces the historical and technical development of these methods, emphasizing their application in calibrating spectrophotometers for prediction of measured sample chemical or physical properties—particularly in near-infrared (NIR), infrared (IR), Raman, and atomic spectroscopy—and explores how AI and deep learning are reshaping the spectroscopic landscape. In the previous installment of this two-part series we have taken a look back into the history of chemometrics. In this second part of this series we will peer into the future for an estimation of where chemometrics might be going.
A series of recent Spectroscopy articles and podcasts has given a much more detailed discussion of the current changes, definitions, and advances in chemometrics, including the terms artificial intelligence (AI)and machine learning (ML) as applied to spectroscopy. Up until just a few years ago most analytical chemistry publications used standard chemometrics terms related to their data processing methods. In just the past few years, the term machine learning appears more frequently to discuss what were considered mainstream chemometrics terms just a few years ago. Recalling the famous quote by William Shakespeare from Romeo and Juliet “A rose by any other name would smell as sweet.” In like manner for chemometrics, the essence of algorithms doesn’t change regardless of what they are called. If one may summarize and simplify what is meant here: chemometrics is the use of the advanced algorithms including AI, deep learning, neural networks, machine learning, statistics, and other mathematics and computer science as applied to understanding chemical data. By chemical data we mean analytical data measured from samples or processes involving their chemistry, whether zero-order, fused data from multiple analytical techniques, or complex imaging data containing spatial, physical, and chemical information. Reference (58) gives very recent descriptions and resources for connecting with the current trends in AI and chemometrics terminology, while (59–61) discuss the typical terms and methods classically used for chemometrics in spectroscopy from 2020 and 2021.
As computational power increased, the 21st century saw a paradigm shift from what one might call classical chemometrics to more sophisticated terms like ML and AI methods. The advancements in these data analytics and the nuances of their use offer significant advantages for automation, and in handling high-dimensional, non-linear, and large data sets—which are common challenges in modern spectroscopic analysis (58).
ML techniques such as support vector machines (SVMs), random forests (RFs), and neural networks (NNs) have been adopted for spectral calibration. ML methods can capture complex, non-linear relationships between spectral data and analyte concentrations, improving apparent prediction accuracy. One must take care in referring to simple algorithm names to make comparisons. One of our pet peeves are the published literature which compares algorithm performance for specific applications without including a detailed description of the algorithms used for comparison. For example, one might say that such-and-such an algorithm they were “promoting” was compared to PLS and resulted in an improved predictive performance; but what type of PLS are they are referring to? If one explores the use of PLS across mathematical and scientific literature one finds an extremely diverse set of algorithms.
For example, PLS is a powerful regression and dimensionality reduction method with several variants tailored for specific applications. Key types include PLS-1 and PLS-2 for single and multiple response variables, respectively, and discriminant PLS (PLS-DA) for classification tasks. Advanced forms like orthogonal PLS (O-PLS) improve interpretability by separating predictive and non-predictive variations, while sparse PLS (sPLS) incorporates feature selection for high-dimensional data. Non-linear extensions include kernel PLS (KPLS), which maps data into a higher-dimensional space using kernel functions, and locally weighted PLS (LW-PLS), which adapts models locally to capture non-linear relationships. Polynomial and quadratic PLS expand the predictor space with higher-order terms, while neural network-based PLS (NN-PLS) integrates neural networks to model complex dependencies. Other specialized forms like multiblock PLS (MB-PLS) and three-way PLS (3W-PLS) address multi-block and tensor data structures. These methods find applications in spectroscopy, chemometrics, bioinformatics, and process monitoring, among others (62,63). Get the point?
The landmark paper “Attention is All You Need” by Vaswani et al. (64) introduced the transformer architecture, which revolutionized the field of machine learning, particularly in natural language processing (NLP). Unlike previous architectures that relied heavily on recurrent or convolutional networks for sequential data processing, the transformer architecture relies entirely on a mechanism called self-attention. This mechanism allows the model to weigh the importance of different words in a sequence relative to each other, without sequential dependency. As a result, transformers can handle large, complex datasets more efficiently and capture long-range dependencies with less computational cost. This advance paved the way for deep learning models that are highly scalable, parallelizable, and capable of extraordinary generalization.
In the context of chemometrics, the concept of ML has progressed from early statistical methods and classical regression techniques to more complex multivariate analyses and ML method variants. Initially, chemometric methods relied heavily on linear models like principal component analysis (PCA), partial least squares (PLS), and cluster analysis (CA) to uncover patterns in chemical data. However, these methods often faced limitations in handling non-linear relationships and complex interactions within high-dimensional data. With the advent of AI and deep learning, chemometrics has incorporated more sophisticated algorithms, such as neural networks and ensemble models, enhancing data interpretation in spectroscopy, chromatography, and sensor analysis.
Self-attention and transformer architectures have the potential to drive the next frontier in chemometrics. Here’s why:
In summary, AI techniques such as transformers promise to be “transformative” for chemometrics and broader data analytics in chemistry. By allowing researchers to interpret increasingly complex datasets more accurately and efficiently, these methods could advance chemical research, product development, and quality assurance, ultimately leading to more informed, data-driven decisions in chemistry (64).
Deep learning, particularly convolutional neural networks (CNNs), and artificial neural networks (ANNs) have shown promise in spectral analysis due to their ability to automatically extract hierarchical features. For example, deep neural networks can identify subtle spectral patterns linked to chemical composition that traditional PLS models might miss. This is particularly valuable in Raman spectroscopy, where signal-to-noise ratios can be low. Artificial neural networks (ANNs) are versatile non-linear computational models widely used across various fields due to their flexibility and adaptability. Their application in chemometrics, primarily for classification and regression tasks, began in the early 1990s. One paper highlights various ANN architectures and demonstrates their use in solving chemometric problems through practical examples. It also compares the strengths and limitations of ANNs relative to conventional chemometric methods (65). ANNs are versatile computational models that process data through fully connected layers, ideal for general-purpose tasks like regression and classification. CNNs are specialized neural networks that use convolutional layers to automatically extract spatial features, making them highly effective for image and structured data analysis (66).
AI extends beyond data modeling into adaptive calibration systems. These models can self-correct for changes in instrument conditions or sample variability, maintaining accuracy over time—a significant advancement for real-time process monitoring (67).
In practice, AI and ML methods are now being integrated into software for commercial spectrophotometers. For instance, advanced AI models have been developed to predict moisture content in agricultural products using NIR spectroscopy or to detect trace contaminants in pharmaceuticals via Raman analysis. These advancements allow for real-time, non-destructive testing with unprecedented accuracy (68–70).
While the integration of AI and ML offers exciting possibilities, challenges remain. The need for large, high-quality datasets to train AI models is a significant hurdle. Additionally, model interpretability—critical in regulated industries like pharmaceuticals—can be a limitation with complex neural networks.
Future research is likely to focus on hybrid models, combining the strengths of classical PLS with AI’s adaptability. Explainable AI techniques may also play a crucial role in making advanced models more transparent and reliable. Overall, the advancements of chemometrics seem to be following the fundamental guidelines codified by Kowalski and Booksh in their 1994 A pages Analytical Chemistry paper titled “Theory of Analytical Chemistry” (71). It’s always useful to reread this paper from time to time.
The advancements from simple linear regression to classical multivariate analysis to AI-driven methods represent a remarkable journey in chemometric history. Each step—whether in the development of PCA and PLS, or in the advent of deep learning—has expanded the possibilities for spectral calibration and chemical analysis. As AI continues to reshape the landscape, the principles established by early chemometricians will remain fundamental, ensuring that spectroscopy remains a powerful tool for modern science and industry. In these ongoing advancements, the challenge remains the same: turning complex numbers and measurement data into reliable, accurate, and actionable chemical knowledge. We refer you to Part 1 of this mini-series as reference (72).
(58) Workman, Jr, J. AI, Deep Learning, and Machine Learning in the Dynamic World of Spectroscopy, December 2, 2024. https://www.spectroscopyonline.com/view/ai-deep-learning-and-machine-learning-in-the-dynamic-world-of-spectroscopy (accessed 2025-08-20).
(59) Workman, Jr, J.; Mark, H. A Survey of Chemometric Methods Used in Spectroscopy, Spectroscopy 2020, 35 (8), 9–14. https://www.spectroscopyonline.com/view/a-survey-of-chemometric-methods-used-in-spectroscopy (accessed 2025-08-20).
(60) Workman, Jr, J.; Mark, H. Survey of Key Descriptive References for Chemometric Methods Used for Spectroscopy: Part I, Spectroscopy 2021, 36 (6), 15–19. https://www.spectroscopyonline.com/view/survey-of-key-descriptive-references-for-chemometric-methods-used-for-spectroscopy-part-i (accessed 2025-08-20).
(61) Workman, Jr, J.; Mark, H. Survey of Key Descriptive References for Chemometric Methods Used for Spectroscopy: Part II, Spectroscopy 2021, 36 (10), 16–19. DOI: 10.56530/spectroscopy.pj5166a9
(62) Wold, S.; Sjöström, M.; Eriksson, L. PLS-Regression: A Basic Tool of Chemometrics. Chemometr. Intell. Lab. Syst. 2001, 58 (2), 109–130. DOI: 10.1016/S0169-7439(01)00155-1
(63) Rosipal, R.; Krämer, N. Overview and Recent Advances in Partial Least Squares. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds) Subspace, Latent Structure and Feature Selection. SLSFS 2005. Lecture Notes in Computer Science, vol 3940. Springer, Berlin, Heidelberg, 2006. DOI: 10.1007/11752790_2
(64) Vaswani, A. et al. 2017. Attention is All You Need. Advances in Neural Information Processing Systems. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, arXiv:1706.03762v7, 2017. DOI: 10.48550/arXiv.1706.03762 PDF Version. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (accessed 2025-08-20).
(65) Marini, F.; Bucci, R.; Magrì, A. L.; Magrì, A. D. Artificial Neural Networks in Chemometrics: History, Examples and Perspectives. Microchem. J. 2008, 88 (2), 178–185. DOI: 10.1016/j.microc.2007.11.008
(66) Zupan, J.; Novič, M.; Ruisánchez, I. Kohonen and Counterpropagation Artificial Neural Networks in Analytical Chemistry. Chemometr. Intell. Lab. Syst. 1997, 38 (1), 1–23. DOI: 10.1016/S0169-7439(97)00030-0
(67) Li, H.; Xu, H.; Li, Y.; Li, X. Application of Artificial Intelligence (AI)-enhanced Biochemical Sensing in Molecular Diagnosis and Imaging Analysis: Advancing and Challenges. Trends Anal. Chem. 2024, 174, 117700. DOI: 10.1016/j.trac.2024.117700
(68) Alemayhu, A. S.; Ji, R.; Abdalla, A. N.; Bian, H. AI and Laser-Induced Spectroscopy for Food Industry. Food and Humanity 2024, 3, 100413. DOI: 10.1016/j.foohum.2024.100413
(69) Du, X. N.; Chen, Y. W.; Wang, Q.; Yang, H. Y.; Lu, Y.; Wu, X. F. Exploring AI-Enhanced NMR Dereplication Analysis for Complex Mixtures and its Potential Use in Adulterant Detection. Phytochem. Rev. 2024, 1–36. DOI: 10.1007/s11101-024-10006-4
(70) Low, J. S. Y.; Teh, H .F.; Thevarajah, T. M.; Chang, S.W.; Khor, S. M. An AI-Assisted Microfluidic Paper-Based Multiplexed Surface-Enhanced Raman Scattering (SERS) Biosensor with Electrophoretic Removal and Electrical Modulation for Accurate Acute Myocardial Infarction (AMI) Diagnosis and Prognosis. Biosens. Bioelectron. 2025, 270, 116949. DOI: 10.1016/j.bios.2024.116949
(71) Booksh, K. S.; Kowalski, B. R. Theory of Analytical Chemistry. Anal. Chem. 1994, 66 (15), 782A–791A. DOI: 10.1021/ac00087a718
(72) Workman, J., Jr.; Mark, H. From Classical Regression to AI and Beyond: The Chronicles of Calibration in Spectroscopy: Part I. Spectroscopy 2025, 40 (2), 13–18. DOI: 10.56530/spectroscopy.pu3090t7
Howard Mark serves on the Editorial Advisory Board of Spectroscopy, and runs a consulting service, Mark Electronics, in Suffern, New York. Direct correspondence to: SpectroscopyEdit@mmhgroup.com ●
Jerome Workman, Jr. serves on the Editorial Advisory Board of Spectroscopy and is the Executive Editor for LCGC and Spectroscopy. He is the co-host of the Analytically Speaking podcast and has published multiple reference text volumes, including the three-volume Academic Press Handbook of Organic Compounds, the five-volume The Concise Handbook of Analytical Spectroscopy, the 2nd edition of Practical Guide and Spectral Atlas for Interpretive Near-Infrared Spectroscopy, the 2nd edition of Chemometrics in Spectroscopy, and the 4th edition of The Handbook of Near-Infrared Analysis.●
Get essential updates on the latest spectroscopy technologies, regulatory standards, and best practices—subscribe today to Spectroscopy.