A Researcher from Lomonosov Moscow State University has developed a convolutional neural network (CNN) model for Fourier transform infrared (FT-IR) spectra recognition. This AI-based system is capable of classifying 17 functional groups and 72 coupling oscillations with remarkable accuracy, providing a significant boost to material analysis in fields like organic chemistry, materials science, and biology.
In an innovative study, a scientist from Lomonosov Moscow State University has employed deep learning to streamline the analysis of Fourier transform infrared (FT-IR) spectra. This technique, which is crucial for identifying chemical compounds and assessing their structures, is traditionally labor-intensive and requires a high level of experience and expertise. By leveraging convolutional neural networks (CNN), Daniil S. Koshelev, has developed a model that simplifies this process, allowing for faster and more accurate analysis. This research has been published in the journal Applied Spectroscopy (1).
Fourier transform infrared spectroscopy (FT-IR) is a common method for analyzing substances and compounds in a wide range of scientific disciplines. It involves examining the absorption of infrared light by chemical bonds, providing valuable information about the functional groups and coupling oscillations within a molecule. However, interpreting FT-IR spectra can be challenging, requiring significant time and expertise. To address this, the team from Lomonosov Moscow State University developed a CNN-based model that automatically classifies 17 classes of functional groups and 72 classes of coupling oscillations with high precision (1).
For this research 14,361 FT-IR spectra of organic molecules were obtained by web scanning, creating a comprehensive dataset to train the CNN model. Various CNN architectures were tested with different sizes of feature maps to optimize the model's accuracy. The resulting model achieved a weighted F1 score of 93% for functional groups and 88% for coupling oscillations, demonstrating a relatively high level of accuracy and reliability (1).
Read More: FT-IR with Continuous Wavelet Feature Extraction Combined with an Artificial Neural Network
To ensure the model's accuracy, the research used visualization methods like Shapley additive explanations (SHAP) and gradient-weighted class activation mapping (GradCAM). These tools helped visualize and highlight the absorption bands associated with specific functional groups or bonds, providing a deeper understanding of the model's decision-making process. The high AUC ROC (Area Under the Curve for Receiver Operating Characteristic) metrics, reaching 0.98 and above for most classes, further validated the model's effectiveness (1).
AUC ROC is a metric used to evaluate the performance of a binary classification model. The ROC curve plots the True Positive Rate (sensitivity) against the False Positive Rate (1 minus specificity) at various thresholds, showing the trade-off between these rates. The AUC is the area under the ROC curve, providing a single value that summarizes the model's ability to discriminate between positive and negative instances. An AUC of 0.5 suggests the model is no better than random guessing, while an AUC of 1.0 indicates a perfect classifier. The higher AUC signifies better model performance.
The team's work represents a significant improvement over classical machine learning methods such as K-nearest neighbor, random forests classifier, support vector machine, or multilayer perceptron, which typically achieved an overall class accuracy of only 23% (1). The newly developed CNN model not only outperforms these traditional methods but also brings a new level of automation and efficiency to FT-IR analysis.
This research has the potential to revolutionize the use of FT-IR in organic chemistry, materials science, and biology. By automating the analysis of FT-IR spectra, the model could save valuable time for scientists and enhance the reliability of results. The authors suggest that the model can be used to facilitate the preparation of experimental data for publication, thereby streamlining the research process (1).
The study opens the door to creating software tools based on this AI-driven model, allowing for more efficient and accurate FT-IR analysis (1–3). These advancements could lead to new applications in environmental science, quality control, and other fields where chemical analysis is critical (2,3).
References
(1) Koshelev D. S. Expert System for Fourier Transform Infrared Spectra Recognition Based on a Convolutional Neural Network With Multiclass Classification. Appl. Spectrosc. 2024, 78 (4), 387–397. https://doi.org/10.1177/00037028241226732
(2)Workman, Jr., J; Mark, H. Artificial Intelligence in Analytical Spectroscopy, Part I: Basic Concepts and Discussion. Spectroscopy 2023, 38 (2), 13–22. DOI: 10.56530/spectroscopy.og4284z8
(3)Workman, Jr., J; Mark, H. Artificial Intelligence in Analytical Spectroscopy, Part II: Examples in Spectroscopy. Spectroscopy 2023, 38 (6), 10–15. DOI: 10.56530/spectroscopy.js8781e3
AI-Powered Spectroscopy Faces Hurdles in Rapid Food Analysis
September 4th 2024A recent study reveals on the challenges and limitations of AI-driven spectroscopy methods for rapid food analysis. Despite the promise of these technologies, issues like small sample sizes, misuse of advanced modeling techniques, and validation problems hinder their effectiveness. The authors suggest guidelines for improving accuracy and reliability in both research and industrial settings.
Examining the Role of ATR-FT-IR Spectroscopy and Machine Learning in Wood Forensics, Part 1
September 4th 2024Wood forensics is an important field that helps authenticate wood and addresses the challenges that illegal logging brings. In this multipart article, we explore the wood forensics industry, and how spectroscopic techniques are contributing to its advancement.
Non-Linear Memory-Based Learning Advances Soil Property Prediction Using vis-NIR Spectral Data
September 3rd 2024Researchers from Zhejiang University have developed a new non-linear memory-based learning (N-MBL) model that enhances the prediction accuracy of soil properties using visible near-infrared (vis-NIR) spectroscopy. By comparing N-MBL with traditional machine learning and local modeling methods, the study reveals its superior performance, particularly in predicting soil organic matter and total nitrogen.