News

Article

Data Fusion in Action: Integrating Different Vibrational and Atomic Spectroscopy Data

Key Takeaways

  • Multimodal data fusion in spectroscopy enhances chemical analysis by integrating vibrational and atomic spectroscopies, improving specificity and robustness.
  • Early fusion combines raw data, intermediate fusion models shared latent spaces, and late fusion integrates decision-level outputs.
SHOW MORE

This tutorial explores the motivation, mathematical underpinnings, and practical approaches to fusing spectral data, with emphasis on early, intermediate, and late fusion strategies.

Abstract

Spectroscopy is a cornerstone of chemical and biological analysis, yet no single technique can fully capture the complex composition of real-world samples. Combining data from different vibrational and atomic spectroscopies provides enhanced chemical specificity, quantitative robustness, and interpretability. This tutorial introduces three major categories of multimodal data fusion: early fusion (concatenation of raw or preprocessed features), intermediate fusion (shared latent space models such as partial least squares or canonical correlation analysis), and late fusion (decision-level combination of independent models). Matrix notation is employed to formalize fusion strategies and highlight the mathematical relationships among datasets. Finally, challenges and future directions in physical interpretability, nonlinear fusion, and explainable artificial intelligence are discussed. Using a matrix-based framework, we highlight challenges in alignment, scaling, redundancy, and interpretation, and provide examples of how fusion strategies can enhance chemical analysis and diagnostics.

1. Introduction

Spectroscopy generates vast amounts of high-dimensional data, but each modality emphasizes different aspects of the chemical and physical aspects of a sample. Vibrational spectroscopy—infrared (IR), near-infrared (NIR), Raman—probes molecular vibrations, functional groups, and physical/optical sample properties, while atomic spectroscopies—ultraviolet-visible (UV–vis), fluorescence, X-ray, plasma emission—reveal elemental composition and oxidation states.

Inductively coupled plasma atomic/optical emission spectroscopy (ICP–AES or ICP–OES) and microwave plasma atomic emission spectroscopy (MP–AES) are techniques that measure the elemental composition of a sample by detecting the characteristic light emitted by excited atoms in a plasma.

Integrating these techniques using data fusion offers a path to more comprehensive analysis (3,4). For example, in pharmaceutical quality control, vibrational methods quantify excipients and crystallinity, while atomic methods track elemental impurities (1). In environmental monitoring, atomic spectroscopy quantifies heavy metals while vibrational spectroscopy captures organic contaminants.

The central challenge of data fusion is integrating heterogeneous signals into a coherent model without losing interpretability. Fusion must address differences in scale, redundancy, and noise while enabling robust prediction or classification (5,6). This is another example of an analytical problem not clearly resolved, and so it is identified as one of our “Unsolved Problems in Spectroscopy” challenges.

2. Fusion Strategies: Conceptual Overview

2.1 Early Fusion: Feature-Level Integration

Early fusion combines raw or preprocessed spectra from different modalities into a single feature matrix (7).

This combined feature space can be subjected to principal component analysis (PCA), partial least squares regression (PLSR), or other multivariate methods (1,5).

Early fusion means stacking different types of spectra together. Imagine measuring the same set of samples using both Raman and UV–Vis. Instead of analyzing separately, early fusion places all those measurements side by side into a bigger dataset containing more information than either the Raman or UV-vis spectra alone.

2.2 Intermediate Fusion: Latent Variable Models

Intermediate fusion seeks a shared latent space where relationships between modalities are explicitly modeled. Techniques include canonical correlation analysis (CCA) and multi-block partial least squares (MB-PLS)(1,2).

Intermediate fusion doesn’t just stack the data but instead looks for hidden factors, or latent variables, that explain both datasets together. For instance, the concentration of a contaminant might influence both Raman bands and atomic emission lines.

2.3 Late Fusion: Decision-Level Integration

Late fusion is accomplished by first building separate models for each type of spectroscopy (like Raman and atomic absorption or UV-vis and NIR) and then combining their results as a final step (3,6).

Late fusion means keeping each spectroscopy method separate, building its own prediction model, and then combining the answers into one final estimate.

3. Challenges in Fusion

3.1 Data Alignment

Spectral data may have different resolutions or sampling variables or grids. There may be differing numbers of signal data points. Alignment often requires interpolation, or warping functions (3):

3.2 Scaling and Normalization

Because signals may differ in dynamic range or amplitude scales, scaling correction is essential. Mean-centering and autoscaling are common corrections (1,5):

3.3 Redundancy and Multicollinearity

Spectral features often overlap in information. Regularization methods such as Ridge regression or Sparse PLS can mitigate redundancy (6).

4. Case Applications

Pharmaceuticals: IR for excipient content, ICP-MS for elemental impurities, fused to ensure drug safety (3).

Food Quality: NIR for macronutrients, UV–Vis for pigments, XRF for trace minerals (3,4).

Environmental Monitoring: Raman for organic pollutants, AAS for heavy metals (3).

5. Discussion and Future Research

The fusion of vibrational and atomic spectroscopy represents an important frontier in chemical data science. While early fusion is simple, it risks redundancy and scaling issues (6,7). Intermediate fusion offers powerful latent-variable modeling but can be complex to interpret (2). Late fusion maintains interpretability but may underutilize shared information (6).

Future research directions include:

Nonlinear Fusion: Kernel methods and deep learning to capture nonlinear cross-modal relationships (3).

Explainable AI: Developing interpretable neural networks that highlight spectral regions most responsible for predictions (3).

Transfer Learning: Applying models trained on one instrument or modality to another (1).

Hybrid Physical-Statistical Models: Incorporating spectroscopic theory into fusion models to improve interpretability (1).

The long-term vision is coherent multimodal spectroscopy, where measurements across different vibrational and atomic domains are seamlessly integrated into predictive digital twins for real-time chemical systems.

References

(1) Mark, H.; Workman, J. Chemometrics in Spectroscopy, Revised Second Edition, Elsevier/Academic Press, Burlington, MA, 2021. https://shop.elsevier.com/books/chemometrics-in-spectroscopy/mark/978-0-323-91164-1 (accessed 2025-09-05).

(2) Smilde, A. K.; Westerhuis, J. A.; de Jong, S. A Framework for Sequential Multiblock Component Methods. J. Chemom. 2003, 17 (6), 323–337. DOI: 10.1002/cem.811

(3) Lahat, D.; Adali, T.; Jutten, C. Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects. Proc. IEEE 2015, 103 (9), 1449–1477. DOI: 10.1109/JPROC.2015.2460697

(4) Næs, T.; Brockhoff, P. B.; Tomic, O. Statistics for Sensory and Consumer Science; John Wiley & Sons, 2010. https://www.wiley.com/en-us/Statistics+for+Sensory+and+Consumer+Science-p-9780470518212 (accessed 2025-09-05).

(5) Bro, R.; Smilde, A. K. Principal Component Analysis. Anal. Methods 2014, 6 (9), 2812–2831. DOI: 10.1039/C3AY41907J

(6) Balabin, R. M.; Smirnov, S. V. Variable Selection in Near-Infrared Spectroscopy: Benchmarking of Feature Selection Methods. Anal. Chim. Acta 2011, 692 (1–2), 63–72. DOI: 10.1016/j.aca.2011.03.006

(7) Wold, S.; Esbensen, K.; Geladi, P. Principal Component Analysis. Chemom. Intell. Lab. Syst. 1987, 2 (1–3), 37–52. DOI: 10.1016/0169-7439(87)80084-9

_ _ _

This article was partially constructed with the assistance of a generative AI model and has been carefully edited and reviewed for accuracy and clarity.

Newsletter

Get essential updates on the latest spectroscopy technologies, regulatory standards, and best practices—subscribe today to Spectroscopy.

Related Videos
Molecular model of lysozyme protein found in tears, saliva, human milk and mucus, 3D illustration | Image Credit: © Dr_Microbe - stock.adobe.com
Pouring cooking oil from jug into bowl on wooden table | Image Credit: © New Africa - stock.adobe.com.
Small pile of minerals extracted in a rare earth mine. Generated with AI. | Image Credit: © Road Red Runner - stock.adobe.com.