News

Article

Mastering Spectroscopy of Inhomogeneous Materials: Advanced Sampling Strategies to Solve the Heterogeneity Problem

Key Takeaways

  • Sample heterogeneity introduces spectral distortions, complicating spectroscopic analysis and model calibration, especially in quantitative applications.
  • Strategies like spectral preprocessing, localized sampling, and hyperspectral imaging help mitigate heterogeneity effects but lack universality.
SHOW MORE

This tutorial investigates the persistent issue of sample heterogeneity—chemical and physical—during spectroscopic analysis. Focus will be placed on understanding how spatial variation, surface texture, and particle interactions influence spectral features. Imaging spectroscopy, localized sampling strategies, and adaptive averaging algorithms will be reviewed as tools to manage this problem, as one of the remaining unsolved problems in spectroscopy.

Abstract

Sample heterogeneity represents a fundamental obstacle in quantitative and qualitative spectroscopic analysis. Chemical and physical inhomogeneities—such as varying particle sizes, packing densities, surface textures, and spatial concentration gradients—can introduce significant variation in measured spectra. This tutorial explores the core challenges and modeling strategies used to understand and mitigate these effects. We present matrix-based formulations of spectral mixture models and correction strategies using data fusion, spatial averaging, and spectral preprocessing. Applications in imaging spectroscopy and localized sampling design are discussed in the context of recent research findings. The tutorial concludes with a discussion on future research directions, including adaptive sampling and uncertainty quantification.

1. Introduction

Spectroscopy is one of the most versatile and widely applied analytical tools in modern science and industry, used for applications ranging from pharmaceuticals to agriculture, polymers, food, forensics, and materials identification, quantification, and characterization. A major strength of spectroscopic methods—particularly vibrational techniques like near-infrared (NIR), mid-infrared (MIR or IR), and Raman spectroscopy—is their ability to analyze samples nondestructively, rapidly, and with minimal preparation. However, a fundamental and still largely unresolved challenge in spectroscopic analysis is the issue of sample heterogeneity.

Sample heterogeneity refers to the spatial non-uniformity of a sample's composition or physical structure. It may arise from chemical inhomogeneity (for example, uneven distribution of analytes, impurities, or matrix components) or from physical variations (namely, differences in particle size, density, or surface texture). In real-world samples, particularly solids and powders, these forms of heterogeneity are more the rule than the exception.

This issue becomes especially critical in quantitative spectroscopic applications such as process analytical technology (PAT), quality control, or predictive modeling using chemometrics. Even small deviations in sample presentation or composition can lead to significant spectral variations that degrade calibration model performance by reducing prediction precision and accuracy, and by limiting model transferability between instruments or between sample batches (4).

Why is this an unsolved problem? Despite decades of research and numerous proposed corrections—such as spectral preprocessing, scatter correction, sampling strategies, and multivariate calibration modeling techniques—no universal or foolproof solution exists. Most techniques reduce the symptoms of heterogeneity rather than modeling or eliminating the cause. The complex, multidimensional nature of heterogeneity—spanning scales from microstructure to bulk properties—makes it difficult to characterize fully or account for in a generalized manner (1,2,4). Thus, the problem remains central to ongoing research in spectroscopy, chemometrics, optical sampling, and statistical sampling science.

In this tutorial, we examine the theoretical foundations, measurement challenges, and modeling approaches associated with sample heterogeneity by reviewing modern research publications. We then review correction strategies and identify emerging opportunities to mitigate these effects in practical spectroscopic workflows.

2. Understanding Sample Heterogeneity

Sample heterogeneity manifests in multiple dimensions—chemically, physically, and even optically. To fully understand and address this issue, it is useful to distinguish between two primary forms of heterogeneity: chemical heterogeneity and physical heterogeneity. Each type introduces different kinds (or variations) of spectral distortions, and both are pervasive across nearly all solid-state and particulate samples (2,3).

2.1 Chemical Heterogeneity

Chemical heterogeneity refers to the uneven distribution of molecular or elemental species throughout a measured sample. In many practical applications—such as the analysis of pharmaceutical tablets, powdered food or agriculture ingredients, geological samples, or composite polymers—the chemical components are not uniformly dispersed. This lack of homogeneity may arise from incomplete mixing, uneven crystallization, layering during manufacturing, or natural variation in raw or natural materials (3).

From a spectroscopic perspective, the signal detected from a chemically heterogeneous sample is typically a composite spectrum, resulting from the superposition of the individual spectra of its constituents. A widely used mathematical approach for describing this scenario is the Linear Mixing Model (LMM):

In this formulation, each measured spectrum is considered a linear combination of endmember spectra. However, this model assumes linearity and non-interaction, which may not hold true in real systems. For instance, chemical interactions, band overlaps, or matrix effects can produce nonlinearities or violate additivity, complicating both interpretation and calibration (1).

What makes chemical heterogeneity particularly problematic is that it often occurs on spatial scales smaller than the spectrometer’s measurement spot. This causes subpixel mixing in imaging applications or averaging effects in point measurements, leading to inaccurate estimates of concentration or identity—especially in high-stakes environments like pharmaceutical quality control or remote sensing (1,3).

2.2 Physical Heterogeneity

Physical heterogeneity encompasses differences in a sample’s morphology, surface properties, packing density, and internal structure that do not necessarily involve changes in chemical composition but nevertheless alter the measured spectrum.

Key sources of physical heterogeneity include:

  • Particle size and shape: Large particles can scatter light more than small particles, changing pathlength and intensity. These follow various functions, including Mie scattering and Kubelka–Munk (K–M) relationships.
  • Surface roughness: Irregular surfaces can lead to variations in diffuse or specular reflection, affecting resulting absorbance values.
  • Packing density: Voids or compressibility in the sample influence the optical density and scattering light path.
  • Sample orientation: Especially in anisotropic (direction-dependent) materials, the angle of illumination and detection may alter the resulting spectral intensity.

These physical attributes primarily introduce additive and multiplicative distortions in the spectra. One common way to model such distortions is through multiplicative scatter correction (MSC) (2):

Physical heterogeneity is arguably even harder to control than chemical heterogeneity because it involves the interaction of light with material structure—something that is highly dependent on optical geometry, sample preparation, and even environmental factors such as humidity (4).

Moreover, while preprocessing methods like MSC, standard normal variate (SNV), and derivatives help correct for these effects to some degree, they rely on statistical assumptions and often lack direct physical interpretability. This limits their effectiveness in strongly scattering or optically complex sample systems (2,3).

Together, chemical and physical heterogeneity represent multi-faceted, system-specific, and scale-dependent challenges. As such, they continue to confound efforts to develop universal calibration models and robust prediction algorithms in spectroscopy. There is no single method that fully eliminates the influence of heterogeneity across all sample types, making it a core unsolved issue in the field (1–4).

3. Modeling and Correction Strategies

While sample heterogeneity remains a persistent obstacle in spectroscopy, several strategies—both empirical and model-based—have been developed to reduce its impact. These methods can be grouped into three main categories: spectral preprocessing, localized and adaptive sampling, and spatial-spectral data integration via imaging spectroscopy. Each plays a role in minimizing, compensating for, or explicitly modeling the variability introduced by heterogeneity.

3.1 Spectral Preprocessing

Spectral preprocessing techniques are often the first line of defense against unwanted variations due to physical effects in spectra such as multiplicative scatter and baseline shifts. Preprocessing techniques aim to transform raw spectral data in a way that emphasizes analyte-related information while suppressing irrelevant noise or spectral distortion.

Common preprocessing strategies include:

  • SNV: Each spectrum is centered and scaled individually to remove multiplicative and additive effects. SNV is especially useful when dealing with diffuse reflectance spectra from powdery or granular samples (2).
  • MSC: Each spectrum is adjusted using a linear regression against a reference spectrum (typically the mean of the dataset) to remove baseline offsets and multiplicative scatter. This method is closely linked to the physics of light scattering and has been effective in a range of heterogeneous samples (2,4).
  • Derivative Spectroscopy (Savitzky–Golay derivatives): By computing first or second derivatives of the spectra, broad baseline trends and constant offsets can be reduced. However, derivatives also amplify high-frequency noise, so they require smoothing filters (often empirically derived for each sample type) for robust use (3).

These preprocessing techniques are widely used in calibration workflows involving partial least squares (PLS) regression and other multivariate calibration models. Nevertheless, they are typically empirical—they correct data based on statistical patterns rather than explicit physical (first principles) modeling, and they may not fully account for complex, nonlinear scattering behaviors or inhomogeneities that vary at multiple spatial scales.

3.2 Localized Sampling and Adaptive Averaging

A key limitation of point-based spectroscopy is its sensitivity to sampling location, especially in inhomogeneous materials. To overcome this, strategies have been developed that rely on spatially distributed measurements across the sample surface or volume.

In localized sampling, spectra are collected from multiple points on the sample. The assumption is that by averaging across these spatial positions, the measurement will better represent the global composition of the sample. The average spectrum (is given by:

This method reduces the impact of local variations, especially when heterogeneity exists at scales smaller than the measurement beam size. Studies have shown that increasing the number of sampling points significantly reduces calibration errors and increases reproducibility for near-infrared and Raman measurements of solid dosage forms and polymer films (3).

Adaptive sampling takes this concept further by dynamically guiding where to measure next based on real-time spectral variance or predefined heuristics. For instance, variance-based selection may focus on regions of high spectral contrast, while machine-learning-guided adaptive sampling uses active learning models to minimize uncertainty with the fewest measurements.

In essence, both strategies aim to combat spatially structured heterogeneity, allowing more representative sampling without physically homogenizing the sample. This is especially valuable for layered materials, nonuniform blends, and process-line applications where sample presentation cannot be easily controlled.

3.3 Imaging and Hyperspectral Strategies

One of the most powerful tools for analyzing heterogeneous samples is hyperspectral imaging (HSI), which combines the spatial resolving power of imaging with the chemical sensitivity of spectroscopy. An HSI system produces a three-dimensional data cube:

This reshaped dataset can then be analyzed using chemometric techniques such as:

  • Principal Component Analysis (PCA): to reduce dimensionality and visualize major sources of variation,
  • Independent Component Analysis (ICA): to separate mixed signals into statistically independent sources,
  • Spectral Unmixing or Endmember Extraction: to identify pure component spectra and their fractional abundances at each pixel (1).

Hyperspectral imaging has shown success in modeling endmember variability—the spectral variability of the same chemical component under different physical or environmental conditions (1). Additionally, hyperspectral cameras combined with multivariate algorithms have been deployed in real-time quality control, identifying physical heterogeneities that would otherwise go undetected by single-point spectrometers.

However, imaging comes with trade-offs: increased data volume, slower acquisition speed, and greater computational demand. Despite these challenges, HSI continues to grow as a central technique for dealing with inhomogeneous systems across scientific and industrial domains.

4. Discussion and Future Research

Despite significant progress, sample heterogeneity remains a central and unresolved problem in analytical spectroscopy. The heterogeneity of materials—whether chemical or physical—introduces spectral complexity that interferes with model building, reduces predictive accuracy, and complicates transferability across instruments and sample batches (1–4). This challenge is not merely technical but foundational, as it stems from the inherent disconnect between the scale of spectroscopic measurements and the spatial complexity of real-world materials.

Future research directions should target not only improved data corrections but also fundamental changes in how we acquire, process, and interpret spectroscopic data from heterogeneous samples. Key emerging directions include:

  1. Real-Time Feedback-Controlled Sampling
    Systems that can analyze the initial spectra in real time and adjust sampling parameters (for example, location, averaging, measurement geometry) could help reduce variance on the fly. These approaches will rely heavily on embedded chemometric algorithms and rapid preprocessing steps (3).
  2. Multi-Modal Data Fusion
    Combining spectroscopy with complementary measurement modalities—such as X-ray imaging, terahertz imaging, or optical coherence tomography—could improve heterogeneity characterization. Spectroscopic imaging combined with topographic mapping, for instance, would enable simultaneous modeling of surface and chemical effects.
  3. Physics-Informed Machine Learning and Deep Learning
    Most current deep learning applications in spectroscopy focus on classification or regression but do not explicitly model heterogeneity. Integrating physical priors (for example, scatter theory, sample geometry) into neural network architectures could lead to more robust models that generalize across variable sample presentations.
  4. Uncertainty Quantification (UQ) and Explainability
    Given that heterogeneity introduces variability in predictions, quantifying the uncertainty associated with each prediction becomes essential for decision-making. Bayesian regression models, ensemble methods, and conformal prediction techniques offer heterogeneity potentially useful paths forward (1,2).
  5. Standardized Benchmarking for Heterogeneity Sensitivity
    As a field, spectroscopy lacks a common framework to benchmark and compare methods for handling heterogeneity. Developing standard test materials and protocols that simulate common types of inhomogeneity would greatly accelerate research, method development, and validation improvements.

Mitigating the effects of sample heterogeneity will require a synergistic approach that blends physical modeling, advanced instrumentation, chemometric algorithms, and machine learning. Progress will be iterative and application-specific, but the need for robust, interpretable, and transferable solutions is universal.

5. Bridging Sampling Heterogeneity and Light Transport Models

The persistent challenge of sampling inhomogeneous materials in spectroscopy—outlined in this tutorial—is fundamentally rooted in the optical physics of light transport through turbid, multi-phase, and spatially complex samples. This insight directly connects with the foundational concepts presented in Tutorial #1*, which focused on modeling diffuse reflectance and light-matter interactions in scattering media.

In Tutorial #1 (Unsolved Problems in Spectroscopy #1*), we explored how radiative transfer theory (RTT), Kubelka–Munk (K–M), and Monte Carlo simulations model the propagation of light in optically thick, scattering-dominated media. These models show that the effective sampling depth, photon pathlength, and signal origin are all probabilistic functions of sample structure and wavelength. The key outcome is that optical sampling volumes are neither homogeneous nor constant, but vary based on:

This physical variability in photon interaction volume is a primary source of measurement heterogeneity—distinct from compositional differences alone.

In this Tutorial #4, we address this heterogeneity from a data-centric and algorithmic perspective: how to preprocess, average, or model data to compensate for unpredictable sample properties. Together, the two tutorials underscore that heterogeneity is both a physical and computational problem, requiring dual solutions from both domains.

5.1 Unified View: Physical-Chemometric Models

To meaningfully resolve the sampling issue, one could integrate physical models of light propagation with chemometric modeling—a paradigm often termed hybrid modeling or physics-informed chemometrics (7). Rather than preprocessing away scatter effects, these models explicitly include them in calibration equations. For example:

Such models may incorporate empirical corrections like extended multiplicative scatter correction (EMSC), or use Monte Carlo-derived photon distributions as part of data interpretation. This approach improves model transferability, especially when spectra are acquired across different sampling geometries, containers, or optical interfaces (6,7).

5.2 Implications for Instrument Design and AI Integration

Linking Tutorials #1 and #4 also reveals new directions for instrument design and machine learning applications in spectroscopy. Instruments that can control or measure scattering coefficients in real time—via dual-wavelength probes, SORS, or angular-resolved detection—could dynamically adjust acquisition parameters or trigger adaptive calibration paths.

Moreover, deep learning models trained on synthetic data generated from light transport simulations (that is, physics-based data augmentation) may become robust to variations caused by sample heterogeneity, much like adversarial training in image recognition.

5.3 Future Integration Paths

To fully address the challenge of heterogeneous sampling, the spectroscopy community might embrace a hybrid physical-informational model—one that fuses:

  • Optical theory (RTT, K–M, Mie scattering),
  • Empirical measurements (HSI, point sampling, averaging),
  • Algorithmic modeling (PLS, PCA, deep learning).

Such integration is not only theoretically sound but increasingly feasible due to advances in computational power, open-source modeling tools, and multimodal instruments. As tutorials #1 and #4 show, it is potentially feasible to combine physical understanding with data modeling in order that we can begin to systematically resolve the unsolved problem of sampling in spectroscopic analysis of heterogeneous materials.

References

*Note that Tutorial #1 (Unsolved Problems in Spectroscopy #1) is found at: Workman, J. Jr. Toward a Generalizable Model of Diffuse Reflectance in Particulate Systems. Spectroscopy Article, June 30, 2025. DOI: 10.56530/spectroscopy.sj3986i3

_ _ _

(1) Somers, B.; Asner, G. P.; Tits, L.; Coppin, P. Endmember Variability in Spectral Mixture Analysis: A Review. Remote Sens. Environ. 2011, 115 (7), 1603–1616. DOI: 10.1016/j.rse.2011.03.003

(2) Jin, J. W.; Chen, Z. P.; Li, L. M.; Steponavicius, R.; Thennadil, S. N.; Yang, J.; Yu, R. Q. Quantitative Spectroscopic Analysis of Heterogeneous Mixtures: The Correction of Multiplicative Effects Caused by Variations in Physical Properties of Samples. Anal. Chem. 2012, 84 (1), 320–326. DOI: 10.1021/ac202598f

(3) Ortega-Zuñiga, C.; Reyes-Maldonado, K.; Méndez, R.; Romañach, R. J. Study of Near Infrared Chemometric Models with Low Heterogeneity Films: The Role of Optical Sampling and Spectral Preprocessing on Partial Least Squares Errors. J. Near Infrared Spectrosc. 2017, 25 (2), 103–115. DOI: 10.1177/0967033516686653

(4) Mark, H.; Workman, J. Effect of Repack on Calibrations Produced for Near-Infrared Reflectance Analysis. Anal. Chem. 1986, 58 (7), 1454–1459. DOI: 10.1021/ac00298a041

(5) Mourant, J. R.; Canpolat, M.; Brocker, C.; Esponda-Ramos, O.; Johnson, T. M.; Matanock, A.; Stetter, K.; Freyer, J. P. Light Scattering from Cells: The Contribution of the Nucleus and the Effects of Proliferative Status. J. Biomed. Opt. 2000, 5 (2), 131–137. DOI: 10.1117/1.429979

(6) Jacques, S. L. Optical Properties of Biological Tissues: A Review. Phys. Med. Biol. 2013, 58 (11), R37. DOI: 10.1088/0031-9155/58/11/R37

(7) Esmonde-White, K. A.; Cuellar, M.; Uerpmann, C.; Lenain, B.; Lewis, I. R. Raman Spectroscopy as a Process Analytical Technology for Pharmaceutical Manufacturing and Bioprocessing. Anal. Bioanal. Chem. 2017, 409, 637–649. DOI: 10.1007/s00216-016-9824-1

_ _ _

This article was partially constructed with the assistance of a generative AI model and has been carefully edited and reviewed for accuracy and clarity.

Newsletter

Get essential updates on the latest spectroscopy technologies, regulatory standards, and best practices—subscribe today to Spectroscopy.

Related Videos
The Big Island's Kohala Coast with the dormant volcano of Hualalai in the distance | Image Credit: © Kyo46 - stock.adobe.com
The Big Island's Kohala Coast with the dormant volcano of Hualalai in the distance | Image Credit: © Kyo46 - stock.adobe.com