Describing Their Two-Step Neural Model: An Interview with Ayanjeet Ghosh and Rohit Bhargava

News
Article

In the second part of this three-part interview, Ayanjeet Ghosh of the University of Alabama and Rohit Bhargava of the University of Illinois Urbana-Champaign discuss how machine learning (ML) is used in data analysis and go into more detail about the model they developed in their study.

A recent study in Applied Spectroscopy introduced a two-step regressive neural network model that improves discrete frequency infrared (IR) imaging for biomedical use, especially in studying protein structures in tissues affected by neurodegenerative diseases (1). Unlike traditional methods like principal component analysis (PCA), which are less interpretable and require dense spectral data, this model uses only seven wavenumbers to accurately reconstruct high-resolution spectra and predict structural features (1). As a result, it significantly accelerates both data acquisition and analysis, offering a more efficient and scalable solution for IR imaging in biomedical research (1).

Two of the authors of this study, Ayanjeet Ghosh, who is a professor in the Department of Chemistry and Biochemistry at the University of Alabama, and Rohit Bhargava, who is a professor in the Department of Bioengineering at the University of Illinois, Urbana-Champaign, recently sat down with Spectroscopy to discuss their findings (2,3).

In the second part of this three-part interview, Ghosh and Bhargava discuss how machine learning (ML) is used in data analysis and go into more detail about the model they developed in their study.

DNA samples in test tubes with glowing neon lights, representing genetic research and scientific exploration. Generated with AI. | Image Credit: © Asawin - stock.adobe.com

DNA samples in test tubes with glowing neon lights, representing genetic research and scientific exploration. Generated with AI. | Image Credit: © Asawin - stock.adobe.com

How is machine learning (ML) used for data analysis, and why is principal component analysis (PCA) insufficient for analyzing sparsely sampled discrete frequency IR data, especially in biomedical applications?

ML approaches have been widely in for chemical imaging, specifically for both Fourier transform infrared (FT-IR) and discrete frequency IR (DFIR) microscopies, wherein spectral parameters, such as intensities and frequencies, have been leveraged to distinguish between different disease states, such as early-stage vs metastatic cancer or identify chemical signatures underlying pathological markers, such as composition of protein aggregates in neurodegenerative diseases. PCA is a dimensionality reduction technique typically used in conjunction with FT-IR imaging. DFIR does not require tools like PCA because it already provides only the specific spectral data relevant to chemical characterization of a specific specimen—the sparse spectral sampling in DFIR makes dimensionality reduction unnecessary.

Could you walk us through the architecture and design of the two-step regressive neural network model you developed? How does it address the challenges of curve fitting at scale?

Our two-step neural network is designed to perform two key steps necessary for quantification of protein secondary structures from discrete frequency data. It reconstructs the full spectra from seven wavenumbers, and it then predicts areas under curve (AUCs) of underlying spectral components for structural quantification, which is typically done using band fitting.

  • Step 1 (ANN1): Upscales 7-point sparse spectra → 41-point full spectra of the amide I region and includes three hidden layers (16, 32, 64 neurons) and ReLU activation.
  • Step 2 (ANN2): Predicts 3 Gaussian AUCs from the up-sampled spectra and includes two hidden layers (16, 9 neurons) and SELU activation.

Our approach is ~3000x (based exclusively on our computational resources available to us at the time) faster than Gaussian fitting, which is particularly relevant for large images with > 1 million pixels.

Your model requires only seven wavenumbers to generate high-resolution spectral predictions—how did you determine the optimal spectral frequencies to sample, and how generalizable is this selection across tissue types or applications?

Our goal was to use the minimum number of spectral bands to reconstruct full-resolution spectra. We trained our models on largely simulated spectral data composed of three components, representative of the most common secondary structural elements in proteins. The number of bands was chosen empirically based on performance tests comparing mean absolute error (MAE) of the model vs. the band count. The specific wavenumbers chosen were not necessarily tied to specific structures but were selected to best reconstruct the spectrum. We found that our model performance was slightly better with hand-picked bands compared to uniformly spaced bands across the amide-I range.

The data chosen to train the models was designed to capture the possible variations of the amide I IR spectra as typically observed in biological specimens. Hence, this model should be generalizable across different tissue types. We have recently verified this by comparing the model output with band fitting for breast cancer tissue biopsies. However, retraining of the models may be necessary for specific applications where the spectra are known to be composed of additional structural components.

References

  1. Edmonds, H.; Mukherjee, S. S.; Holcombe, B.; et al. Quantification of Protein Secondary Structures from Discrete Frequency Infrared Images Using Machine Learning. Appl. Spectrosc. 2025, ASAP. DOI: 10.1177/00037028251325553
  2. The University of Alabama, Ayanjeet Ghosh. UA.edu. Available at: https://chemistry.ua.edu/people/dr-ayanjeet-ghosh/ (accessed 2025-05-01).
  3. University of Illinois, Urbana-Champaign, Rohit Bhargava. Illinois.edu. Available at: https://bioengineering.illinois.edu/people/rxb (accessed 2025-05-01).
Recent Videos
Technology battery high power electric energy, Battery to electric cars and mobile devices with clean electric, Green renewable energy battery storage future, Technology digital abstract background | Image Credit: © KanawatTH - stock.adobe.com.
Technology battery high power electric energy, Battery to electric cars and mobile devices with clean electric, Green renewable energy battery storage future, Technology digital abstract background | Image Credit: © KanawatTH - stock.adobe.com.
Technology battery high power electric energy, Battery to electric cars and mobile devices with clean electric, Green renewable energy battery storage future, Technology digital abstract background | Image Credit: © KanawatTH - stock.adobe.com.
Technology battery high power electric energy, Battery to electric cars and mobile devices with clean electric, Green renewable energy battery storage future, Technology digital abstract background | Image Credit: © KanawatTH - stock.adobe.com.
Lake Tahoe West shore view including Fannette Island in the winter of 2018 | Image Credit: © AlessandraRC - stock.adobe.com.
North Lake Tahoe Sunset | Image Credit: © adonis_abril - stock.adobe.com.
North Lake Tahoe Sunset | Image Credit: © adonis_abril - stock.adobe.com
Beautiful Day in Lake Tahoe, California | Image Credit: Jeremy Janus - stock.adobe.com
Sand Harbor Lake Tahoe Nevada | Image Credit: © Stephen - stock.adobe.com.
The country Sierra Leone is in West Africa, and benefits from a tropical Atlantic coastline. Its bordered by Guinea and Liberia. The Capital is Freetown, and you can head to the beach in Sierra Leone. | Image Credit: © KALADA - stock.adobe.com.
Related Content