OR WAIT 15 SECS
In celebration of Spectroscopy’s 35th Anniversary, leading spectroscopists discuss important issues and challenges in analytical spectroscopy.
Advances in instrumentation and data analysis suggest that surface enhanced Raman scattering (SERS) is a powerful tool to characterize complex samples. Since the initial experiments and subsequent realization that Raman signals are significantly enhanced on the surface of plasmonic, structured, metal surfaces, surface enhanced Raman scattering (SERS) has evolved into an ultrasensitive method for chemical analysis (1–3). Indeed, understanding of how light is concentrated and how the local fields are enhanced and re-radiated by plasmonic structures has driven SERS research forward (4). Advances in SERS substrates, now commercially available from a number of sources, and improved sampling strategies, suggest that SERS is poised to be potent analytical technique. Indeed, it is now relatively straightforward to produce a SERS signal, and with this greater understanding of signal enhancements, increased attention is being given to applications using SERS.
However, since the early days of SERS, one of the more befuddling aspects of SERS analysis has been that the SERS spectrum observed from a molecule is not always the same as the spontaneous Raman spectrum. The origin of these spectral differences has been attributed to surface selection rules, charge-transfer effects, photo-reactions, and other phenomena (5–8). Indeed, these signal variations have raised suggestions that libraries of SERS spectra are needed to properly identify analytes in mixtures. This need for libraries is complicated by observations that the SERS signal from a sample can vary depending on common variables like the substrate or laser excitation wavelength.
Although the identity of an unknown molecule, or molecules, in a complex system (such as human biofluids) that give rise to a SERS signal can be elusive, new advances in computing are enabling SERS analysis of complex systems through machine learning and related algorithms (9). The detected bands in a SERS spectrum provide an information-rich signal that can be analyzed with computer algorithms, even if an analyst does not know the molecular origin of the signal. Algorithms include neural networks, random forest, and artificial intelligence algorithms, as well as multivariate regression and other chemometric routines. With a rise in companies producing commercial software and the availability of web-based shareware routines, the activation barrier to these analyses methods continues to decrease.
The underlying principle of machine learning is that given enough data, a computer can use small fluctuations and patterns to classify the outcome. The most common approach is to use a training set‚ samples of known classes to generate a neural network or other machine learning model that will properly capture the variance in the sample data and predict to what class an unknown sample belongs. As the cost of computing power decreases, more-sophisticated methods are accessible to analyze data. The clear advantage of machine learning is that the chemically rich SERS signal, where each vibrational mode detected acts as a variable, can classify the complex sample signals as arising from different specific classes, such as healthy or diseased cells (Figure 1). As the instrumentation and methods using SERS increase the ability to generate large amounts of SERS data quickly, these analyses provide an incredibly powerful way to utilize the chemical information in the SERS spectra for diagnostic assays and to monitor chemical changes. With increased training sets, there is potential to refine such classes further, such as to determine which treatment will likely be most effective for a disease.
The downside to machine learning algorithms is that the models rely on appropriate training data and, for the most part, can only classify a sample into classes that have been predetermined. Both supervised and unsupervised methods exist, but all models require validation against data sets with known classifications. A further downside to these methods is the loss of information. Although the variance in the SERS data arises from molecular differences, the identity of these molecules can be difficult to recover. Returning to the biofluid example, the machine learning algorithm may classify a sample correctly; however, information, such as a specific upregulated biomolecule to be targeted for therapy, is lost. Complementary methods are often necessary to further assess the disease.
In a recent study we were able to perform SERS detection following a high-performance liquid chromatography (HPLC) separation of a mouse tumor cell lysate and show that the molecular composition significantly changed between healthy and various types of cancerous cells (10). The molecules with consistent increases in concentration generated patterns that appear to be characteristic of the sampled cells. These patterns can be readily differentiated by machine learning algorithms. Again, although the identity of the molecules detected by HPLC are unknown, computer models can use the patterns in the observed spectra to track and diagnose disease. Additionally, analysis of the underlying data may indicate key biomolecules important in the disease, without knowing the identity of the actual molecules. In such cases where knowing the identity of a specific molecule is not necessary, the ability to rapidly generate SERS spectra and analyze them with machine learning makes it possible to capture the chemical information in SERS data to enable informative characterization.
Zachary D. Schultz is an associate professor at The Ohio State University. Direct correspondence to firstname.lastname@example.org