A Simple Introduction to Raman Spectral Identification of Organic Materials

https://doi.org/10.56530/spectroscopy.ol8176c6

Innovative database search technology can help Raman spectroscopists identify molecular vibrations; here, we show how to use these tools more effectively.

By now, it is well known that Raman spectroscopy has the potential to be used for a variety of analytical applications, and that the hurdles to using the technique 30–40 years ago have been largely overcome. However, a remaining impediment to exploiting the technology is that very few analysts have the ability to know what they are looking at. Even if there is experience in infrared (IR) spectroscopy, which looks at similar molecular vibrations, making the Raman measurements and interpreting the results will be quite different. There are databases and searching programs that can aid the analyst, but using these resources effectively also requires a learning curve. What I am going to try to do in this article is provide some advice for dealing with those situations where the searching program does not provide a definitive result.

When using a database searching program (1–4), you need to keep in mind that the quality of the result will depend on whether spectra of your compounds are in the database. It will also depend on the quality of the spectra; in comparison to IR spectra, Raman spectra tend to be noisier. However, when a problem can only be solved with Raman spectroscopy, we need to know how to achieve the required information. Because the time available to an analyst is limited by multiple pressures in the industrial environment, I thought it would be useful to summarize some of the things I have been learning in the last few weeks. I have been analyzing spectra leached from polymers, as a follow-up to the work that I did two years ago on extractables and leachables (5). A previous column (6) discussed an earlier foray into data base searching. In this column, I am going to be more specific in differentiating different classes of organic molecules.

Optimal Cases Where the Library Searching Provides an Answer

The first example that I will show is the results from a deposit leached into water at 100 °C from a polymer container. Figure 1 shows our spectrum (in black) overlaid with the match from the database (in red). Inspection of the figure indicates a near perfect match, except for maybe the relative intensities. Note that the hit quality index (HQI) is 81—good, but not fabulous. Remember that the HQI is based on calculating the correlation coefficient (that is, the dot product of the query spectrum with the library spectrum). In this case, because the relative intensities do not match, the value for the HQI is somewhat degraded. But this supports the user applying human judgment on the results. After deciding that this result is a likely fit, one question needs to be asked—does it make sense chemically? In fact, one would not expect to see a metal oxide in a polymer sample unless there was residual metal oxide from the polymerization process that was carried with the polymer, or perhaps some other source of contamination.

The next example is a deposit collected from a solution from a polypropylene (PP) container. Figure 2a shows our spectrum of the deposit as well as the spectrum of the PP container. You can see that the bands do overlap, but there is a significant difference in the bandwidths and relative intensities of some of the sharp bands. Figure 2b shows the results of the database search; we have included the first and the 13th hits, which are respectively ethylene propylene ethylidene norbornene, and amorphous PP (aPP). The respective HQI values are 83 and 80, which is not much different. But because I knew that the composition of the container was PP, I felt that aPP is a more likely choice and some places in the spectrum confirm that. However, keep in mind that the database may not always be 100% accurate. The best practice is to combine what you know about your sample and its spectrum with what the database tells you.

The next spectrum that I show here has a background of carbonaceous material. Many spectra of organic unknowns that come from contamination or some kind of degraded species often have the carbon signature superimposed on the spectrum of the unknowns. It is important for the analyst to know this signature or many errors can occur; however, it is fraught with difficulty because the carbon Raman spectrum is quite variable. The carbons are related to graphite, which has a strong, reasonably sharp band called the G band at 1583 cm^-1. When it is disordered, a second band appears somewhere in the mid-1300 cm^-1 region; I am purposely imprecise about its position because the center of the band depends on the excitation wavelength. This band is called the D band, which over the years has meant disorder or diamond-like. In fact, the D band is not allowed in the strictest sense of solid-state physics, but it appears because of the double resonance with electronic transitions when the crystals are small or there is a defect, a phenomenon that is unique in graphite. Reich and Thomsen explained this phenomenon in great detail, and their explanation of it is in the literature (7). The correspondence between the non-zero phonon wavevector and the two electron transitions (incident and scattered photon) leads to the conclusion and observation that the phonon frequency will depend on the excitation wavelength, which is observed.

While doing analytical spectroscopy of chemical materials, it is not necessary to understand all of this information, but you must be aware of it because the presence of a carbon background will definitely have an impact on the ability of a database to match your spectra. You might think that it should come up with at least a two-part mixture of the carbon and the target analyte. But once you realize that the carbon spectrum first depends on the excitation wavelength, and then that the relative intensity and widths of the D and G bands depend on the physical state of the sample, you can realize that a database may have a difficult time finding the precise carbon spectrum to fit your spectrum.

Figure 3 shows spectra of carbon fibers (provided years ago by Herman Noether, who was retired from Celanese) that provide an idea of how much variability there is. The fiber made from pitch is the closest to graphite that has no band in the 1330–1370 cm^-1 region. The two fibers made from polyacrylonitrile (PAN) were made by different processes that accounts for the different spectra reflecting different microstructures. In addition, the top two spectra show an intense overtone at 2665 cm^-1, almost exactly twice the frequency of the D band. The spectral behavior of carbons is quite complex and interesting, which is why I am suggesting that you get some familiarity with these spectra (8).

In principle, library searching programs should be able to determine that a spectrum represents a mixture of an organic with a carbon. Figure 4 shows an example where library searching did work. Figure 4a shows the spectrum that I recorded with a micrograph indicating the material from which this spectrum was generated. Figure 4b shows the results of database search. Note that the gray regions in Figure 4b were deselected from the search so that those regions where there is no information did not contribute noise to the “result.”

Classifying Spectra

What happens when the search does not easily produce something reasonable? In the course of searching many spectra over the past weeks, I have noticed that surprisingly, the CH region can be used for classification. Figure 5a shows the CH region for a number of organic materials. Olive oil is a triglyceride of mostly oleic acid, which has one double bond; the CH band at 3008 cm^-1 is an analytical band for olefinic CH. Stearic acid is representative of the class of saturated free fatty acids (FFAs), which are solid at room temperature. Any CH band below 3000 cm^-1 is normally attributed to a saturated hydrocarbon; whereas the CH bands in this region overlap, the overall structure of the spectrum is quite different and is indicative of liquid or amorphous hydrocarbons compared to solid hydrocarbons. The next two spectra represent two proteins—skin and tendon. Note how different the envelope is of the CH bands below 3000 cm^-1 from that of the lipid. This spectrum, peaked at ~2930 cm^-1 with shoulders on either side, is typical for proteins. It is also possible to identify the aromatic CH near 3060 cm^-1 and the >NH from the protein amide group at about 3350 cm^-1. The bottom three spectra were recorded from three fairly pure forms of cellulose—cotton, Avicel, and wood fiber. The Avicel and cotton spectra are almost identical with maybe a slight shift in the broad poorly defined OH band of the cellulose. The wood spectrum was acquired under well-defined polarization conditions and shows a clear splitting between ~2900 and 2960 cm^-1 as well as a significant OH band.

Figure 5b shows the region between 1200 and 1800 cm^-1 of the same materials. Although usually it is not straightforward to assign most bands in this region of the spectrum, we see that there are useful analytical characteristics. The double bond of the olive oil produces a band that appears near 1666 cm^-1, and it also shows a well-defined band from the carbonyl group at 1747 cm^-1. The stearic acid has no double bond and therefore no band in the mid-1600 cm^-1 region, and there is also no band in the mid-1700 cm^-1 region from the carbonyl; sometimes, when the spectrum of a FFA is expanded vertically, a weak broad feature of the carboxylate at ~1640 cm^-1 can be seen. The band near 1300 cm^-1 of the stearic acid represents the backbone twist and it is sharp, as are the CH2 and CH3 deformations because the material is in the solid form. In contrast, the deformations in the liquid olive oil are broadened and diffuse because of heterogeneous conformations. And, in addition to the backbone twist, there is a second band near 1250 cm^-1 that is assigned to the =C-H deformation. The spectra of skin and tendon both show rather diffused bands in the mid-1600 cm^-1 region; this band is the Amide I, and it is a mixture of the carbonyl stretch and NH bend in the protein. Note that the shape of this band is quite different in the two spectra, and that is because it is composed of contributions from the α helix, β sheet, and random coil of varying relative contributions. In this region, the cellulose does not have any distinct features. One other feature that is not illustrated here is the aromatic stretch at 1000 cm^-1 that is sharp and stands out even when the aromatic stretch is low in intensity. When present, one needs to confirm the presence of the aromatic group by identifying a fingerprint band near 1600 cm^-1 and the aromatic CH near 3060 cm^-1.

In my examination of many spectra, I have found that these characteristics can be useful in classifying the compounds. I actually should not have been surprised because these differences are used extensively in coherent anti-Stokes Raman spectroscopy (CARS). In addition, I have seen the relative intensities in the CH region of linear chains to vary with chain length. The bands near 2840 and 2880 cm^-1 are assigned respectively to the symmetric and asymmetric stretches of the >CH₂ methylene group whereas the symmetric and asymmetric –CH₃ methyl stretches occur close to 2885 and 2930 cm^-1. Because there is only one methyl group on a fatty acid, or two on an unsubstituted alkane chain, the intensity of the upper methyl (asymmetric) band relative to the rest will diminish as the chain lengthens.

Figure 6 shows the spectra, in the CH region, of olive oil (top) and a commercial product of medium-chain triglycerides. The two spectra were scaled so that the intensity at approximately 2900 cm^-1 was equivalent. Oleic acid, the major component in olive oil, is an 18-carbon chain. Medium-chain triglycerides (MCT) contain chains of 6, 8, 10, and 12 carbon atoms. The 6- and 8-carbon chains are liquid, whereas the 10- and 12-carbon chains are solid. The spectrum of the MCT shows the characteristics of a liquid, which I would normally assume would result from short chain composition. But there is also an indication of the presence of an olefinic CH band at 3008 cm^-1 in the MCT, which will influence the state of the material. Examining the fingerprint part of the spectrum (not shown here), we also see the presence of a shoulder at 1265 cm^-1 that we mentioned earlier as showing up because of the =C-H bend, and confirms the presence of the carbon double bond. But what I want to point out in this figure is the dramatic difference in the two spectra in the intensity of the sharp band at ~2850 cm^-1 relative to the remaining part of the CH region. Because the methyl group bands come in on the higher frequency side, the lower intensity in the methylene band of MCT at ~2850 cm^-1 is consistent with shorter chains in that material. I have been able to use this observation in my spectral searching; I have found that matches can be close, but by looking for a similar compound with a different chain length, I am able to get better concurrence.

Figure 7 shows the results on a search for the MCTs. I have used my own judgment to select the hits of interest. My spectrum is black, butyl laurate (12 plus 4 carbons) is green, and decanoic acid (10 carbons) is red. Butyl laurate will exhibit a well-defined carbonyl, which we can confirm is present at approximately 1740 cm^-1. The decanoic acid will have a broad band near 1650 cm^-1 because of the carboxylate. In the CH region, the butyl laurate seems to be a better match, but we need to remember that this material is a mixture of many compounds, so this analysis is only meant to give indications as to what is present. In addition, neither of these compounds explain the triplet of bands around 1000 cm^-1 or the probable presence of the double bond.

Summary

I have explored some of the issues involved in identifying unknown organic materials using commercial software and data bases. In my case, I am using KnowItAll for its searching algorithms and databases. Under ideal circumstances, the software can yield good results. It is usually helpful to exclude spectral regions that have no information to improve the matching. In cases where no realistic results appear, I have offered some simple ways to sort common compounds using the CH region as well as analytic bands in the fingerprint region. I should also add that for those of you who have done much more IR spectroscopy than Raman, I want to point out that there are some capabilities in the software that are commonly used in IR, but totally inappropriate for Raman. Although these “corrections” make the results of IR spectra better, they have the potential to add artifacts to Raman spectra and should be avoided unless you know what they are doing, and that the algorithms are relevant. What algorithms are present will be vendor-dependent, but for a start, I would recommend turning off automated corrections. If you feel that the spectra need to be corrected for instrument function, this should be done in the instrument vendor software before transferring the data to the searching program.

What you should have noticed by now is that a simple selection of the spectrum with the highest HQI is not always the best choice. It is important to examine the spectral hits with the query spectrum to determine if the hit makes sense. Keep in mind that the HQI is a simple dot product, which will be distorted by the relative intensities of the bands. Another issue to beware of is the position of the carbonyl band. This band is quite sensitive to the chemistry of its position in a molecule, but it is weak in the Raman spectrum, so the hits tend to not get it right. When I figure out how to focus on the importance of weak bands, I will be able to offer advice for this as well. I know that it is now well-understood that Raman spectra have the potential to provide a wealth of information on unknown materials. I also know that I am disappointing the novices trying to take advantage of these capabilities, because a learning curve will be required to make best use of them. My hope is that the information that I am providing will be of use.