Investigating Forensics Applications of Raman Spectroscopy, ATR FT-IR, and Chemometrics

As forensic analysis continues to advance, such as in the understanding of source identification and analysis of trace quantities of bodily fluids, spectroscopic techniques and machine learning are playing a significant role. Igor K. Lednev, a chemistry professor at the University at Albany, SUNY, in Albany, New York, has been working in this field with his team. The analytical methods currently under investigation include Raman spectroscopy, attenuated total reflection Fourier transform infrared (ATR FT-IR) spectroscopy, and advanced chemometric classification and analysis methods. We recently interviewed him about his work.

In one recent study, you reported the development of a multivariate discriminant model using ATR FT-IR spectra of dry urine to identify the sex of the donor (1). Which specific analytes are being measured for this discriminant analysis? What specific chemometric approach was required for optimizing the discrimination?

We utilized one of the multivariate statistical methods called partial least-squares discriminant analysis (PLS-DA).To improve further the discrimination, we also applied genetic algorithm (GA) analysis to the derivative urine spectra. The GA is a computational method that identifies spectral regions that contribute the most to the differentiation power of the model. In addition, the spectral regions selected by GA may be instrumental for understanding the chemical origin of sex differences in urine samples. Although a specific biochemical assignment based on the regions selected by GA was challenging, there was good agreement with the characteristic spectral peaks of creatinine. Creatinine is a waste product from creatine, which is consumed in muscles. Detection and quantification of creatinine in biological fluids, particularly urine, have been previously investigated. It has been shown that the concentration of creatinine in urine depends on the muscle mass and donor sex, as the urine of male subjects typically has a higher concentration of creatinine than that of females. Let me also mention that this was a collaborative project between our laboratory and Prof. Takeaki Ozawa’s laboratory at the University of Tokyo. A very talented PhD student, Ayari Takamura, spent half a year in our laboratory conducting research.

In your forensic research, you have applied the random forest algorithm to differentiate between Raman spectra of body fluids and environmental inferences (EIs) commonly found for semen traces (2). You reported that a classification probability threshold of 70% was able to separate and classify samples satisfactorily and that none of the 27 EI substances was classified as any form of body fluid. How did you arrive at the 70% threshold limit? Were there any special techniques for either Raman spectra collection or the application of the random forest algorithm that you used?

This is an excellent question. Our random forest (RF) model assigns each Raman spectrum to all five body fluid classes (blood, saliva, semen, sweat, and vaginal fluid) with a specific probability (ranging between 0 and 1) that a spectrum belongs to an individual class, and all five probabilities sum up to 1. This process of classification in RF is based on the votes from all decision trees. The averaged classification probability values of body fluid samples were used to establish a threshold of 70% of classification probability, such that if a sample receives a classification probability ≥70% for a specific body fluid, the sample is classified as that fluid. If the highest classification probability for a sample is < 70% for all five body fluids, the sample is classified as unassigned. The lowest classification probability for body fluid samples was 0.73 and the highest for all EI substances was 0.67 (red body paint). It is important to note that for this study a false positive result would occur only if the mean classification probability of spectra from one of the potential EI substances were ≥70%. Therefore, with the established threshold of 70%, the developed RF model classified all samples from the validation dataset correctly, demonstrating 100% accuracy in predictions.

Let me emphasize here that we built the RF model based on Raman spectra of body fluids only. However, spectra of both body fluids and potential environmental inferences (EIs) were included to the external validation of the model. This is important because it is challenging to predict and build a library of all potential EIs. There is one more important aspect about this RF model. We initially built a support vector machines discriminant analysis (SVMDA) classification model for the differentiation of body fluids, which showed 100% correct assignments providing that a high-quality Raman spectrum was acquired (3). When we applied that SVMDA model to the EIs’ spectra, several of the EI substances were misclassified as one of the body fluids, resulting in a false positive identification. To overcome this problem, we built the RF classification model, which we initially validated against potential EIs for bloodstains (4). The same RF model was shown later to perform very well for potential EIs for semen (2). This makes us assume that this model should work well for all potential EIs. We are currently testing this assumption.

In a review paper, you outlined that Raman spectroscopy is able to determine the bloodstain age for up to two years (5). How is this possible? What special techniques or methods are required to accurately determine the age of bloodstains? What are some of the difficulties still remaining to achieve more accurate and conclusive bloodstain age analysis?

Blood is a very complex, heterogeneous biological system, which is subjected to several chemical, biochemical and physical transformations post deposition including autoxidation, denaturation aggregation, and degradation (6). Various processes occur at different stages of bloodstain aging. For example, we have recently reported that fluorescence properties of a bloodstain change dramatically during the first day post deposition (7). We assigned these changes to oxidation processes involving tryptophan, nicotinamide adenine dinucleotide, and flavins. Near infrared Raman spectra of dry bloodstains, which are dominated by the contribution from hemoglobin, keep changing noticeably for a much longer time, which allowed us to build regression models for determining the time since deposition up to one week (6) and even two years (8). The latter model is less accurate and shows about 70% accuracy only. However, this model allows a confident differentiation of bloodstains deposited several hours, several days, several weeks, several months, or more than a year ago, which should be very helpful information to the crime scene investigator. The main difficulty, which still limits the application of the developed methodology, is due to the environmental effects on the ex vivo aging of bloodstains. This includes temperature, humidity, and light, including sunlight. Several laboratories worldwide including ours work on understanding these environmental effects on blood aging using various techniques and methods.

Using chemometrics with Raman spectroscopy, you developed a classification method that provides a universal, single step, non-destructive, and robust technique with 100% accuracy for sample identification of all main body fluids (9). What were some of the key surprises or discoveries that you encountered while doing this research?

Good question. We reported on the great potential of Raman spectroscopy for the nondestructive confirmatory identification of body fluids in 2008 (10). Since then, we have made significant progress by making the analysis automatic (9) and validating the method with respect to potential false positives and false negatives as well as interference from common substrates (11,12) and contaminations (13). We are very grateful to the National Institute of Justice (NIJ) for supporting this project for many years. We have been trying to commercialize our patented technology for a while now, only to find how surprisingly difficult this process is. A main reason for it in my opinion is that the NIJ, as part of the Department of Justice (DOJ) in general is one of very few federal funding agencies that do not have Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) programs for the commercialization of scientific discoveries.

In one forensic analysis paper, you described the differentiation between smokers and nonsmoker donors in the Raman spectra of oral body fluid traces (14). For this technique, you reported that an artificial neural network (ANN) model showed 100% accuracy after external validation for discrimination of these two groups. Do you think that ANN has a bright future in forensics discriminant analysis using Raman spectroscopy?

A typical approach for developing classification models based on spectroscopic data includes several steps. First, the data are preliminarily evaluated using unsupervised statistical methods such as principal component analysis (PCA) and cluster analysis (CA). Then, supervised statistical methods are used to build and validate classification models. At this stage, researchers go typically from simpler methods to more complex ones until a satisfactory discrimination is achieved. Machine learning methods, including random forest (RF), support vector machine discriminant analysis (SVMDA), K-nearest neighbor, and artificial neural networks (ANNs), are known as data-driven approaches. ANNs are techniques based on mathematical models representing the human brain function that are able to efficiently map and extract nonlinear relationships from the data. They consist of mathematical neurons that are interconnected in layers and have weights that mediate these interconnections. ANNs are very robust and, being well trained, can minimize the error of the prediction values and thus provide very accurate results by adjusting the weights. ANNs have been implemented previously for various biomedical applications based on Raman spectroscopic analysis and have great potential for forensic applications too.

Research using ATR FT-IR spectroscopy was used to discriminate sex and race, based on the analysis of bloodstain samples (15). In this work, ATR FT-IR spectra were acquired from dry bloodstains and partial least squares discriminant analysis (PLSDA) was used to classify these sample spectra. What do you think is the basis of the fundamental chemical differences in these samples that was detected using ATR FT-IR?

To address this question, we used genetic algorithms (GA) and determined ATR FT-IR spectral regions that contributed the most to the differentiation of blood samples based on the donor sex and race. Specifically for the sex differentiation, we observed a region corresponding to lipid contribution that is consistent with the literature, reporting different levels of high-density lipoprotein (HDL) cholesterol and double bond index in fatty acids within the blood serum of males and females. In addition, a spectral region with a contribution from carbohydrates was selected by GA that is consistent with the literature data, indicating higher levels of glucose in blood of female donors. As for the race differentiation, the results of our analysis of ATR FT-IR spectral data are consistent with the literature indicating that the levels of lipoproteins, apolipoproteins, hemoglobin, and total protein concentration in blood varies with the donor race.

You recently reported the development of a support vector machines discriminant analysis (SVMDA) classification method for differentiation of Raman spectra of body fluid stains applicable to forensic analysis (4). From your experience in forensic analysis, what would you say are the advantages and disadvantages in applying discriminant models using genetic algorithms, ANN, and SVMDA? Is one method proving to be an improvement over the others?

As I mentioned earlier, we typically begin by using simple statistical methods and then go for more complex ones until a satisfactory discrimination is achieved. At the moment, our random forest model for the identification of main body fluids based on Raman spectroscopy shows 100% accuracy and is not subject to environmental interferences. We have already tested this model with respect to the potential false positives for blood (4) and semen (2), and need to complete the validation for the rest of the body fluids. We are cautiously optimistic that the model will withstand the remaining tests and will be the basis for the final “product,” which is a universal, nondestructive, confirmatory method for the forensic identification of all main body fluids.

What would you consider to be the most meaningful contributions of your work, including teaching and patent work?

An academic career is attractive to me because it gives plenty of opportunity to make various meaningful contributions including those to a scientific field, profession, and even society. Training the next generation of experts is probably the most important duty of the university professor. I am very happy that all students graduated from our lab are developing successful careers in pharma, major crime laboratories, and academia. I guess publishing about 250 research papers and reaching the h-index of 58 might indicate a noticeable contribution to the scientific field, but I leave it for my peers to evaluate. After many years of fundamental research, I have shifted the focus of our program toward applied science during the last decade or so, specifically targeting the development of new methods for medical diagnostics and forensic purposes. Most recently, I co-founded two startup companies aiming at the commercialization of our patented technologies. Should this activity be successful, it will probably make the most meaningful societal contribution.

Would you share with our readers to describe your work ethic, philosophy, and how you plan your daily or weekly work schedule?

This is a serious question, which deserves a separate conversation. Briefly, my work ethic is probably quite standard: be honest, be respectful, and work hard. Although this might sound trivial, it is not. One of the key elements of my teaching philosophy is to make sure that students learn how to learn. As for the research training, my main approach is to give students as much independence and responsibility as they can handle. In my opinion, and based on many years of experience, this is the best way to prepare students for their future careers. I am investing time and significant effort to generate a priority “wish” list for my activities. My daily and weekly work schedule includes my duties (teaching, service, and professional) and then, if there is time left, top activities from the priority list.

What words of wisdom do you have for any young people interested in a scientific research career?

Go for it! However, do not expect that it will be easy. Good luck!

References

(1) A. Takamura, L. Halamkova, T. Ozawa, and I.K. Lednev, “Phenotype profiling for forensic purposes: determining donor sex based on Fourier transform infrared spectroscopy of urine traces,” Anal. Chem. 91(9), 6288–6295 (2019).

(2) C. Taylor, E. Mistek, L. Halámková, and I.K. Lednev, “Raman spectroscopy for forensic semen identification: Method validation vs. environmental interferences,” Vib. Spectrosc. 109, 103065 (2020).

(3) C.K. Muro, K.C. Doty, L.d.S. Fernandes, and I.K. Lednev, “Forensic body fluid identification and differentiation by Raman spectroscopy,” Forensic Chem. 1(1), 31–38 (2016).

(4) R. Rosenblatt, L. Halámková, K.C. Doty, E.A.C. de Oliveira, and I.K. Lednev, “Raman spectroscopy for forensic bloodstain identification: Method validation vs. environmental interferences,” Forensic Chem. 16, 100175 (2019).

(5) A. Weber and I.K. Lednev, “Review: Crime clock – analytical studies for approximating time since deposition of bloodstains,” Forensic Chem. 19, 100248 (2020).

(6) K.C. Doty, G. McLaughlin, and I.K. Lednev, “A Raman ‘spectroscopic clock’ for bloodstain age determination: the first week after deposition,” Anal. Bioanal. Chem. 408(15), 3993–4001 (2016).

(7) A. Wojtowicz, A. Weber, R. Wietecha-Posluszny, and I.K. Lednev, “Probing menstrual bloodstain aging with fluorescence spectroscopy,” Spectrochim. Acta A Mol. Biomol. Spectrosc. 119172 (2020).

(8) K.C. Doty, C.K. Muro, and I.K. Lednev, “Predicting the time of the crime: Bloodstain aging estimation for up to two years,” Forensic Chem. 5, 1–7 (2017).

(9) B. Vyas, L. Halamkova, and I.K. Lednev, “A Universal Test for the Forensic Identification of All Main Body Fluids Including Urine,” Forensic Chem.100247 (2020). https://doi.org/10.1016/j.forc.2020.100247

(10) K. Virkler and I.K. Lednev, “Raman spectroscopy offers great potential for the nondestructive confirmatory identification of body fluids,” Forensic Sci. Int. 181, e1–e5 (2008).

(11) G. McLaughlin, M.A. Fikiet, H.-o. Hamaguchi, and I.K. Lednev, “Universal detection of body fluid traces in situ with Raman hyperspectroscopy for forensic purposes: Evaluation of a new detection algorithm (HAMAND) using semen samples,” J. Raman Spectroscopy 50, 1147–1153 (2019).

(12) G. McLaughlin, V. Sikirzhytski, and I.K. Lednev, “Circumventing substrate interference in the Raman spectroscopic identification of blood stains,” Forensic Sci. Int. 231(1-3), 157–166 (2013).

(13) A. Sikirzhytskaya, V. Sikirzhytski, G. McLaughlin, and I.K. Lednev, “Forensic identification of blood in the presence of contaminations using Raman microspectroscopy coupled with advanced statistics: effect of sand, dust, and soil.” J. Forensic Sci. 58(5), 1141–1148 (2013).

(14) E. Al-Hetlani L. Halámková, M.O. Amin, and I.K. Lednev, “Differentiating smokers and nonsmokers based on Raman spectroscopy of oral fluid and advanced statistics for forensic applications,” J. Biophotonicse201960123 (2019). https://doi.org/10.1002/jbio.201960123

(15) E. Mistek, L. Halámková, and I.K. Lednev, “Phenotype profiling for forensic purposes: Nondestructive potentially on scene attenuated total reflection Fourier transform-infrared (ATR FT-IR) spectroscopy of bloodstains,” Forensic Chem. 16, 100176 (2019).

Igor K. Lednev is a Professor of Chemistry at the University at Albany, State University of New York. His research is focused on the development of novel laser spectroscopy for medical diagnostics and forensic purposes. He has authored over 240 publications in peer-reviewed journals reaching h-index of 58. He is a co-founder of two startup companies targeting the commercialization of his technology protected by seven patents. He is on the editorial boards of Raman Spectroscopy, Forensic Chemistry, Spectroscopy magazine, and High Energy Chemistry journal. He served as an advisory member on the White House Subcommittee for Forensic Science. He is a co-founder of the NIJ Forensic Science Symposium at Pittcon (the world-largest Analytical Chemistry congress); the symposium became a regular annual event including 34 invited talks and a poster session. He is a Fellow of the Society for Applied Spectroscopy and the Royal Society of Chemistry (UK). Media covered his work over 90 times including TV and radio interviews, publications in the Wall Street Journal, Chemical & Engineering News, Forensic Magazine, etc. Discovery Channel Canada featured his work on forensic Raman spectroscopy. Congressman Tonko featured his research at the U.S. House of Representatives Hearing on the advancements in forensic science in September 2019. Dr. Lednev received several prestigious awards including the Gold Medal Award from NY/NJ Section of the Society for Applied Spectroscopy, Guest Prof. Fellowship from the Friedrich-Schiller-University, Research Innovation Award from Research Corporation, Chancellor’s Award for Excellence in Scholarship and Creative Activities, and CAS Dean’s Award for Outstanding Achievements in Teaching. For his development work, Lednev and coworkers have just received a Phase I Small Business Technology Transfer (STTR) grant from the U.S. National Science Foundation.