Advances in Chemometrics for the Analytical Sciences

July 8, 2019

Barry Wise, the president of Eigenvector Research, is a renowned expert in chemometrics. We recently spoke to him to get his thoughts and opinions on the latest developments in chemometrics for the analytical sciences.

Since its early inception by Profs Bruce Kowalski and Svante Wold, chemometrics has become the preeminent methodology for quantitative and qualitative analysis, including calibration development, calibration transfer, and process monitoring. Chemometrics is a powerful tool that applies data interpretation to multiple analytical techniques, including liquid and gas chromatography, vibrational or atomic spectroscopy, and mass spectrometry. The most recent advances in chemometrics have demonstrated: improved signal-to-noise, better data mining and interpretation, and higher-resolution in complex imaging. In addition to these benefits, technology for calibration transfer between instrument platforms, new developments in pattern recognition and qualitative methods, progress in quantitative modeling and prediction, innovations in automation and artificial intelligence, and increased deployment in process monitoring have all been made practical. Barry Wise, the president of Eigenvector Research, is a renowned expert in chemometrics. We recently spoke to him to get his thoughts and opinions on the latest developments in chemometrics for the analytical sciences.

You have recently described advances in chemometrics in three-dimensional and multidimensional microscopy (1), detection of cervical cancer (2), process analytical technology (3), calibration transfer (4), model maintenance (5), decompositions using maximum signal factors (6), multiway methods (7), and classification algorithms (8). What is the newest development in chemometrics for analysis that you have been involved with? What is unique about your approach?

With the mainstreaming of artificial intelligence (AI) we’ve seen the development of new data modeling techniques. Most of these are being designed for big data sets involving millions of samples, and data structures that aren’t all that similar to those we typically see in the process analysis or analytical chemistry domain, and for purposes I’m more than a little skeptical about. But obviously a huge amount of effort has been put into these methods.

So I think our challenge is to figure out what parts of this will ultimately be useful in our domain and how we might adapt the methods to work better on our problems. To that end we’ve gathered together a number of these modeling techniques including artificial neural networks (ANNs), support-vector machines (SVMs) and most recently boosted regression and classification trees (XGBoost) into our main modeling interface for analytical methods. This means that they all can utilize the same wide array of data preprocessing methods and are all evaluated by the same means, using full cross validation done in the classical way. So when you compare them you’re actually getting an apples-to-apples comparison. This is necessary to really determine which modeling direction to take going forward. 

From your perspective what are the most exciting developments in chemometrics for chromatography or mass spectrometry over the past five years, in terms of both applications and instrumentation development? How about spectroscopic techniques?

The thing that I find exciting is when methods that we’ve been proponents of for many years finally find their home and start getting integrated into software packages with specific analytical methods. As an example I’ve been a proponent of parallel factor analysis (PARAFAC) for about 20 years now but it is only lately that it is really beginning to find routine application. In hyphenated chromatography (GC–MS) we’re seeing the development of algorithms based on PARAFAC2 that resolve overlapping peaks in an automated fashion. In particular, the PARAFAC2 based deconvolution and identification system (PARADISe) algorithm developed by Rasmus Bro’s group [at the University of Copenhagen] couples deep neural networks to identify overlapped peaks then feeds them to PARAFAC2 for decomposition into their true chemical components. PARAFAC is also being integrated into excitation emission fluorescence systems; we have a great collaboration with a commercial instrument company on that. Both of these systems combine the results with database searches in order to identify underlying chemical information. 

It has also been great to see the (albeit slow) adoption of chemometric methods into imaging systems, from remote sensing to imaging Raman and infrared (IR) systems to surface techniques like secondary-ion mass spectrometry (SIMS). Sometimes groups are so focused on the hardware that it takes a long time for them to admit that there are also gains to be made on the data processing front. Bruce Kowalski used to say, “math is cheaper than physics,” referring to the fact that hardware is expensive (think supercolliders) but mathematics is largely done with pencils. That remains true today.

In your opinion, what are the papers you would highlight as being most influential in advancing chemometric measurements over the past five to ten years?

Honestly I get most of my ideas about what we should work on next at conferences and by following the work of particular research groups or individuals. Also from requests from our users, various news feeds and, yikes!, social media. Research journals are so fragmented now, and the pressure for academics to publish has led to so many “least publishable unit” type papers that I generally get going on what I’m interested in and then work backwards to the journal articles I need to help figure out the pieces.

As to the “most influential” part of the question, I think it’s the chemometric evangelists (and I consider myself one), that are responsible for “advancing chemometric measurements.” They do this by getting out and teaching or training, providing software solutions, and giving talks to new audiences to introduce them to methods and potential benefits.

What recent advances in chemometrics have you been most active in recently?

We’ve been pretty active lately in calibration model maintenance, instrument standardization and calibration transfer, which I’ve been involved with for a long time. Model maintenance is an oft forgotten part of the chemometric model implementation process. There are lots of reasons models become invalid. You can roughly divide the reasons into changes in the systems under consideration (addition of new analytes) and changes in the measurement hardware (aging and drift). So it’s critical to have a plan to keep models working and the tools for doing this. We’ve developed a roadmap for model maintenance and have adapted our software so that standardization methods can be integrated directly into model structures, and we’ve created tools to evaluate the options.

What have been your greatest challenges in scientific discovery over your career? What is your general approach to problem solving in your scientific work?

After some reflection this question made me realize that I actually consider myself an engineer (perhaps no surprise as I actually am a chemical engineer). You could say that my real job has been to develop tools that allow others to engage in the scientific discovery process.

As to my general approach to problem solving, it can be summed up by something a good friend of mine said once considering what to do on a Friday night: “Let’s do something, even if it’s wrong.” Honestly I don’t like spending a lot of time reviewing how others have approached similar problems; I prefer to start working on a problem myself with an uncontaminated mind. And while developments in theory are great and we can learn a lot from simulations, none of this really matters if the tools we create don’t work on real data from real problems. So my general approach has been to be hands-on with both the development of the chemometric methods and the data problems they’re designed to treat, keeping them in a fairly tight loop.

What are some major gaps in knowledge in chemometrics that you would like to see more research and development time devoted to?

The major gap that I see is a lack of qualified chemometrician or data scientists relative to the number of problems there are out there to solve. Our approach has been to provide chemometrics training, and we have in fact done hundreds of classes attended by thousands of students. And while that has been great for the students we’ve worked with, it still hasn’t filled the gap. Given the philosophy so rampant in industry these days of “doing more with less” I don’t expect that will change. So then it comes down to how to 1) make the chemometricly educated more efficient with their time so they can address more problems and 2) create tools that can be used by less-knowledgeable people. The development time needs to go into creating tools that automate large parts of typical modeling efforts and simplifying tools so that they can be used by mere humans.

At the Advances in Process Analytics and Control Technologies (APACT) conference this week I heard Brian Rohrback of Infometrix talk about autonomous model building. They’ve surveyed chemometric models in use in petrochemical plants and found that the majority of them are based on a poor selection of samples and most calibration models are overfit. This is no surprise to me. They are working on automating the model building process to avoid these issues and implementing the subsequent models in a variety of chemometric packages, including ours, for online implementation.

I’ve been reluctant to develop tools that automate the modeling process too much because I’m a firm believer that you can’t “check your brain at the door” and do a decent job of this. In fact we have a pretty nice “model optimizer” in our software and I haven’t promoted it much because of my philosophical objections to taking people out of the loop. But I’m now inspired to make it work even better with less supervision because it appears there will never be enough trained people to address all the problems or opportunities out there.

What do you anticipate is your next major area of research or application in your field?

We’re planning an EigenSummit this summer to try to figure that out. We get everybody in the company together every couple years to hash out future directions and I’m really looking forward to this one!

In the mean time we’ll be working on continued refinement of our software in terms of improving usability and implementing user feature requests. For the computationally intensive methods we’ve implemented, and especially for big data applications like hyperspectral image analysis, we’re working on making better use of parallel processors so we can speed things up, which allows for a wider search over the preprocessing and method parameter landscape, along with the ability to tackle larger problems.

What would you like to share with the readers of LCGC and Spectroscopy related to the most exciting future uses of chemometrics?

Going through this interview made me realize how many things I’m excited about right now! But if I have to pick one, I’m especially excited about all the potential applications that are opening up because of handheld instruments like Raman systems. We’ve enabled the use of sophisticated chemometric models on several systems in near real-time. We are able to take sophisticated chemometric models developed using advanced techniques and create numerical recipes from them that can be compiled or interpreted for use on low power processors typically found on handheld instruments.

The key here is that while the processing and memory requirements to compute chemometric models are great, the requirements to apply them are relatively small. The application of almost all preprocessing methods and models, from partial least squares (PLS) to ANNs and SVMs, can be boiled down to a series of simple mathematical steps applying the model parameters to the incoming data stream. This can be done with very little processing power. So now the whole process of taking data, developing a model, getting it on the instrument, and deploying in the field can be greatly compressed. This improves the agility of the system significantly and has the potential to profoundly expand the number of applications of spectroscopy and chemometrics in the field. We’re certainly excited about that!

References

  • B.M. Wise, Visualization of three-way and higher order data sets (Conference Presentation). In Three-Dimensional and Multidimensional Microscopy: Image Acquisition and Processing XXVI (Vol. 10883, p. 108830F). International Society for Optics and Photonics (2019).

  • B.M. Wise, and J.M. Shaver, Detection of cervical cancer from evoked tissue fluorescence images using 2-and 3-way methods. In Optical Fibers and Sensors for Medical Diagnostics and Treatment Applications XIX (Vol. 10872, p. 1087210). International Society for Optics and Photonics (2019).

  • B.M. Wise, R.T. Roginski, Model maintenance: the unrecognized cost in PAT and QbD. Chimica Oggi-Chemistry Today33, 2 (2015).

  • B.M. Wise, H. Martens, M. Høy, R. Bro, and P.B. Brockhoff, Calibration Transfer by Generalized Least Squares. In Seventh Scandinavian Symposium on Chemometrics (SSC7), Copenhagen, Denmark from Aug. 2001 (2015).

  • B.M. Wise, and R.T. Roginski, A calibration model maintenance roadmap. IFAC-PapersOnLine48(8), 260-265 (2015).

  • N.B. Gallagher, J.M. Shaver, R. Bishop, R.T. Roginski, and B.M. Wise, Decompositions using maximum signal factors. Journal of Chemometrics, 28(8), 663-671 (2014).

  • B.M. Wise, J.M. Shaver, and N.B. Gallagher, Getting to multiway: A roadmap for batch process data. Scientific and Social Program TRICAP  21, 22 (2012).

  • B.M. Wise, D. O’Sullivan and M.A. Palacios, A Comparison of ANNs, SVMs, and XGBoost on some Challenging Classification Problems, APACT-19, Chester, UK, May 1 (2019).

Barry M. Wise is President and co-founder of Eigenvector Research and the creator of PLS_Toolbox chemometrics software. He holds a doctorate in Chemical Engineering and has experience in a wide variety of applications spanning chemical process monitoring, modeling and analytical instrumental development. He has extensive teaching experience, having presented over 100 chemometrics courses and has co-authored over 50 peer reviewed articles, book chapters and patents. Dr. Wise is the winner of the 2001 EAS Award for Achievements in Chemometrics. He has organized and chaired numerous conferences including the First International Chemometrics InterNet Conference InCINC ’94, 1995 Gordon Research Conference on Statistics in Chemistry, Three-way Methods in Chemistry and Psychology TRICAP ’97, and Chemometrics in Analytical Chemistry, CAC-2002 and International Association of Spectral Imaging, IASIM-2018. Wise was was just awarded the 14th Herman Wold medal in gold "for his pioneering contributions in Process Chemometrics and his extensive, deep commitment to the proliferation of Chemometrics." This biannual prestigious award is given to an individual who has contributed significantly to the development and proliferation of Chemometrics. The prize ceremony was held at the SSC16 (16th Scandinavian Symposium on Chemometrics) conference, Oslo, Norway.