News|Articles|April 28, 2025

New Tutorial Highlights Power of Chemometrics in Data Analysis

A new tutorial provides a step-by-step, hands-on guide to using multivariate data analysis tools like PCA and PLS to extract meaningful insights from complex pharmaceutical data sets.

A recent tutorial published by researchers from the Dipartimento di Scienza Applicata e Tecnologia at Politecnico di Torino in collaboration with Merck Serono SpA provided guidance for researchers to master multivariate data analysis methods. This study, published Chemometrics and Intelligent Laboratory Systems and led by Nicola Cavallini, a researcher from the Dipartimento di Scienza Applicata e Tecnologia at Politecnico di Torino, aids to help researchers understand complex data sets produced by modern analytical technologies such as near-infrared (NIR) spectroscopy and Raman spectroscopy (1).

NIR and Raman spectroscopy are analytical spectroscopic techniques routinely used in pharmaceutical analysis. NIR can identify impurities in drugs and pharmaceutical formulations, and it is often handy in quality assurance applications to ensure the drugs that are sent to market are of high quality (2). Meanwhile, Raman spectroscopy, because it is non-invasive, can test drug products that are already in the package (3). Because Raman requires no sample preparation, using this technique in pharmaceuticals helps keep costs down.

In their tutorial, Cavallini and the team discusses every major stage of pharmaceutical data analysis. From raw data organization to predictive modeling, this case-study-based tutorial provides a clear and reproducible framework that is both accessible and informative (1).

One current trend in pharmaceutical analysis is that modern process analytical technologies (PAT) are commonly used to generate massive volumes of spectral data that contain a wealth of hidden chemical and physical information. However, extracting insights from these data requires more than just sophisticated instrumentation (1). Cavallini and the team discuss the role of chemometric tools in navigating this complexity, particularly through techniques such as principal component analysis (PCA), partial least squares (PLS) regression, and partial least squares-discriminant analysis (PLS-DA) (1).

Their tutorial demonstrates this by describing a real-world data set involving multiple freeze-dried pharmaceutical formulations. Beginning with a detailed explanation of the dataset’s structure and characteristics, the authors methodically lead the reader through a complete data analysis pipeline (1). Through each step in this process, which includes data preprocessing, exploratory analysis, regression modeling, and classification, the researchers explain exactly what to do and how to execute them (1).

The tutorial also demonstrates how increasing levels of sucrose and arginine in the formulations influence the clustering and regression results, offering insight into how formulation variables affect the final product. It also uncovers subtler patterns, such as the impact of the operator performing the analysis and the session in which data were collected, highlighting the method's sensitivity not just to sample composition but also to procedural variability (1).

This practical approach to chemometrics is important because the pharmaceutical industry has put more emphasis on quality control, process optimization, and regulatory compliance. As a result, the authors are keen on encouraging critical thinking during each stage of drug analysis (1). At each stage, key questions are posed and discussed, which allow readers to reflect on the decisions they make during their own analyses. The inclusion of fully commented Matlab code furthers this educational goal, allowing even those with limited programming experience to adapt the scripts for their own data sets (1).

By presenting a tutorial that is both technically rigorous and practically approachable, Cavallini and his co-authors provide a roadmap for advancing data literacy in one of the world’s most scientifically demanding industries (1). Ultimately, the tutorial shows that with the right tools and approach, even complex, high-dimensional spectral data can become a source of actionable insight.

References

Massei, A.; Cavallini, N.; Savorani, F.; et al. Exploring NIR Spectroscopy Data: A Practical Chemometric Tutorial for Analyzing Freeze-dried Pharmaceutical Formulations. Chemo. Intel. Lab. Sys. 2025, 257, 105291. DOI: 10.1016/j.chemolab.2024.105291
Wetzel, W. An Inside Look at Near-infrared (NIR) Spectroscopy. Spectroscopy. Available at: https://www.spectroscopyonline.com/view/an-inside-look-at-near-infrared-nir-spectroscopy (accessed 2025-04-25).
Vankeirsbilck, T.; Vercauteren, A.; Baeyens, W.; et al. Applications of Raman Spectroscopy in Pharmaceutical Analysis. TrAC Trends Anal. Chem. 2002, 21 (12), 869–877. DOI: 10.1016/S0165-9936(02)01208-6