A new tutorial provides a step-by-step, hands-on guide to using multivariate data analysis tools like PCA and PLS to extract meaningful insights from complex pharmaceutical data sets.
A recent tutorial published by researchers from the Dipartimento di Scienza Applicata e Tecnologia at Politecnico di Torino in collaboration with Merck Serono SpA provided guidance for researchers to master multivariate data analysis methods. This study, published Chemometrics and Intelligent Laboratory Systems and led by Nicola Cavallini, a researcher from the Dipartimento di Scienza Applicata e Tecnologia at Politecnico di Torino, aids to help researchers understand complex data sets produced by modern analytical technologies such as near-infrared (NIR) spectroscopy and Raman spectroscopy (1).
NIR and Raman spectroscopy are analytical spectroscopic techniques routinely used in pharmaceutical analysis. NIR can identify impurities in drugs and pharmaceutical formulations, and it is often handy in quality assurance applications to ensure the drugs that are sent to market are of high quality (2). Meanwhile, Raman spectroscopy, because it is non-invasive, can test drug products that are already in the package (3). Because Raman requires no sample preparation, using this technique in pharmaceuticals helps keep costs down.
A collection of colorful capsules and tablets scattered on a surface. Generated by AI. | Image Credit: © Khatyjay - stock.adobe.com
In their tutorial, Cavallini and the team discusses every major stage of pharmaceutical data analysis. From raw data organization to predictive modeling, this case-study-based tutorial provides a clear and reproducible framework that is both accessible and informative (1).
One current trend in pharmaceutical analysis is that modern process analytical technologies (PAT) are commonly used to generate massive volumes of spectral data that contain a wealth of hidden chemical and physical information. However, extracting insights from these data requires more than just sophisticated instrumentation (1). Cavallini and the team discuss the role of chemometric tools in navigating this complexity, particularly through techniques such as principal component analysis (PCA), partial least squares (PLS) regression, and partial least squares-discriminant analysis (PLS-DA) (1).
Their tutorial demonstrates this by describing a real-world data set involving multiple freeze-dried pharmaceutical formulations. Beginning with a detailed explanation of the dataset’s structure and characteristics, the authors methodically lead the reader through a complete data analysis pipeline (1). Through each step in this process, which includes data preprocessing, exploratory analysis, regression modeling, and classification, the researchers explain exactly what to do and how to execute them (1).
The tutorial also demonstrates how increasing levels of sucrose and arginine in the formulations influence the clustering and regression results, offering insight into how formulation variables affect the final product. It also uncovers subtler patterns, such as the impact of the operator performing the analysis and the session in which data were collected, highlighting the method's sensitivity not just to sample composition but also to procedural variability (1).
This practical approach to chemometrics is important because the pharmaceutical industry has put more emphasis on quality control, process optimization, and regulatory compliance. As a result, the authors are keen on encouraging critical thinking during each stage of drug analysis (1). At each stage, key questions are posed and discussed, which allow readers to reflect on the decisions they make during their own analyses. The inclusion of fully commented Matlab code furthers this educational goal, allowing even those with limited programming experience to adapt the scripts for their own data sets (1).
By presenting a tutorial that is both technically rigorous and practically approachable, Cavallini and his co-authors provide a roadmap for advancing data literacy in one of the world’s most scientifically demanding industries (1). Ultimately, the tutorial shows that with the right tools and approach, even complex, high-dimensional spectral data can become a source of actionable insight.
Exploring Data Transforms in Chemometrics
May 14th 2025Our “Chemometrics in Spectroscopy” column highlights the methodology that is used in order to apply chemometric methods to data. Integrating chemometrics with spectroscopy allows scientists to understand solutions to their problems when they encounter surprising results. Recently, columnists Howard Mark and Jerome Workman, Jr., wrote a series of articles about data transforms in chemometric calibrations. In this listicle, we profile all pieces in this series and invite you to learn more about applying chemometric models to continuous spectral data.
Wearable fNIRS Sensor Tracks Cognitive Fatigue in Real Time
May 7th 2025Researchers have developed a wireless, wearable brain-monitoring device using functional near-infrared spectroscopy (fNIRS) to detect cognitive fatigue in real time. The miniaturized system enables mobile brain activity tracking, with potential applications in driving, military, and high-stress work environments.
Real-Time Health Monitoring Using Smart Wearable Spectroscopy Sensors With AI
May 6th 2025A newly published review in the journal Advanced Materials explores how intelligent wearable sensors, powered by smart materials and machine learning, are changing healthcare into a decentralized, personalized, and predictive modeling system. An international team of researchers highlights emerging technologies that promise earlier diagnosis, improved therapy, and continuous health monitoring—anytime, anywhere.
AI and Satellite Spectroscopy Team Up to Monitor Urban River Pollution in China
April 30th 2025A study from Chinese researchers demonstrates how combining satellite imagery, land use data, and machine learning can improve pollution monitoring in fast-changing urban rivers. The study focuses on non-optically active pollutants in the Weihe River Basin and showcases promising results for remote, data-driven water quality assessments.
How Satellite-Based Spectroscopy is Transforming Inland Water Quality Monitoring
Published: April 29th 2025 | Updated: April 29th 2025New research highlights how remote satellite sensing technologies are changing the way scientists monitor inland water quality, offering powerful tools for tracking pollutants, analyzing ecological health, and supporting environmental policies across the globe.