News|Videos|April 3, 2026

Improving Interpretability for Archaeological Applications Using PLSR and RF Modeling

What predictive modeling strategy works best for radiocarbon dating? According to Christina Macie Ryder, a postdoctoral researcher at Texas A&M University, random forest (RF) models perform better than other modeling strategies such as partial least squares regression (PLSR).

In a study comparing predictive modeling strategies for radiocarbon dating, Christina Ryder, who is currently a postdoctoral researcher at Texas A&M University, and her team found that random forest (RF) models outperformed partial least squares regression (PLSR) models on a clean validation set, achieving a lower root mean square error and higher correct classification rate.1,2 However, when applied to an external archeological data set, the restricted PLSR model showed better performance, particularly in classifying samples from a late Pleistocene Neanderthal locality.1,2 The RF model struggled with overfitting and sensitivity to consolidants, while the PLSR model focused on collagen-specific absorption bands, proving more reliable.1,2 Ryder also emphasized that current work is ongoing to improve both models.

In the below video clip, Ryder discusses the PLSR and RF modeling performance in more detail when analyzing collagen in archaeological bone.

What is Partial Least Squares Regression and Random Forest?

Partial least squares regression (PLSR) is a statistical modeling technique that finds relationships between two sets of variables, predictors (X) and responses (Y), by extracting latent components that maximize the covariance between them.3,4

Unlike ordinary least squares regression, which directly regresses Y on X, PLSR projects both X and Y into a lower-dimensional latent space, then models the relationship there.3,4 PLSR modeling is ideal to use when there are many predictors and they are highly correlated.

Meanwhile, random forest (RF) is an ensemble machine learning (ML) algorithm that builds many decision trees and combines their outputs to produce more accurate and stable predictions than any single tree.5

A single decision tree is simple but prone to overfitting. RF modeling is designed to combat for overfitting.5 This modeling technique helps build many trees, introducing randomness so that each tree is different, and then aggregating predictions across all trees created.5 All of these benefits improve the accuracy of the spectral data.

This interview is the second part of our interview with Ryder. The first part of our conversation with Ryder focused on the effectiveness of near-infrared (NIR) spectroscopy for studying collagen in archaeological bone.

References
  1. Ryder, C.; Celis, G.; Devièse, T. et al. Refining Near-infrared Spectroscopy for Collagen Quantification: A New Predictive Model for Archaeological Bone. J. Arch. Sci. 2026, 185, 106448. DOI: 10.1016/j.jas.2025.106448
  2. Wetzel, W.; Spectroscopy Staff. Collagen Preservation in Archaeological Bone Using NIR Spectroscopy. Spectroscopy. Available at: https://www.spectroscopyonline.com/view/collagen-preservation-in-archaeological-bone-using-nir-spectroscopy (accessed 2026-03-27).
  3. Abdi, H.; Williams, L. J. Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression. Methods Mol. Biol. 2013, 930, 549–579. DOI: 10.1007/978-1-62703-059-5_23