Scientists from the University of Granada (Spain) recently compared how effective hyperspectral imaging (HSI) and machine learning (ML) methods are in classifying ink found in historical documents. Their findings were published in Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (1).
Feather and ink bottle isolated on paper background | Image Credit: © Sergey Yarochkin - stock.adobe.com
Identifying materials used in tangible cultural heritage is vital for selecting appropriate restoration and preservation strategies. Analyzing inks in manuscripts and historical documents can enrich one’s understanding of artistic and historical context, bettering efforts to date documents, determine authorship, detecting falsifications or undocumented restorations, and identifying causes of deterioration. Ink analysis, therefore, is key for codicologists and historians looking to explore the content and material composition of manuscripts.
To obtain compositional information while preserving objects’ integrity and value, non-invasive analytical techniques are predominantly used, the most widely utilized being X-ray fluorescence (XRF), X-ray diffraction (XRD), Fourier transform infrared (FTIR) spectroscopy, and Raman spectroscopy. Recently, however, hyperspectral imaging (HSI) has gained prominence in this field. Combining spectroscopy and spatial imaging, this technique provides images at different wavelengths, capturing spectral reflectance at each pixel of an image, creating a hypercube containing three-dimensional data (two spatial coordinates and a spectrum for every pixel of the image). According to the researchers, HSI’s primary advantage over other methods is its ability to provide spatial information, enabling the retrieval of material distribution within a document, which is critical for historical studies and conservation evaluation (2). Additionally, its non-contact and rapid data acquisition capabilities make it suitable for on-site analysis of historical artifacts at locations like museums or libraries.
While HSI has its advantages, the researchers claim that no studies have investigated the automatic classification of historical inks by using machine learning (ML) and HSI data. For this study, six supervised ML models were trained and validated to automatically classify three types of inks: (1) pure metallo-gallate inks (MGP); (2) carbon-containing inks (CC), which include pure carbon-based inks like ivory black or bone black, as well as mixtures of carbon-based and metallo-gallate or sepia inks; and (3) non-carbon-containing inks (NCC), which can be pure sepia or a mixture of MGP and sepia. Six supervised classification models, including five traditional algorithms (Support Vector Machines [SVM], K-Nearest Neighbors [KNN], Linear Discriminant Analysis [LDA], Random Forest [RF], and Partial Least Squares Discriminant Analysis [PLS-DA]) and one deep learning (DL)-based model, were evaluated. Further, principal component analysis (PCA) was used before classification for visualization of the separability of the classes and dimensionality reduction, comparing the classification accuracy and running time with and without PCA.
With mock-up samples and historical documents, micro-averaged accuracy above 90%was achieved for all models. The best results came from the DL model, with micro- and macro-averaged accuracy and recall reaching above the 99%threshold. Among traditional models, SVM was the best option with all metrics above the 95% threshold and micro- and macro-averaged accuracy and recall above 97%. That said, neither model achieved perfect results. As such, choosing between a traditional or DL model can mostly be based on available computational resources and how dire the need is for slightly better accuracy.
Future research will be focused on tackling more detailed classification where subclasses in CC and NCC groups can be separated. Applying unmixing techniques could prove more interpretable analyses of individual components and their concentrations in mixtures compared to DL or ML approaches. Their effectiveness, however, will depend on the choice of mixing model, the accuracy of the extracted endmembers (spectra of pure components), and the availability of a comprehensive reference library.
(1) López-Baldomero, A. B.; Buzzelli, M.; Moronta-Montero, F.; Martínez-Domingo, M. Á.; Valero, E. M. Ink Classification in Historical Documents Using Hyperspectral Imaging and Machine Learning Methods. Spectrochim. Acta – A: Mol. Biomol. Spectrosc. 2025, 335, 125916. DOI: 10.1016/j.saa.2025.125916
(2) Catelli, E.; Randeberg, L. L.; Alsberg, B. K.; Gebremariam, K. F.; Bracci, S. An Explorative Chemometric Approach Applied to Hyperspectral Images for the Study of Illuminated Manuscripts. Spectrochim. Acta – A: Mol. Biomol. Spectrosc. 2017, 177, 69–78. DOI: 10.1016/j.saa.2017.01.015
New AI Strategy for Mycotoxin Detection in Cereal Grains
April 21st 2025Researchers from Jiangsu University and Zhejiang University of Water Resources and Electric Power have developed a transfer learning approach that significantly enhances the accuracy and adaptability of NIR spectroscopy models for detecting mycotoxins in cereals.
New Study Reveals Insights into Phenol’s Behavior in Ice
April 16th 2025A new study published in Spectrochimica Acta Part A by Dominik Heger and colleagues at Masaryk University reveals that phenol's photophysical properties change significantly when frozen, potentially enabling its breakdown by sunlight in icy environments.
AI-Driven Raman Spectroscopy Paves the Way for Precision Cancer Immunotherapy
April 15th 2025Researchers are using AI-enabled Raman spectroscopy to enhance the development, administration, and response prediction of cancer immunotherapies. This innovative, label-free method provides detailed insights into tumor-immune microenvironments, aiming to optimize personalized immunotherapy and other treatment strategies and improve patient outcomes.