Identifying Insect Species Using Machine Learning


Using machine learning methods and spectroscopy, scientists from Central South University in Hunan, China created a unique method of analyzing empty puparia to identify insect species. Their research was published in Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (1).

Set of insects | Image Credit: © Alekss -

Set of insects | Image Credit: © Alekss -

Species identification, specifically within entomological surveys, can have a great impact on biodiversity assessment to environmental management, to forensic investigations (1). Insect species identification can be done using a variety of objects, including eggs, larvae, and pupae. Empty puparia, for example, can be the sole source of entomological evidence available when an insect dies, and this aspect of species identification is relatively unstudied.

Empty puparia are the exoskeletons that remain after insect eclosion, safeguarding intra-puparium tissue from damage. There have been many studies on the composition of empty puparia, leading to its use in multiple fields, such as developing antibacterial drugs and in postmortem interval (PMI) estimation (1). That said, traditional analysis methods fall to tell the difference between incomplete empty puparia and species that are morphologically similar. This has led to a need in easier and faster techniques for detecting empty puparia.

In this study, attenuated total reflectance-Fourier transform infrared spectroscopy (ATR-FTIR) was used to acquire the spectral information from empty puparia of five different species of fly. The data was then subjected to spectral pre-processing to obtain average spectra for preliminary analysis. Following this, principal component analysis (PCA) and orthogonal partial least squares-discriminant analysis (OPLS-DA) were used for clustering and classifying the spectra. Afterwards, three machine learning models–Support Vector Machines (SVM), K-nearest neighbor (KNN), and Random Forest (RF)–were used to analyze spectra from different waveband groups.

During the clustering and classification process, two wavebands (3000–2800 cm−1 and 1800–1300 cm−1) were deemed significant in distinguishing one of the species, Aldrichina graham. As for the machine learning models, the biological fingerprint region (1800–1300 cm−1) showed a great ability in identifying empty puparia species. Notably, the SVM model exhibited a 100% accuracy in identifying all five fly species. Overall, the scientists view this as a notable first step in identifying insect species with empty puparia, specifically using infrared spectroscopy and machine learning methods for the process. According to them, this study provides “a robust research foundation for future investigations in this area” (1).


(1) Zhang, X.; Yang, F.; Xiao, J.; Qu, H.; Jocelin, N. F.; Ren, L.; Guo, Y. Analysis and Comparison of Machine Learning Methods for Species Identification Utilizing ATR-FTIR Spectroscopy. Spectrochim. Acta Part B At. Spectrosc. 2024, 308, 123713. DOI:

Related Content