Machine Learning Unveils Efficient Classification of Nanoparticles from Noisy spICP-TOF-MS Data

Published on: 

New research has demonstrated a two-stage machine learning strategy to overcome bias in spICP-TOF-MS data and improve the classification of nanoparticles. The approach achieves high accuracy in identifying engineered, incidental, and natural nanoparticle types, providing a robust and efficient method for nanoparticle classification in complex samples.

Single-particle inductively coupled plasma time-of-flight mass spectrometry (spICP-TOF-MS) holds promise for the quantification and classification of nanoparticles (NPs) based on their elemental compositions. However, the presence of systematic bias in spICP-TOF-MS data poses a challenge for accurate NP classification. To address this issue, researchers at Iowa State University have developed a multi-stage semi-supervised machine learning (SSML) strategy that effectively overcomes the inherent bias and improves the classification of NP types. This work was published in Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (1).

SSML is an approach that combines multiple steps to train a machine learning model using a combination of labeled and unlabeled data. In the first stage, labeled data is used to train an initial model. This model is then used to identify and separate the "noise classes" or misclassified instances in the data. In the second stage, these noise classes are incorporated into the training process, enriching the labeled data and improving the model's performance. By iteratively refining the model with the addition of previously misclassified instances, SSML aims to enhance the model's ability to accurately classify new, unseen data. This approach is particularly useful when labeled data is limited or expensive to obtain, as it leverages the available labeled and unlabeled data to create a more robust and effective machine learning model.

The research team's approach involves identifying and incorporating "noise classes," which account for systematic particle misclassifications, into the SSML model. By doing so, a more robust classification model is developed, enabling accurate identification of NP types. The researchers conducted a case study using cerium(IV) oxide, ferrocerium mischmetal, and bastnaesite mineral NPs as representatives of engineered (ENP), incidental (INP), and natural (NNP) nanoparticle types, respectively.

The final SSML model achieved impressive results, with a receiver operating characteristic area under the curve (ROC AUC) value of 0.979. The false-positive rates for ENPs, INPs, and NNPs were exceptionally low at 0.030, 0.001, and 0, respectively. This level of accuracy allows for reliable particle-type classification even in mixed samples with varying concentrations. The researchers demonstrated the capability of their two-stage SSML model for particle-type quantification across a wide range of concentrations, spanning over two orders of magnitude.


The significance of this study lies in its ability to address the bias present in spICP-TOF-MS training data, providing a straightforward and robust approach for incorporating machine learning models into NP classification strategies. By overcoming the limitations of spICP-TOF-MS, this innovative methodology has the potential to advance the field of nanoparticle analysis, facilitating accurate identification and characterization of NPs for a wide range of applications, including environmental monitoring, nanotoxicology, and nanomaterial development.

With further refinement and validation, the SSML approach developed by the Iowa State University team holds great promise for enhancing the accuracy and efficiency of nanoparticle classification, ultimately contributing to advancements in nanoscience and technology.


(1) Buckman, R. L.; Gundlach-Graham, A. Machine learning analysis to classify nanoparticles from noisy spICP-TOFMS data. Spectrochimica Acta Part A: Mol. Biomol. Spectrosc. 2023, ASAP. DOI: 10.1039/D3JA00081H