Key Points
- A team from Serbia, Israel, and the UAE reviewed how RF machine learning algorithms are advancing biomedical signal analysis, highlighting their ability to process complex data from spectroscopy, imaging, electrochemistry, and omics platforms for tasks such as toxicity prediction and disease classification.
- RF models have shown high accuracy in identifying cell damage, assessing pharmaceutical toxicity, differentiating disease states, and predicting enzyme activity by analyzing intricate biochemical and imaging data.
- Despite their strengths, RF models face limitations such as interpretability issues and challenges with very high-dimensional data.
With machine learning algorithms beginning to be routinely adopted in several application areas to automate data processing, researchers are taking a look at how else machine learning (ML) models can be used to improve biomedical signal analysis. In this review article led by a multidisciplinary research team from Serbia, Israel, and the United Arab Emirates institutions, the researchers conducted a deep dive into the random forest ML algorithm, analyzing how it is being used, what its strengths and weaknesses are, and how it is advancing biomedical science. The findings were published in the journal Chemico-Biological Interactions (1).
What is Random Forest?
Random Forest is a powerful ensemble-based supervised ML algorithm that combines multiple decision trees using bootstrap aggregation (bagging) and random feature selection (1,2). Decision trees are algorithms that start with a basic question, and then from there, multiple questions emerge, and these make up decision nodes in the tree (2). Decision trees are essentially a way to categorize and subset the data collected.
This dual strategy of random forest ML algorithms allows it to improve predictive accuracy while reducing the risk of overfitting, which is a frequent challenge in ML, particularly with high-dimensional biomedical data (1). By training on diverse data sets derived from spectroscopy, electrochemistry, imaging, and omics-based platforms, random forest models have become especially valuable in physiological signal analysis and biochemical classification (1).
What are some of the promising uses of random forest models?
The authors explained in their article that random forest models can accurately predict cell damage and toxicity. These outcomes are derived from analyzing intricate two-dimensional (2D) signal structures that are captured through imaging or biochemical assays (1). The review underscores recent studies, including the team’s own contributions, that demonstrate how decision tree-based algorithms can classify cellular responses to stress or toxic exposure by interpreting patterns in these signals.
Random forest models have also been valuable in toxicity assessments. By processing biochemical and morphological features from cellular imaging, random forest models can detect toxic effects of new pharmaceutical compounds with high sensitivity (1).
Random forest algorithms have also been used to differentiate between disease states, including cancer subtypes and metabolic disorders, by classifying omics data (1). Random forest models also offer strong performance in predicting continuous variables such as enzyme activity levels or concentration of specific metabolites (1).
What are the current limitations with random forest models?
Currently, random forest models have several limitations, despite their promise in the biomedical field. For one, like many ML models, random forest models have a “black box” nature to them, which does hinder data interpretability (1). In the biomedical space, this limitation is problematic because understanding the rationale behind a model’s prediction is crucial. Additionally, random forest models may suffer from bias in feature selection and can struggle when confronted with extremely high-dimensional data, such as full genome or proteome data sets (1).
Therefore, right now, the researchers believe that random forest models should be integrated into hybrid ML frameworks because of its ability to handle noise and non-linear relationships (1). They conclude their paper by describing how the growth on using random forest models is indicative of the push for using artificial intelligence (AI) across all industries. It is expected that AI-driven sensing systems will continue to be developed as the technology improves and becomes less expensive. These integrated systems could dramatically enhance the detection, classification, and prediction of physiological and pathological processes, especially in personalized medicine and real-time monitoring (1).
However, based on the current progress being made on this front, random forest models clearly have a pathway to being widely adopted in future biomedical applications. With continued refinement and interdisciplinary collaboration, random forest models are poised to play a central role in decoding the biochemical signals that underpin human health and disease (1).
References
- Pantic, I. V.; Pantic, J. P.; Valjarevic, S.; et al. Artificial intelligence – based approaches based on random forest algorithm for signal analysis: Potential applications in detection of chemico - biological interactions. Chem. Biol. Inter. 2025, 418, 111624. DOI: 10.1016/j.cbi.2025.111624.
- IBM, What is Random Forest? IBM.com. Available at: https://www.ibm.com/think/topics/random-forest (accessed 2025-07-25).