|Articles|December 1, 2021

December 2021
Volume 36
Issue 12

Cold-Hot Nature Identification of Chinese Medicine Based on an Ultraviolet Chemical Fingerprint

Author(s)Guohui Wei, Xianjun Fu, Zhenguo Wang

https://doi.org/10.56530/spectroscopy.qe1076z9

A model has been developed to predict the “cold” or “hot” nature of Chinese medicines based on UV spectral data.

The nature theory of Chinese medicine (CM) is a core basic theory of traditional Chinese medicine (TCM), and CM chemical fingerprint technology has been widely used in identification and research of CM. Following the previous study results, the UV spectrum data of 61 CMs with clear cold-hot nature, in which 30 CMs are “cold” medicines and 31 CMs are “hot” medicines, were analyzed. Based on the constructed CM fingerprint database, a retrieval scheme is studied to build a predictive identification model used to classify the cold-hot nature of CMs. According to the experimental results, this prediction model is best for the identification of the UV spectrum of petroleum ether solvents; compared with existing classical models, the constructed model has better predictive stability and extrapolation. The proposed prediction model is proven to be effective.

As one of the core theories of traditional Chinese medicine (TCM), the nature theory of Chinese medicine (CM) has been widely explored and studied in recent years. Under this theory, CMs are divided into four characteristics—cool, cold, hot, and warm—and the cold-hot nature is important (1,2). Treating a “hot” syndrome with a medicine with a “cold” nature, or treating a “cold” syndrome with a medicine with a “hot” nature illustrates that cold-hot medicine nature theory is an important basis for TCM treatment in recovering the balance between so-called yin and yang in the human body, and is of great significance for guiding the clinical practice of TCM (3).

The study of CM cold-hot nature has attracted a lot of attention recently. Jin and associates (4) constructed a “three-element” mathematical analysis model to study cold-hot medicine nature and to explore the biological performance of Chinese materia medica. Zhao and associates (5) introduced a novel cold/hot plate differentiating assay method to study the cold and heat properties of traditional CMs. Wan and colleagues (6) compared the intervention effects of four CMs with a cold or hot property on body temperature and temperature-sensitive transient receptor potential ion channel proteins of rats with yeast-induced fever. Liang and colleagues (7) integrated bioinformatics analysis and chemical structure analysis methods to identify the biological activity of Chinese herbal medicine properties. Wang and associates (8) utilized a self-organizing map to classify the cold-hot nature of CMs based on their constituent compounds. Fu and colleagues (9) investigated the presence of anticancer activity displayed by the cold-hot nature of traditional Chinese marine medicine with phylogenetic tree analysis. Fu and associates (10) explored bioinformatics methods to understand the cold, hot, and neutral nature of CMs.

The main basis for determining the cold or hot nature of any given medicine is the bioactivity of the CM and the material composition within the CM that determines its bioactivity. Therefore, material composition is the basis for the production of the cold or hot nature of a CM. At present, research on cold-hot nature is focused on the interrelationship between cold-hot medicine nature and material composition within CMs. Methodologies involving the construction of CM nature discriminant models, such as chemical fingerprinting techniques, are often used to analyze the relationship between material composition and prediction of the cold-hot nature of CMs.

CM cold-hot nature discrimination with artificial intelligence first utilizes chemical fingerprint technology to determine the original effects of a CM or the metabolomics method to represent the features of a CM, and then constructs an artificial intelligence prediction model to identify the cold or hot nature of CMs for which the cold or hot nature is unknown. This is of great significance to the development of new resources for TCM. Li and associates (11) analyzed 1725 types of CMs and represented the features of CMs with 150 original efficacy characteristics to build an artificial neural network (ANN) classification model to predict the cold or hot nature of CMs with an unknown cold or hot nature. Subsequently, Liu and associates (12) and Zhang and colleagues (13) explored partial least squares and principal component analysis-linear discriminant analysis models to identify the cold or hot nature of CMs. Nie and associates (14) utilized metabolomics data to characterize the features of CMs, and obtained better forecasting results in predicting the cold-hot nature with a random forest model. Long and colleagues (15) built a combination system for predicting the cold or hot nature of CMs that analyzed 284 CMs, and used a support vector machine (SVM) as a predictive model. Li and associates (16) combined nuclear magnetic resonance spectroscopy of proton ¹H-NMR with pattern recognition techniques for cold-hot nature classification of Chinese medicinal herbs.

Thus, as we see, there are many studies that aim to identify or predict the cold or hot nature of CMs. However, most of these studies have adopted the classical and general artificial intelligence algorithms instead of building a proprietary prediction model for CM characteristic data, leading to insufficient forecasting results. In this study, the interrelationship between cold-hot medicine nature and the material composition of CMs is explored. An artificial intelligence prediction model that conforms to the characteristics of CM data is constructed to identify the cold or hot nature of CMs for which the cold or hot nature is unknown. First, UV fingerprint similarity is studied to pro- pose a retrieval scheme. Second, a predictive identification model is built with the proposed retrieval scheme. Finally, a number of experiments are used to verify the feasibility of the predictive model.

Materials and Methods

CM Data set

The UV spectral data of this study come from the 973 program ”Research of Basic Theories of CM,” The project selected 61 representative CMs with clear cold-hot nature, in which 30 CMs are “cold” medicines and 31 CMs are “hot” medicines. For example, Menthae haplocalycis herba and Platycladi cacumen are cold medicines; Citri reticulatae pericarpium and Atractylodis rhizome are hot medicines. Their natures are determined by referring to the Chinese Materia Medica or Shen Nong’s Herbal Classic. The project measured the absorbance of all 61 CMs at the wavelength range of 190–400 nm. UV spectral data were measured with four different solvents (chloroform, distilled water, absolute ethanol, and petroleum ether). Figure 1 shows the UV absorption spectra of Menthae haplocalycis herba with four different solvents.

UV Fingerprint Similarity

Correlation similarity of random variables is usually represented by the correlation co-efficient; this includes Pearson’s correlation and Spearman’s correlation. Pearson’s correlation has been widely used in correlation calculations of functional connectivity of brain regions and similarity measurement of chemical fingerprints, such as spectra and chromatograms (17,18). This study intends to use Pearson’s correlation to calculate the similarity of the UV absorption spectra of different CMs. Higher values of Pearson’s correlation coefficient (close to 1.0) indicate that the corresponding material composition of CMs might exhibit higher similarity, which also implies that it may be the same for the cold-hot nature of CMs.

Pearson’s correlation is the most commonly used statistical calculation method that reflects the linear correlation between two variables. Correlation coefficient is used to measure the correlation between two variables, the well-known formula is as follows:

where x, y are two n-dimensional variables, x = (x₁, x₂,...x_n), y = (y₁, y₂,...y_n); x, y represent the mean of the variables; cov(x,y) indicates the covariance of the two variables; and σ_x, σ_y represents the standard deviation of x,y, respectively. It can be seen from the formula that the correlation coefficient is the quotient of covariance and standard deviation of the two variables.

A Retrieval Scheme (RS) for Cold-Hot Medicine Nature Identification

By using the Pearson‘s correlation coefficient (PCC) to measure the similarity of UV spectra, a retrieval scheme based on the similarity measurement is proposed for cold-hot medicine nature prediction. For a CM where we do not know its cold-hot nature, we first measure the absorption degree of UV spectrum, and then calculate the similarity of the UV spectrum between this query CM and the CMs with a clear cold-hot nature in the database. The obtained PCCs are arranged from small to large to search for the “most similar” reference CMs in the data set. The K “most similar” CMs correspond to the reference CMs, with largest PCCs to the query CM. A cold nature probability value is calculated to measure the degree of cold nature of the query CM—the ratio of the sum of the PCCs of cold nature medicines to the sum of the PCCs of the K “most similar” CMs. The formula is as follows (with C being the number of cold nature medicines and H being the number of hot nature medicines):

With this retrieval scheme, when giving a threshold of 0.5, if is above 0.5, we classify this queried CM as cold; otherwise, it is deemed to be classified as hot.

Performance Assessment

To demonstrate the effectiveness and feasibility of the abovementioned methods, extensive experiments were constructed to assess the prediction performance of the retrieval scheme in terms of stability accuracy and extrapolation accuracy for our algorithm and state-of-the-art prediction models, including ANN (19), extreme learning machine (ELM) (20), and SVM (21). ANN attempts to mimic the neural network of the human brain from the perspective of information processing, to establish a simple model that forms different networks according to different nodes and connections. The ELM algorithm is developed from the ANN algorithm; and in the classification process, ELM can project data from multidimensional space to label space, classifying the samples with similar descriptors into the same class. SVM is a binary classification model; the basic idea of SVM learning is to solve the separation hyperplane that divides the training data set correctly, and has the largest geometric interval. The experiments of performance assessment are in the context of the CM data set. The application allows a researcher to examine a CM substance with an unknown cold-hot nature by retrieving and studying similar UV spectra of CMs with a clear cold-hot nature before determining the true nature of the substance being classified. We first compared the medicine nature identification of UV spectra with different solvents, and then evaluated the model’s performance, including stability evaluation and extrapolation evaluation and comparing them with other prediction models.

There are two metrics used in our experiments to assess the performance of the proposed prediction model. The first metric, stability evaluation, is computed by the leave-one-out method (21) in the entire data set. The detailed steps are as follows: First, for the UV spectral data of 61 CMs in the database, one CM is selected as the testing data set, and the remaining 60 CMs in the data set serve as the training data set. Next, the top K most similar CMs are retrieved by computing UV spectra similarity between the query CM and each of the reference CMs in the training data set. Then, the cold nature probability of the query CM is calculated. The above process is repeated until the cold nature probability value of each CM is calculated. As a result, 61 cold nature probabilistic values are obtained. Stability evaluation applies the receiver operating characteristic (ROC) curve and prediction accuracy (ACC) to evaluate the proposed prediction model. The ROC curve is generated by varying the threshold of the cold nature probability. The area under the ROC curve (AUC) is computed to evaluate the prediction model. The larger the area, the more stable the model. The result of ACC illustrates the probability of correctly classified cold-hot nature of CMs. The formula of ACC is as follows:

M is the number of cold CMs, and N is the number of hot CMs; rank₁ is the sequence number of the ith CMs after sorting the cold probability from small to large.

The second metric, extrapolation evaluation, indicates the extent to which cold nature CMs can be detected on basis of the CMs that are retrieved by RS. In these experiments, 61 CMs are randomly divided into the training set and the testing set. The raining set contains 40 CMs, with cold nature CMs and hot nature CMs each making up an equal portion. The remaining 21 CMs are used for testing. With the testing data, we search for the most similar K CMs from the training set, and calculate the cold nature probability values. These probabilities are applied to calculate the ROC curve and ACC value that are used for extrapolation evaluation. The experiments are repeated 10 times with randomly selected training data sets, and the experimental results are taken as the mean and variance of 10 experiments.

Results

Performance Comparison with Different Solvents

As a fingerprint of CM, UV spectroscopy can identify complex substances in CMs, and be used to predict CM nature. This helps to quantitatively study the relationship between substance composition and medicine property.

In this study, the UV spectra of different solvents (chloroform, distilled water, absolute ethanol, and petroleum ether) were substituted into the constructed prediction model to analyze the influence of UV spectroscopy on the identification of medicine nature with different solvents. Spearman’s correlation of UV spectra with petroleum ether solvent is calculated as a comparative reference (denoted as “Spearman’s + PE”). Figure 2 shows the AUC curves for the medicine nature identification of the UV spectra with four solvents and Spearman’s + PE. The AUC value is calculated as a function of the number of retrieved similar reference CMs (K), and thus obtains a more comprehensive curve for the performance of the predictive model. From Figure 2, the UV spectra of petroleum ether solvent outperforms that of other three solvents for classification of cold-hot medicine nature, meaning that UV spectra of petroleum ether solvent can better identify CM nature. When K = 7, AUC value of petroleum ether solvent reaches the maximum value of 0.834. CM nature identification with Spearman’s + PE is inferior to that with Pearson’s correlation and petroleum ether solvent when K < 15, but outperforms that with other solvents. This illustrates Pearson’s correlation is more applicable to measure the similarity of UV fingerprints with petroleum ether. CM nature identification rates of absolute ethanol and distilled water solvents are poor; maximum AUC values are only 0.636 and 0.666, respectively. Therefore, UV spectra of absolute ethanol and distilled water solvents cannot be used as characteristic data alone to identify medicine nature. Table I displays the ACC values when AUC values of four solvents are the maximum, respectively. For UV spectra of petroleum ether solvent, the maximum AUC and ACC value are 0.834 and 0.754, respectively. However, model prediction accuracies of other three solvents or Spearman’s are all lower than 0.7, meaning that they are poor for identifying CM nature.

Model Performance Assessment

To comprehensively elaborate and verify the feasibility and stability of the proposed prediction model for identifying the CM nature, this study compares the prediction performance of our proposed model (the retrieval scheme, denoted as “RS”) with that of classical classifiers (SVM, ANN, ELM), or classifiers that have been reported in literatures to predict the CM nature. According to the results of the previous section, UV spectral data with petroleum ether solvent are utilized as the research object. Table II shows the stability evaluation comparison for RS and other three comparative algorithms for CM nature recognition. First, ANN and ELM are poor for classifying cold-hot medicine nature with UV spectral data. Second, prediction performance of SVM is better than that of ANN and ELM. Finally, the stability and recognition performances of the proposed model RS outperform that of the classical prediction models.

For extrapolation evaluation, Table III shows the performance comparison of RS and the baseline methods. According to Table III, the conclusion of extrapolation experiments is consistent with the conclusion of stable experiments. First, ANN and ELM are poor for identifying cold-hot medicine nature with UV spectral data. Second, prediction performance of SVM is higher than that of ANN and ELM, but lower than RS. Finally, the extrapolation performance of RS outperforms that of the classical prediction models.

Table IV displays statistically significant difference of AUC values between RS and other comparative algorithms at the 5% significance level. An unpaired t-test is applied to calculate the p value. According to the table, we can see the p-values between RS and ANN; ELM are lower than 0.01, meaning that they have significant statistical difference. The p-values between RS and SVM is 0.012, meaning that they have a statistically significant difference. In general, the data analysis illustrates that the extrapolation evaluation between RS and other comparative algorithms has a statistically significant difference.

Prediction Examples

Extrapolation evaluation is used to provide prediction examples. Two retrieval sets returned by RS are showed in Table V. The query CM (first row) and their top K = 7 retrieved reference CMs are calculated by PCC. Perfect retrieval reference CMs would result in a ranked order of CMs with a monotonically decreasing PCC metric. In the first column, for the query hot nature CM Fructus Piperis Alba, seven retrieved reference CMs are all hot nature. Its cold nature probability is 0, indicating the query CM Fructus Piperis Alba is more likely to be hot nature. In the second column, for the query cold nature CM Dianthi Herba, six retrieved CMs are cold nature and one is hot nature; its calculated cold nature probability is 0.8586, indicating the query CM Dianthi Herba is more likely to be cold nature. According to the prediction examples, it can be inferred that the similarity of UV spectra can characterize the similarity of cold-hot medicine nature. All CM retrieval results are provided in Table V, according to the leave-one-out method.

Overall Prediction Performance

We evaluated the total prediction performance of RS using a leave-one-CM-out method with a threshold of 0.5. The prediction of ROC curve is shown in Figure 3. The AUC value is 83.4%. The total classification accuracy is 75.4% (46/61). Both kinds of the medicines obtained approximately 75% identification rates; 23.3% (7/30) of cold nature medicines are misclassified as hot nature medicines, whereas 25.8% (8/31) of hot nature medicines are misclassified as cold nature class. Overall, the results illustrate relatively good prediction performance for a complex classification problem.

Discussion

This study investigates the feasibility of identifying CM nature with a retrieval scheme based on the similarity of UV spectral data. We demonstrate that it is a reasonable method to determine the cold-hot medicine nature according to medicine nature of the similar CMs. This study has many unique characteristics and experimental observations. First, to realize CM nature prediction, a 61 reference CM UV spectral data set is assembled in which each of the CM is clearly identified as cold or hot nature. The conclusion is that it is feasible to discriminate CM nature.

Second, cold-hot nature is an important part of CM nature theory. In this study, we explore the interrelationship between cold-hot medicine nature and material composition within CM. The UV spectra are used to represent material composition. Experiment results have demonstrated that material composition within CM is related to cold-hot medicine nature, which can be used for medicine nature-type identification and classification. Meanwhile, it can be inferred that material composition is the basis for the manifestation of CM nature.

Third, according to UV spectral data characteristics of CM, we designed a retrieval scheme for predicting cold-hot nature of CM. The PCC value is applied to measure the similarity of UV spectra. Experimental results show that our specific model RS is superior to existing classical models. The potential explanations are as follows: i) The RS model is a classifier designed specifically for UV spectral data of traditional Chinese medicine; ii) On the basis of the theory that material composition is the basis for the manifestation of CM nature, RS fully mines the relationship between material composition and CM nature with high accuracy.

Fourth, in this study, the robustness of the proposed RS model needs to be demonstrated by independent studies to support future clinical applications. For an intelligent forecasting system, the ultimate goal is to aid researchers in reading and interpreting UV spectral data. The system is not sufficiently effective if it has low robustness for an independent testing data set. The proposed RS model has a high robustness if the UV spectral data of other CMs have the identical wavelength range. In the future, other CMs will be collected to verify the robustness of the model.

Despite the advantages discussed above, there are some limitations to our research. First, in this study, UV spectra is used to study material composition within CM. However, infrared (IR) spectra and liquid chromatography technologies can also be used to study material composition within CM. Medicine nature prediction methods with these fingerprint data are one of the emphases for study in the future. Second, we explore the similarity of UV spectra to predict the cold-hot nature of CM. The fingerprint data of CM is high dimensional and small sample data. The study of predictive models that fit the characteristics of such fingerprint data is the focus of follow-up research. Third, since this paper focuses on the similarity of UV spectra, UV spectrum features are not studied in depth. Consequently, fusion with other effective spectrum features will be explored to further improve the medicine nature discrimination performance. Finally, the identification of CM nature based on fingerprints is very complicated. In this study, 61 CMs are extracted UV spectra. However, it is not enough for training a discrimination model. Extraction of more CM UV spectral data is also planned.

Conclusions

In this study, a cold-hot medicine nature predictive model that conforms to the characteristics of TCM is proposed. Unlike the other existing classical classification models, this identification model is based on the retrieval scheme, discriminating the unknown CM nature with the retrieved similar nature CMs. The experimental results show that the model can better predict the cold-hot nature of CMs based on the UV spectral data.

Competing Interests

The authors declare no competing financial interest.

Acknowledgments

The research is supported by the national key basic research development program (973 Program) (No. 2007CB512601); National Natural Science Foundation of China (No. 81473369); Key research and development plan of Shandong province (No. 2016CYJS08A01-1).

References

(1) J. Gao and C. Chen, J. Shanghai Univer. Trad. Chin. Med. 21, 16–18 (2007).

(2) B. Ouyang, Z.G. Wang, and P. Wang, J. Beijing Univer. Trad. Chin. Med. 29, 592–594 (2006).

(3) S. Li, Z.Q. Zhang, and L.J. Wu, IET Syst. Biol. 1, 51–60 (2007).

(4) R. Jin, B. Zhang, X.-Q. Liu, S.-M. Liu, X. Liu, L.-Z. Li, Q. Zhang, and C.-M. Xue, J. Chin. Integr. Med. 9, 715−724 (2011).

(5) Y. Zhao, L. Jia, J. Wang, W. Zou, H. Yang, and X. Xiao, Pharm. Biol. 54, 1298–1302 (2016).

(6) H.-Y. Wan, X.-Y. Kong, X.-M. Li, H.-W. Zhu, X.-H. Su, and N Lin, China J. Chin. Materia Medica 39, 3813−3818 (2014).

(7) F. Liang, L. Li, M. Wang, X. Niu, J. Zhan, X. He, C. Yu, M. Jiang, and A. Lu, J. Ethnopharmacol. 148, 770−779 (2013).

(8) M. Wang, L. Li, C. Yu, A. Yan, Z. Zhao, G. Zhang, M. Jiang, A. Lu, and J. Gas- teiger, Mol. Inform. 35, 109–115 (2016).

(9) X. Fu, X. Song, X. Li, K.K. Wong, J. Li, F. Zhang, C. Wang, and Z. Wang, Alternat. Med. 2017, 1–10 (2017).

(10) X. Fu, L.H. Mervin, X. Li, H. Yu, and J. Li, J. Chem. Inf. Model. 57, 468–483 (2017).

(11) Y. Li, X. Li, F.-Z. Xue, and Y.-X. Liu, J. Shandong Univ. (Health Sciences) 49, 57–61 (2011).

(12) W.-H., Liu, Y. Li, Y.-J. Ji, P. Wang, Y.-Q. Zhang, and F.-Z. Xue, J. Shandong Univ. (Health Sciences) 50, 151–154 (2012).

(13) X.-X. Zhang, Y. Li, Y.-J. Ji, P. Wang, Y.-Q. Zhang, and F.-Z. Xue, J. Shandong Univ. (Health Sciences) 50, 143–146 (2012).

(14) B. Nie, Z.-L. Hao, B. Gui, Z. Wang, J.-Q. Du, G.-L. Wang, and X. Zhang, J. Jiangxi Univ. Trad. Chin. Med. 27, 82–86 (2015).

(15) W. Long, P. Liu, J. Xiang, X. Pi, J. Zhang, and Z. Zou, Comput. Methods Programs Biomed. 101, 253–264 (2011).

(16) H. Li, Q. Xu, J., Zhang, and H. Xu, 2017 International Conference on Medical Science and Human Health (MSHH 2017) 174–179 (2017).

(17) L. Geerligs, C. Can, and R.N. Henson, NeuroImage 135, 16–31 (2016).

(18) J.H. Christensen, J. Mortensen, A.B. Hansen, and O. Andersena, J. Chromatogr. A 1062, 113–123 (2005).

(19) L. Tao, T. Runtao, Y. Xinlan, L. Sun, Y. He, P. Xie, S. Ma, J. AOAC Int. 102, 720–725 (2019).

(20) E. Malar, A. Kandaswamy, D. Chakravarthy, and A.G. Dharan, Comput. Biol. Med. 42, 898–905 (2012).

(21) G. Wei, H. Ma, W. Qian, and M. Qiu, Curr. Med. Imaging Rev. 13, 210–216 (2017).

Guohui Wei, Xianjun Fu, Zhenguo Wang, and Honglei Zhou are with the Shandong University of Traditional Chinese Medicine, in Jinan, China. Direct correspondence to: bmie530@163.com or zhenguow@126.com ●