Cross-Algorithm Validation for Predicting Body Constitutions by Life-Style Towards Bias Recovery

Guang SHI


As a candidate of complementary medical fields, traditional Chinese medical theory appears scientific and gentle effects on the diagnosis, prediction, and healthcare. In such a theory, the body constitutions (BCs) of individuals are widely considered as the most important indexes for diagnosis and treatment.

This research aims at precisely categorizing individuals into specific BCs out of nine types from the evidence of physical indexes; predicting the BCs from the life-style; offering the health guidance on the life-styles for recovering the so-called “biased” BCs to the healthy status known as the “Gentle BC”. On the basis of the original questionnaire including 254 life-style features, the ML algorithms including RF, PLS, LASSO and a new scheme of pair-wise classification are employed for predicting the BCs over the population of 851 persons. Moreover, the principle features (PFs) of life-style are identified to recover the biased BCs into the gentle constitutions as the health guidance.

From the prediction results, a maximum correction rate of 88.7%, 40.9% , 69.9%, 94.6% are achieved by RF, PLS, LASSO, and our proposed pair-wise classification algorithm, which indicates the developed life-style questionnaire is effective. Meanwhile, the principle features identified by above algorithm are extracted with an average number of 17. The total appearance of PFs of PLS, LASSO and pair-wise LASS are 126, 84, and 77, respectively. By a cross-algorithm validation, most commonly identified principle life-styles are illustrated for each biased BC. Furthermore, the health care guidance is suggested for the bias recovery to the gentle constitution. As the conclusion, the ML algorithms are trustable and fair for explicit applications, but sensitive to implicit applications such as life-style or body constitutions. Differed algorithms do not lead to incorrect clinical explanations but the quality of application might be quite different. Then, it is necessary to carefully choose algorithms (even multiple validations) for analyzing implicit medical data; and the pre-process before any MLs is suggested.