Title |
Classification of Kidney Cancer Data based on Feature Extraction Methods |
Authors |
손호선(Ho Sun Shon) ; 김경옥(Kyoung Ok Kim) ; 차은종(Eun Jong Cha) ; 김경아(Kyung Ah Kim) |
DOI |
https://doi.org/10.5370/KIEE.2020.69.7.1061 |
Keywords |
Kidney cancer; LASSO; PCA; Data mining |
Abstract |
Recently, Numerous data mining methods in the bioinformatics field have been developed for processing biodata. We extracted significant genes (60,483 of gene expression data from TCGA) for the prognosis prediction of 1,157 patients using gene expression data from patients with kidney cancer and applied classification methods based on data mining. Significant genes were extracted using least absolute shrinkage and selection operator (LASSO) and principal component analysis (PCA), and classification accuracy and performance were compared using a classification algorithm. Combined clinical data from patients with kidney cancer and gene data were used to determine the optimal classification model and estimate classification accuracy as risk factors by sample type, primary diagnosis, tumor stage, and vital status representing the state of patients. Classification accuracy based on sample type showed the best performance, particularly for the logistic regression and support vector machine algorithms. These results can be applied to extract biomarkers for prognosis prediction of kidney cancer from various causes and for preventing kidney cancer and early diagnosis. |