1. Introduction
Kidney cancer is among the ten most common cancers worldwide (1), (2); unfortunately, it is hard to detect early through normal clinical means. It is not
a single disease; instead, it comprises different histologically and genetically distinct
types of cancer, each with its own histologic type, which in turn has its own clinical
course and therapy responses (3), (4). The Cancer Genome Atlas (TCGA) Research Network has conducted a series of comprehensive
molecular characterizations in distinctive histologic types of kidney cancers (5).
Kidney cancer is the 6th most frequent cancer in males and the 10th in females, representing
5% and 3% of all new cases, respectively (6). Gender disparities in kidney cancer incidence have been reported, with a higher
incidence and worse outcome in males (7). Over half of all people aged 50 have cysts, which are fluid-filled and are usually
benign (noncancerous) and do not need treatment (8). Solid tumors of the kidney are rare; however, approximately three-quarters of these
tumors are cancerous, with a potential to spread (9). According to the Centers for Disease Control (CDC), in the United States in 2014,
black men were the most likely to get kidney cancer (24.7 per 100,000), followed by
white men (22.0 per 100,000). Among women, African-American women are the most likely
to get kidney and renal pelvis cancers (12.4 per 100,000), followed by Hispanic women
(11.9 per 100,000) (10). Studies suggest that the distribution of kidney cancer subtypes differs between
racial groups (11), (12). Race and ethnicity cause inter-tumoral heterogeneity in cancers, ranging from disease
incidence, morbidity, and mortality rates to treatment outcomes (13), (14). Therefore, the identification of population-specific molecular biomarkers is essential
(15).
Identifying genes that contribute to the prognosis of cancer patients is one of the
challenges faced while providing appropriate treatment for patients. The critical
challenges in bioinformatics are searching for biomarkers that represent the state
of patients and predicting the prognosis of cancer patients. The number of gene data
is enormous compared with the number of patients, making it challenging to analyze
it. To solve these problems, significant genes that represent the state of patients
must be extracted. In addition, developing a classification model from the extracted
genes may be helpful for early diagnosis and prediction of the prognosis of cancer
patients. Cancer is caused by gene variation, damaging genes regulating cell replication
in a predetermined order; thus, cells multiply unlimitedly. Therefore, the cells invade
adjacent normal tissues and are transferred to the whole body. Because cancer stem
cells from mutated genes, they are thought to be a genetic disorder, although only
a small number of cancers are genetic. In the case of a mutation in a reproductive
cell, the mutation is transferred over generations, and it exists in the whole somatic
cell (16). To predict the state of patients, researchers applied deep learning techniques for
analyzing the mutation in a sequence and, studies have accurately predicted major
mutations that cause diseases such as spinal muscular atrophy, hereditary non-species
colon cancer, and autism (17).
Kidney cancer is a primary tumor stemming from kidney and renal cell carcinoma; it
is a malignant tumor that accounts for over 90% of cases (2), (18). Because kidney cancer has no symptoms in the early stages, there is often a progressive
step at the time of discovery. According to the national cancer registration statistics
published in 2020, among the 243,837 cases of cancer in 2018, 5,456 were attributed
to kidney cancer, accounting for 2.2% of all cancer cases. By gender, kidney cancer
ranked eighth with 3.0% (3,806 cases) of all male cancers (19). In addition, the symptoms and treatment of kidney cancer decrease patients’ quality
of life by increasing the disease burden and medical costs. Risk factors for kidney
cancer include environmental habits, living factors, genetic factors, and existing
kidney diseases. Among them, smoking, obesity, high blood pressure, and eating habits
can be the causes associated with living factors (20). Recently, researchers conducted research to extract features using genetic data
from kidney cancer patients and apply classification algorithms through neighborhood
component analysis methods (21). Furthermore, we used big data from a large cohort (KOTCC database) of kidney cancer
patients collected from eight domestic medical institutions to extract variables affecting
kidney cancer recurrence. We applied a machine learning algorithm to predict recurrence
within five years of surgery (22).
In this study, we propose a method to extract genes that affect prognosis prediction
in kidney cancer patients using a deep learning algorithm and apply a classification
algorithm based on the extracted genes to predict the prognosis of cancer patients.
We combined gene expression data and clinical data from kidney cancer patients obtained
from TCGA portal sites to extract genes that contribute to patient prognosis and applied
classification techniques to present their utilization (23). Next, we selected gender (male, female), sample type (primary cancer, normal), and
race (white, black, Asian) as the target variables for analysis. Notably, we extracted
genes from kidney cancer patients based on gender, sample type, and race to overcome
heterogeneity and extract genetic biomarkers that could allow a more accurate prognosis
prediction. After testing the functionality of genes, we presented their applicability
and developed the optimal prediction model by comparing and analyzing classification
algorithms using extracted genes.
2. Related works
Machine learning and deep learning algorithms are being applied to various analyses
of biological data. Some studies predicted the risk of 20 cancers by applying machine
learning techniques and artificial intelligence methods to genetic big data analysis
(24). The Bayesian classifier has been applied to the problem of classifying proteins
that have sequence and structural information, and studies have also used the Bayesian
network to combine various details related to proteins and genes with improving the
predictive performance for gene function (25). As such, different machine learning technologies are being applied for the analysis
of biological data. In a study utilizing the TCGA-KIRC database, they used CT and
MRI scan data and clinical data of 227 kidney cancer patients were used to predict
the classification accuracy of the cancer stage by applying deep learning (26). In another study, significant gene extraction from kidney cancer data in the TCGA
was performed using a deep autoencoder compared to the traditional methods such as
least absolute shrinkage and selection (LASSO). The predictive accuracies of classification
were compared with the conventional state-of-the-art classification methods and analyzed
(27). Researchers integrated data of various cancer patient types, conducted the analysis
using AE structures, presented their availability in clinical applications, and suggested
ways to efficiently perform posterior inference via stochastic variational inference
and learning algorithms in the presence of posterior probability distributions and
continuous latent variables for extensive data (28). An AE is a neural network in which the output is set to input x to extract features.
By learning how to reconstruct an input, the AE extracts basic or abstract properties
that facilitate the accurate prediction of the information. In principle, a linear
AE with a single hidden layer in a multi-layer perceptron is the same as principal
component analysis (PCA) (29), (30). More generally, nonlinear autoencoders have been studied to extract key properties,
including high-level features and Gabor-filter features (31), (32).
The variational autoencoder (VAE) is a model that, given training data as a generative
model, produces new data with a sampled value in the same distribution as the actual
distribution of the training data. An AE is a model that compresses high-dimensional
input data into smaller representations in the stochastic form. Unlike the conventional
AE, which maps inputs to latent vectors, VAE maps input data to parameters in the
identical probability distributions as the mean and variance of Gaussian distributions.
This method produces structured latent spaces and is therefore helpful for image generation
(33-35).
Supervised autoencoder (SAE) is a neural network that jointly predicts the targets
and inputs (reconstruction). For a single hidden layer, this simply means that a classification
loss is added to the output layer. The innermost layer has a classification loss added
to the layer for a deeper AE, which is usually handed off to the supervised learner
after training the AE. The SAE uses unsupervised auxiliary tasks to improve the generalization
performance (36-38).
The CVAE is a modification of existing VAE structures that enable supervised learning,
which considers category information when learning data distributions in the form
of added class label y to encoders and decoders. CVAE is a deep conditional generative
model for structured output prediction using Gaussian latent variables. The model
has efficiently trained in the stochastic gradient variational Bayes framework and
allows fast prediction using stochastic feed-forward inference (39-41).
We validated the performance of the proposed framework by comparing it with the traditional
data mining and classification methods. The proposed framework employs the various
AE-based deep learning techniques by taking advantage of pre-training and fine-tuning
strategies. The experimental results show that the AE-based deep learning methods
show better performances than the combinations of traditional data mining and classification
methods.
3. Methodologies
3.1 Architecture
The main challenge faced by general analysis is the characteristic of genetic data
because they have more gene expression values than the number of samples. We propose
a novel deep learning–based framework by combining the various AE-based techniques
for cancer analysis and compared with the existing feature extraction methods—PCA
and NMF—and demonstrate its superiority. The following section describes a pre-training
method for auto-encoder-based feature extraction. Proper training of neural networks
requires a large amount of learning data; however, often, we have a small quantity
of labeled learning data and large amounts of unlabeled learning data. In this case,
unlabeled learning data are used to pre-train each layer of a neural network called
unsupervised pre-training. AE and VAE only have reconstruction loss in pre-training,
but SAE and CVAE also include classification loss. Once the parameters for each layer
have been determined to some extent, the classification performance can be improved
through fine-tuning using labeled learning data.
In particular, feature extraction was first performed to compare traditional classification
algorithms with deep learning techniques. We used conventional dimension reduction
techniques such as principal component analysis (PCA) and non-negative matrix factorization
(NMF) followed by state-of-the-art classification algorithms. And we used deep learning
techniques such as autoencoder (AE), variational autoencoder (VAE), conditional autoencoder
(CAE), and conditional variational autoencoder (CVAE), followed by a neural network
classifier. For significant gene selection using traditional classification algorithms,
we used PCA and NMF, solved the data imbalance problem, applied various classification
techniques, and compared and analyzed the results. For deep learning–based significant
gene selection, we used all the improved algorithms based on the AE algorithm. We
compared the extracted genes, and the classification accuracy was analyzed using a
multi-layer perceptron (MLP).
3.1.1 Autoencoder
The AE is a deep learning structure for efficiently coding data. Coding refers to
compressing data; in other words, dimensionality reduction is the transformation of
data from a high-dimensional space into a low-dimensional space to efficiently represent
some data. The neural network architecture of AE has the same input and output and
can be represented by Fig. 1 as a symmetrically constructed structure. Because dimensionality reduction is the
goal in our study, we take some data X and obtain the node value Z of the hidden layer
as a combination of the weighted multiplication and sum and the activation function,
which we call the encoder.
그림. 1. 오토인코더의 아키텍쳐
Fig. 1. Architecture of autoencoder
AE has the same structure as MLP, except that the input and output layers have the
same number of neurons. Because AE reconstructs the input, the output is called reconstruction,
and the loss function is calculated using the difference between the input and reconstruction.
When learning AE, it follows unsupervised learning, and the loss is used as the maximum
likelihood (ML). Once the hidden layer Z parameters have been determined to some extent,
we can use the labeled learning data can be used to perform supervised fine-tuning.
3.1.2 Variational autoencoder
A VAE combines the input data X with the mean (μ) and variance (σ2) (two vector outputs)
through the encoder to create a normal distribution. That allows the sampling to create
latent vector Z to pass through the decoder to produce new data similar to any existing
input data. Therefore, the VAE is a generative model developed generate new data using
probability distributions. The structure of the VAE is shown in Fig. 2.
We used the ideal sampling function posterior (to sample, allowing generators to learn
the input data well. Equation (1) is used to make the value generated by the sampling function equal to the input value.
The maximum likelihood estimation that maximizes the value of (1) shows how well the
reconstruction restores data like the input data when the Z vector (latent vector)
extracted from the ideal sampling function is given.
그림. 2. 변형 오토인코더의 아키텍쳐
Fig. 2. Architecture of a variational autoencoder
Finding an optimal formula that satisfies these conditions results in evidence lower
bound formula that fulfills the above conditions when X is given to the network as
evidence.
The first term of (2) is the reconstruction term, indicating how well it is restored
from the ideal sampling function. The second term is a regularization term, which
makes the perfect sampling function the same as the prior as possible. The conditions
are given to sample values like priors among several samples. The third term represents
the distance between the two probability distributions, the distance between the ideal
sampling function $(q_{\Phi}(z |x))$ and the sample function $p(z |x)$.
3.1.3 Supervised autoencoder
An SAE is an AE with the addition of a classification loss to the representation layer.
For a single hidden layer, this means that a classification loss is added to the output
layer. For a deeper AE, the innermost layer would have a classification loss added
to the layer, usually handed off to the supervised learner after training the AE,
which is explained in Fig. 3.
그림. 3. 감독 오토인코더의 아키텍쳐
Fig. 3. Architecture of the supervised autoencoder
3.1.4 Conditional variational autoencoder
The CVAE is a modification of existing VAE structures which enables supervised learning.
CVAE adds a class label y to the encoder and decoder, considering the category information
when learning data distributions. Thus, in CVAE, a particular condition is given and
added to the encoder and decoder if the label information is known. The y-value is
given along with x to find the latent vector z in the encoder. Similarly, the decoder
can represent the y-value that generates data as follows. Therefore, the loss function
is represented by the reconstruction loss and classification loss. The form is shown
in Fig. 4.
그림. 4. 조건부 변형 오토인코더의 아키텍쳐
Fig. 4. Architecture of the conditional variational autoencoderㄴ
3.2 Classifier
To establish a classification model, we employ a multilayer perceptron (MLP) classifier
followed by the various autoencoder-based techniques. A multilayer perceptron is a
neural network connecting multiple layers in a directed graph, which means that the
signal path through the nodes only goes one way. Each node, apart from the input nodes,
has a nonlinear activation function. An MLP uses backpropagation as a supervised learning
technique.
3.3 Training
3.3.1 Generative pre-training
The number of samples for a given phenotype prediction task is generally small; however,
many other gene expression profiles unrelated to this phenotype are available. These
profiles were grouped to form a large dataset of samples without labels. This unlabeled
dataset cannot predict the phenotype but helps construct a hierarchical representation
of gene expressions in the neural network. The idea is to find nonlinear combinations
of inputs that provide functional patterns for gene expression analysis. The unlabeled
dataset is used to initialize the weights of the MLP before supervised learning. We
pre-trained the AE models iteratively for each hidden layer to learn a denoising AE
that reconstructs the previous layer’s output.
In the current setting, a generative approach is an approach that provides a training
dataset; that is, the empirical distribution can generate synthetic observations that
should exhibit the essential structural properties observed in the empirical distribution.
The VAE and CVAE generative training strategies ultimately result in a pre-trained
model with a good understanding of representation. It can generate the correct features
of the given data well.
3.3.2 Fine-tuning for classification
Fine-tuning involves tuning the parameters pre-trained with large-scale data using
small-scale data. We fine-tuned the encoder of the pre-trained VAE and CVAE pre-trained
with an imbalanced large amount of data. We added a supervised neural network classifier
after the encoder of the VAE and CVAE, ignoring the decoder part. With model loss
and cross-entropy, we also trained the model using the Adam optimizer to update the
model’s weights.
4. Experiments
4.1 Dataset
TCGA has collected cancer data from various platforms worldwide and has produced a
dataset of immeasurable values using standardized analysis methods. These data were
obtained through TCGA’s data portal. In this study, we collected 1,157 kidney cancer
samples from TCGA. We used the transcription profiling file’s data format and contained
both case files and clinical information files for the samples. Next, we combined
clinical, expression, and case data into a single file based on the case ID and file
name using Python. Therefore, the dataset was analyzed using 1,157 samples and 60,483
gene expression data from patients. The frequencies of the target variables are shown
in Table 1 below. To solve an imbalanced problem, we applied AE-based nonlinear data transformation
and generation techniques during training.
The samples classified by gender were 407 women (35.2%) and 750 men (64.8%). The samples
classified by race were as follows: 940 white (81.2%), 150 black or African Americans
(13.0%), 17 Asians (1.5%), and the remaining 50 were not reported (4.3%). In the sample
types, 1,010 cases had a primary tumor (87.3%), 139 people had a solid tissue normal
(12%), and the remaining had missing values.
표 1. 클래스 레이블(종양, 성별, 인종)에 따른 빈도
Table 1. Frequency according to class label(tumor, gender, race)
|
Frequency
|
Percentage
|
Cumulative (%)
|
Primary Tumor
|
1010
|
87.9
|
87.6
|
Solid Tissue Normal
|
139
|
12.1
|
100.0
|
Total
|
1149
|
100.0
|
|
Female
|
407
|
35.2
|
35.2
|
Male
|
750
|
64.8
|
100.0
|
Total
|
1157
|
100.0
|
|
Asian
|
17
|
1.5
|
1.6
|
Black or African-American
|
150
|
13.6
|
14.6
|
White
|
940
|
84.9
|
100.0
|
Total
|
1107
|
100.0
|
|
4.2 Overall analytical structure
We leveraged the integrated data obtained from TCGA to conduct classification analysis
on gene expression data using traditional classification techniques and deep learning–based
MLP. Fig. 5 shows the overall framework used by traditional classification algorithms. We calculate
the interquartile range for outlier detection, a widely used technique that helps
find outliers in continually distributed data. We used IQR in our preprocessing because
it is a reasonably robust measure of variability. Besides, it is not affected by outliers
since it uses the middle 50% of the distribution for calculation and is computationally
cheap. First, we eliminated noise and outliers from the genetic data of kidney cancer
and extracted 5,000 genes via chi-square tests. We performed 5-fold cross-validation
(train (80%) and test (20%)) on 5,000 data samples and used PCA and NMF as data transformation
methods. Subsequently, we utilized SMOTE algorithms to solve the data imbalance problem
for gender, race, and sample type variables. Furthermore, the classification accuracy
of the said variables (race, gender, and sample type) was compared and analyzed by
applying classification algorithms such as KNN, SVM, DT, RF, AB, NB, and MLP.
그림. 5. 신장암 유전자 발현 데이터에 대한 전통적인 분류
Fig. 5. Traditional classification for gene expression of kidney cancer
The classification accuracies of the deep learning techniques for race, gender, and
sample type based on the AE are shown in Fig. 6. In AE-based techniques, we eliminated noise and outliers, and 5,000 genes were extracted
via chi-square tests. We performed 5-fold cross-validation (80% for training and 20%
for testing) on the selected 5,000 features and corresponding data samples followed
by AE, VAE, SAE, and CVAE during the pre-training and training phase. Finally, we
extracted the 100 latent variables. We also solved the imbalanced data problem by
fine-tuning the generative pre-trained encoder for highly imbalanced data. The encoder
and MLP were combined as classifiers to predict the classification accuracy for race,
gender, and sample type.
Compared to Fig. 5, the experiments consist in assessing two different approaches when training the
classification model, allowing fine-tuning of the entire network and embedding the
AEs into the classification network, namely by only importing the encoding layers.
The unsupervised pre-training on the gene expression data and fine-tuning it on specific
tasks affect the classification performance. The experimental results show that autoencoder-based
approaches achieved a higher classification performance than the traditional classification
approaches as reported in the next sections.
그림. 6. 신장암 유전자 발현 데이터에 대한 오토인코더 기반 분류
Fig. 6. Autoencoder-based classification for gene expression data of kidney cancer
4.3 Evaluation measures
To evaluate the model’s performance for classification accuracy prediction, we utilized
the precision, recall, and F1-score using a confusion matrix. Precision represents
the true positive ratio of the predicted positive data, and recall represents the
proportion of actual positive data that is predicted well. The F1-score uses the harmonic
mean of precision and recall to compute the mean so that the more imbalanced the data
are, the more penalty is applied, which is close to a smaller value. We compared the
macro-average and micro-average because our target data had an imbalance problem.
The macro-average is used while verifying whether a classifier works well for all
classes. It is used when all the classes of data are the same. The micro-average is
used when the sizes of each class are different; that is when the sizes of the independently
measured confusion matrix are different. Therefore, it can be used more effectively
on datasets with class-imbalance problems. Abbreviations used in the confusion matrix
refer to true positive (TP), false positive (FP), false negative (FN), and true negative
(TN). The following micro-average is used when the number of classes is different;
for example, if the class label is 2, the following forms of micro-precision, micro-recall,
and micro-F1-score can be expressed from equations (3) to (5).
The macro-averaging normalizes the sum of all metrics. Thus, Macro-averaging does
not consider the number of events in each class. Macro-precision, macro-recall, and
macro-F1-score can be expressed using equations (6) to (10).
All experiments were executed on an Intel Xeon E5-2698 v4 @ 2.20GHz, 256GB (CPU),
NVIDIA Tesla V100 32GB (GPU), and Ubuntu 18.04 operation system. We also used the
Scikit-Learn (42) and PyTorch (43) libraries with the Python programming language for all analyses.
5. Results
This section extensively evaluates our approach and compares it with other unsupervised
feature extraction techniques, followed by over-sampling and state-of-the-art classifiers.
We also report on an ablation study we conducted to explore the most significant 20
genes for each clinical information.
Tables 2-4 show the performance comparison among all methods according to gender, race, and
sample type, respectively. The classification performance of values using the micro-average
is superior to that of evaluation metrics using the macro-average. The AE-based methods
achieved higher performance than the conventional feature extraction methods. That
means that AE-based methods can better extract the complexity of cancer and produce
more meaningful features. We used only an MLP classifier for the features extracted
by AE-based methods because of its neural network structure, and we did not use any
sampling for it. When the data imbalance problem was solved using traditional algorithms,
the generative AE-based methods achieved higher performance than when sampling was
performed using SMOTE algorithms.
Table 2 presents the classification results for gender. The results show that VAE achieved
a macro-F1-score of 0.958, and a micro-F1-score of 0.962, indicating higher classification
performance than the other methods. It offers results comparable with other AE-based
methods and improves the highest performance results of conventional PCA+SVM with
SMOTE over-sampling, by 0.021 macro- and 0.02 micro-F1-score, respectively. A gender
disparity exists in the incidence of kidney carcinomas, with more incidence reported
in men (44). Men are at a higher risk of developing kidney cancer and usually have a more aggressive
disease at the time of diagnosis. Females generally show more favorable histological
kidney cancer and have better oncological outcomes than males (45). Extracting valuable features by VAE or other AE-based methods gives deeper information
about gender-related differences in kidney cancer therapy.
표 2. 성별에 따른 분류 성능 평가
Table 2. Classification performance evaluation according to gender
Feature
Extraction
|
Sampling
|
Classifier
|
Micro-
Precision
|
Micro-
Recall
|
Micro-F1
-
score
|
Macro-
Precision
|
Macro-
Recall
|
Macro-
F1-score
|
AE
|
FALSE
|
MLP
|
0.953
|
0.952
|
0.953
|
0.945
|
0.952
|
0.948
|
VAE
|
FALSE
|
MLP
|
0.963
|
0.962
|
0.962
|
0.958
|
0.960
|
0.958
|
CAE
|
FALSE
|
MLP
|
0.950
|
0.950
|
0.950
|
0.945
|
0.946
|
0.945
|
CVAE
|
FALSE
|
MLP
|
0.958
|
0.957
|
0.957
|
0.952
|
0.955
|
0.953
|
NMF
|
FALSE
|
AB
|
0.908
|
0.907
|
0.907
|
0.896
|
0.901
|
0.898
|
DT
|
0.894
|
0.893
|
0.893
|
0.884
|
0.881
|
0.882
|
KNN
|
0.657
|
0.671
|
0.659
|
0.630
|
0.615
|
0.616
|
MLP
|
0.835
|
0.836
|
0.835
|
0.822
|
0.814
|
0.818
|
NB
|
0.804
|
0.798
|
0.784
|
0.808
|
0.740
|
0.753
|
RF
|
0.910
|
0.910
|
0.909
|
0.909
|
0.892
|
0.899
|
SVM
|
0.781
|
0.777
|
0.759
|
0.785
|
0.710
|
0.722
|
TRUE
|
AB
|
0.909
|
0.908
|
0.908
|
0.897
|
0.903
|
0.900
|
DT
|
0.868
|
0.867
|
0.867
|
0.854
|
0.856
|
0.854
|
KNN
|
0.647
|
0.603
|
0.612
|
0.603
|
0.613
|
0.594
|
MLP
|
0.847
|
0.847
|
0.847
|
0.834
|
0.829
|
0.831
|
NB
|
0.815
|
0.813
|
0.805
|
0.814
|
0.768
|
0.780
|
RF
|
0.916
|
0.915
|
0.915
|
0.908
|
0.908
|
0.907
|
SVM
|
0.777
|
0.754
|
0.758
|
0.743
|
0.759
|
0.743
|
PCA
|
FALSE
|
AB
|
0.867
|
0.868
|
0.866
|
0.860
|
0.846
|
0.852
|
DT
|
0.736
|
0.737
|
0.736
|
0.711
|
0.709
|
0.710
|
KNN
|
0.737
|
0.744
|
0.732
|
0.725
|
0.689
|
0.697
|
MLP
|
0.943
|
0.943
|
0.943
|
0.939
|
0.936
|
0.937
|
NB
|
0.657
|
0.677
|
0.639
|
0.640
|
0.585
|
0.578
|
RF
|
0.845
|
0.833
|
0.822
|
0.858
|
0.778
|
0.797
|
SVM
|
0.940
|
0.939
|
0.939
|
0.936
|
0.932
|
0.933
|
TRUE
|
AB
|
0.867
|
0.864
|
0.865
|
0.850
|
0.858
|
0.853
|
DT
|
0.762
|
0.759
|
0.759
|
0.738
|
0.739
|
0.737
|
KNN
|
0.736
|
0.692
|
0.699
|
0.692
|
0.710
|
0.685
|
MLP
|
0.940
|
0.940
|
0.940
|
0.934
|
0.934
|
0.934
|
NB
|
0.756
|
0.764
|
0.748
|
0.746
|
0.708
|
0.712
|
RF
|
0.857
|
0.858
|
0.856
|
0.851
|
0.834
|
0.841
|
SVM
|
0.943
|
0.942
|
0.942
|
0.936
|
0.938
|
0.937
|
Table 3 shows that when the target variable is race, and the label is white, black or African-American,
and Asian, the class label imbalance is very severe. Our data included 940 (81.2%),
150 (13.0%), and 17 (1.5%) white, African-American, and Asian samples, respectively.
Clearly, race prediction is a more challenging task than other clinical prediction
tasks. It shows a much lower macro-averaged performance. The results show that CVAE
achieved a macro-F1-score of 0.763, and a micro-F1-score of 0.959, indicating a higher
classification performance than the other methods. It offers a micro-F1-score that
is comparable with other AE-based methods and improves the highest performance results
of conventional PCA+SVM with SMOTE over-sampling by 0.121 macro- and 0.018 micro-F1-score,
respectively. It also enhances the highest performance results for the AE method by
0.076. We can conclude that CVAE can extract the complexity of cancer and works well
for more complex tasks than other AE-based methods. Surveillance and epidemiology
data indicate that kidney cancer incidence and mortality rates are higher among African-American
patients compared to white patients (46).
White and Asian patients (age 63.9 and 62.6 years, respectively) had a slightly older
age of onset than Black and Native American patients (age 60.7 and 60.3 years) (47). However, feature extraction for racial information is challenging; we can achieve
a higher macro-F1-score (higher than 70%) using the CVAE method.
Table 4 presents the breakdown results for the sample types. The label for the sample type
was 1,010 primary tumors (87.3%) and 139 solid tissue normal (12%), and the remaining
had missing values. The results show that all AE-based methods achieved comparable
results, with macro-F1-score of 0.996 and micro F1-score of 0.998, and a higher classification
performance than the other methods. All AE-based methods improve the
표 3. 인종에 따른 분류 성능 평가
Table 3. Classification performance evaluation of race
Feature
Extraction
|
Sampling
|
Classifier
|
Micro-
Precision
|
Micro-
Recall
|
Micro-F1
-
score
|
Macro-
Precision
|
Macro-
Recall
|
Macro-
F1-score
|
AE
|
FALSE
|
MLP
|
0.953
|
0.961
|
0.956
|
0.743
|
0.660
|
0.687
|
VAE
|
FALSE
|
MLP
|
0.956
|
0.964
|
0.958
|
0.767
|
0.663
|
0.685
|
CAE
|
FALSE
|
MLP
|
0.955
|
0.960
|
0.956
|
0.720
|
0.662
|
0.678
|
CVAE
|
FALSE
|
MLP
|
0.961
|
0.959
|
0.958
|
0.832
|
0.753
|
0.763
|
NMF
|
FALSE
|
AB
|
0.857
|
0.849
|
0.848
|
0.515
|
0.479
|
0.489
|
DT
|
0.874
|
0.880
|
0.877
|
0.530
|
0.517
|
0.523
|
KNN
|
0.790
|
0.844
|
0.809
|
0.480
|
0.402
|
0.416
|
MLP
|
0.845
|
0.873
|
0.852
|
0.505
|
0.456
|
0.468
|
NB
|
0.807
|
0.589
|
0.657
|
0.384
|
0.409
|
0.355
|
RF
|
0.889
|
0.914
|
0.889
|
0.576
|
0.508
|
0.516
|
SVM
|
0.772
|
0.867
|
0.812
|
0.373
|
0.385
|
0.369
|
TRUE
|
AB
|
0.866
|
0.845
|
0.853
|
0.512
|
0.507
|
0.506
|
DT
|
0.871
|
0.842
|
0.855
|
0.508
|
0.515
|
0.509
|
KNN
|
0.814
|
0.622
|
0.685
|
0.421
|
0.535
|
0.405
|
MLP
|
0.847
|
0.857
|
0.851
|
0.608
|
0.518
|
0.540
|
NB
|
0.800
|
0.594
|
0.656
|
0.376
|
0.413
|
0.349
|
RF
|
0.901
|
0.921
|
0.905
|
0.586
|
0.537
|
0.550
|
SVM
|
0.810
|
0.601
|
0.671
|
0.416
|
0.475
|
0.388
|
PCA
|
FALSE
|
AB
|
0.819
|
0.852
|
0.827
|
0.468
|
0.415
|
0.424
|
DT
|
0.822
|
0.836
|
0.828
|
0.457
|
0.433
|
0.442
|
KNN
|
0.826
|
0.858
|
0.816
|
0.510
|
0.378
|
0.387
|
MLP
|
0.933
|
0.946
|
0.937
|
0.626
|
0.590
|
0.604
|
NB
|
0.797
|
0.834
|
0.809
|
0.456
|
0.402
|
0.414
|
RF
|
0.847
|
0.860
|
0.807
|
0.575
|
0.364
|
0.364
|
SVM
|
0.935
|
0.947
|
0.939
|
0.629
|
0.594
|
0.608
|
TRUE
|
AB
|
0.847
|
0.849
|
0.846
|
0.506
|
0.488
|
0.492
|
DT
|
0.825
|
0.799
|
0.811
|
0.448
|
0.472
|
0.456
|
KNN
|
0.817
|
0.669
|
0.722
|
0.412
|
0.478
|
0.408
|
MLP
|
0.940
|
0.950
|
0.945
|
0.625
|
0.610
|
0.616
|
NB
|
0.891
|
0.883
|
0.885
|
0.543
|
0.533
|
0.534
|
RF
|
0.886
|
0.900
|
0.878
|
0.601
|
0.471
|
0.503
|
SVM
|
0.938
|
0.948
|
0.941
|
0.691
|
0.622
|
0.642
|
highest performance results of conventional PCA+KNN, PCA+MLP without any over-sampling,
and PCA+MLP with SMOTE over-sampling, by 0.002 macro- and 0.001 micro-F1-score, respectively.
Survival in patients with kidney cancer can be correlated with the expression of various
genes based solely on the expression profile in the primary kidney tumor (48). Compared to other tasks, distinguishing extracted features is more straightforward,
and predicting sample type is much easier. For sample types, classifiers based on
both traditional techniques and deep learning performed well. Using the AE-based pre-training
algorithm is slightly better, and overall, than the other compared methods. The are
several methods to predict cancer subtypes or sample types using deep learning techniques
on gene expression data (49-51). To the best of our knowledge, methods identifying kidney cancer biomarkers by combining
AE-based methods and model interpretation techniques are still lacking.
In general, unsupervised learning algorithms applied to gene expression data extract
biological and technical signals present in input samples. It is best to compress
gene expression data using several algorithms and many different latent space dimensionalities.
These compressed gene expression features represent important biological signals,
including gender, race, and presence of tumor. We showed, through several experiments
tracking lower dimensional gene expression representations, and supervised learning
performance, that optimal biological features are learned using a variety of latent
space dimensionalities and different compression algorithms.
표 4. 종양 유무에 따른 분류 성능 평가
Table 4. Classification performance evaluation according to tumor type
Feature
|
Sampling
|
Classifier
|
Micro-
Precision
|
Micro-
Recall
|
Micro-
F1-score
|
Macro-
Precision
|
Macro-
Recall
|
Macro-
F1-score
|
AE
|
FALSE
|
MLP
|
0.998
|
0.998
|
0.998
|
0.996
|
0.996
|
0.996
|
VAE
|
FALSE
|
MLP
|
0.998
|
0.998
|
0.998
|
0.996
|
0.996
|
0.996
|
CAE
|
FALSE
|
MLP
|
0.998
|
0.998
|
0.998
|
0.996
|
0.996
|
0.996
|
CVAE
|
FALSE
|
MLP
|
0.998
|
0.998
|
0.998
|
0.996
|
0.996
|
0.996
|
NMF
|
FALSE
|
AB
|
0.989
|
0.989
|
0.989
|
0.970
|
0.978
|
0.974
|
DT
|
0.984
|
0.983
|
0.984
|
0.951
|
0.975
|
0.962
|
KNN
|
0.976
|
0.974
|
0.974
|
0.929
|
0.957
|
0.941
|
MLP
|
0.991
|
0.990
|
0.990
|
0.983
|
0.973
|
0.977
|
NB
|
0.995
|
0.995
|
0.995
|
0.982
|
0.994
|
0.988
|
RF
|
0.995
|
0.995
|
0.995
|
0.991
|
0.984
|
0.988
|
SVM
|
0.964
|
0.963
|
0.960
|
0.963
|
0.863
|
0.897
|
TRUE
|
AB
|
0.992
|
0.991
|
0.991
|
0.974
|
0.986
|
0.980
|
DT
|
0.977
|
0.975
|
0.975
|
0.929
|
0.961
|
0.943
|
KNN
|
0.959
|
0.943
|
0.948
|
0.845
|
0.955
|
0.887
|
MLP
|
0.992
|
0.992
|
0.992
|
0.984
|
0.980
|
0.981
|
NB
|
0.995
|
0.995
|
0.995
|
0.985
|
0.991
|
0.988
|
RF
|
0.994
|
0.994
|
0.994
|
0.987
|
0.984
|
0.986
|
SVM
|
0.981
|
0.978
|
0.979
|
0.931
|
0.978
|
0.952
|
PCA
|
FALSE
|
AB
|
0.993
|
0.993
|
0.993
|
0.987
|
0.980
|
0.983
|
DT
|
0.980
|
0.979
|
0.979
|
0.962
|
0.942
|
0.950
|
KNN
|
0.997
|
0.997
|
0.997
|
0.993
|
0.995
|
0.994
|
MLP
|
0.997
|
0.997
|
0.997
|
0.992
|
0.995
|
0.994
|
NB
|
0.985
|
0.984
|
0.984
|
0.961
|
0.966
|
0.964
|
RF
|
0.987
|
0.987
|
0.987
|
0.989
|
0.949
|
0.968
|
SVM
|
0.996
|
0.996
|
0.996
|
0.988
|
0.991
|
0.990
|
TRUE
|
AB
|
0.991
|
0.991
|
0.991
|
0.980
|
0.979
|
0.979
|
DT
|
0.986
|
0.985
|
0.985
|
0.968
|
0.963
|
0.965
|
KNN
|
0.995
|
0.995
|
0.995
|
0.983
|
0.994
|
0.988
|
MLP
|
0.997
|
0.997
|
0.997
|
0.992
|
0.995
|
0.994
|
NB
|
0.969
|
0.967
|
0.968
|
0.914
|
0.941
|
0.925
|
RF
|
0.993
|
0.993
|
0.993
|
0.993
|
0.974
|
0.983
|
SVM
|
0.996
|
0.996
|
0.996
|
0.988
|
0.991
|
0.990
|
6. Conclusions
We combined kidney cancer clinical data and gene data collected through the TCGA database
to extract significant gender, race, and sample type genes. We conducted a classification
analysis based on these data. Based on deep learning algorithms, we compared and analyzed
datasets using traditional classification techniques and pre-training processes, such
as AE, VAE, SAE, and CVAE. For feature extraction of significant genes for the classification
analysis using traditional techniques, PCA and NMF techniques were employed, while
in our proposed deep learning–based techniques, important genes were extracted through
pre-training processes such as AE, VAE, SAE, CVAE, and fine-tuning. As a result, deep
learning–based effective gene extraction methods performed better.
There are several methods to predict cancer subtypes or sample types using deep learning
techniques on gene expression data. To the best of our knowledge, there is a lack
methods for identifying kidney cancer biomarkers that combine AE-based methods and
model interpretation techniques. As shown in Tables 2-4, extracting race-related features
is the most challenging task, and sample type feature extraction is much easier than
other tasks. For the challenging tasks, CVAE outperforms the other methods.
Furthermore, we compared micro and macro measures according to the number of class
labels of the target variables. The micro-measure exhibited better performance. In
the future, the extracted genes will be able to confirm the gene’s function through
verification and help predict the prognosis of kidney cancer patients. In further
work, we will consider the other data samples such as clinical, RNA, DNA methylation,
etc.
Acknowledgements
This work was supported by the Basic Science Research Program through the National
Research Foundation of Korea (NRF) by the Ministry of Education under Grant No. 2019R1F1A1051569,
and No. 2020R1I1A1A01065199, No. 2020R1A6A1A12047945.
References
V. M. G. Olivares, L. M. G. Torres, G. H. Cuartas, M. C. N. De la Hoz, 2019, Immunohistochemical
profile of renal cell tumours, Revista Espanola De Patologia, Vol. 52, No. 4, pp.
214-221
J. J. Hsieh, M. P. Purdue, S. Signoretti, C. Swanton, L. Albiges, M. Schmidinger,
D. Y. Heng, J. Larkin, V. Ficarra, 2017, Renal cell carcinoma, Nat. Rev. Dis. Primers,
Vol. 3, No. 17010, pp. 1-19
W. M, Linehan, M. M. Walther, B. Zbar, 2003, The genetic basis of cancer of the kidney,
J. Urol., Vol. 170, pp. 2163-2172
W. M. Linehan, B. Zbar, 2004, Focus on kidney cancer, Cancer Cell, Vol. 6, No. 3,
pp. 223-228
Cancer Genome Atlas Research Network, 2016, Comprehensive molecular characterization
of papillary renal-cell carcinoma, N. Engl. J. Med., Vol. 374, No. 2, pp. 135-145
L. A. Torre, B. Trabert, C. E. DeSantis, K. D. Miller, G. Samimi, C. D. Runowicz,
M. M. Gaudet, A. Jemal, R. L. Siegel, 2018, Ovarian cancer statistics, 2018, CA Cancer
J. Clin., Vol. 68, pp. 284-296
A. J. Peired, R. Campi, M. L. Angelotti, G. Antonelli, C. Conte, E. Lazzeri, F. Becherucci,
L. Calistri, S. Serni, P. Romagnani, 2021, Sex and Gender Differences in Kidney Cancer:
Clinical and Experimental Evidence, Cancers, Vol. 13, No. 18, pp. 4588
Y. Zhan, C. Pan, Y. Zhao, J. Li, B. Wu, S. Bai, 2021, Systematic Analysis of the Global,
Regional and National Burden of Kidney Cancer from 1990 to 2017: Results from the
Global Burden of Disease Study 2017, Eur. Urol. Focus, Vol. 8, No. 1, pp. 302-319
N. Chowdhury, C. G. Drake, 2020, Kidney cancer: an overview of current therapeutic
approaches, Urol. Clin., Vol. 47, No. 4, pp. 419-431
D. A. Siegel, S. J. Henley, J. Li, L. A. Pollack, E. A. Van Dyne, A. White, pp 950-954
2017, Rates and trends of pediatric acute lymphoblastic leukemia—United States, 2001–2014,
Morb. Mortal. Wkly. Rep., Vol. 66, No. 36, pp. 950-954
A. F. Olshan, Y. M. Kuo, A. M, Meyer, M. E. Nielsen, M. P. Purdue, W. K. Rathmell,
2013, Racial difference in histologic subtype of renal cell carcinoma, Cancer Med.,
Vol. 2, No. 5, pp. 744-749
L. Lipworth, A. K. Morgans, T. L. Edwards, D. A. Barocas, S. S. Chang, S. D. Herrell,
D. F. Penson, M. J. Resnick, J. A. Smith, P. E. Clark, 2016, Renal cell cancer histological
subtype distribution differs by race and sex, BJU Int., Vol. 117, No. 2, pp. 260-265
T. R. Rebbeck, 2018, Prostate cancer disparities by race and ethnicity: from nucleotide
to neighborhood, Cold Spring Harbor Persp. Med., Vol. 8, No. 9, pp. a030387
S. J. O. Nomura, Y. T. Hwang, S. L. Gomez, T. T. Fung, S. L. Yeh, C. Dash, L. Allen,
S. Philips, L. Hilakivi-Clarke, Y. L. Zheng, J. H. Y. Wang, 2017, Dietary intake of
soy and cruciferous vegetables and treatment- related symptoms in Chinese-American
and non-Hispanic White breast cancer survivors, Breast Cancer Res. Treat., Vol. 168,
No. 2, pp. 467-79
P. Mamoshina, K. Kochetov, E. Putin, F. Cortese, A. Aliper, W. S. Lee, S. M. Ahn.
L. Uhn, N. Skjodt, O. Kovalchuk, M. Scheibye-Knudsen, 2018, Population specific biomarkers
of human aging: a big data study using South Korean, Canadian, and Eastern European
patient populations, J. Gerontology: Ser. A, Vol. 73, No. 11, pp. 1482-1490
H. Y. Xiong, B. Alipanahi, L. J. Lee, H. Bretschneider, D. Merico, R. K. Yuen, Y.
Hua, S. Gueroussov, H. S. Najafabadi, T. R. Hughes, Q. Morris, Y. Barash, A. R. Krainer,
N. Jojic, S. W. Scherer, B. J. Blencowe, B. J. Frey, 2015, RNA splicing. The human
splicing code reveals new insights into the genetic determinants of disease, Science,
Vol. 347, No. 6218, pp. 1-20
M. Amgad, H. Elfandy, H. Hussein, L. A. Atteya, M. A. T. Elsebaie, L. S. A. Elnasr,
R. A. Sakr, H. S. E. Salem, A. F. Ismail, A. M. Saad, J. Ahmed, M. A. T. Elsebaie,
M. Rahman, I. A. Ruhban, N. M. Elgazar, Y. Alagha, M. H. Osman, A. M. Alhusseiny,
M. M. Khalaf, A. F. Younes, A. Abdulkarim, D. M. Younes, A. M. Gadallah, A. M. Elkashash,
S. Y. Fala, B. M. Zaki, J. Beezley, D. R. Chittajallu, D. Manthey, D. A. Gutman, L.
A. D. Cooper, 2019, Structured crowdsourcing enables convolutional segmentation of
histology images, Bioinformatics, Vol. 35, No. 18, pp. 3461-3467
V. M. G. Olivares, L. M. G. Torres, G. H. Cuartas, M. C. N. De la Hoz, 2019, Immunohistochemical
profile of renal cell tumours, Rev. Esp. Patol., Vol. 52, No. 4, pp. 214-221
accessed on 17 August, 2021, National Cancer Center. Available online: https://ncc.re.kr/
index
B. H. Chi, I. H. Chang, 2018, The overdiagnosis of kidney cancer in Koreans and the
active surveillance on small renal mass, Korean J. Urol. Oncol., Vol. 16, No. 1, pp.
15-24
A. M. Ali, H. Zhuang, A. Ibrahim, O. Rehman, M. Huang, A. Wu, 2018, A machine learning
approach for the classification of kidney cancer subtypes using miRNA genome data,
Appl. Sci., Vol. 8, No. 12, pp. 1-14
H. M. Kim, S. J. Lee, S. J. Park, I. Y. Choi, S. H. Hong, 2021, Machine learning approach
to predict the probability of recurrence of renal cell carcinoma after surgery: Prediction
model development study, JMIR Med. Inform., Vol. 9, No. 3, pp. e25635
accessed on 17 August, 2021, Genomic Data Commons. Available online: https://portal.gdc.
cancer.gov
B. J. Kim, S. H. Kim, 2018, Prediction of inherited genomic susceptibility to 20 common
cancer types by a supervised machine-learning method, PNAS USA, Vol. 115, No. 6, pp.
1322-1327
O. G. Troyanskaya, K. Dolinski, A. B. Owen, R. B. Altman, D. Botstein, 2003, A Bayesian
framework for combining heterogeneous data sources for gene function prediction (in
Saccharomyces cerevisiae), PNAS USA, Vol. 100, No. 14, pp. 8348-8353
N. Hadjiyski, 2020, Kidney cancer staging: Deep learning neural network based approach,
2020 International Conference on E-Health and Bioengineering (EHB 2020)
H. S. Shon, E. Batbaatar, K. O. Kim, E. J. Cha, K. A. Kim, 2020, Classification of
kidney cancer data using cost-sensitive hybrid deep learning approach, Symmetry, Vol.
12, No. 1,154
N. Simidjievski, C. Bodnar, I. Tariq, P. Scherer, H. A. Terre, Z. Shams, M. Jamnik,
P. Liò, 2019, Variational autoencoders for cancer data integration: Design principles
and computational practice. Front. Genet.,, Vol. 10, No. 1205
P. Baldi, K. Hornik, 1989, Neural networks and principal component analysis: Learning
from examples without local minima., Neural Netw., Vol. 2, pp. 53-58
M. Mohri, A. Rostamizadeh, A. Talwalkar, 2012, Foundations of Machine Learning, MIT
Press
P. Vincent, H. Larochelle, L. Lajoie, Y. Bengio, P. A. Manzagol, 2010, Stacked denoising
autoencoders: Learning useful representations in a deep network with a local denoising
criterion, J. Mach. Learn. Res., Vol. 11, pp. 3371-3408
M. A. Ranzato, C. S. Poultney, S. Chopra, Y. LeCun, 2007, Efficient learning of sparse
representations with an energy-based model, Adv. Neural Inf. Process. Syst., Vol.
19, pp. 1137-1144
D. P. Kingma, M. Welling, 2014, Auto-encoding variational bayes, Proceedings of the
2nd International Conference on Learning Representations
Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, L. Carin, 2016, Variational autoencoder
for deep learning of images, labels and captions, 30th Conference on Neural Information
Processing Systems (NIPS 2016)
K. Simonyan, A. Zisserman, 2015, Very deep convolutional networks for large-scale
image recognition, The 3rd International Conference on Learning Representations (ICLR)
L. Le, A. Patterson, M. White, 2018, Supervised autoencoders: Improving generalization
performance with unsupervised regularizers, 32nd Conference on Neural Information
Processing Systems (NIPS 2018)
M. Mohri, A. Rostamizadeh, D. Storcheus, 2015, Generalization bounds for supervised
dimensionality reduction, JMLR: Workshop and Conf. Proc., Vol. 44, pp. 226-241
L. A. Gottlieb, A. Kontorovich, R. Krauthgamer, 2016, Adaptive metric dimensionality
reduction, Theor. Comput. Sci., Vol. 620, pp. 105-118
K. Sohn, H. Lee, X. Yan, 2015, Learning structured output representation using deep
conditional generative models, Proceedings of the 28th International Conference on
Neural Information Processing Systems, Vol. 2, pp. 3483-3491
S. Belharbi, R. Hérault, C. Chatelain, S. Adam, 2018, Deep neural networks regularization
for structured output prediction, Neurocomputing, Vol. 281, pp. 169-177
Y. Bengio, E. Laufer, G. Alain, J. Yosinski, 2014, Deep generative stochastic networks
trainable by backprop, Proceeding of the 31st International Conference on Machine
Learning, Vol. 32, pp. 226-234
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, 2011, Scikit-learn: Machine
learning in Python, J. Mach. Learn. Res., Vol. 12, pp. 2825-2830
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
N. Gimelshein, L. Antiga, A. Desmaison, 2019, Pytorch: An imperative style, high-performance
deep learning library, Proceedings of the 33rd International Conference on Neural
Information Processing Systems, pp. 8026-8037
I. Lucca, T. Klatte, H. Fajkovic, M. De Martino, S. F. Shariat, 2015, Gender differences
in incidence and outcomes of urothelial and kidney cancer, Nat. Rev. Urol., Vol. 12,
No. 12, pp. 585-592
M. Mancini, M. Righetto, G. Baggio, 2020, Gender-related approach to kidney cancer
management: Moving forward, Int. J. Mol. Sci., Vol. 21, No. 9, pp. 3378
D. Hepps, A. Chernoff, 2006, Risk of renal insufficiency in African-Americans after
radical nephrectomy for kidney cancer, Urologic Oncology: Seminars and Original Investigations,
Vol. 24, No. 5, pp. 391-395
B. Shuch, S. Vourganti, C. J. Ricketts, L. Middleton, J. Peterson, M. J. Merino, A.
R. Metwalli, R. Srinivasan, W. M. Linehan, 2014, Defining early-onset kidney cancer:
implications for germline and somatic mutation testing and clinical management, J.
Clin. Oncol., Vol. 32, No. 5, pp. 431-437
J. R. Vasselli, J. H. Shih, S. R. Iyengar, J. Maranchie, J. Riss, R. Worrell, C. Torres-Cabala,
R. Tabios, A. Mariotti, R. Stearman, M. Merino, W. M. Linehan, 2003, Predicting survival
in patients with metastatic kidney cancer by gene- expression profiling in the primary
tumor, Proceedings of the National Academy of Sciences, Vol. 100, No. 12, pp. 6958-6963
M. Mostavi, Y. C. Chiu, Y Huang, Y. Chen, 2020, Convolutional neural network models
for cancer type prediction based on gene expression, BMC Med. Genom., Vol. 13, No.
44, pp. 1-13
N. E. M. Khalifa, M. H. N. Taha, D. E. Ali, A. Slowik, A. E. Hassanien, 2020, Artificial
intelligence technique for gene expression by tumor RNA-Seq data: a novel optimized
deep learning approach, IEEE Access, Vol. 8, pp. 22874-22883
R. Tabares-Soto, S. Orozco-Arias, V. Romero-Cano, V. S. Bucheli, J. L Rodríguez-Sotelo,
C. F. Jiménez-Varón, 2020, A comparative study of machine learning and deep learning
algorithms to classify cancer types based on microarray gene expression data, PeerJ
Comput. Sci., Vol. 6, No. e270
저자소개
2010 : Ph.D in Computer Science, Chungbuk National University, Korea.
2012 to present : Visiting professor in Medical Research Institute, School of Medicine,
Chungbuk National University, Korea.
2019 : Ph.D in Computer Science, Chungbuk National University, Korea
2021 to present : Researcher in Electronics and Telecommunications Research Institute,
Korea.
1987 : Ph.D in Biomedical Engineering, University of Southern California, U. S. A.
1988 to present : Professor in Department of Biomedical Engineering, School of Medicine,
Chungbuk National University, Korea.
2000 : Ph. D in Industrial Engineering, Dongguk University, Korea
2021 to present : Research professor in Institute for Trauma Research, College of
Medicine, Korea University, Korea.
2004 : Ph.D in Information Communications University, Korea
2004 to present : Professor in College of Electrical and Computer Engineering, Chungbuk
National University, Korea.
2001 : Ph.D in Biomedical Engineering, Chungbuk National University, Korea.
2005 to present : Professor in Department of Biomedical Engineering, School of Medicine,
Chungbuk National University, Korea.