황명하
                     (Myeong-Ha Hwang)
                     †iD
                     이인태
                     (In-Tae Lee)
                     1iD
                     채창훈
                     (Chang-Hun Chae)
                     1iD
                     정남준
                     (Nam-Joon Jung)
                     1iD
               
                  - 
                           
                        (Digital Solution Laboratory, Korea Electric Power Research Institute(KEPRI), Korea.)
                        
 
               
             
            
            
            Copyright © The Korean Institute of Electrical Engineers(KIEE)
            
            
            
            
            
               
                  
Key words
               
               Deep Learning, Natural Language Processing, Power Generation, Diagnostic Service, Text Mining, Framework
             
            
          
         
            
                  1. 서 론
               
                  Natural language processing (NLP) is emerging as one of the most frequently used technologies
                  in a wide range of artificial intelligence areas. With the recent advancement of NLP
                  technology and the continuous increase in the value and amount of data, programs capable
                  of recognizing natural language such as human speaking and writing have already been
                  widely used in relation to clinical documents, airline reservations, and vehicle roadside
                  support.
                  
               
               
                  The global market for software, hardware, and services in the NLP sector is forecast
                  as follows. Tractica forecast that the NLP market, which was estimated at $\$$277.2
                  million in 2015, would grow by an average of 25% annually to reach $\$$2.1 billion
                  by 2024. Owing to the high demand in the NLP sector, its market size has been increasing
                  (1).
                  
               
               
                  Deep learning technology related to embedding that vectorizes the similarity between
                  words has been attracting particular attention in the NLP sector. However, there are
                  no cases of applying this technology to the electric power industry and no corresponding
                  service frameworks. Moreover, thousands of knowledge documents produced by electric
                  generator operation experts have been collected for about 20 years by Korea Electric
                  Power Corporation (KEPCO), but they have rarely been applied to electric generator
                  operation.
                  
               
               
                  Therefore, in this report, we propose Gen2Vec, an NLP framework for electric generator
                  operation knowledge services using deep learning technology. Gen2Vec is a framework
                  that includes a preprocessing function that extracts nouns from search sentences or
                  words, a word recommendation function that recommends words related to search words
                  using deep learning-based word embedding, and a document recommendation function that
                  recommends documents related to search words. In the future, Gen2Vec can be the core
                  engine of the knowledge system for electric generator operation experts and will be
                  used in search and chatbot services for electric generator operation programs and
                  new employee education.
                  
               
             
            
                  2. Related Work
               
                     2.1 Word Embedding based on Deep Learning
                  
                     Embedding is the state in which a graph can be drawn so that edges do not cross each
                     other on the surface (2). Recently, embedding techniques have been widely employed in the NLP sector. Typical
                     embedding techniques include word embedding, in which words are represented as vectors,
                     and the term frequency (TF)–inverse document frequency (IDF) approach, in which the
                     importance of each word in a document is quantified.
                     
                  
                  
                     First, for word embedding, there are prediction-based models, including neural probabilistic
                     language models such as Word2Vec and FastText and matrix factorization-based models
                     such as latent semantic analysis and the global word vector model (3-8). In this study, we selected Word2Vec, which showed good performance in evaluating
                     word similarity and was widely used in various industries, as the word embedding technique.
                     As shown in Figure 1, there are two training methods in Word2Vec, that is, the continuous bag of word
                     (CBOW) and Skip-gram methods. The CBOW technique is a training method that predicts
                     a target word based on the surrounding words (Fig. 1(a)). On the other hand, Skip-gram is a training method that predicts the surrounding
                     words using a target word (Fig. 1(b)).
                     
                  
                  
                     Second, TF-IDF, as a weight used in information searching and text mining, is a statistical
                     number indicating how important a word is in a particular document (9). The TF indicates how often a specific word appears in a document, and the IDF is
                     the inverse of the document frequency. The TF-IDF can be obtained by multiplying the
                     TF times the IDF. The TF-IDF is used to rank search results in search engines and
                     to measure the similarity between documents within a document cluster.
                     
                  
                  
                     Minarro-Gimenez et al. conducted a study on improving the accessibility of medical
                     knowledge by applying Word2Vec to medical documents, and Husain and Dih developed
                     a mobile application recommendation system for travelers based on TF-IDF content (10-11). Similarly, research using deep learning-based word embedding and the TF-IDF has
                     been actively underway in various industries. However, research on the application
                     of this technology in the electric power sector is insufficient.
                     
                  
                  
                     
                     
                           
                           
그림. 1. CBOW와 Skip-gram 구조 
                        
                        
                           
Fig. 1. CBOW and Skip-gram Structure
                         
                     
                  
                
               
                     2.2 Framwork
                  
                     A framework is a software environment that provides a reusable design and implementation
                     of parts corresponding to specific software functions in a collaborative form to make
                     the development of a software platform effective (12). The framework can be maintained through systematic code management and is highly
                     reusable. It has high development productivity by providing a function library.
                     
                  
                  
                     Regarding research on frameworks, Bedi and Toshniwal developed a deep learning framework
                     for forecasting electricity demand using long short-term memory (13). Dowling et al. proposed an optimization framework for evaluating revenue opportunities
                     provided by multi-scale hierarchies in the electric power market and determining optimal
                     participation strategies for individual participants (14). In addition, Pinheiro and Davis Jr. endeavored to improve user convenience by managing
                     the characteristics and structure of data collection target themes by developing ThemeRise,
                     which was a framework for producing a volunteered geographic information application,
                     a type of cloud sourcing, and Jack Jr. proposed the National Institute on Aging and
                     Alzheimer's Association (NIA-AA) research framework to assist research on the biological
                     definition of Alzheimer’s disease (15-16). As shown in those studies, framework research has been gaining attention not only
                     in the electric power industry, but also in other industries. The introduction of
                     a framework facilitates the utilization of existing technologies conveniently from
                     the perspective of users and has the advantage of efficient platform development.
                     
                  
                
             
            
                  3. Proposed Gen2Vec Framework
               
                     3.1 Framework Architecture
                  
                     The framework architecture of the proposed Gen2Vec is shown in Fig. 2. Pretraining is performed using deep learning- based Word2Vec utilizing 1,348 expert
                     knowledge documents for the electric generator. When the user enters a sentence or
                     word in the search box, the preprocessing function is performed to extract only nouns.
                     Then, the word recommendation function is performed by embedding based on the extracted
                     and pretrained words. Next, Gen2Vecscore is calculated using the words extracted by
                     the word recommendation function and the TF-IDF value of each document. Lastly, the
                     document recommendation function is performed to recommend documents related to the
                     search word.
                     
                  
                  
                     
                     
                           
                           
그림. 2. Gen2Vec 프레임워크 구조 
                        
                        
                           
Fig. 2. Framework Architecture of Gen2Vec
                         
                     
                  
                
               
                     3.2 Preprocessing Function
                  
                     The preprocessing function of Gen2Vec is performed after the user enters a word or
                     sentence that he/she wants to search in the search box. Firstly, the word or sentence
                     is tokenized to separate each word into tokens, and then part of speech tagging is
                     used to add the part of speech to each token. After this, the word recommendation
                     function is performed by extracting only the noun tokens. KoNLPy that korean natural
                     language processing package for python was used for the preprocessing function of
                     Gen2Vec (17).
                     
                  
                
               
                     3.3 Word Recommendation Function
                  
                     The word recommendation function was developed using the Gensim framework-based Word2Vec
                     for the words extracted from the preprocessing function (18). The training parameters of the Skip-gram model for extracting embedding words are
                     matrices u and v. The size of each matrix is the size of the vocabulary set (|v|)
                     by the number of embedding dimensions (d). The probability that the target word (t)
                     and content word (c) are positive samples is calculated using Eq. (1), and the probability that t and c are negative samples is calculated using Eq. (2).
                     
                  
                  
                     
                     Eq. 1(식 (1)) Positive Sample Calculation of Skip-gram
                     
                  
                  
                     
                     
                     
                     
                     
                  
                  
                     Eq. 2(식 (2)) Negative Sample Calculation of Skip-gram
                     
                  
                  
                     
                     
                     
                     
                     
                  
                  
                     The log-likelihood function of Skip-gram is Equation (3), and the word recommendation function can vectorize words in a document cluster after
                     training to optimize Eq. (3). Using this approach, the words related to the search word can be recommended.
                     
                  
                  
                     Eq. 3(식 (3)) Log-likelihood Function of Skip-gram
                     
                  
                  
                     
                     
                     
                     
                     
                  
                
               
                     3.4 Document Recommendation Function
                  
                     The document recommendation function of the proposed Gen2Vec is as follows. When the
                     user enters a necessary word or sentence in the search box, the preprocessing function
                     is used to extract the noun words ($Word_{x_{i}}$). The extracted words are input
                     into the trained Word2Vec model to extract the TopN words ($Word_{y_{i}}$) with high
                     cosine similarity for each $Word_{x_{i}}$. The formula for obtaining the cosine similarity
                     is expressed in Eq. (4), and the method for obtaining $Word_{y_{i}}$ is expressed in Eq. (5).
                     
                  
                  
                     Eq. 4(식 (4)) The formula for obtaining the cosine similarity
                     
                  
                  
                     
                     
                     
                     
                     
                  
                  
                     Eq. 5(식 (5)) The method for obtaining $Word_{y_{i}}$ 
                     
                  
                  
                     
                     
                     
                     
                     
                  
                  
                     After defining each word (w), each target word (t), each document (d), the total number
                     of documents (D), and the total frequency (frequency (f ())) for the documents, TF-IDF
                     is produced using Eq. (6). Then, the data frame is extracted for TF-IDF, which has the word list consisting
                     of $Word_{x_{i}}$ and $Word_{y_{i}}$ as the column value and has the documents as
                     the row values. Table 1 shows an example of the extraction results.
                     
                  
                  
                     Eq. 6(식 (6)) The formula of TF-IDF
                     
                  
                  
                     
                     
                     
                     
                     
                  
                  
                     
                     
                     
                     
                           
                           
표 1. 문서 추천 기능을 위한 데이터 프레임 예시
                        
                        
                           
Table 1. Data Frame Example for Document Recommendation Function
                        
                        
                           
                           
                           
                                 
                                    
                                       | 
                                          
                                       			
                                        
                                       			
                                      | 
                                    
                                          
                                       			
                                        $Word_{x_{1}}$ 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        $Word_{x_{n}}$ 
                                       			
                                     | 
                                    
                                          
                                       			
                                        $Word_{y_{1}}$ 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        $Word_{y_{m}}$ 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Doc1 
                                       			
                                     | 
                                    
                                          
                                       			
                                        0.27 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        0.32 
                                       			
                                     | 
                                    
                                          
                                       			
                                        0.41 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        0.19 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        DocZ 
                                       			
                                     | 
                                    
                                          
                                       			
                                        0.13 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        0.11 
                                       			
                                     | 
                                    
                                          
                                       			
                                        0.21 
                                       			
                                     | 
                                    
                                          
                                       			
                                        .. 
                                       			
                                     | 
                                    
                                          
                                       			
                                        0.02 
                                       			
                                     | 
                                 
                              
                           
                        
                      
                     
                  
                  
                     The above data frame contains the word $Word_{x_{i}}$ that the user directly enters
                     in the search box and the related word $Word_{y_{i}}$ extracted by deep learning.
                     Therefore, Gen2Vecweight is defined because it is necessary to give different weights
                     to $Word_{x_{i}}$ and $Word_{y_{i}}$, as expressed in Eq. (7). Using this approach, the TF-IDF value in the data frame is updated.
                     
                  
                  
                     Eq. 7(식 (7)) The formula of Gen2Vecweight
                     
                  
                  
                     
                     
                     
                     
                     
                  
                  
                     Next, Gen2Vecweight for each document in the data frame is summed and the TopK documents
                     are extracted. The function used for this calculation was defined as Gen2Vecscore.
                     If T is defined as the total number of $Word_{x_{i}}$ and $Word_{y_{i}}$, the method
                     of obtaining Gen2Vecscore can be expressed as shown in Eq. (8).
                     
                  
                  
                     Eq. 8(식 (8)) The formula of Gen2Vecscore 
                     
                  
                  
                     
                     
                     
                     
                     
                  
                
             
            
                  4. Experiments and Results
               
                     4.1 Expert Documents for Diagnostic Services of Power Generation Facility
                  
                     KEPCO has operated electric generators at each of their power plants for about 20
                     years, and experts have directly diagnosed the generators, accumulating 1,348 documents.
                     The experts utilized categories of boiler, electric generator, performance, gas turbine,
                     and steam turbine in the diagnosis and designated subcategories such as fault diagnosis
                     and precision diagnosis. Table 2 shows the statistics of expert documents on electric generator operation collected
                     from 2000 to 2018. The Gen2Vec developed in this study was trained for these documents.
                     
                  
                  
                     
                     
                     
                     
                           
                           
표 2. 발전설비 진단을 위한 전문가 문서 군집
                        
                        
                           
Table 2. Expert Documents for Diagnostic Services of Power Generation Facility
                        
                        
                           
                           
                           
                                 
                                    
                                       | 
                                          
                                       			
                                        Category 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Subcategory 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Number of documents 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Boiler 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Fault diagnosis 
                                       			
                                     | 
                                    
                                          
                                       			
                                        52 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Precision diagnosis 
                                       			
                                     | 
                                    
                                          
                                       			
                                        380 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Gas turbine 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Fault diagnosis 
                                       			
                                     | 
                                    
                                          
                                       			
                                        34 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Precision diagnosis 
                                       			
                                     | 
                                    
                                          
                                       			
                                        53 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Steam 
                                       
                                       			
                                       turbine 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Fault diagnosis 
                                       			
                                     | 
                                    
                                          
                                       			
                                        33 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Precision diagnosis 
                                       			
                                     | 
                                    
                                          
                                       			
                                        57 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Electric 
                                       
                                       			
                                       generator 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Leak absorption 
                                       			
                                     | 
                                    
                                          
                                       			
                                        122 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Prevention diagnosis 
                                       			
                                     | 
                                    
                                          
                                       			
                                        37 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Performance 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Insulation diagnosis 
                                       			
                                     | 
                                    
                                          
                                       			
                                        546 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Precision diagnosis 
                                       			
                                     | 
                                    
                                          
                                       			
                                        34 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Total 
                                       			
                                     | 
                                    
                                          
                                       			
                                        1,348 
                                       			
                                     | 
                                 
                              
                           
                        
                      
                     
                  
                  
                     
                     
                     
                     
                           
                           
표 3. 단어 추천 기능의 결과 예시
                        
                        
                           
Table 3. Result Example of Word Recommendation Function
                        
                        
                           
                           
                           
                                 
                                    
                                       | 
                                          
                                       			
                                        $Word_{x_{1}}$$~ Word_{x_{5}}$ 
                                       
                                       			
                                       [Korean] 
                                       
                                       			
                                       (Cosine Similarity) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Gas turbine 
                                       
                                       			
                                       [가스터빈] 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Compressor 
                                       
                                       			
                                       [압축기] 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Blade 
                                       
                                       			
                                       [블레이드] 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Crack 
                                       
                                       			
                                       [균열] 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Occurrence 
                                       
                                       			
                                       [발생] 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        $Word_{y_{1}}$ 
                                       
                                       			
                                       $~ Word_{y_{5}}$ 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Trial run 
                                       
                                       			
                                       [시운전] 
                                       
                                       			
                                       (0.88) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Blade 
                                       
                                       			
                                       [블레이드] 
                                       
                                       			
                                       (0.97) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Compressor 
                                       
                                       			
                                       [압축기] 
                                       
                                       			
                                       (0.97) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Tiny 
                                       
                                       			
                                       [미세] 
                                       
                                       			
                                       (0.84) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Majority 
                                       
                                       			
                                       [다수] 
                                       
                                       			
                                       (0.86) 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        $Word_{y_{6}}$ 
                                       
                                       			
                                       $~ Word_{y_{10}}$ 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Gunsan 
                                       
                                       			
                                       [군산] 
                                       
                                       			
                                       (0.85) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Bucket 
                                       
                                       			
                                       [버켓] 
                                       
                                       			
                                       (0.91) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Bucket 
                                       
                                       			
                                       [버켓] 
                                       
                                       			
                                       (0.93) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Progress 
                                       
                                       			
                                       [진전] 
                                       
                                       			
                                       (0.83) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Similarity 
                                       
                                       			
                                       [유사] 
                                       
                                       			
                                       (0.85) 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        $Word_{y_{11}}$ 
                                       
                                       			
                                       $~ Word_{y_{15}}$ 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Low pressure 
                                       
                                       			
                                       [저압] 
                                       
                                       			
                                       (0.83) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Past 
                                       
                                       			
                                       [과거] 
                                       
                                       			
                                       (0.88) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Vane 
                                       
                                       			
                                       [베인] 
                                       
                                       			
                                       (0.91) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Fault 
                                       
                                       			
                                       [결함] 
                                       
                                       			
                                       (0.83) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Order 
                                       
                                       			
                                       [차례] 
                                       
                                       			
                                       (0.85) 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        $Word_{y_{16}}$ 
                                       
                                       			
                                       $~ Word_{y_{20}}$ 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Turbine 
                                       
                                       			
                                       [터빈] 
                                       
                                       			
                                       (0.81) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Rotor 
                                       
                                       			
                                       [로터] 
                                       
                                       			
                                       (0.85) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Recommendation 
                                       
                                       			
                                       [권고] 
                                       
                                       			
                                       (0.88) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Expansion 
                                       
                                       			
                                       [확대] 
                                       
                                       			
                                       (0.81) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Estimation 
                                       
                                       			
                                       [추정] 
                                       
                                       			
                                       (0.85) 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        $Word_{y_{21}}$ 
                                       
                                       			
                                       $~ Word_{y_{25}}$ 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Component 
                                       
                                       			
                                       [부품] 
                                       
                                       			
                                       (0.80) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Type 
                                       
                                       			
                                       [종류] 
                                       
                                       			
                                       (0.84) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Rotor 
                                       
                                       			
                                       [로터] 
                                       
                                       			
                                       (0.88) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Discovery
                                          			[발견]
                                        
                                       
                                       			
                                       (0.81) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Many 
                                       
                                       			
                                       [여러] 
                                       
                                       			
                                       (0.84) 
                                       			
                                     | 
                                 
                              
                           
                        
                      
                     
                  
                  
                     
                     
                           
                           
그림. 3. 전처리 기능 예시 
                        
                        
                           
Fig. 3. An Example of Preprocessing Function
                         
                     
                  
                
               
                     4.2 Preprocessing and Word Recommendation
                  
                     The experimental example and results of the preprocessing function are shown in Fig. 3. After entering a word or sentence that the user wants to search in the search box,
                     tokenization is applied to separate the word or sentence into tokens when the preprocessing
                     function is applied (Fig. 3 (a), (b)). Next, the part of speech is tagged to each token, and the nouns are extracted (Fig. 3(c), (d)). The extracted nouns are input into the word recommendation function.
                     
                  
                  
                     The experimental example and results of the word recommendation function are shown
                     in Table 3. In this experiment, the nouns ($Word_{x_{i}}$) extracted by the preprocessing function
                     as shown in Fig. 3 were pretrained with the Word2Vec model. For pretraining, the word vector dimension
                     was set to 1,000, the window size was set to 4, and the downsample setting for frequently
                     appearing words was set to 1e-3. By applying these parameters, the embedding words
                     ($Word_{y_{i}}$) corresponding to TopN were extracted. Here, the embedding words were
                     output in order of descending cosine similarity. In this experiment, N was assumed
                     to be 5, and each extracted $Word_{y_{i}}$ had a cosine similarity value.
                     
                  
                  
                     The experimental results show that the words that were highly related to each $Word_{x_{i}}$
                     were extracted as $Word_{y_{i}}$. The extracted words included duplicate words. Other
                     than the duplicate words with high cosine similarity, the duplicate words were excluded
                     from $Word_{y_{i}}$ to be used in the document recommendation function.
                     
                  
                
               
                     4.3 Document Recommendation Function
                  
                     The experimental example and results of the document recommendation function are presented
                     in Table 4.  The TF-IDF value that was pretrained for the nouns ($Word_{x_{i}}$) extracted by
                     the preprocessing function and words recommended by the word recommendation function
                     ($Word_{y_{i}}$) was updated with the proposed Gen2Vecweight. Next, the documents
                     extracted using Gen2Vecscore were presented to the user. Defining the K value used
                     to obtain TopK as 10, in relation to the search words in the example employed in this
                     study, the extracted documents are listed in Table 4.
                     
                  
                  
                     
                     
                     
                     
                           
                           
표 4. 문서 추천 기능의 결과 예시
                        
                        
                           
Table 4. Result Example of Document Recommendation Function
                        
                        
                           
                           
                           
                                 
                                    
                                       | 
                                          
                                       			
                                        Rank 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Document name 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Gen2Vecscore 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        1 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Yeongwol natural gas power plant gas turbine report (1) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        2.44 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        2 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Seo-incheon 5GT OH technical support report (1) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        2.39 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        3 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Yeongwol natural gas power plant gas turbine report (2) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        2.35 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        4 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Busan 3GT report 
                                       			
                                     | 
                                    
                                          
                                       			
                                        2.18 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        5 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Seo-incheon unit 1 gas turbine maintenance work 
                                       
                                       			
                                       technical support report 
                                       			
                                     | 
                                    
                                          
                                       			
                                        2.17 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        6 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Bundang 8GT 1 blade damage report 
                                       			
                                     | 
                                    
                                          
                                       			
                                        2.16 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        7 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Pyeongtaek 3GT composite report (1) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        2.10 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        8 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Pyeongtaek 3GT composite report (2) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        2.08 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        9 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Seo-incheon 5GT OH technical support report (2) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        2.07 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        10 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Yeongwol 2GT high temperature parts damage report 
                                       			
                                     | 
                                    
                                          
                                       			
                                        1.98 
                                       			
                                     | 
                                 
                              
                           
                        
                      
                     
                  
                  
                     The results in Table 4 confirm that the documents related to the nouns extracted from the search words,
                     such as the gas turbine report, blade damage report, and high-temperature parts damage
                     report, were extracted. The accuracy result of the document recommendation function
                     is in Table 5. Precision, Recall, and F1 results were derived for Gen2Vec, Word2Vec, and TF-IDF.
                     Gen2Vec proved about 3.9% and 10.8% higher than Word2Vec and TF-IDF.
                     
                  
                  
                     
                     
                     
                     
                           
                           
표 5. 문서 추천 기능의 성능평가
                        
                        
                           
Table 5. Evaluated Performance of Document Recommendation Function
                        
                        
                           
                           
                           
                                 
                                    
                                       | 
                                          
                                       			
                                        Algorithm 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Precision(%) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Recall(%) 
                                       			
                                     | 
                                    
                                          
                                       			
                                        Accuracy(%) 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Gen2Vec 
                                       			
                                     | 
                                    
                                          
                                       			
                                        81.3 
                                       			
                                     | 
                                    
                                          
                                       			
                                        84.9 
                                       			
                                     | 
                                    
                                          
                                       			
                                        83.1 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        Word2Vec 
                                       			
                                     | 
                                    
                                          
                                       			
                                        78.2 
                                       			
                                     | 
                                    
                                          
                                       			
                                        80.1 
                                       			
                                     | 
                                    
                                          
                                       			
                                        79.2 
                                       			
                                     | 
                                 
                                 
                                       | 
                                          
                                       			
                                        TF-IDF 
                                       			
                                     | 
                                    
                                          
                                       			
                                        71.9 
                                       			
                                     | 
                                    
                                          
                                       			
                                        72.7 
                                       			
                                     | 
                                    
                                          
                                       			
                                        72.3 
                                       			
                                     | 
                                 
                              
                           
                        
                      
                     
                  
                
             
            
                  5. Conclusion
               
                  In this report, we proposed Gen2Vec, a knowledge service framework required for electric
                  generator operation, based on user search words. Gen2Vec offers three functions to
                  provide efficient knowledge services. First, there is a preprocessing function that
                  separates a sentence entered by the user into tokens and extracts only nouns. Second,
                  there is a word recommendation function that recommends words related to the search
                  words by applying a model trained by deep learning. Last, there is a document recommendation
                  function that extracts highly related documents by applying Gen2Vecweight and Gen2Vecscore
                  to the TF-IDF values pretrained for words extracted with the preprocessing and word
                  recommendation functions.
                  
               
               
                  When using Gen2Vec in this way, experts and new employees who operate electric generators
                  can quickly extract expert documents accumulated over 20 years by KEPCO when diagnosing
                  generators. Consequently, operators and new employees can obtain expert knowledge
                  without any experts in the power plants and can easily apply this knowledge in the
                  field.
                  
               
               
                  In the future, we are planning on extending the word and document recommendation functions
                  of Gen2Vec into a person- alized recommendation function by developing optimization
                  functions related to the search words of individuals. Furthermore, we are developing
                  Gen2Vec with training in multiple languages such as English and Chinese and working
                  on improving user-friendliness by developing a chatbot service and voice-based search
                  service by equipping this as the core engine of knowledge services for electric generator
                  operation.
                  
               
             
          
         
            
                  Acknowledgements
               
                  This work was funded by the Korea Electric Power Corporation (KEPCO).
                  
               
             
            
                  
                     References
                  
                     
                        
                        R. Madhavan, 2018, Natural language processing current appli- cations and future possibilities,
                           Tractica Omdia

 
                      
                     
                        
                        A. Ittai, B. Yair, N. Ofer, Sep 2011, Advances in metric embedding theory, Advances
                           in Mathematics, Vol. 228, pp. 3026-3126

 
                      
                     
                        
                        Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin, Feb 2003, A neural probabilistic language
                           model, Journal of Machine Learning, Vol. 3, pp. 1137-1155

 
                      
                     
                        
                        T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Dec 2013, Distributed representations
                           of words and phrases and their compositionality, Proceedings of the 26th International
                           Conference on Neural Information Processing Systems (NIPS), Australia, Vol. 2, pp.
                           3111-3119

 
                      
                     
                        
                        T. Mikolov, K. Chen, G. Corrado, J. Dean, Jan 2013, Efficient estimation of word representations
                           in vector space, Proceedings of the International Conference on Learning Representations
                           (ICLR), USA

 
                      
                     
                        
                        A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, April 2017, Bag of tricks for efficient
                           text classification, Proceedings of the 15th Conference of the European Chapter of
                           the Association for Computational Linguistics Spain, Vol. 2, pp. 427-431

 
                      
                     
                        
                        S. T. Dumais, 2005, Latent semantic analysis, Annual Review of Information Science
                           and Technology, Vol. 38, pp. 188-230

 
                      
                     
                        
                        J. Pennington, R. Socher, C. D. Manning, Oct 2014, Glove: Global vectors for word
                           representation, Proceedings of the 2014 Conference on Empirical Methods in Natural
                           Language Processing (EMNLP), Qatar, pp. 1532-1543

 
                      
                     
                        
                        G. Salton, M. J. McGill, 1983, Introduction to Modern Inform- ation retrieval, McGraw-Hill

 
                      
                     
                        
                        J. A. Minarro-Gimenez, O. Marin-Alonso, M. Samwald, 2014, Exploring the application
                           of deep learning techniques on medical text corpora, 2014 European Federation for
                           Medical Informatics and IOS Press, pp. 584-588

 
                      
                     
                        
                        W. Husain, L. Y. Dih, July 2012, A framework of a personalized location-based traveler
                           recommendation system in mobile application, International Journal of Multimedia and
                           Ubiquitous Engineering, Vol. 7, pp. 11-18

 
                      
                     
                        
                        A. Gachet, Software frameworks for developing decision support systems – A new component
                           in the classification of DSS development tools, Journal of Decision Systems, Vol.
                           12, No. 3, pp. 271-281

 
                      
                     
                        
                        J. Bedi, D. Toshniwal, Jan 2019, Deep learning framework to forecast electricity demand,
                           Applied Energy, Vol. 238, pp. 1312-1326

 
                      
                     
                        
                        A. W. Dowling, R. Kumar, V. M. Zavala, Jan 2017, A multi- scale optimization framework
                           for electricity market partici- pation, Applied Energy, Vol. 190, pp. 147-164

 
                      
                     
                        
                        M. B. Pinheiro, C. A. Davis Jr, Jun 2018, ThemeRise: A theme- oriented framework for
                           volunteered geographic information applications, Journal of Open Geospatial Data,
                           Software and Standards, Vol. 1, pp. 3-9

 
                      
                     
                        
                        C. R. Jack Jr, D. A. Bennett, K. Blennow, M. C. Carrillo, B. Dunn, X. B. Haeberlein,
                           D. M. Holtzman, W. Jagust, F. Jessen, J. Karlawish, E. Lilu, J. L. Molinuevo, T. Montine,
                           C. Phelps, K. P. Rankin, C. C. Rowe, P. Scheltens, E. Siemers, H. M. Snyder, R. Sperling,
                           2018, NIA-AA research framework: Toward a biological definition of Alzheimer’s disease,
                           Alzheimers Dement, Vol. 14, pp. 535-562

 
                      
                     
                        
                        E. L. Park, S. Cho, 2014, KoNLPy: Korean natural language processing in Python, Proceedings
                           of the 26th Annual Conference on Human and Cognitive Language Technology, pp. 133-136

 
                      
                     
                        
                        R. Rehurek, P. Sojka, 2011, Gensim-Statistical semantics in Python, The 4th European
                           Meeting on Python in Science

 
                      
                   
                
             
            저자소개
             
             
             
            
            He has received B.S. degree in Department of Information and Communication Engineering,
               from Chungnam National University (CNU), Korea in 2015 and M.E. degree in Information
               and Communication Network Technology from University of Science and Technology (UST),
               Korea in 2018, and currently work for Korea Electric Power Research Institute (KEPRI).
               
            
            His current research interests Deep Learning and Natural Language Processing (NLP).
            
            He is currently working as a principal resear- cher in KEPCO Research Institute (KEPRI),
               Daejeon, Korea.
            
            He received his M.S. of com- puter science from Korea University.
            
            He received M.S. degree in Information and Mechanical Engineering from Gwangju Institute
               of Science and Technology (GIST). 
            
            His Major is Computer Science on general and, in specific, Augmented Reality and Computer
               Vision.
            
            
            He received his Ph.D. degree in computer engineering from Hanbat University.
            His research interests are AI, VR/AR and Drone Applications.