2.1 Literature review on big data architecture design
                  In order to prevent such railway accidents and industrial safety accidents, prior
                     studies such as failure analysis and reliability evaluation methods such as FTA (Fault
                     Tree Analysis), big data and artificial intelligence technologies, and big data architecture
                     design have been conducted. 
                  
                  We will now introduce prior research and related technologies. First, in previous
                     studies for big data railway architecture, the big data railway safety platform architecture
                     was designed by dividing it into five parts: collection, API gateway, preprocessing,
                     storage, and analysis [3].
                  
                  Additionally, in the service layer-centered big data architecture research, the architecture
                     was divided into four layers: the storage layer, processing layer, service layer,
                     and ingestion layer for predictive maintenance of railroad points.
                  
                  Furthermore, the main functions of the architecture are to collect data from external
                     sources, process and retrieve collected data, and perform data aggregation, modeling,
                     analysis, and visualization, with the architecture designed based on Hadoop and Apache
                     NIFI [4]. FTA is a quantitative failure analysis and reliability evaluation method that uses
                     FT (Fault Tree), which logically expresses the relationship with the causes of system
                     failure to find vulnerable parts and improve system reliability [5].
                  
                  FTA is a significantly reasonable failure and defect analysis method. If FTA is used
                     as a standard for artificial intelligence and big data analysis, the reliability of
                     defect analysis can be increased.   MQTT (Message Queue Telemetry Transport) is a
                     message transmission and reception framework for large-scale IoT communication of
                     small devices standardized in 2016. 
                  
                  MQTT's publish-subscribe messaging pattern can communicate only through a broker.
                     MQTT has the following three technical characteristics [6].
                  
                  ⓐ Clients requesting a connection with the MQTT broker either explicitly disconnect
                     after making a TCP/IP socket connection or remain connected until they are disconnected
                     due to network conditions.
                  
                  ⓑ MQTT's publish-subscribe messaging pattern can communicate only through a broker.
                     Additionally, when a message is published on the set topic, the message can be published
                     to the clients subscribing to the topic, and both one-to-one and one-to-many communication
                     is possible.
                  
                  ⓒ QoS has 3 levels, where 0 guarantees a maximum of one transmission, 1 guarantees
                     at least one transmission, and 2 guarantees one reception.
                  
                  Kafka was developed by Linkedin and is a distributed data streaming platform based
                     on message queues that can publish, subscribe, store, and process data streams in
                     real-time. Unlike conventional message transmission systems, Kafka manages messages
                     as event queues in the file system rather than memory [7].
                  
                  MongoDB is different from relational databases such as Oracle and MySQL, which store
                     data in tables and have row-centered storage structures that access databases using
                     SQL. MongoDB is a NoSQL with a document-centered storage structure, and data is stored
                     as keys and values in Binary JSON format. It consists of a collection that matches
                     a table, a document that matches a row, and a field that matches a column [8]. In a study of big data architecture for an IoT-based smart manufacturing system,
                     MQTT and Kafka are combined to collect, relay, and store sensor data, and MongoDB,
                     relational databases, and Elasticsearch are adopted as consumers [9]. Another deep learning-based network for real-time object detection is called YOLO
                     (You Only Look Once). YOLO is a one-stage detection algorithm that performs classification
                     and location identification simultaneously and has the advantage of being able to
                     detect objects faster than two-stage detection algorithms based on R-CNN such as fast
                     R-CNN and SPPNet [10].
                  
                  In this study, a railway safety platform application model was presented using the
                     IoT-based big data platform architecture. Additionally, using YOLOv5, an object detection
                     algorithm, an experiment was conducted on how image data on a railroad track can be
                     used in anomaly detection for safe railway operation, and the results of the experiment
                     are presented.
                  
                  
                        
                        
Fig. 1. Big data platform architecture design process
                      
                
               
                     2.2 Research Method
                  Referring to previous studies, this study applied the following research method to
                     design the railway safety big data platform architecture. The research process is
                     shown in Fig. 1.
                  
                  First, the essential elements for big data platform design were defined in five areas:
                     ① data collection area, ② transmission area, ③ storage area, ④ monitoring and control
                     area, and ⑤ artificial intelligence analysis area. 
                  
                  Second, we investigated technological details and application cases to analyze whether
                     the technologies in the five areas defined above are appropriate for IoT device communication
                     and sensor data storage and analysis. 
                  
                  Third, we combined the technologies of each area to design the optimal railway safety
                     big data platform architecture. 
                  
                  Lastly, based on the designed railway safety big data platform architecture, we presented
                     an application model that identifies and classifies railroad track status images collected
                     from trains through a deep learning algorithm.