Transcription

1 Send Orders for Reprints to The Open Automation and Control Systems Journal, 2014, 6, Open Access Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform Zhao Xiaoyong * and Yang Chunrong Mathematics and Computer Science Institute, XinYu University, JiangXi, , China Abstract. This paper establishes the massive data processing mathematical model and algorithm of cloud computing, and the Hadoop distributed computing method is introduced to the database management system, to realize the automatic partition database data and master-slave node set. The master-slave nodes distributed algorithm is complied by using the MATLAB software realizing the data distributed computing function, and through the numerical simulation, we can compute the data processing speed, transmission rate, capacity and other system parameters. Compared with the Hadoop distributed processing algorithms and two kinds of traditional data processing algorithm, we can find that the data processing speed of Hadoop distributed computing algorithm is faster than the general algorithm, the amount of information storage, information transmission speed, which can satisfy the need of high data processing. Keywords: Hadoop; Cloud computing; Massive data; Large capacity; Distributed computing; Master slave node 1. INTRODUCTION With the development of computer internet communication computing, communication system needs to deal with a very large data. For the massive data processing, the server will consume a large amount of computer resources, and many traditional computing platforms cannot complete the mass data processing [1,2]. Combined with the calculation function of Hadoop distributed, a cloud computing data processing platform of high-speed processing mass data is designed by using cloud computing and cloud storage technology [3]. At the same time, combined with master-slave nodes automatic assignment, the platform uses the automatically partitioned database to realize a high data capacity and transmission speed data processing, which provides a new scheme for the massive data processing algorithms and data communication technology. 2. OVERVIEW OF HADOOP CLOUD PLATFORM DATABASE MASSIVE DATA PROCESSING AND MINING METHODS The evolution process of the personalized internet causes a massive data, and the traditional single super server in the face of massive data has gradually fallen short, so the processing massive data has become a thorny problem [4]. The characteristics of open source Hadoop cloud platform developed by Apache foundation research brings endless possibilities for the huge amount of data processing methods, this paper uses the Hadoop data processing platform and combines with distributed data processing algorithm to establish the massive database processing data platform, the main structure is shown in Fig. (1). Fig. (1) shows the schematic diagram of Hadoop massive data processing, it can be seen from the chart that the data processing of massive data Hadoop mainly consists allocation computing algorithm design and master-slave distributed, these two techniques can successfully achieve high speed processing of massive database data. Massive database Hadoop database processing algorithm Distributed computing method Master-slave nodes distribution Completing data processing Fig. (1). Hadoop massive data processing / Bentham Open

3 Research on Database Massive Data Processing The Open Automation and Control Systems Journal, 2014, Volume Browser Hadoop utility tool Secondary JVM JobTracker Win Operating system Server Fig. (3). The schematic diagram of Hadoop master node design. Secondary JobTracker JVM Linux Operating system Server Fig. (4). The schematic diagram of Hadoop slave node design. Fig. (5). The schematic diagram of Hadoop and Hbase running. distributed database system HDFS and distributed file system Hbase. As shown in Fig. (3), the master-slave node design is the main calculation mode of Hadoop implementation distributed computing and data storage, which has a master node and multiple slave nodes in each database cluster, and the master node running daemon includes, Secondary and JobTracker. As shown in Fig. (4), there are many salvee nodes in the Hadoop massive data cloud processing system, which can realize the data distributed processing function [8]. In the slave node running, the daemon is DataNode and Task- Tracker. Fig. (5) shows the Hadoop and HBase run simultaneously, it can layout Hbase information and data partition, this

4 1466 The Open Automation and Control Systems Journal, 2014, Volume 6 Xiaoyong and Chunrong Fig. (6). Master node opening. Fig. (7). The massive data processing computing time simulation curve. paper uses the browser Web log to deal with massive data, the master node opening is shown in Fig. (6). Fig. (6) shows the schematic diagram Master node opening, it is the key of implementing Hadoop database massive data processing [9,10]. In order to verify the validity and reliability of the system, this paper goes through simulation that can obtain the calculation performance table as shown in Table 1. Table 1 shows the simulation parameters of the cloud massive data processing Hadoop platform [11]. It can be seen from the table that Hadoop data processing calculation can achieve a higher level, in which the minimum value of data calculation is 0.001Mb/s, the information capacity of the minimum value is 100Gb and the speed is 1800 frames /s, they all meet the needs of design. In order to compare different algorithms on the processing effect of the massive data, this paper selects two kinds of traditional algorithm and Hadoop cloud processing algorithm to carry on comparison, the amount of data is from the initial dozens of million to several hundreds of megabytes [12, 13]. As shown in Fig. (7), the MATLAB numerical simulation calculation can be found that the computing speed of Hadoop cloud processing computing platform is significantly higher than two traditional algorithms, especially when the data is reached 105 level, data processing speed will be faster, it can save several times data processing time, which is a kind of high efficient data processing algorithms. Table 1. The massive data processing cloud platform simulation parameters. Main Simulation Parameters Numerical Calculated data Min Mb/s Calculated data Max Mb/s 0.01 Information capacity Min Gb 100 Information capacity Max Gb SUMMARY Speed (frame /s) 1800 Massive image processing technology requires very high for the processor and memory, which need to adopt high performance processor and large capacity memory, because

Send Orders for Reprints to reprints@benthamscience.ae 766 The Open Electrical & Electronic Engineering Journal, 2014, 8, 766-771 Open Access Research on Application of Neural Network in Computer Network

Send Orders for Reprints to reprints@benthamscience.ae 1582 The Open Cybernetics & Systemics Journal, 2015, 9, 1582-1586 Open Access The Construction of Seismic and Geological Studies' Cloud Platform Using

Send Orders for Reprints to reprints@benthamscience.ae 522 The Open Electrical & Electronic Engineering Journal, 2014, 8, 522-526 Open Access Regulation of Voltage and Reactive Power Based on the Control

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL *Hung-Ming Chen, Chuan-Chien Hou, and Tsung-Hsi Lin Department of Construction Engineering National Taiwan University

Send Orders for Reprints to reprints@benthamscience.ae 50 The Open Cybernetics & Systemics Journal, 2015, 9, 50-54 Open Access Research and Application of Redundant Data Deleting Algorithm Based on the

, pp.265-274 http://dx.doi.org/10.14257/ijdta.2016.9.6.27 Research on Database Remote Disaster Recovery and Backup Technology Based on Multi Point and Multi Hop Guoyong Lin and Fan Huang, Department of

3rd International Conference on Science and Social Research (ICSSR 2014) Exploration on Security System Structure of Smart Campus Based on Cloud Computing Wei Zhou Information Center, Shanghai University

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the

Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 479-484 479 Open Access Research and Design for Mobile Terminal-Based on Smart Home System

S Orders for Reprints to reprints@benthamscience.net The Open Automation and Control Systems Journal, 2013, 5, 187-193 187 Research of Data Mining Algorithm Based on Cloud Database Open Access Tianxiang

Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 839-843 839 Open Access Research on the Development and Preliminary Application of 12396

Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 353-357 353 Open Access Design of a Python-based Wireless Network Optimization and Testing

Send Orders for Reprints to reprints@benthamscience.ae 384 The Open Cybernetics & Systemics Journal, 2015, 9, 384-389 Open Access Study on the Evaluation for the Knowledge Sharing Efficiency of the Knowledge

Send Orders for Reprints to reprints@benthamscience.ae 2244 The Open Automation and Control Systems Journal, 2015, 7, 2244-2252 Open Access Research of Massive Spatiotemporal Data Mining Technology Based

Journal of Information & Computational Science 7: 3 (2010) 759 765 Available at http://www.joics.com A Scheme for Implementing Load Balancing of Web Server Jianwu Wu School of Politics and Law and Public

Send Orders for Reprints to reprints@benthamscience.ae The Open Mechanical Engineering Journal, 2015, 9, 213-218 213 Open Access Mechanical Analysis of Crossbeam in a Gantry Machine Tool and its Deformation

2013 IEEE International Conference on Big Data Evaluating Task Scheduling in Hadoop-based Cloud Systems Shengyuan Liu, Jungang Xu College of Computer and Control Engineering University of Chinese Academy

Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department

Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 105-109 105 Open Access Study on One Map Organizational Model for Land Resources Data Used in Supervision

Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 204, 6(7):680-684 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Development of cloud computing system based on wireless

ANALYSING THE FEATURES OF JAVA AND MAP/REDUCE ON HADOOP Livjeet Kaur Research Student, Department of Computer Science, Punjabi University, Patiala, India Abstract In the present study, we have compared

Send Orders for Reprints to reprints@benthamscience.ae The Open Construction and Building Technology Journal, 2014, 8, 455-462 455 Open Access Numerical Analysis on Mutual Influences in Urban Subway Double-Hole

Cloud Computing based on the Hadoop Platform Harshita Pandey 1 UG, Department of Information Technology RKGITW, Ghaziabad ABSTRACT In the recent years,cloud computing has come forth as the new IT paradigm.