Prediction Cryptocurrencies using Hybridization Machine Learning

In this paper, this study attempts to predict the twenty cryptocurrency price by taking into consideration various parameters that affect the trading or investment market. For the first level of this study aim to construct the forecasting model to predict the future values of cryptocurrencies and the live model of decision making for trading is deploy. This body of this project were follow the CRISP data mining methodology in supporting to process the models. The data source is from the Academia Datathon 2018. The data set consists of various attributes related to the various coin, price and time, recorded daily in 2018. For the second level, by using the best model from level one which has been compared the accuracy and performance, will be used to construct Artificial Intelligence bot for decision making of trading or investment. By focusing on twenty major cryptocurrencies, each with the large market size and price, this study attempts to predict the forecasting price based on time series method such as min, max and mean price values.

8

votes

Business Understanding

Bitcoin, the first ever decentralized cryptocurrency, was originally conceived to create a global currency and payment system that is able to reliably work without requiring a trusted third party. The main underlying technology that powers Bitcoin, and almost all other cryptocurrencies today, is known as block -chain (Lewis, 2015). The block-chain requires a peer-to-peer network. The ADEPT is one of peer-to-peer network which has been developed by IBM partnership with Samsung (Panikkar, Nair, Brody, & Pureswaran, 2015).

The emerging payment by using crypto currencies accelerating the technology pervades the positive impact to human life (Bakar & Rosbi, 2017). The individual and business has adopted this new system in term of transact a money quickly and efficiently over the internet without to supply the credit cards or banking information and use a traditional payment system (Ahamad, Nair, & Varghese, 2013). In 2018, the cryptocurrency has a total market cap of around $800 billion USD in Jan 2018 as reported in the Global chart of total market capitalization.

The cryptocurrency has been seen as investment tool is not only focusing on investor but the private investors and brokers also interested to these digital currencies. In this regards, it is necessary to predict the future value (i.e. forecasting or estimating) of cryptocurrency prices to make decision in trading or investment. The objective of Level-I was to construct the forecasting model to predict the future values of cryptocurrencies. The case is financial time-series prediction. The case integrates knowledge from various sources – Crypto Currencies, Quantitative Finance and Machine learning. The objective of Level-II was to implement the model which involves applying “live” model to make a decision making for trading or investment. To predict the forecasting values of 20 major cryptocurrencies, the methodology shown in Figure 1 was employed.

Data Understanding

For constructing the prediction model for cryptocurrencies, the dataset was provided. The provided dataset contains the historical data of 1500 different cryptocurrencies. The price of each cryptocurrency was recorded for every five minutes. In Level-I, the task was to predict the future price of 20 major cryptocurrencies. Appendix I show the code snippet for checking data quality problems.

Data Preparation and Transformation

Here, the price of cryptocurrencies was transformed into three descriptive statistical values namely, mean, maximum, and minimum. To transformed the data, the sliding window concept was used. The threshold of 12 window size was used for transforming the price values into mean, minimum, and maximum. In addition, the 12 window size was used to convert 5 minutes’ timestamp to 1-hour timestamp. The resultant transformed file is shown in Table 1. Appendix II shows the code snippet for transforming the cryptocurrencies dataset into sliding window concepts and generating Mean, Maximum, and Minimum values for each sliding window.

Once, the cryptocurrencies files were transformed into useful and informative features then these transformed feature files were fed an input to prediction algorithm to learn the cryptocurrencies patterns from the historic time series data and construct the forecasting model to predict the future values of 20 major cryptocurrencies. Three different prediction algorithms were trained to learn forecasting rules and to compare which one is suitable on our corpus. These three prediction algorithms are; Support Vector Machine for Regression (SVMR), Linear Regression (LR), and Random Forest (RF).

This section presents the results of all 60 analyses (20 Cryptocurrencies files x 3 prediction algorithms). For each analyses, four different performance metrics (namely, MAPE, DS, CVMAPE, CS and CE) are shown. In addition, the combined model score for all 20 predicted models is also shown in Table 2 and Table 3.

Table 2- Experimental results for MAPE and DS predicted the 20 major cryptocurrencies

Table 2- Experimental Results

CID

Linear Regression

SVM Regression

Random Forest

MAPE

DS

CE

MAPE

DS

CE

MAPE

DS

CE

1442

0.0204

93.0645

10822.83

0.0151

97.6613

99.83333

0.0401

87.5806

10.16667

1443

0.1637

87.1774

3447.333

0.0214

97.5806

104.2222

0.0261

95.4839

12

1443

0.0167

94.0323

21903.5

0.0361

96.9355

186.8333

0.0792

74.4355

11.66667

1444

0.2287

88.871

20230.33

0.0261

98.7903

149

0.0652

94.7581

10

1445

0.0302

95.3226

9960.5

0.0532

97.9032

136.8333

0.0717

95.4032

10.66667

1446

0.036

97.0968

3047

0.0541

98.7903

131.6111

0.1039

96.9355

10.66667

1447

0.5237

82.6613

5311.127

0.0164

98.0645

188.6667

0.0248

96.4516

11

1448

0.5155

80.5645

20240.33

0.0204

97.0968

169.6667

0.0385

94.7581

11

1449

0.4043

89.7581

20519.17

0.0334

97.6613

187.1667

0.0378

91.6129

11.66667

1450

0.0828

94.1935

18514

0.0365

98.7903

144.1667

0.062

95.7258

10.66667

1451

0.3644

92.3387

9103

0.0348

99.3548

213.1667

0.0353

97.7419

11.66667

1452

0.0446

92.4194

18152.33

0.0184

97.2581

141.8333

0.0483

91.2903

5.5

1453

0.0575

89.6774

18282.33

0.0205

96.5323

124.8333

0.0489

92.8226

7

1454

0.4911

94.5161

5918.667

0.0459

99.1129

168.6667

0.1302

73.3065

7.333333

1455

0.2037

94.9681

3514.333

0.0577

99.1214

189.1667

0.0792

99.1129

9.833333

1456

0.8155

87.6613

22892.22

0.0197

98.7097

130.3333

0.027

95.9677

6.833333

1457

0.1853

93.2258

6648

0.0272

96.7742

176.1667

0.029

92.7419

9.333333

1458

0.0677

92.8226

6724.667

0.0226

97.9032

110.6667

0.0373

94.5968

9.666667

1459

0.0386

78.5484

34454

0.0019

85.8871

54.66667

0.005

46.129

6.833333

1460

0.0193

92.0968

5519.083

0.0311

96.8548

53.89815

0.4796

28.5484

8.5

Table 3- Experimental result of average prediction on Combine Score 20 major cryptocurrencies

Linear Regression

SVM Regression

Random Forest

R

M

D

U

Z

R

M

D

U

Z

R

M

D

U

Z

1.0594308449457146

0.215485

90.55083

13260.24

-13171

0.49118014318682995

0.029625

97.364621

143.07

-46.2262

1.3628232655093842

0.073455

86.77016

9.60

75.73388

Deployment

The modelling approach of the predict the forecast price in trading market size. The model has been illustrated in the previous section. After training and testing as shown in Figure 1 previously (see Section 1). The proposed model will be ready to use. In this section, the proposed approach is described by applied the best model that has been evaluated which had the highest performance and accuracy in previous section. This phase contain four main layer:

Layer 1 Crawling Agent (CA): CA is responsible for crawling cryptocurrencies data market from Internet. The suitable language for create crawling agent is python by using Scrapy package. It is an open source and collaborative framework for extracting the data from website. The advantages of Scrapy it can done job in a fast, simple and extensible way.

Layer 2 Transform Agent (TA): TA is agent used for running the transformation of data for cleaning process by using sliding window technique. At the same time, the agent also checking the missing values in the data extraction data in CA.

Layer 3 (Analyzer Agent): This layer will analyze data provide by TA agent. This agent using forecasting weka feature that can integrate with others system through weka python plugin to forecast the time series prediction. Based on the previous experiment Random Forest the best machine learning performed with the dataset.

Layer 4 (Decision Maker): This layer is making decision to alert investor to buy and sell of forecast price according time series. The first step is finding the features input based on statistical naive method by shifting the previous price as predict price in advance so that we can be prepared to make trading and do the prediction based on random forest as shown in the pseudocode of forecast model (see Figure 3). Then, the similarity is employed to form a similarity relationship between one rules to another rules from the fuzzy case base. The fuzzy rules generate the rules (see Figure 4), which indicates the highest and lowest prices status for the trading which can make the decision for investor to trade the current and future currencies. This is applied to make the trading strategies at various level which making the decision, suggestion and recommendation to prevent the predicted currencies failure.

Table 2 present the result of accuracy of forecasting measure by mean absolute percentage error (MAPE) and the percentage of occurrences by using directional symmetry (DS) in which the measure the performance of a model in predicting the direction of value changes. The three models which are Linear Regression, SVM Regression and Random Forest were tested. The result obtained was as expected as Random Forest has an edge over other two machine learning algorithms. Among the three algorithms, Random Forest was the fastest to be executed but Linear Regression suffering badly to predict mean value of Bitcoin price. On the other hand, SVM Regression was able to produce accurate predictions while Random Forest came second in that criteria. However, Random Forest was the most stable algorithm when handling different types of datasets. Lastly, again SVM Regression proved that it can predict the direction of the Bitcoin price accurately. Nevertheless, Random Forest emerged as the most optimum and most suitable algorithm to predict prices of the Bitcoin currency (Z= 75.73388).

(Please refer the results of this experiment at Section 6 Results)

Conclusion

Based on the prediction scoring formulas, we have calculated Combined Score Model (Z) for all three machine learning algorithms (SVM Regression, Random Forest, Linear Regression). However, in the end, we have found out that Random Forest has the lowest Z value which is 75.73388 and was the fastest as it has the lowest U value. Thus concluding this article by mentioning that Random Forest is the best machine algorithm to predict Bitcoin price value for every 5 minutes and hour.

Post navigation

I can judge only from crypto related side so in general I like how the solution is structured and every step is explained. Even I am not data scientist I managed to understand a bit because of the good explanation. The test will show if you have managed to do the job but besides that great work. Keep the good work and good luck!

Very nice structure of the paper, with all the results explained in details and all the required test supplied. Well Done. I can’t find the source code for prediction. I found only the matlab code for data manipulation

It would be nice to compare your results with a baseline (predicting the previous point, or average of the previous points or something similar). Also, your idea to use accumulated measures for the past 1 hour is interesting. Did you compare it with just giving the previous points as features to the regression?