In order to understand better the meaning of “Time series Forecast”, let’s split the term in two parts:

Time series is a sequence of observations taken sequentially in time.

Forecast means making predictions about a future event.

Time Series Forecasting has been in use for quite some time in several industries; It is often used in any industry to guide future decisions, for example, in retail forecasting is very important so that the raw material can be procured accordingly. The best known example is the weather forecast, where the future can be predicted based on the pattern in the past and recent changes. These predictions are very important and usually are the first step to solve other problems.

Traditional Time Series analysis involves decomposing the data into its components such as trend component, seasonal component and noise.
< Techniques such as ARIMA(p,d,q), moving average, auto regression were used to analyze time series. Stateful RNN’s such as LSTM is found to be very effective in Time Series analysis in the recent past.

Neural networks are a very comprehensive family of machine learning models and, in recent years, their applications in finance and economics have dramatically increased.

The idea of Recurrent Neural Networks (RNNs) is to use sequential information. In a traditional neural network, we assume that all inputs are independent. But for many tasks this is not an optimal idea. RNNs are called recurrent because you perform the same task for each element of a sequence, with the output depending on the previous calculations.

RNN is a special case of neural network similar to convolutional neural networks, the difference being that RNN’s can retain its state of information.

LSTM is a variant of the RNN architecture. It was difficult to train models using traditional RNN architectures. Recent advancements demonstrate state of the art results using LSTM(Long Short Term Memory). An LSTM cell has 5 essential components which allows it to model both long-term and short-term data: the cell state, hidden state, input gate, forget gate and output gate. One critical advantage of LSTMs is their ability to remember from long-term sequences.
The LSTM architecture was able to take care of the vanishing gradient problem in the traditional RNN.

The goal of this Tutorial is to provide a practical guide to neural networks for forecasting financial time series data.

is a mechanism that allows information to be passed selectively. It consists of a sigmoid layer (activation function-sigmoid) and a point multiplication operation.

The output of the sigmoid layer is a number from 0 to 1, which determines the level of transmission. Zero means" don't miss anything "and one means"skip all". The cell has three gates that control its state.

The second gate decide which new information to write to the cell state. This stage is divided into two parts. First, the sigmoid layer, called the input gate, decides which values need to be updated. The tanh layer then creates a vector of New CT ' candidate values that can be added to the cell state.

The third gate decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.

implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).

SGD (Stochastic gradient descent optimizer. ) , RMSprop (It is recommended to leave the parameters of this optimizer at their default values (except the learning rate, which can be freely tuned). Adam

Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order mo- ments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpre- tations and typically require little tuning.

A metric is a function that is used to judge the performance of your model. Metric functions are to be supplied in the metrics parameter when a model is compiled.
It coud be Custom metrics , binary_accuracy, categorical_accuracy. A metric function is similar to a loss function, except that the results from evaluating a metric are not used when training the model. You may use any of the loss functions as a metric function.

A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training. You can pass a list of callbacks (as the keyword argument callbacks) to the .fit() method of the Sequential or Model classes. The relevant methods of the callbacks will then be called at each stage of the training.
We will use "history": callback that records events into a History object. This callback is automatically applied to every Keras model. The History object gets returned by the fit method of models.

# We can just fit or write it to history of learning. Then we will see the process of learninghistory=model.fit(X_train,y_train,epochs=20,batch_size=10,validation_data=(X_test,y_test),verbose=2,shuffle=False)

# make functions for plotting history and predictionsdefplot_history(history):plt.figure(figsize=(6,4))plt.plot(history.history["loss"],label="Train")plt.plot(history.history["val_loss"],label="Test")plt.title("Loss over epoch")plt.xlabel('Epoch')plt.ylabel('Loss')plt.legend()plt.show()defplot(X_test):pred=model.predict(X_test)pred_inverse=scaler.inverse_transform(pred.reshape(-1,1))y_test_inverse=scaler.inverse_transform(y_test.reshape(-1,1))# Calculate mean_squared_error. Previosly we did MinMax scale, so apply inverse_transform to recover valuesrmse=sqrt(mean_squared_error(y_test_inverse,pred_inverse))print('Test RMSE: %.3f'%rmse)plt.figure(figsize=(15,8))plt.plot(pred_inverse,label='predict')plt.plot(y_test_inverse,label='actual')plt.legend()plt.show()

Actually not that result that we expect from Neural Network! Keep on trying¶

# Another one approach to break the data using "shift"# Take bigger window window_size=100df_s=df.copy()foriinrange(window_size):df=pd.concat([df,df_s.shift(-(i+1))],axis=1)df.dropna(axis=0,inplace=True)

we can say that RNNs are pretty good instrument for predicting time series. It has pros and cons.
It could give very good results because LSTM remember previos periods, peaks, trens and other moments.
More than 1000 parameters to tune model makes choise of LSTM in simple time prediction hard. You can change numer of layes, choose dropout, window size, optimizer, bathc size, number of epochs, return or not renurn sequences.

<It seems, that choose of LSTM is good, when ARIMA, SARIMA, Prophet aren't give acceptable metrics and results. Or when we have lots of additional information to Time series. For example: Using News to Predict Stock Movements, of sales with lots of conditions from business that describe different changes in Time Series.