Thesis/Reports/Thesis/sections/results/gru.tex

\subsubsection{GRU Model}
Another popular architecture to model sequential data is a recurrent neural network. There exist two main types of recurrent neural networks, the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU). The GRU is a simplified version of the LSTM, which has fewer parameters and is computationally less expensive. The GRU model can be trained for quantile regression in the same way as the linear and non-linear models using the pinball loss. There is, however, a difference in how the input data is structured and provided to the model. For linear and non-linear models, the data is provided in the shape of $(batch\_size, num\_features)$. The recurrent neural network, on the other hand, expects the input data to be structured as $(batch\_size, time\_steps, num\_features\_per\_timestep)$. This is also explained in the background section about the recurrent neural network.

The GRU model architecture to predict the NRV quantiles is shown in Table \ref{tab:gru_model_architecture}. The model starts with an embedding layer that converts the quarter of the day into an embedding. This layer concatenates the other input features with the quarter embedding. The input of the TimeEmbedding is of shape (Batch Size, Time Steps, Input Features Size). The output of this layer is then passed to the GRU layer. The GRU layer outputs the hidden state for every time step. This results in a tensor of shape (Batch Size, Time Steps, Hidden Size). Only the last hidden state is relevant for the prediction of the NRV quantiles for the next quarter. The last hidden state should contain all the necessary information from the previous quarters to make the prediction. The last hidden state is then passed through a linear layer to output the quantiles for the NRV prediction. The input and output of the model depend if the model is trained using an autoregressive or non-autoregressive way. The non-autoregressive variant of the GRU model has two days worth of time steps. This results in $92*2$ time steps. The model then needs to output $(96 * \text{number\_of\_quantiles})$ NRV quantile values.

TODO: Zielige visualisatie van model nu
\begin{table}[H]
\centering
\begin{tabularx}{\textwidth}{Xr} % Set the table width to the text width
\toprule
\textbf{Layer (Type)} & \textbf{Output Shape} \\ \midrule
\midrule
Time Embedding & [B, Time Steps, Input + Time Embedding Size] \\
\midrule
GRU & [B, Time Steps, Hidden Size] \\
\multicolumn{2}{c}{\textit{Last state of GRU passed [B, Hidden Size]}} \\
Linear & [B, Number of quantiles] \\
\bottomrule
\end{tabularx}
\caption{GRU Model Architecture}
\label{tab:gru_model_architecture}
\end{table}

Multiple experiments are conducted to find which hyperparameters and input features work best for the GRU model. The results of the GRU model are shown in Table \ref{tab:autoregressive_gru_model_results}.

\begin{table}[H]
\centering
\begin{adjustbox}{width=\textwidth,center}
\begin{tabular}{@{}cccccccccc@{}}
\toprule
Features & Layers & Hidden Size & \multicolumn{2}{c}{MSE} & \multicolumn{2}{c}{MAE} & \multicolumn{2}{c}{CRPS} \\
\cmidrule(lr){4-5} \cmidrule(lr){6-7} \cmidrule(lr){8-9}
& & & AR & NAR & AR & NAR & AR & NAR \\
\midrule
NRV & & & & & & & & \\
& 2 & 256 & 39838.35 & 40097.62 & 150.81 & 150.37 & 85.04 & 76.12 \\
& 4 & 256 & 39506.55 & 39968.96 & 149.81 & 150.04 & 85.46 & 76.07 \\
& 8 & 256 & 37747.11 & 40400.37 & 146.67 & 151.03 & 83.67 & 76.59 \\
& 2 & 512 & 39955.79 & 40917.24 & 150.77 & 152.04 & 87.88 & 76.06 \\
& 4 & 512 & 43301.13 & 39954.62 & 156.73 & 150.14 & 89.78 & 76.25 \\
& 8 & 512 & 37681.71 & 40379.14 & 146.62 & 151.05 & 83.08 & 76.42 \\
\midrule
NRV + Load & & & & & & & & & \\
& 2 & 256 & 38427.91 & 40024.14 & 147.27 & 150.06 & 84.17 & 76.04 \\
& 4 & 256 & 38984.44 & 40480.73 & 147.91 & 151.24 & 85.91 & 75.82 \\
& 8 & 256 & 38343.98 & 39135.60 & 146.44 & 148.85 & 84.22 & 76.19 \\
& 2 & 512 & 41496.77 & 40808.04 & 153.53 & 151.89 & 88.26 & 75.43 \\
& 4 & 512 & 38000.40 & 40260.01 & 146.10 & 150.57 & 83.99 & 75.38 \\
& 8 & 512 & 41104.28 & 39907.44 & 152.13 & 150.11 & 89.13 & 76.42 \\
\midrule
NRV + Load + PV\\ + Wind & & & & & & & & & \\
& 4 & 256 & 39872.46 & 40708.93 & 149.34 & 151.32 & 85.91 & 75.93 \\
& 8 & 256 & 39704.37 & 40292.25 & 148.59 & 151.19 & 85.62 & 75.94 \\
& 4 & 512 & 39024.27 & 41580.29 & 147.91 & 153.39 & 84.18 & 75.84 \\
& 8 & 512 & 42397.86 & 41043.88 & 154.00 & 152.63 & 89.87 & 76.35 \\
\midrule
NRV + Load + PV\\ + Wind + Net Position \\+ QE (5 dim) & & & & & & & & & \\
& 4 & 256 & 39906.53 & 40881.92 & 149.78 & 152.34 & 84.88 & 76.15 \\
& 8 & 256 & 37675.15 & 40159.91 & 145.39 & 150.42 & 83.37 & 75.89 \\
& 4 & 512 & & 40613.54 & & 151.17 & & 75.33 \\
& 8 & 512 & 35238.98 & 39896.57 & 141.02 & 149.96 & 80.92 & 75.92 \\
\bottomrule
\end{tabular}
\end{adjustbox}
\caption{Autoregressive GRU quantile regression model results. All the models used a dropout of 0.2 .}
\label{tab:autoregressive_gru_model_results}
\end{table}

The results show the same behavior for the GRU model as for the linear and non-linear models. The performance of the autoregressive model increases a bit when more features are added. The performance of the non-autoregressive model does not increase that much when adding new features. The reason for this is the same as for the linear and non-linear models. There is a large input size for the non-autoregressive model, which makes it harder to learn the dependencies between the features. The non-autoregressive model has to predict 96 quarters, which is a complex task. When comparing the results of the autoregressive GRU model and the non-autoregressive GRU model, the same observation can be made for the linear and non-linear models. The CRPS is always lower for the non-autoregressive model while the MSE and MAE are higher most of the time.

% TODO: explain from which models the examples come from
\begin{figure}[H]
    \centering
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/aqr_gru_model_examples/AQR_GRU_NRV_Load_Wind_PV_NP_QE-Sample_864.png}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/naqr_gru_model_examples/NAQR_GRU_NRV_Load_Wind_PV_NP_QE-Sample_864.png}
    \end{subfigure}
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/aqr_gru_model_examples/AQR_GRU_NRV_Load_Wind_PV_NP_QE-Sample_4320.png}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/naqr_gru_model_examples/NAQR_GRU_NRV_Load_Wind_PV_NP_QE-Sample_4320.png}
    \end{subfigure}
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/aqr_gru_model_examples/AQR_GRU_NRV_Load_Wind_PV_NP_QE-Sample_6336.png}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/naqr_gru_model_examples/NAQR_GRU_NRV_Load_Wind_PV_NP_QE-Sample_6336.png}
    \end{subfigure}
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/aqr_gru_model_examples/AQR_GRU_NRV_Load_Wind_PV_NP_QE-Sample_7008.png}
        \caption{Autoregressive GRU model}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/naqr_gru_model_examples/NAQR_GRU_NRV_Load_Wind_PV_NP_QE-Sample_7008.png}
        \caption{Non-autoregressive GRU model}
    \end{subfigure}

    \caption{Comparison of the autoregressive and non-autoregressive GRU model examples.}
    \label{fig:gru_model_sample_comparison}
\end{figure}

The examples from the test set using the GRU models are shown in Figure \ref{fig:gru_model_sample_comparison}. Again the same behavior can be observed for the linear and non-linear models. The non-autoregressive examples stay around zero and do not follow the trend of the real NRV values. The autoregressive examples look a lot better visually and follow the trend of the real NRV values much more.

\begin{figure}[ht]
    \centering
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/quantile_performance/AQR_GRU_QP_Train.jpeg}
        \caption{AR - Train}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/quantile_performance/AQR_GRU_QP_Test.jpeg}
        \caption{AR - Test}
    \end{subfigure}

    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/quantile_performance/NAQR_GRU_QP_Train.jpeg}
        \caption{NAR - Train}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/quantile_performance/NAQR_GRU_QP_Test.jpeg}
        \caption{NAR - Test}
    \end{subfigure}
    \caption{Over/underestimation of the quantiles for the autoregressive and non-autoregressive GRU models. Both the quantile performance for the training and test set are shown. The plots are generated using the input features NRV, Load, Wind, PV, Net Position, and the quarter embedding (only for the autoregressive model).}
    \label{fig:gru_model_quantile_over_underestimation}
\end{figure}

The plots in Figure \ref{fig:gru_model_quantile_over_underestimation} show the over/underestimation of the learned quantiles for the GRU models. The fraction of real NRV values under the predicted quantiles of the training set is very close to the ideal fraction. The autoregressive model, however, shows a slight underestimation for almost all quantiles. Looking at the test set, the lower quantiles are overestimated for the autoregressive model while the higher quantiles are underestimated. The quantile predictions of the test set for the non-autoregressive model are all underestimated. This means a lower fraction of real NRV values is below the quantiles than wanted.