Thesis/Reports/Thesis/sections/results/gru.tex

\subsubsection{GRU Model}
Another popular architecture to model sequential data is a recurrent neural network. There exist two main types of recurrent neural networks, the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU). The GRU is a simplified version of the LSTM, which has fewer parameters and is computationally less expensive. The GRU model can be trained for quantile regression in the same way as the linear and non-linear models using the pinball loss. There is, however, a difference in how the input data is structured and provided to the model. For linear and non-linear models, the data is provided in the shape of $(batch\_size, num\_features)$. The recurrent neural network, on the other hand, expects the input data to be structured as $(batch\_size, time\_steps, num\_features\_per\_timestep)$. This is also explained in the background section about the recurrent neural network.

The GRU model architecture to predict the NRV quantiles is shown in Table \ref{tab:gru_model_architecture}. The model starts with an embedding layer that converts the quarter of the day into an embedding. This layer concatenates the other input features with the quarter embedding. The input of the TimeEmbedding is of shape (Batch Size, Time Steps, Input Features Size). The output of this layer is then passed to the GRU layer. The GRU layer outputs the hidden state for every time step. This results in a tensor of shape (Batch Size, Time Steps, Hidden Size). Only the last hidden state is relevant for the prediction of the NRV quantiles for the next quarter. The last hidden state should contain all the necessary information from the previous quarters to make the prediction. The last hidden state is then passed through a linear layer to output the quantiles for the NRV prediction. The input and output of the model depend if the model is trained using an autoregressive or non-autoregressive way. The non-autoregressive variant of the GRU model has two days worth of time steps. This results in $92*2$ time steps. The model then needs to output $(96 * \text{number\_of\_quantiles})$ NRV quantile values.

TODO: Zielige visualisatie van model nu
\begin{table}[H]
\centering
\begin{tabularx}{\textwidth}{Xr} % Set the table width to the text width
\toprule
\textbf{Layer (Type)} & \textbf{Output Shape} \\ \midrule
\midrule
Time Embedding & [B, Time Steps, Input + Time Embedding Size] \\
\midrule
GRU & [B, Time Steps, Hidden Size] \\
\multicolumn{2}{c}{\textit{Last state of GRU passed [B, Hidden Size]}} \\
Linear & [B, Number of quantiles] \\
\bottomrule
\end{tabularx}
\caption{GRU Model Architecture}
\label{tab:gru_model_architecture}
\end{table}

Multiple experiments are conducted to find which hyperparameters and input features work best for the GRU model. The results of the GRU model are shown in Table \ref{tab:autoregressive_gru_model_results}.

\begin{table}[H]
\centering
\begin{adjustbox}{width=\textwidth,center}
\begin{tabular}{@{}cccccccccc@{}}
\toprule
Features & Layers & Hidden Size & \multicolumn{2}{c}{MSE} & \multicolumn{2}{c}{MAE} & \multicolumn{2}{c}{CRPS} \\
\cmidrule(lr){4-5} \cmidrule(lr){6-7} \cmidrule(lr){8-9}
& & & AR & NAR & AR & NAR & AR & NAR \\
\midrule
NRV & & & & & & & & \\
& 2 & 256 & 39838.35 & 40097.62 & 150.81 & 150.37 & 85.04 & 76.12 \\
& 4 & 256 & 39506.55 & 39968.96 & 149.81 & 150.04 & 85.46 & 76.07 \\
& 8 & 256 & 37747.11 & 40400.37 & 146.67 & 151.03 & 83.67 & 76.59 \\
& 2 & 512 & 39955.79 & 40917.24 & 150.77 & 152.04 & 87.88 & 76.06 \\
& 4 & 512 & 43301.13 & 39954.62 & 156.73 & 150.14 & 89.78 & 76.25 \\
& 8 & 512 & 37681.71 & 40379.14 & 146.62 & 151.05 & 83.08 & 76.42 \\
\midrule
NRV + Load & & & & & & & & & \\
& 2 & 256 & 33202.80 & 38427.91 & 138.02 & 147.27 & 79.62 & 84.17 \\
& 4 & 256 & 33600.73 & 38984.44 & 138.62 & 147.91 & 81.03 & 85.91 \\
& 8 & 256 & 32828.61 & 38343.98 & 136.82 & 146.44 & 79.42 & 84.22 \\
& 2 & 512 & 35979.57 & 41496.77 & 144.16 & 153.53 & 83.50 & 88.26 \\
& 4 & 512 & 32334.73 & 38000.40 & 135.92 & 146.10 & 78.82 & 83.99 \\
& 8 & 512 & 35177.39 & 41104.28 & 141.79 & 152.13 & 83.79 & 89.13 \\
\midrule
NRV + Load + PV\\ + Wind & & & & & & & & & \\
& 4 & 256 & 31594.55 & 39872.46 & 134.11 & 149.34 & 77.52 & 85.91 \\
& 8 & 256 & 31481.22 & 39704.37 & 133.45 & 148.59 & 77.26 & 85.62 \\
& 4 & 512 & 31368.31 & 39024.27 & 134.02 & 147.91 & 76.58 & 84.18 \\
& 8 & 512 & 34566.66 & 42397.86 & 140.13 & 154.00 & 82.09 & 89.87 \\
\midrule
NRV + Load + PV\\ + Wind + Net Position \\+ QE (5 dim) & & & & & & & & & \\
& 4 & 256 & 30130.37 & 39906.53 & 130.92 & 149.78 & 75.02 & 84.88 \\
& 8 & 256 & 28560.67 & 37675.15 & 127.77 & 145.39 & 73.77 & 83.37 \\
& 4 & 512 & & & & & & & \\
& 8 & 512 & 27421.85 & 35238.98 & 125.32 & 141.02 & 72.73 & 80.92 \\
\bottomrule
\end{tabular}
\end{adjustbox}
\caption{Autoregressive GRU quantile regression model results. All the models used a dropout of 0.2 .}
\label{tab:autoregressive_gru_model_results}
\end{table}