Thesis/Reports/Thesis/sections/background.tex

\section{Background}
% Achtergrond informatie
% Generatief modelleren
% -> enkel forecast is vaak brak -> reinforcement learning is lastig -> generatief modelleren, veel generaties om mee te trainen
% - Achtergrond electrititetismarkt
% - Achtergrond Generatief modelleren (van NRV)
\subsection{Electricity market}
The electricity market consists of many different parties who all work together and want to make a profit in the end. An overview of the most important parties can be found in Table \ref{tab:parties}.

% table
\begin{table}[h]
    \centering
    \begin{tabularx}{\textwidth}{|C|C|}
        \hline
        \textbf{Party} & \textbf{Description} \\
        \hline
        Producers & Generates electricty. The electricity can be generated using coal, nuclear energy, wind parks etc. \\
        \hline
        Consumers & Uses electricity. This can be normal households, companies but also industry. \\
        \hline
        Transmission system operator (TSO) & Party responsible for reliable transmission of electricity from generation plants to local distribution networks. This is done over the high-voltage grid. In Belgium, this party is Elia.\\
        \hline
        Distribution system operator (DSO) & Party responsible for the distribution of electricity to the end users. Here, the electricity is transported over the low-voltage grid. \\
        \hline
        Balancing responsible party (BRP) & These parties forecast the electricity consumption and generation of their clients. They make balanced nominations to Elia.
        \\
        \hline
        Balancing Service Provider (BSP) & Parties that provide the TSO (Elia) with balancing services. They submit Balancing Energy Bids to Elia. If needed, they will provide balancing energy at a set price. \\
        \hline
    \end{tabularx}
    \caption{Overview of the most important parties in the electricity market}
    \label{tab:parties}
\end{table}

Elia, the Transmission system operator (TSO) in Belgium is responsible for keeping the grid stable. They do this by balancing the electricity consumption and generation. If there is an imbalance, Elia will use reserves to balance the grid. These reserves are expensive and are paid by the market participants. The prices paid for the activations of these reserves is called the imbalance price. Keeping the grid balanced is a very important but also a very difficult task. If the grid is not balanced, it can lead to blackouts but also other problems like damage to equipment and so on.
\\\\
Balance responsible parties (BRPs) forecast the electricity consumption and generation of their portfolio to effectively manage the balance between supply and demand within the grid they operate in. They submit a daily balance schedule for their portfolio the day before to the transmission system operator. This consists of the expected physical injections and offtakes from the grid and the commercial power trades. The power trades can be purchases and sales between BRPs or they can even be trades with other countries. BRPs must provide and deploy all reasonable resources to be balanced on a quarter-hourly basis. They can exchange electricity with other BRPs for the following day or the same day. There is one exception where a BRP can deviate from the balance schedule. This is when the grid is not balanced and they can help Elia to stabilize the grid. In this case, they will receive a compensation for their help. When a BRP deviates from the balance schedule in a way that destabilizes the grid, it will need to pay the imbalance price for the deviation.
\\\\
The imbalance price is determined based on which reserves Elia needs to activate to stabilize the grid. The imbalance of a BRP is the quarter-hourly difference between total injections and offtakes from the grid. The Net Regulation Volume (NRV) is the net control volume of energy that Elia applies to maintain balance in the Elia control area. The Area Control Error is the current difference between the scheduled values and the actual values of power exchanged in the Belgian control area. The imbalance of the system (SI) is the Area Control Error minus the NRV. Using the System Imbalance, the imbalance price is calculated.
\\\\
Elia, the Transmission System Operator (TSO) in Belgium, maintains grid stability by activating three types of reserves, each designed to address specific conditions of imbalance. These reserves are crucial for ensuring that the electricity supply continuously meets the demand, thereby maintaining the frequency within the required operational limits. The reserves include:

1) \textbf{Frequency Containment Reserve (FCR)} \\
FCR is a reserve that responds automatically to frequency deviations in the grid. The reserve responds automatically in seconds and provides a proportional response to the frequency deviation. Elia must provide a minimal share of this volume within the Belgian control area. This type of volume can also be offered by the BSPs.
\\\\
2) \textbf{Automatic Frequency Restoration Process (aFRR)} \\
aFRR is the second reserve that Elia can activate to restore the frequency to 50Hz. The aFRR is activated when the FCR is not sufficient to restore the frequency. Every 4 seconds, Elia sends a set-point to the BSPs. The BSPs use this set-point to adjust their production or consumption. The BSPs have a 7.5-minute window to activate the full requested energy volume.
\\\\
3) \textbf{Manual Frequency Restoration (mFRR)} \\
Sometimes the FCR and aFRR are not enough to restore the imbalance between generation and consumption. Elia activates the mFRR manually and the requested energy volume is to be activated in 15 minutes.

The order in which the reserves are activated is as follows: FCR, aFRR and mFRR. BSPs provide bids for the aFRR and mFRR volumes. The provided bids consist of the type (aFRR or mFRR), bid volume (MW), bid price (per MWh) and start price (per MWh).
The start price is used to cover the costs of starting a unit.
\\\\
Elia selects the bids based on the order of activation and then the price. The highest marginal price paid for upward or downward activation determines the imbalance price. This means that the last bid that is activated determines the imbalance price. This price is paid by the BRPs that are not balanced. The imbalance price calculation is shown in Table \ref{tab:imbalance_price}.

\begin{table}[h]
    \centering
    \begin{tabular}{|c|c|c|}
        \hline
        & \multicolumn{2}{c|}{\textbf{System Imbalance}} \\
        \cline{2-3}
        \textbf{Imbalance of the balance responsible party} & \textbf{Positive} & \textbf{Negative or zero} \\
        \hline
        \textbf{Positive} & MDP - \(\alpha\) & MIP + \(\alpha\) \\
        \hline
        \textbf{Negative} & MDP - \(\alpha\) & MIP + \(\alpha\) \\
        \hline
    \end{tabular}
    \caption{Prices paid by the BRPs}
    \label{tab:imbalance_price}
\end{table}

The imbalance price calculation includes the following variables: \\
- MDP: Marginal price of downward activation \\
- MIP: Marginal price of upward activation \\
- \(\alpha\): Extra parameter dependent on System Imbalance \\
\\

TODO: Add more information about the imbalance price calculation, alpha?

The imbalance price can be reconstructed given the bids of a certain quarter/day and the System Imbalance. During this thesis, the system imbalance is assumed to be almost the same as the Net Regulation Volume. This is a simplification but it is a good approximation. The goal of this thesis is to model the Net Regulation Volume which can then be used to reconstruct the imbalance price and to make decisions on when to buy or sell electricity.

\subsection{Generative modeling}
Simple forecasting of the NRV is often not accurate and defining a policy using this forecast will lead to wrong decisions. A better method would be to try to model the NRV and sample multiple generations of the NRV. This should give better predictions and confidence intervals can be calculated from these.
\\\\
Generative modeling is a type of machine learning that is used to generate new data samples that look like the training data. The goal of generative modeling is to learn the true data distribution and use this distribution to generate new samples. Generative modeling is used in many different fields including image generation, text generation etc.
\\\\
In this thesis, generative modeling can be used to model the NRV of the Belgian electricity market using different conditional input features like the weather, the load forecast etc. The model can then be used to generate new samples of the NRV.
\\\\
There exist many different types of generative models. Some of the most popular ones are:
\begin{itemize}
    \item Generative Adversarial Networks (GANs)
    \item Variational Autoencoders (VAEs)
    \item Normalizing Flows
    \item Diffusion models
\end{itemize}

In this thesis, autoregressive models will be used to model the NRV. Autoregressive models are models that predict the next value in a sequence based on the previous values. The model can be trained to predict the next value in the NRV sequence based on the previous values of the NRV, the weather, the load forecast etc. Using this method, the model will always generate the same sequence of values given the same input features. Instead of using the autoregressive model to predict the next value in the sequence, the model can also be trained to predict the distribution of the next value. This way, the model can generate multiple generations of the NRV given the same input features. For example, Quantile Regression can be used to predict the distribution of the next value in the sequence.
\\\\
In this thesis, the utilization of diffusion models is also explored. Diffusion models are a type of generative model that can be used to generate new data samples that follow the distribution of the input data set. Using a structured training process, diffusion models learn to reverse a diffusion process. Starting from a random noise distribution, the model learns to transform the noise into a sample from the data distribution using multiple denoising steps.

\subsection{Diffusion models}
TODO: reference the paper
The "Denoising Diffusion Probabilistic Models" (DDPM)
\subsubsection{Overview}
Diffusion models are a type of probabilistic model designed to generate high-quality, diverse samples from complex data distributions. The way this type of model is trained is unique. The model is trained to reverse an iterative noise process that is applied to the data. This process is called the diffusion process. The model denoises the data in each iteration. During the training, the model learns to reverse the diffusion process. A training sample is transformed into a noise sample by applying the diffusion process. The model is then trained to recover the original sample from the noise sample. The model is trained to maximize the likelihood of the data given the noise. By doing this, the model learns to generate samples from the data distribution. Starting from the noise, the model can generate samples that look like the data. The model can also be conditioned on additional information to generate samples that follow other distributions.

\subsubsection{Applications}
Diffusion models gained popularity in the field of computer vision. They are used for inpainting, super-resolution, image generation, image editing etc. The paper introducing "Denoising Diffusion Probabilistic Models" (DDPM) showed that diffusion models can achieve state-of-the-art results in image generation. This type of model was then applied to other fields like text generation, audio generation etc. The most popular application of diffusion models is still image generation. Many different models and products exist that make use of diffusion models to generate images. Some examples are DALL·E, Stable Diffusion, Midjourney, etc. These models can generate or edit images based on a given text description.
\\\\
This method can also be applied to other fields like audio generation, text generation etc. In this thesis, diffusion models are explored to model time series data conditioned on additional information.

\subsubsection{Generation process}
The generation process is quite different in comparison to other models. For example, GANs and VAE generate samples by sampling from a noise distribution and then transforming the noise into a sample that looks like the training data in one step using a generator network. Diffusion models generate samples by starting from a noise distribution and then applying a series of denoising steps to the noise. The diffusion process consists of 3 main components: the forward process, the reverse process and the sampling process.

\begin{itemize}
    \item \textbf{Forward process} \\
    During this process, Gaussian noise is added to the data in each of the T time steps according to a variance schedule $\beta_1, ..., \beta_T$. \\\\
    $q(\mathbf{x}_{1:T}|\mathbf{x}_0) \coloneqq \prod_{t=1}^{T} q(\mathbf{x}_t|\mathbf{x}_{t-1}) \quad$ with $\quad q(\mathbf{x}_t|\mathbf{x}_{t-1}) \coloneqq \mathcal{N}(\mathbf{x}_t; \sqrt{1-\beta_t}\mathbf{x}_{t-1}, \beta_t\mathbf{I})$
    \\\\
    This formula shows that the noisy data distribution after T diffusion steps is the product of the transition probabilities at each step t. The noise added in each time step is a Gaussian distribution with mean $\sqrt{1-\beta_t}\mathbf{x}_{t-1}$ and variance $\beta_t\mathbf{I}$. The variance schedule $\beta_1, ..., \beta_T$ is a hyperparameter that needs to be chosen or optimized during training.

    \item \textbf{Reverse process} \\
    The diffusion process must then be reversed. The model is trained to model the noise distribution given the data and timestep. \\\\
    $p_{\theta}(\mathbf{x}_{0:T}) \coloneqq p(\mathbf{x}_T) \prod_{t=1}^{T} p_{\theta}(\mathbf{x}_{t-1}|\mathbf{x}_t) \quad$ with $\quad p_{\theta}(\mathbf{x}_{t-1}|\mathbf{x}_t) \coloneqq \mathcal{N}(\mathbf{x}_{t-1}; \mu_{\theta}(\mathbf{x}_t, t), \Sigma_{\theta}(\mathbf{x}_t, t))$
    \\\\
    In the reverse process, each step aims to undo the diffusion by estimating what the previous, less noisy state might have been. This is done using a series of conditional Gaussian distributions $p_{\theta}(\mathbf{x}_{t-1}|\mathbf{x}_t)$. For each of these Gaussians, a neural network with parameters $\theta$ is used to estimate the mean $\mu_{\theta}(\mathbf{x}_t, t)$ and the covariance $\Sigma_{\theta}(\mathbf{x}_t, t)$ of the distribution. The joint distribution $p_{\theta}(\mathbf{x}_{0:T})$ is then the product the marginal distribution of the last timestep $p(\mathbf{x}_T)$ and the conditional distributions $p_{\theta}(\mathbf{x}_{t-1}|\mathbf{x}_t)$ for each timestep.

    \item \textbf{Training} \\
    The model training is done by optimizing the variational bound of the negative log-likelihood. This is also called the evidence lower bound (ELBO) in the context of generative models. \\\\
    TODO: add formula and explain?

    \item \textbf{Conditioning} \\
    The model can be conditioned on additional information. This can be used to guide the generation process. In the context of image generation, this can be used to generate images of a certain class or with certain attributes. This requires some changes in the model architecture and training process.
    TODO: add more information about conditioning
\end{itemize}

The diffusion process can be seen in Figure \ref{fig:diffusion_process}. The model is trained to reverse this process. Starting from the noise, the model learns to generate samples that look like the data.

\begin{figure}[h]
    \centering
    \includegraphics[width=0.8\textwidth]{images/diffusion/diffusion_graphical_model.png}
    TODO: fix citation
    \caption[Diffusion process]{Diffusion process (adapted from \cite{ho2020denoising}).}
    \label{fig:diffusion_process}
\end{figure}