Thesis/Reports/Thesis/sections/policies.tex

\section{Policies}
\label{sec:policies}
Organizations that own a battery and are active in the electricity market have to make decisions on when to charge and discharge their battery. These decisions are based on the current state of the battery, the current state of the market, and the future state of the market. The future state of the market can be predicted using generative models like the ones discussed in previous sections. The organizations want to maximize their profit by buying electricity when it is cheap and selling electricity when it is expensive. The policies used decide when to charge and discharge the battery. Another important aspect of these policies is to keep the battery in a healthy state. Charging and discharging a battery too much can reduce its lifetime. The policies have to take this into account.

In this thesis, a simple policy is used to optimize the profit made by charging and discharging a battery. The policy is based on imbalance price predictions for the next day. These imbalance prices are reconstructed using the generated full-day NRV samples. This allows showing the potential of using NRV generations to optimize the policy. In the real world, more complex policies can be used to optimize the profit. These policies can be trained using reinforcement learning or other optimization techniques. Multiple baseline policies are defined to compare the performance of the policy based on NRV predictions.

\subsection{Baselines}
% Baseline fixed thresholds
The most simple baseline policy is to define two fixed thresholds for charging and discharging the battery. These thresholds can be determined by the historical data of the imbalance price. The thresholds can be found by doing a simple grid search for the best thresholds. The thresholds that maximize the profit on the historical data are used as the fixed thresholds. During the optimization, a penalty parameter can be added to the profit function to penalize when the battery is charged or discharged too much.

% Baseline thresholds determined on the previous day
Another baseline policy is to determine the thresholds for charging and discharging the battery based on the NRV of the previous day. This policy is based on the assumption that the NRV of the next day will be similar to the NRV of the previous day. The NRV of the previous day can be seen as the NRV prediction for the next day and is used to reconstruct the imbalance prices. The thresholds can then be determined by doing a simple grid search for the best thresholds over the reconstructed imbalance prices. The same penalty parameter can be added to the profit function to reduce the charge cycles of the battery.

\subsection{Policies based on NRV generations}
A simple policy can be defined that uses multiple predictions for the NRV of the next day. First, multiple full-day NRV samples are generated using a generative model. Each of these samples will be seen as a prediction for the NRV of the next day. For each of these predictions, the imbalance prices are reconstructed. The charge and discharge thresholds are determined for each of these reconstructed imbalance prices using a simple grid search like in the baseline policy. The mean is taken over all the optimal thresholds to determine the final thresholds for the next day. This results in a policy that uses the NRV samples of the generative model. This policy also uses the penalty parameter to reduce the charge cycles of the battery.