In this practical work, you will have to manage a battery in order to recoversubways braking electrical energy and to supply it to a subway station. Your goal isto minimize the cost of energy consumed on the national grid by exploiting optimallythis regenerative braking energy. First you will compute the gains realized whenthe battery is controlled according to an intuitive strategy. Then you will comparetwo methods, Stochastic Dynamic Programming and Model Predictive Control. Bothmethods provide a control for the battey at each time step these methods don’t use thesame amount of information about the past randomness realization. And they don’tuse their information the same way. You will compare throughout this work diﬀerentways to use past information to compute controls aware of the future uncertainties.

1 Problem statement and data

Subway stations represent one third of the Paris subway system energy consumption.
There are unexploited energy ressources in subway stations that could be harvested to
lower this energy consumption. We could for example recover braking energy from the
trains. Currently trains can dissipate their kinetic energy by braking mechanically or
convert this energy into electricity. However supply/demand equilibrium have to be
ensured on the subway line. Hence a train can brake electrically only if there is a train
accelerating/consuming energy nearby. There is therefore a signiﬁcant amount of potential
energy that is currently wasted. We could overcome this problem with an electrical
storage.

We consider a subway station equiped with a battery that can charge braking energy and
national grid energy on one side (subway line side) and discharge energy to power the
subway station which is also powered by the national grid (subway station side). A
battery manager has to control this battery in real time. The station demand and the cost
of electricity are assumed deterministic. Braking energy available at each time step is
uncertain.

Ut−DtUt+Wt+1EtsEt+1l

Figure 1: Electrical problem representation

Power Line Icon made by Freepik from www.ﬂaticon.com is licensed by Creative Commons BY 3.0

We present hereby the stochastic optimization problem we want to solve:

We detail all notations and equations in the following.

1.1 Battery dynamics

The battery can store power coming from the trains or the national grid and supply power to the
subway station. It cannot do both things at the same time. We control its state of charge (soc). We
model the dynamics of the battery at each time step t by a function ft we call the dynamic at time
t giving equation (1b) :

where

t ∈ 𝕋 := {0, 1,…,T − 1} is discrete (minutes in this case) and t denotes the beginning
of the period [t,t + 1[. We call T the horizon of the problem.

St state of charge (kWh) of the battery at the beginning of period [t,t + 1[. It belongs
to a set 𝕏 = [Smin,Smax] representing the minimum and maximum state of charge of
the battery.

Ut energy (kWh) of battery charge or discharge during [t,t+1[, decided at the beginning
of time period [t,t + 1[. It belongs to a space 𝕌 = [Umin,Umax] modeling ramp limits
for the charge or discharge.

Ut− = − min (0,Ut) energy we discharge from the battery

Ut+ = max (0,U(t)) energy we charge into the battery

ρ is the charge and discharge eﬃciency (%)

The initial state of charge of the battery is called x0 and appears in the equality constraint
(1g)

1.1.1 Variables deﬁnition

We deﬁne hereby the remaining variables of the problem presented in ﬁgure1

Ets energy (kWh) we pay to the national grid to power the station during [t,t + 1[

Etl energy (kWh) we pay to the national grid to charge the battery during [t,t + 1[

Dt energy (kWh) consumed by the subway station during [t,t + 1[

Wt braking energy (kWh) available on the subway line during [t,t + 1[. This quantity
is a random variable

We observe that charge/discharge decision Ut is taken at the beginning of [t,t + 1[, beforeknowing the available braking energy Wt. We are in a decision-hazard setting.

1.1.2 Constraints

We have to ensure the supply/demand equilibrium at each node and each time step on the subway
station node which leads to equation:

(2)

On the subway line node we have to ensure supply/demand balance however we are not forced
to recover all the regenerative braking energy Wt. We have then an inequality constraint
(1d):

We prevent the battery manager to sell power to the grid then we have to ensure:

1.2 Criterion: intertemporal payoﬀ

At each time t we pay to the national grid operator the quantity Etl + Ets of energy. We assume
that the price of electricity ct per kWh is known in advance as it is negociated and constant during
[t,t + 1[.

At each time t we pay the instantaneous cost:

Using equation (2) we can eliminate the variable Ets leading to the instantaneous
cost:

At the end of the horizon we have no preference concerning the ﬁnal state of charge of the
battery. We introduce then a ﬁnal cost function K : 𝕏 → ℝ such as:

From 0 to T the payoﬀs sum up to:

(3)

The manager wants to minize these payoﬀs every day without knowing the future availability of
braking energy. He will therefore minimize them on average, hopefully obtaining the best mean
payoﬀ he can obtain after several days. The manager wants to minimize the expected
payoﬀ:

(4)

The manager wants to solve problem (1). We detail the last constraint (1i) in the
following.

1.3 Numerical data and model

1.3.1 Constants

We deﬁne, in ﬁle constants.jl, some numerical data required to model the problem.

Question 1Comment each line by deﬁning the constant after the #. What is the number oftime steps of the problem? The number of time steps in one hour? Include the ﬁle constants.jl
in your script or notebook. Plot the cost of electricity as well as the consumption of thestation during the day. Compute the cost of the station’s energy consumption over the wholetime horizon when it consumes exclusively on the national grid.

1.3.2 Braking energy scenarios

We provide a code function, deﬁned in scenarios_generation.jl , that allows to compute a given
number of braking energy scenarios over the time horizon.

Question 2Include scenarios_generation.jl in your script or notebook. Use this function togenerate 50 regenerative braking scenarios and save the 2 outputs. The ﬁrst output containsthe braking energy scenarios, we leave the second output until the end. Based on this 50scenarios what is the average energy available over a day? We will call these 50 scenariosthe assessment scenarios thoughout the work. We will refer to the second output as theξ-assessment scenarios.

Question 3What is the number of state variables? What is the number of control variables?Verify that the code functions correspond the real functions of the problem.

2 Evaluation with given policies

The manager wants to be able to make a decision that takes into account the future uncertainties
but that use the information about the past uncertainties realization. This is the meaning of the
last constraint (1i) of problem (1) we did not yet explained.

2.1 Non-anticipativity constraint

The constraint Ut = πt(W0,...,Wt−1) states that the decision at time t, Ut, must depend only on
the past uncertainties of the problem W0,...,Wt−1.

When the Wt are time independent we can restrict the search to functions of the current state
of the problem. The non anticipativity constraint becomes Ut = πt(St). When the Wt are not time
independent we can still restrict the search to functions of the state. The controls obtained are
not guaranteed to be optimal but they may be good enough and are often easier to
compute.

An admissible state feedback policy π : 𝕋 × 𝕏 → 𝕌 assigns a charge or discharge
instruction Ut = πt(St) ∈ 𝕌 to any time t ∈ 𝕋 and to any state of charge St∈ 𝕏,
while respecting the problem constraints. Given an admissible policy π and given a
braking energy scenario W(⋅) = W0,…,WT−1, we are able to build a battery state of
charge trajectory S(⋅) := S0,…,ST−1,ST and a charge/discharge control trajectory
U(⋅) := U0,…,UT−1 produced by the “closed-loop” dynamics initialized at the initial time 0
by

We also obtain the payoﬀ associated to the braking energy generation scenario W1,...,WT:

where the expectation 𝔼 is taken with respect to the product probability ℙ. The true expected value (7) is
diﬃcult to compute,1
and we evaluate it by the Monte Carlo method using Ns braking energy scenarios W1(⋅),…,WNs(⋅):

(8)

By the law of large numbers, the mean payoﬀ (8) is a “good” approximation of the expected
payoﬀ(7) if the number of scenarios is “large enough”.

2.2 Simulation of a given heuristic

We propose here a code function providing a heuristic strategy base only on state and time
information. At time t this function return a control ut knowing t and xt. This code function is
then a state feedback. Its idea is to charge during peak hours and discharge during oﬀ-peak hours
while ensuring the state and control constraints.

Question 4What are the peak and oﬀ-peak hours? What is the sign of the control returnedby this heuristic according to the time at which it is computed? Explain the logic of thisheuristic.

We provide a code function, in policy_assessment.jl, to assess a policy along multiple scenarios
knowing the instantaneous cost, the ﬁnal cost, the dynamic, the scenarios, the initial state of the
system and the policy.

"""Assess a policy by simulation of multiple scenarios to compute the expectation of the cost# Arguments*`Array{Float64}`: initial state of the system*`Int`: number of stage in the problem*`Function`: Instantaneous cost function*`Function`: Final cost function*`Function`: Dynamics function*`Function`: Policy, function of the time and state returning a control*`Array{Float64}`: Noises scenarios along which we want to simulate# Return*`Float64`: Mean cost obtained by simulation of the heuristic over multiple scenarios*`Float64`: dim 3 with size (T, dim(X_t), N_scenarios) Array containing the state trajectories"""function assess_policy(initial_state, stages_number, inst_cost, fin_cost, dyn,

Question 5Use this function with the previous heuristic on each on the assessmentscenarios. What is the average cost of energy consumption on the national grid obtained?Compare with the origanl cost when there was no battery. Plot the battery state of charge forsome scenarios. Does it correspond to your previous understanding of the heuristic?

3 Stochastic Dynamic Programming resolution

Stochastic Dynamic Programming is a method to solve Stochastic Optimal Control problems. It is
based on the concept of value functions and the so called Bellman’s principle of optimality. The
value function of the problem at time t is deﬁned as:

With K the ﬁnal cost of the problem which is zero in our case. The principle of optimality
states that, when the problem noises are independent, we can write the following recursion between
value functions:

(9)

In practice we use equation (9) in two phases: an oﬄine stage where we use it to
compute value functions, and an online stage where these functions are used to compute a
control.

Oﬄine phase: computation of value functionsFor t from T to 0 compute

(10)

Online: computation of a controlAt time t, knowing the state of the system St compute Ut:

(11)

In practice we compute the value functions on a discretized state space 𝕏d. As it is not possible
to iterate over a continuous set. Similarly we solve problems (19) and (20) by discretizing 𝕌 and
testing every control in 𝕌d the discrete space obtained.

3.1 Markovian assumption, state and control discretization

In order to compute the expectation in problems (19) and (20) we assume that the noises are time
independent and use 1000 braking energy scenarios to compute the marginal laws of the noises at
each time step. We do not use the assessment scenarios in order not to bias the results. The
discrete marginal laws are obtained by quantization as Monte Carlo should be avoided as much as
possible with Dynamic Programming.

#we build the marginal laws w_laws= build_noises_laws(N_W,10);

We choose the size of the discretization step for the state variable and the control
variable.

3.2 Value functions oﬄine computation

Question 7Plot the value function at time 3 over the state of charge of the battery usingthe function provided in plotting_functions.jl. Why is it decreasing? Plot the value functionat time 4300. Why does it display a plateau?

3.3 Using value functions for online control computation

We provide a function, in ﬁle sdp_policies.jl to generate a control at time t knowing the previous
uncertainty realization, the state of the system and the time step.

"""Obtain a control for the battery management problem at time tknowing state x_t using value functions# Arguments*`Int`: time step t*`Array{Float64}`: state of the system at time t# Return* Array{Float64}:control computed by the policy"""function get_control_sdp_x(t, x_t)

This function is a policy that can be plugged in our simulator.

Question 8Simulate battery control using value functions along the assessment scenarios.What is the average cost obtained? Compare with the previous methods. Plot the batterystate of charge during the day. Do you notice an important discrepancy with the heuristicsimulation?

4 Model Predictive Control

Model Predictive Control is a popular method to solve Stochastic Optimal Control problems
arising in many engineering ﬁelds (electrical, mechanical...). Its popularity comes from its
simplicity and accessibility. It consists in making a forecast to get rid of all the future uncertainties
at each time step. We can then solve a deterministic problem which provides a control trajectory.
We apply the ﬁrst control. At the next time step we compute a new forecast and we
reoptimize.

This algorithm has no oﬄine phase.

Online: computation of a controlAt time t, knowing the state of the system St and the past uncertainties W0,..,Wt−1 compute a
forecast wt,..,wT−1 and compute Ut:

(12)

4.1 Deterministic problem resolution

We provide a function to solve a deterministic problem knowing a scenario, the initial state and the
initial time step. This function is in the ﬁle deterministic.jl.

"""Compute the solution of the problem where noises are replaced by nominal values# Arguments*`Array{Float64}`: initial state of the system*`Int`: initial time step at which we solve the problem*`Array{Float64}`: Noises scenario to replace uncertainties# Return*`JuMPmodel`:JuMP model modelling the deterministic problem* Float64:objective value obtained* Array{Float64}:trajectory of the battery state of charge obtained* Array{Float64}:trajectory of the control obteined"""function solve_anticipative_problem(X_0_loc, t0, stages_horizon, forecast)

Question 9Use this function to solve the deterministic problem over the whole horizonwith initial state x0and initial time step 0 for each of the assessment scenarios. Why is theaverage cost lower than the heuristic and dynamic programming one?

4.2 Model Predictive Control simulation

We provide a function to compute a forecast at time t for the next time steps up to the rolling
horizon length knowing only the state at time t.

This function is used in the MPC policy as it consists in making a forecast of the future time
steps and solve a deterministic problem.

Question 10Compute a simulation with the mpc policy for each of the assessmentscenarios. Compare with the previous results.

5 Taking into account non-Markovian structure

We could improve the Model Predictive Control results. Indeed MPC performances depend
signiﬁcantly on the forecast quality. Here we use only information about the state of the system at
time t to compute a forecast assuming all the noises are time independent. We the use the previous
marginal laws to compute the forecast.

The scenario generator we provided follows the following rules:

there is no braking energy during the night as subways are not circulating then
∀t ∈{Tstop,...,Tstart}, Wt = 0

during operating hours we model regenerative braking randomness by an autoregressive
process of order 1:

(13)

with (ξt)t∈𝕋 a sequence of gaussian white noises.

5.1 Improving MPC forecast

The forecast computed at each time step in the MPC strategy could be based on this model and
on the last uncertainty realization. The MPC policy is not anymore a state feedback it lies in the
class of mappings:

Question 11Compute a simulation with this mpc policy for each of the assessmentscenarios. Compare with the previous results.

We can build such kinds of policies in a dynamic programming framework as well.

5.2 Using past informations online in classical SDP

We consider the value functions computed at section ??. During the online phase we can compute
the expectation with respect to Wt by exploiting the stochastic process structure. We know here
that W follows an autoregressive process (or zero during night hours). We know then that:
𝔼[g(Wt)|Wt−1 = wt−1] = 𝔼[g(awt−1 + ξt)]. Knowing the law of ξt we can therefore estimate the
conditional expectation at time t.

Oﬄine phase: computation of value functionsIgnore the dependence between the noises. We keep the same marginal laws for the random
variables. For t from T to 0 compute

(15)

Online: computation of a controlAt each time step t, observing state St and previous uncertainty realization wt compute
Ut:

(16)

We provide a policy of this form using the value functions we generated earlier.

"""Obtain a control the battery management problem at time tknowing state x_t and noise w_{t-1} using value functions# Arguments*`Int`: time step t*`Array{Float64}`: state of the system at time t*`Array{Float64}`: realization of the noises at time t-1# Return* Array{Float64}:control computed by the policy"""function get_control_sdp(t, x_t, w)

Question 12Compute a simulation with this policy for each of the assessment scenarios.Compare with the previous results.

5.3 State augmentation

Using noises correlation information with value functions computed with time independence
assumption mitgh be ineﬃcient in practice.

When the randomness is an order 1 autoregressive process it is possible to improve the
stochastic dynamic programming results by augmenting the state with 1 state variable. W follows
an AR(1) process. We can then take Wt as a state variable, ξ(⋅) being the noises of the problem.
We just add the following dynamic in the problem (1):

(17)

W(⋅) being now a state variable and ξ(⋅) the exogenous noises process. We call the new dynamic Ft
such that:

(18)

Oﬄine phase: computation of value functionsFor t from T to 0 compute

(19)

Online: computation of a controlAt time t, knowing the state of the system St compute Ut:

(20)

We deﬁne a new instantaneous cost code function, a new dynamic, a new state discretization, a
new control discretization and new noises laws.