Reward Function
A reward is calculated and returned each time citylearn.citylearn.CityLearnEnv.step()
is called. The reward time series is also accessible through the citylearn.citylearn.CityLearnEnv.rewards
property.
CityLearn provides custom reward functions:
Class |
Equation |
---|---|
\[min(-e, 0)\]
|
|
\[\textrm{sign}(-e) \times 0.01(e^2) \times \textrm{max}(0, E)\]
|
|
\[min(-e^3, 0)\]
|
|
\[\sum_{i=0}^n - \Big( \Big(1 + \frac{e}{|e|} \times \textrm{storage}_{i}^{\textrm{SoC}} \Big) \times |e| \Big)\]
|
|
\[\begin{split}\begin{cases}
-|T_{in} - T_{spt}|^3, \quad \textrm{if} \ T_{in} < (T_{spt} - T_{b}) \ \textrm{and cooling} \\
-|T_{in} - T_{spt}|^2, \quad \textrm{if} \ T_{in} < (T_{spt} - T_{b}) \ \textrm{and heating} \\
-|T_{in} - T_{spt}|, \quad \textrm{if} \ (T_{spt} - T_{b}) \le T_{in} < T_{spt} \ \textrm{and cooling} \\
0, \quad \textrm{if} \ (T_{spt} - T_{b}) \le T_{in} < T_{spt} \ \textrm{and heating} \\
0, \quad \textrm{if} \ T_{spt} \le T_{in} \le (T_{spt} + T_{b}) \ \textrm{and cooling} \\
-|T_{in} - T_{spt}|, \ \textrm{if} \: T_{spt} \le T_{in} \le (T_{spt} + T_{b}) \ \textrm{and heating} \\
-|T_{in} - T_{spt}|^2, \quad \textrm{if} \ (T_{spt} + T_{b}) < T_{in} \ \textrm{and cooling} \\
-|T_{in} - T_{spt}|^3, \quad \textrm{otherwise}
\end{cases}\end{split}\]
|
Where \(e\) is a building’s net electricity consumption, \(T_{in}\) is a building’s indoor dry-bulb temperature, \(T_{spt}\) is a building’s indoor dry-bulb temperature setpoint, \(T_{b}\) is a building’s indoor dry-bulb temperature setpoint comfort band while \(E\) is the district’s net electricity consumption. These rewards are defined for a decentralized single building application and for a centralized agent controlling all buildings, the reward will be the sum of the decentralized values.
How to Point to the Reward Function
The reward function to use in a simulation is defined in the reward_function
key-value of the schema:
{
...,
"reward_function": {
"type": "citylearn.reward_function.RewardFunction",
...
},
...
}
How to Define a Custom Reward Function
CityLearn also allows for custom reward functions by inheriting the base citylearn.reward_function.RewardFunction
:
from typing import Any, List, Mapping, Union
from citylearn.reward_function import RewardFunction
class CustomReward(RewardFunction):
"""Calculates custom user-defined multi-agent reward.
Reward is the :py:attr:`net_electricity_consumption_emission`
for entire district if central agent setup otherwise it is the
:py:attr:`net_electricity_consumption_emission` each building.
Parameters
----------
env_metadata: Mapping[str, Any]:
General static information about the environment.
"""
def __init__(self, env_metadata: Mapping[str, Any]):
super().__init__(env_metadata)
def calculate(self, observations: List[Mapping[str, Union[int, float]]]) -> List[float]:
r"""Calculates reward.
Parameters
----------
observations: List[Mapping[str, Union[int, float]]]
List of all building observations at current :py:attr:`citylearn.citylearn.CityLearnEnv.time_step` that are got from calling :py:meth:`citylearn.building.Building.observations`.
Returns
-------
reward: List[float]
Reward for transition to current timestep.
"""
net_electricity_consumption_emission = [o['net_electricity_consumption_emission'] for o in observations]
if self.central_agent:
reward = [-sum(net_electricity_consumption_emission)]
else:
reward = [-v for v in net_electricity_consumption_emission]
return reward
The schema must then be updated to reference the custom reward function:
{
...,
"reward_function": {
"type": "custom_module.CustomReward",
...
},
...
}