Reward Function

A reward is calculated and returned each time citylearn.citylearn.CityLearnEnv.step() is called. The reward time series is also accessible through the citylearn.citylearn.CityLearnEnv.rewards property.

CityLearn provides custom reward functions:

Class

Equation

citylearn.reward_function.RewardFunction

\[min(-e, 0)\]

citylearn.reward_function.MARL

\[\textrm{sign}(-e) \times 0.01(e^2) \times \textrm{max}(0, E)\]

citylearn.reward_function.IndependentSACReward

\[min(-e^3, 0)\]

citylearn.reward_function.SolarPenaltyReward

\[\sum_{i=0}^n - \Big( \Big(1 + \frac{e}{|e|} \times \textrm{storage}_{i}^{\textrm{SoC}} \Big) \times |e| \Big)\]

citylearn.reward_function.ComfortReward

\[\begin{split}\begin{cases} -|T_{in} - T_{spt}|^3, \quad \textrm{if} \ T_{in} < (T_{spt} - T_{b}) \ \textrm{and cooling} \\ -|T_{in} - T_{spt}|^2, \quad \textrm{if} \ T_{in} < (T_{spt} - T_{b}) \ \textrm{and heating} \\ -|T_{in} - T_{spt}|, \quad \textrm{if} \ (T_{spt} - T_{b}) \le T_{in} < T_{spt} \ \textrm{and cooling} \\ 0, \quad \textrm{if} \ (T_{spt} - T_{b}) \le T_{in} < T_{spt} \ \textrm{and heating} \\ 0, \quad \textrm{if} \ T_{spt} \le T_{in} \le (T_{spt} + T_{b}) \ \textrm{and cooling} \\ -|T_{in} - T_{spt}|, \ \textrm{if} \: T_{spt} \le T_{in} \le (T_{spt} + T_{b}) \ \textrm{and heating} \\ -|T_{in} - T_{spt}|^2, \quad \textrm{if} \ (T_{spt} + T_{b}) < T_{in} \ \textrm{and cooling} \\ -|T_{in} - T_{spt}|^3, \quad \textrm{otherwise} \end{cases}\end{split}\]

Where \(e\) is a building’s net electricity consumption, \(T_{in}\) is a building’s indoor dry-bulb temperature, \(T_{spt}\) is a building’s indoor dry-bulb temperature setpoint, \(T_{b}\) is a building’s indoor dry-bulb temperature setpoint comfort band while \(E\) is the district’s net electricity consumption. These rewards are defined for a decentralized single building application and for a centralized agent controlling all buildings, the reward will be the sum of the decentralized values.

How to Point to the Reward Function

The reward function to use in a simulation is defined in the reward_function key-value of the schema:

{
   ...,
   "reward_function": {
      "type": "citylearn.reward_function.RewardFunction",
      ...
   },
   ...
}

How to Define a Custom Reward Function

CityLearn also allows for custom reward functions by inheriting the base citylearn.reward_function.RewardFunction:

from typing import Any, List, Mapping, Union
from citylearn.reward_function import RewardFunction

class CustomReward(RewardFunction):
    """Calculates custom user-defined multi-agent reward.

    Reward is the :py:attr:`net_electricity_consumption_emission`
    for entire district if central agent setup otherwise it is the
    :py:attr:`net_electricity_consumption_emission` each building.

    Parameters
    ----------
    env_metadata: Mapping[str, Any]:
        General static information about the environment.
    """

    def __init__(self, env_metadata: Mapping[str, Any]):
        super().__init__(env_metadata)

    def calculate(self, observations: List[Mapping[str, Union[int, float]]]) -> List[float]:
        r"""Calculates reward.

        Parameters
        ----------
        observations: List[Mapping[str, Union[int, float]]]
            List of all building observations at current :py:attr:`citylearn.citylearn.CityLearnEnv.
            time_step` that are got from calling :py:meth:`citylearn.building.Building.observations`.

        Returns
        -------
        reward: List[float]
            Reward for transition to current timestep.
        """

        net_electricity_consumption_emission = [o['net_electricity_consumption_emission'] for o in observations]

        if self.central_agent:
            reward = [-sum(net_electricity_consumption_emission)]
        else:
            reward = [-v for v in net_electricity_consumption_emission]

        return reward

The schema must then be updated to reference the custom reward function:

{
   ...,
   "reward_function": {
      "type": "custom_module.CustomReward",
      ...
   },
   ...
}