citylearn.reward_function module

class citylearn.reward_function.ComfortReward(env_metadata: Mapping[str, Any], band: float = None, lower_exponent: float = None, higher_exponent: float = None)[source]

Bases: RewardFunction

Reward for occupant thermal comfort satisfaction.

The reward is calculated as the negative difference between the setpoint and indoor dry-bulb temperature raised to some exponent if outside the comfort band. If within the comfort band, the reward is the negative difference when in cooling mode and temperature is below the setpoint or when in heating mode and temperature is above the setpoint. The reward is 0 if within the comfort band and above the setpoint in cooling mode or below the setpoint and in heating mode.

Parameters:
  • env_metadata (Mapping[str, Any]:) – General static information about the environment.

  • band (float, default: 2.0) – Setpoint comfort band (+/-). If not provided, the comfort band time series defined in the building file, or the default time series value of 2.0 is used.

  • lower_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is above setpoint upper boundary or heating mode but temperature is below setpoint lower boundary.

  • higher_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is below setpoint lower boundary or heating mode but temperature is above setpoint upper boundary.

property band: float
calculate(observations: List[Mapping[str, float | int]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

property higher_exponent: float
property lower_exponent: float
class citylearn.reward_function.IndependentSACReward(env_metadata: Mapping[str, Any])[source]

Bases: RewardFunction

Recommended for use with the SAC controllers.

Returned reward assumes that the building-agents act independently of each other, without sharing information through the reward.

Parameters:

env_metadata (Mapping[str, Any]:) – General static information about the environment.

calculate(observations: List[Mapping[str, float | int]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

class citylearn.reward_function.MARL(env_metadata: Mapping[str, Any])[source]

Bases: RewardFunction

MARL reward function class.

Parameters:

env_metadata (Mapping[str, Any]:) – General static information about the environment.

calculate(observations: List[Mapping[str, float | int]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

class citylearn.reward_function.RewardFunction(env_metadata: Mapping[str, Any], exponent: float = None, **kwargs)[source]

Bases: object

Base and default reward function class.

The default reward is the electricity consumption from the grid at the current time step returned as a negative value.

Parameters:
  • env_metadata (Mapping[str, Any]:) – General static information about the environment.

  • **kwargs (dict) – Other keyword arguments for custom reward calculation.

calculate(observations: List[Mapping[str, float | int]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

property central_agent: bool

Expect 1 central agent to control all buildings.

property env_metadata: Mapping[str, Any]

General static information about the environment.

property exponent: float
reset()[source]

Use to reset variables at the start of an episode.

class citylearn.reward_function.SolarPenaltyAndComfortReward(env_metadata: Mapping[str, Any], band: float = None, lower_exponent: float = None, higher_exponent: float = None, coefficients: Tuple = None)[source]

Bases: RewardFunction

Addition of citylearn.reward_function.SolarPenaltyReward and citylearn.reward_function.ComfortReward.

Parameters:
  • env_metadata (Mapping[str, Any]:) – General static information about the environment.

  • band (float, default = 2.0) – Setpoint comfort band (+/-). If not provided, the comfort band time series defined in the building file, or the default time series value of 2.0 is used.

  • lower_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is above setpoint upper boundary or heating mode but temperature is below setpoint lower boundary.

  • higher_exponent (float, default = 3.0) – Penalty exponent for when in cooling mode but temperature is below setpoint lower boundary or heating mode but temperature is above setpoint upper boundary.

  • coefficients (Tuple, default = (1.0, 1.0)) – Coefficents for citylearn.reward_function.SolarPenaltyReward and citylearn.reward_function.ComfortReward values respectively.

calculate(observations: List[Mapping[str, float | int]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

property coefficients: Tuple
property env_metadata: Mapping[str, Any]

General static information about the environment.

class citylearn.reward_function.SolarPenaltyReward(env_metadata: Mapping[str, Any])[source]

Bases: RewardFunction

The reward is designed to minimize electricity consumption and maximize solar generation to charge energy storage systems.

The reward is calculated for each building, i and summed to provide the agent with a reward that is representative of all the building or buildings (in centralized case)it controls. It encourages net-zero energy use by penalizing grid load satisfaction when there is energy in the energy storage systems as well as penalizing net export when the energy storage systems are not fully charged through the penalty term. There is neither penalty nor reward when the energy storage systems are fully charged during net export to the grid. Whereas, when the energy storage systems are charged to capacity and there is net import from the grid the penalty is maximized.

Parameters:

env_metadata (Mapping[str, Any]:) – General static information about the environment.

calculate(observations: List[Mapping[str, float | int]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

class citylearn.reward_function.V2GPenaltyReward(env_metadata: Mapping[str, Any], peak_percentage_threshold: float = None, ramping_percentage_threshold: float = None, peak_penalty_weight: int = None, ramping_penalty_weight: int = None, energy_transfer_bonus: int = None, window_size: int = None, penalty_no_car_charging: int = None, penalty_battery_limits: int = None, penalty_soc_under_5_10: int = None, reward_close_soc: int = None, reward_self_ev_consumption: int = None, community_weight: float = None, reward_extra_self_production: int = None)[source]

Bases: MARL

Rewards with considerations for electric vehicle charging behaviours in a V2G setting. Note that this function rewards/penalizes only the electric vehicle part. For a comprehensive reward strategy please use one of the super classes or rewrite your own

Parameters:

env_metadata (Mapping[str, Any]:) – General static information about the environment.

calculate(observations: List[Mapping[str, float | int]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

calculate_ev_penalty(b: Building, current_reward: RewardFunction) float[source]

Calculate penalties based on EV specific logic.

property community_weight: float

Return the community_weight

property energy_transfer_bonus: int

Return the energy_transfer_bonus

property peak_penalty_weight: int

Return the peak_penalty_weight

property peak_percentage_threshold: float

Return the peak_percentage_threshold

property penalty_battery_limits: int

Return the penalty_battery_limits

property penalty_no_car_charging: int

Return the penalty_no_car_charging

property penalty_soc_under_5_10: int

Return the penalty_soc_under_5_10

property ramping_penalty_weight: int

Return the ramping_penalty_weight

property ramping_percentage_threshold: float

Return the ramping_percentage_threshold

property reward_close_soc: int

Return the reward_close_soc

property reward_extra_self_production: int

Return the reward_extra_self_production

property reward_self_ev_consumption: int

Return the reward_self_ev_consumption

property window_size: int

Return the window_size