citylearn.reward_function module
- class citylearn.reward_function.ComfortReward(env_metadata: Mapping[str, Any], band: float = None, lower_exponent: float = None, higher_exponent: float = None)[source]
Bases:
RewardFunction
Reward for occupant thermal comfort satisfaction.
The reward is calculated as the negative difference between the setpoint and indoor dry-bulb temperature raised to some exponent if outside the comfort band. If within the comfort band, the reward is the negative difference when in cooling mode and temperature is below the setpoint or when in heating mode and temperature is above the setpoint. The reward is 0 if within the comfort band and above the setpoint in cooling mode or below the setpoint and in heating mode.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
band (float, default: 2.0) – Setpoint comfort band (+/-). If not provided, the comfort band time series defined in the building file, or the default time series value of 2.0 is used.
lower_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is above setpoint upper boundary or heating mode but temperature is below setpoint lower boundary.
higher_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is below setpoint lower boundary or heating mode but temperature is above setpoint upper boundary.
- property band: float
- calculate(observations: List[Mapping[str, float | int]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- property higher_exponent: float
- property lower_exponent: float
- class citylearn.reward_function.IndependentSACReward(env_metadata: Mapping[str, Any])[source]
Bases:
RewardFunction
Recommended for use with the SAC controllers.
Returned reward assumes that the building-agents act independently of each other, without sharing information through the reward.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
- calculate(observations: List[Mapping[str, float | int]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- class citylearn.reward_function.MARL(env_metadata: Mapping[str, Any])[source]
Bases:
RewardFunction
MARL reward function class.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
- calculate(observations: List[Mapping[str, float | int]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- class citylearn.reward_function.RewardFunction(env_metadata: Mapping[str, Any], exponent: float = None, **kwargs)[source]
Bases:
object
Base and default reward function class.
The default reward is the electricity consumption from the grid at the current time step returned as a negative value.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
**kwargs (dict) – Other keyword arguments for custom reward calculation.
- calculate(observations: List[Mapping[str, float | int]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- property central_agent: bool
Expect 1 central agent to control all buildings.
- property env_metadata: Mapping[str, Any]
General static information about the environment.
- property exponent: float
- class citylearn.reward_function.SolarPenaltyAndComfortReward(env_metadata: Mapping[str, Any], band: float = None, lower_exponent: float = None, higher_exponent: float = None, coefficients: Tuple = None)[source]
Bases:
RewardFunction
Addition of
citylearn.reward_function.SolarPenaltyReward
andcitylearn.reward_function.ComfortReward
.- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
band (float, default = 2.0) – Setpoint comfort band (+/-). If not provided, the comfort band time series defined in the building file, or the default time series value of 2.0 is used.
lower_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is above setpoint upper boundary or heating mode but temperature is below setpoint lower boundary.
higher_exponent (float, default = 3.0) – Penalty exponent for when in cooling mode but temperature is below setpoint lower boundary or heating mode but temperature is above setpoint upper boundary.
coefficients (Tuple, default = (1.0, 1.0)) – Coefficents for citylearn.reward_function.SolarPenaltyReward and
citylearn.reward_function.ComfortReward
values respectively.
- calculate(observations: List[Mapping[str, float | int]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- property coefficients: Tuple
- property env_metadata: Mapping[str, Any]
General static information about the environment.
- class citylearn.reward_function.SolarPenaltyReward(env_metadata: Mapping[str, Any])[source]
Bases:
RewardFunction
The reward is designed to minimize electricity consumption and maximize solar generation to charge energy storage systems.
The reward is calculated for each building, i and summed to provide the agent with a reward that is representative of all the building or buildings (in centralized case)it controls. It encourages net-zero energy use by penalizing grid load satisfaction when there is energy in the energy storage systems as well as penalizing net export when the energy storage systems are not fully charged through the penalty term. There is neither penalty nor reward when the energy storage systems are fully charged during net export to the grid. Whereas, when the energy storage systems are charged to capacity and there is net import from the grid the penalty is maximized.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
- calculate(observations: List[Mapping[str, float | int]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- class citylearn.reward_function.V2GPenaltyReward(env_metadata: Mapping[str, Any], peak_percentage_threshold: float = None, ramping_percentage_threshold: float = None, peak_penalty_weight: int = None, ramping_penalty_weight: int = None, energy_transfer_bonus: int = None, window_size: int = None, penalty_no_car_charging: int = None, penalty_battery_limits: int = None, penalty_soc_under_5_10: int = None, reward_close_soc: int = None, reward_self_ev_consumption: int = None, community_weight: float = None, reward_extra_self_production: int = None)[source]
Bases:
MARL
Rewards with considerations for electric vehicle charging behaviours in a V2G setting. Note that this function rewards/penalizes only the electric vehicle part. For a comprehensive reward strategy please use one of the super classes or rewrite your own
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
- calculate(observations: List[Mapping[str, float | int]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- calculate_ev_penalty(b: Building, current_reward: RewardFunction) float [source]
Calculate penalties based on EV specific logic.
- property community_weight: float
Return the community_weight
- property energy_transfer_bonus: int
Return the energy_transfer_bonus
- property peak_penalty_weight: int
Return the peak_penalty_weight
- property peak_percentage_threshold: float
Return the peak_percentage_threshold
- property penalty_battery_limits: int
Return the penalty_battery_limits
- property penalty_no_car_charging: int
Return the penalty_no_car_charging
- property penalty_soc_under_5_10: int
Return the penalty_soc_under_5_10
- property ramping_penalty_weight: int
Return the ramping_penalty_weight
- property ramping_percentage_threshold: float
Return the ramping_percentage_threshold
- property reward_close_soc: int
Return the reward_close_soc
- property reward_extra_self_production: int
Return the reward_extra_self_production
- property reward_self_ev_consumption: int
Return the reward_self_ev_consumption
- property window_size: int
Return the window_size