citylearn.agents.sac module
- class citylearn.agents.sac.SAC(env: CityLearnEnv, **kwargs: Any)[source]
Bases:
RLC
- get_encoded_observations(index: int, observations: List[float]) ndarray[Any, dtype[float64]] [source]
- get_exploration_prediction(observations: List[List[float]]) List[List[float]] [source]
Return randomly sampled actions from action_space multiplied by
action_scaling_coefficient
.
- get_normalized_observations(index: int, observations: List[float]) ndarray[Any, dtype[float64]] [source]
- get_post_exploration_prediction(observations: List[List[float]], deterministic: bool) List[List[float]] [source]
Action sampling using policy, post-exploration time step
- predict(observations: List[List[float]], deterministic: bool = None)[source]
Provide actions for current time step.
Will return randomly sampled actions from action_space if
end_exploration_time_step
<=time_step
else will use policy to sample actions.- Parameters:
observations (List[List[float]]) – Environment observations
deterministic (bool, default: False) – Wether to return purely exploitatative deterministic actions.
- Returns:
actions – Action values
- Return type:
List[float]
- set_encoders() List[List[Encoder]] [source]
Get observation value transformers/encoders for use in agent algorithm.
The encoder classes are defined in the preprocessing.py module and include PeriodicNormalization for cyclic observations, OnehotEncoding for categorical obeservations, RemoveFeature for non-applicable observations given available storage systems and devices and Normalize for observations with known minimum and maximum boundaries.
- Returns:
encoders – Encoder classes for observations ordered with respect to active_observations.
- Return type:
List[List[Encoder]]
- update(observations: List[List[float]], actions: List[List[float]], reward: List[float], next_observations: List[List[float]], terminated: bool, truncated: bool)[source]
Update replay buffer.
- Parameters:
observations (List[List[float]]) – Previous time step observations.
actions (List[List[float]]) – Previous time step actions.
reward (List[float]) – Current time step reward.
next_observations (List[List[float]]) – Current time step observations.
terminated (bool) – Indication that episode has ended.
truncated (bool) – If episode truncates due to a time limit or a reason that is not defined as part of the task MDP.
- class citylearn.agents.sac.SACRBC(env: CityLearnEnv, rbc: RBC | str = None, **kwargs: Any)[source]
Bases:
SAC
Uses
citylearn.agents.rbc.RBC
to select actions during exploration before usingcitylearn.agents.sac.SAC
.- Parameters:
env (CityLearnEnv) – CityLearn environment.
rbc (RBC) –
citylearn.agents.rbc.RBC
or child class, used to select actions during exploration.**kwargs (Any) – Other keyword arguments used to initialize super class.
- get_exploration_prediction(observations: List[float]) List[float] [source]
Return actions using
RBC
.
- property rbc: RBC
citylearn.agents.rbc.RBC
class child class or string path to an RBC class e.g. ‘citylearn.agents.rbc.RBC’, used to select actions during exploration.