citylearn.agents.marlisa module
- class citylearn.agents.marlisa.MARLISA(*args, regression_buffer_capacity: int = None, start_regression_time_step: int = None, regression_frequency: int = None, information_sharing: bool = None, pca_compression: float = None, iterations: int = None, **kwargs)[source]
Bases:
SAC
- property batch_size: int
Batch size.
- property coordination_variables_history: List[float]
- get_exploration_prediction(observations: List[List[float]]) List[List[float]] [source]
Return randomly sampled actions from action_space multiplied by
action_scaling_coefficient
.
- get_exploration_prediction_with_information_sharing(observations: List[List[float]]) Tuple[List[List[float]], List[List[float]]] [source]
- get_exploration_prediction_without_information_sharing(observations: List[List[float]]) Tuple[List[List[float]], List[List[float]]] [source]
- get_post_exploration_prediction(observations: List[List[float]], deterministic: bool) List[List[float]] [source]
Action sampling using policy, post-exploration time step
- get_post_exploration_prediction_with_information_sharing(observations: List[List[float]], deterministic: bool) Tuple[List[List[float]], List[List[float]]] [source]
- get_post_exploration_prediction_without_information_sharing(observations: List[List[float]], deterministic: bool) Tuple[List[List[float]], List[List[float]]] [source]
- get_regression_variables(index: int, observations: List[float], actions: List[float]) List[float] [source]
Hidden dimension.
- property information_sharing: bool
- property iterations: int
- property pca_compression: float
- property regression_buffer_capacity: int
- property regression_frequency: int
- reset()[source]
Reset environment to initial state.
Calls reset_time_step.
Notes
Override in subclass for custom implementation when reseting environment.
- set_regression_encoders() List[List[Encoder]] [source]
Get observation value transformers/encoders for use in MARLISA agent internal regression model.
The encoder classes are defined in the preprocessing.py module and include PeriodicNormalization for cyclic observations, OnehotEncoding for categorical obeservations, RemoveFeature for non-applicable observations given available storage systems and devices and Normalize for observations with known minimum and maximum boundaries.
- Returns:
encoders – Encoder classes for observations ordered with respect to active_observations.
- Return type:
List[Encoder]
- property start_regression_time_step: int
- update(observations: List[List[float]], actions: List[List[float]], reward: List[float], next_observations: List[List[float]], terminated: bool, truncated: bool)[source]
Update replay buffer.
- Parameters:
observations (List[List[float]]) – Previous time step observations.
actions (List[List[float]]) – Previous time step actions.
reward (List[float]) – Current time step reward.
next_observations (List[List[float]]) – Current time step observations.
terminated (bool) – Indication that episode has ended.
truncated (bool) – If episode truncates due to a time limit or a reason that is not defined as part of the task MDP.
- class citylearn.agents.marlisa.MARLISARBC(env: CityLearnEnv, rbc: RBC = None, **kwargs: Any)[source]
-
Uses
citylearn.agents.rbc.RBC
to select action during exploration before usingcitylearn.agents.marlisa.MARLISA
.- Parameters:
env (CityLearnEnv) – CityLearn environment.
rbc (RBC) –
citylearn.agents.rbc.RBC
or child class, used to select actions during exploration.**kwargs (Any) – Other keyword arguments used to initialize super class.