citylearn.agents.rlc module
- class citylearn.agents.rlc.RLC(env: CityLearnEnv, hidden_dimension: List[float] = None, discount: float = None, tau: float = None, alpha: float = None, lr: float = None, batch_size: int = None, replay_buffer_capacity: int = None, standardize_start_time_step: int = None, end_exploration_time_step: int = None, action_scaling_coefficienct: float = None, reward_scaling: float = None, update_per_time_step: int = None, **kwargs: Any)[source]
Bases:
Agent
Base reinforcement learning controller class.
- Parameters:
env (CityLearnEnv) – CityLearn environment.
hidden_dimension (List[float], default: [256, 256]) – Hidden dimension.
discount (float, default: 0.99) – Discount factor.
tau (float, default: 5e-3) – Decay rate.
alpha (float, default: 0.2) – Temperature; exploration-exploitation balance term.
lr (float, default: 3e-4) – Learning rate.
batch_size (int, default: 256) – Batch size.
replay_buffer_capacity (int, default: 1e5) – Replay buffer capacity.
standardize_start_time_step (int, optional) – Time step to calculate mean and standard deviation, and begin standardization of observations and rewards in replay buffer. Defaults to
citylearn.citylearn.CityLearnEnv.time_steps
- 2.end_exploration_time_step (int, optional) – Time step to stop random or RBC-guided exploration. Defaults to
citylearn.citylearn.CityLearnEnv.time_steps
- 1.action_scaling_coefficient (float, default: 0.5) – Action scaling coefficient.
reward_scaling (float, default: 5.0) – Reward scaling.
update_per_time_step (int, default: 2) – Number of updates per time step.
**kwargs (Any) – Other keyword arguments used to initialize super class.
- property action_scaling_coefficient: float
Action scaling coefficient.
- property alpha: float
Temperature; exploration-exploitation balance term.
- property batch_size: int
Batch size.
- property discount: float
Discount factor.
- property end_exploration_time_step: int
Time step to stop exploration. Defaults to
citylearn.citylearn.CityLearnEnv.time_steps
- 1.
Hidden dimension.
- property lr: float
Learning rate.
- property observation_dimension: int
Number of observations after applying encoders.
- property random_seed: int
Pseudorandom number generator seed for repeatable results.
- property replay_buffer_capacity: int
Replay buffer capacity.
- property reward_scaling: float
Reward scaling.
- set_encoders() List[List[Encoder]] [source]
Get observation value transformers/encoders for use in agent algorithm.
The encoder classes are defined in the preprocessing.py module and include PeriodicNormalization for cyclic observations, OnehotEncoding for categorical obeservations, RemoveFeature for non-applicable observations given available storage systems and devices and Normalize for observations with known minimum and maximum boundaries.
- Returns:
encoders – Encoder classes for observations ordered with respect to active_observations.
- Return type:
List[List[Encoder]]
- property standardize_start_time_step: int
Time step to calculate mean and standard deviation, and begin standardization of observations and rewards in replay buffer. Defaults to
citylearn.citylearn.CityLearnEnv.time_steps
- 2.
- property tau: float
Decay rate.
- property update_per_time_step: int
Number of updates per time step.