Memories¶
Episodic Memories¶
EpisodicExperienceReplay¶
-
class
rl_coach.memories.episodic.
EpisodicExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int] = (<MemoryGranularity.Transitions: 0>, 1000000), n_step=-1)[source]¶ A replay buffer that stores episodes of transitions. The additional structure allows performing various calculations of total return and other values that depend on the sequential behavior of the transitions in the episode.
Parameters: max_size – the maximum number of transitions or episodes to hold in the memory
EpisodicHindsightExperienceReplay¶
-
class
rl_coach.memories.episodic.
EpisodicHindsightExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]¶ Implements Hindsight Experience Replay as described in the following paper: https://arxiv.org/pdf/1707.01495.pdf
Parameters: - max_size – The maximum size of the memory. should be defined in a granularity of Transitions
- hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generate for each actual transition
- hindsight_goal_selection_method – The method that will be used for generating the goals for the hindsight transitions. Should be one of HindsightGoalSelectionMethod
- goals_space – A GoalsSpace which defines the base properties of the goals space
EpisodicHRLHindsightExperienceReplay¶
-
class
rl_coach.memories.episodic.
EpisodicHRLHindsightExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]¶ Implements HRL Hindsight Experience Replay as described in the following paper: https://arxiv.org/abs/1805.08180
This is the memory you should use if you want a shared hindsight experience replay buffer between multiple workers
Parameters: - max_size – The maximum size of the memory. should be defined in a granularity of Transitions
- hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generate for each actual transition
- hindsight_goal_selection_method – The method that will be used for generating the goals for the hindsight transitions. Should be one of HindsightGoalSelectionMethod
- goals_space – A GoalsSpace which defines the properties of the goals
- do_action_hindsight – Replace the action (sub-goal) given to a lower layer, with the actual achieved goal
Non-Episodic Memories¶
BalancedExperienceReplay¶
-
class
rl_coach.memories.non_episodic.
BalancedExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True, num_classes: int = 0, state_key_with_the_class_index: Any = 'class')[source]¶ Parameters: - max_size – the maximum number of transitions or episodes to hold in the memory
- allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch
- num_classes – the number of classes in the replayed data
- state_key_with_the_class_index – the class index is assumed to be a value in the state dictionary. this parameter determines the key to retrieve the class index value
QDND¶
ExperienceReplay¶
-
class
rl_coach.memories.non_episodic.
ExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True)[source]¶ A regular replay buffer which stores transition without any additional structure
Parameters: - max_size – the maximum number of transitions or episodes to hold in the memory
- allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch
PrioritizedExperienceReplay¶
-
class
rl_coach.memories.non_episodic.
PrioritizedExperienceReplay
(max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], alpha: float = 0.6, beta: rl_coach.schedules.Schedule = <rl_coach.schedules.ConstantSchedule object>, epsilon: float = 1e-06, allow_duplicates_in_batch_sampling: bool = True)[source]¶ This is the proportional sampling variant of the prioritized experience replay as described in https://arxiv.org/pdf/1511.05952.pdf.
Parameters: - max_size – the maximum number of transitions or episodes to hold in the memory
- alpha – the alpha prioritization coefficient
- beta – the beta parameter used for importance sampling
- epsilon – a small value added to the priority of each transition
- allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch