rlTRPOAgent
信任区域政策优化强化学习代理
Description
信任区域策略优化(TRPO)是一种无模型的,在线,上政策,政策梯度加强学习方法。该算法通过将更新的策略保留在信托区域内接近当前策略的信托区域中,从而防止了与标准策略梯度方法相比的显着性能下降。动作空间可以是离散的或连续的。
For more information on TRPO agents, seeTrust Region Policy Optimization Agents。For more information on the different types of reinforcement learning agents, see强化学习者。
Creation
Syntax
Description
Create Agent from Observation and Action Specifications
creates a trust region policy optimization (TRPO) agent for an environment with the given observation and action specifications, using default initialization options. The actor and critic in the agent use default deep neural networks built from the observation specificationagent
= rltrpoagent(observationInfo
,actionInfo
)observationInfo
和动作规范actionInfo
。TheObservationInfo
和ActionInfo
properties ofagent
are set to theobservationInfo
和actionInfo
input arguments, respectively.
creates a TRPO agent for an environment with the given observation and action specifications. The agent uses default networks configured using options specified in theagent
= rltrpoagent(observationInfo
,actionInfo
,initOpts
)initOpts
目的。TRPO代理不支持复发性神经网络。万博1manbetx有关初始化选项的更多信息,请参见RlagentInitializatizationAptions
。
Create Agent from Actor and Critic
Specify Agent Options
creates a TRPO agent and sets theAgentOptionsproperty to theagent
= rltrpoagent(___,agentOptions
)agentOptions
输入参数。在上一个语法中的任何输入参数之后,请使用此语法。
Input Arguments
Properties
Object Functions
train |
Train reinforcement learning agents within a specified environment |
sim |
在指定环境中模拟训练有素的加固学习剂 |
getAction |
Obtain action from agent or actor given environment observations |
getActor |
Get actor from reinforcement learning agent |
setActor |
Set actor of reinforcement learning agent |
getCritic |
Get critic from reinforcement learning agent |
setCritic |
Set critic of reinforcement learning agent |
生成PolicyFunction |
Create function that evaluates trained policy of reinforcement learning agent |
Examples
Tips
对于连续的动作空间,该代理不会强制执行操作规范设置的约束。在这种情况下,您必须在环境中执行动作空间约束。
在调整Actor网络的学习率是PPO代理所必需的,但对于TRPO代理来说并不是必需的。
For high-dimensional observations, such as for images, it is recommended to use PPO, SAC, or TD3 agents.