@rl-js/interfaces

Interfaces

ActionTraces
ActionValueFunction ⇐ FunctionApproximator
AgentFactory
Agent
EnvironmentFactory
Environment
FunctionApproximator
PolicyTraces
Policy
StateTraces
StateValueFunction ⇐ FunctionApproximator

ActionTraces

Kind: global interface

* ActionTraces
* .record(state, action) ⇒ ActionTraces
* .update(error) ⇒ ActionTraces
* .decay(amount) ⇒ ActionTraces
* .reset() ⇒ ActionTraces

$3

Records a trace for the given state-action pair.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Updates the value function based on the stored traces, and the given error.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| error | number | The current TD error |

$3

Decay the traces by the given amount.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| amount | number | The amount to multiply the traces by, usually a value less than 1. |

$3

Reset the traces to their starting values.
Usually called at the beginning of an episode.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

ActionValueFunction ⇐ `FunctionApproximator`

Kind: global interface
Extends: FunctionApproximator

* ActionValueFunction ⇐ FunctionApproximator
* .call(state, action) ⇒ number
* .update(state, action, error)
* .gradient(state, action) ⇒ Array.<number>
* .getParameters() ⇒ Array.<number>
* .setParameters(parameters)
* .updateParameters(errors)

$3

Estimate the expected value of the returns given a specific state-action pair

Kind: instance method of ActionValueFunction
Overrides: call
Returns: number - - The approximated action value (q)

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Update the value of the function approximator for a given state-action pair

Kind: instance method of ActionValueFunction
Overrides: update

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
| error | number | The difference between the target value and the currently approximated value |

$3

Compute the gradient of the function approximator for a given state-action pair,
with respect to its parameters.

Kind: instance method of ActionValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Get the differentiable parameters of the function approximator

Kind: instance method of ActionValueFunction
Returns: Array.<number> - The parameters that define the function approximator

$3

Set the differentiable parameters fo the function approximator

Kind: instance method of ActionValueFunction

| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | new parameters for the function approximator |

$3

Update the parameters in some direction given by an array of errors.

Kind: instance method of ActionValueFunction

| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |

AgentFactory

Kind: global interface

$3

Kind: instance method of AgentFactory

Agent

Kind: global interface

* Agent
* .newEpisode(environment)
* .act()

$3

Prepare the agent of the next episode.
The Agent should perform any cleanup and
setup stepts that are necessary here.
An Environment object is passed in,
which the agent should store
each time.

Kind: instance method of Agent

| Param | Type | Description |
| --- | --- | --- |
| environment | Environment | The Environment object for the new episode. |

$3

Perform an action for the current timestep.
Usually, the agent should at least:
1) dispatch an action to the environment, and
2) perform any necessary internal updates (e.g. updating the value function).

Kind: instance method of Agent

EnvironmentFactory

Kind: global interface

$3

Kind: instance method of EnvironmentFactory

Environment

Kind: global interface

* Environment
* .dispatch(action)
.getObservation() ⇒ \
* .getReward() ⇒ number
* .isTerminated() ⇒ boolean

$3

Apply an action selected by an Agent to the environment.
This could a string representing the action (e.g. "LEFT"),
or an array representing the force to apply on actuators, etc.

Kind: instance method of Environment

| Param | Type | Description |
| --- | --- | --- |
| action | \* | An action object specific to the environment. |

$3

Get an environment-specific observation for the current timestep.
This might be a string identifying the current state,
an array representing the current environment parameters,
pixel-data representing the agent's vision, etc.

Kind: instance method of Environment
Returns: \* - An observation object specific to the environment.

$3

Get the reward for the current timestep.
Rewards guide the learning of the agent:
Positive rewards should be given when the agent selects good actions,
and negative rewards should be given when the agent selects bad actions.

Kind: instance method of Environment
Returns: number - A scalar representing the reward for the current timestep.

$3

Return whether or not the current episode is terminated, or finished.
For example, this should return True if the agent has reached some goal,
if the maximum number of timesteps has been exceeded, or if the agent has
otherwise failed. Otherwise, this should return False.

Kind: instance method of Environment
Returns: boolean - A boolean representing whether or not the episode has terminated.

FunctionApproximator

Kind: global interface

* FunctionApproximator
* .call(args) ⇒ number
* .update(args, error)
* .gradient(args) ⇒ Array.<number>
* .getParameters() ⇒ Array.<number>
* .setParameters(parameters)
* .updateParameters(errors)

$3

Call the function approximators with the given arguments.
The FA should return an estimate of the value of the function
at the point given by the arguments.

Kind: instance method of FunctionApproximator
Returns: number - - The approximated value of the function at the given point

| Param | Type | Description |
| --- | --- | --- |
| args | \* | Arguments to the function being approximated approximated |

$3

Update the value of the function approximator at the given point.

Kind: instance method of FunctionApproximator

| Param | Type | Description |
| --- | --- | --- |
| args | \* | Arguments to the function being approximated approximated |
| error | number | The difference between the target value and the currently approximated value |

$3

Compute the gradient of the function approximator at the given point,
with respect to its parameters.

Kind: instance method of FunctionApproximator
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

| Param | Type | Description |
| --- | --- | --- |
| args | Array.<number> | Arguments to the function being approximated approximated |

$3

Get the differentiable parameters of the function approximator

Kind: instance method of FunctionApproximator
Returns: Array.<number> - The parameters that define the function approximator

$3

Set the differentiable parameters fo the function approximator

Kind: instance method of FunctionApproximator

| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | new parameters for the function approximator |

$3

Update the parameters in some direction given by an array of errors.

Kind: instance method of FunctionApproximator

| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |

PolicyTraces

Kind: global interface

* PolicyTraces
* .record(state, action) ⇒ PolicyTraces
* .update(error) ⇒ PolicyTraces
* .decay(amount) ⇒ PolicyTraces
* .reset() ⇒ PolicyTraces

$3

Records a trace for the given state-action pair.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Updates the value function based on the stored traces, and the given error.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| error | number | The current TD error |

$3

Decay the traces by the given amount.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| amount | number | The amount to multiply the traces by, usually a value less than 1. |

$3

Reset the traces to their starting values.
Usually called at the beginning of an episode.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

Policy

Kind: global interface

* Policy
.chooseAction(state) ⇒ \
.chooseBestAction(state) ⇒ \
* .probability(state, action) ⇒ number
* .update(state, action, error)
* .gradient(state, action) ⇒ Array.<number>
* .trueGradient(state, action) ⇒ Array.<number>
* .getParameters() ⇒ Array.<number>
* .setParameters(parameters)
* .updateParameters(errors)

$3

Choose an action given the current state.

Kind: instance method of Policy
Returns: \* - An Action object of type specific to the environment

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Choose the best known action given the current state.

Kind: instance method of Policy
Returns: \* - An Action object of type specific to the environment

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Compute the probability of selecting a given action in a given state.

Kind: instance method of Policy
Returns: number - the probability between [0, 1]

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Update the probability of choosing a particular action in a particular state.
Generally, a positive error should make chosing the action more likely,
and a negative error should make chosing the action less likely.

Kind: instance method of Policy

| Param | Type | Description |
| --- | --- | --- |
| state | Array.<number> | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
| error | number | The direction and magnitude of the update |

$3

Compute the gradient of natural logarithm of the probability of
choosing the given action in the given state
with respect to the parameters of the policy.
This can often be computed more efficiently than the true gradient.

Kind: instance method of Policy
Returns: Array.<number> - The gradient of the policy

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Compute the true gradient of the probability of
choosing the given action in the given state
with respect to the parameters of the policy.
This is contrast to the log gradient which is used for most things.

Kind: instance method of Policy
Returns: Array.<number> - The gradient of log(π(state, action))

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Get the differentiable parameters of the policy

Kind: instance method of Policy
Returns: Array.<number> - The parameters that define the policy

$3

Set the differentiable parameters of the policy

Kind: instance method of Policy

| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | The parameters that define the policy |

$3

Update the parameters in some direction given by an array of errors.

Kind: instance method of Policy

| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |

StateTraces

Kind: global interface

* StateTraces
* .record(state) ⇒ StateTraces
* .update(error) ⇒ StateTraces
* .decay(amount) ⇒ StateTraces
* .reset() ⇒ StateTraces

$3

Records a trace for the given state

Kind: instance method of StateTraces
Returns: StateTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Updates the value function based on the stored traces, and the given error.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| error | number | The current TD error |

$3

Decay the traces by the given amount.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| amount | number | The amount to multiply the traces by, usually a value less than 1. |

$3

Reset the traces to their starting values.
Usually called at the beginning of an episode.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

StateValueFunction ⇐ `FunctionApproximator`

Kind: global interface
Extends: FunctionApproximator

* StateValueFunction ⇐ FunctionApproximator
* .call(state) ⇒ number
* .update(state, error)
* .gradient(state) ⇒ Array.<number>
* .getParameters() ⇒ Array.<number>
* .setParameters(parameters)
* .updateParameters(errors)

$3

Estimate the expected value of the returns given a specific state.

Kind: instance method of StateValueFunction
Overrides: call
Returns: number - - The approximated state value (v)

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Update the value of the function approximator for a given state

Kind: instance method of StateValueFunction
Overrides: update

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| error | number | The difference between the target value and the currently approximated value |

$3

Compute the gradient of the function approximator for a given state,
with respect to its parameters.

Kind: instance method of StateValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Get the differentiable parameters of the function approximator

Kind: instance method of StateValueFunction
Returns: Array.<number> - The parameters that define the function approximator

$3

Set the differentiable parameters fo the function approximator

Kind: instance method of StateValueFunction

| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | new parameters for the function approximator |

$3

Update the parameters in some direction given by an array of errors.

Kind: instance method of StateValueFunction

| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |

Interfaces

ActionTraces
ActionValueFunction ⇐ FunctionApproximator
AgentFactory
Agent
EnvironmentFactory
Environment
FunctionApproximator
PolicyTraces
Policy
StateTraces
StateValueFunction ⇐ FunctionApproximator

ActionTraces

Kind: global interface

* ActionTraces
* .record(state, action) ⇒ ActionTraces
* .update(error) ⇒ ActionTraces
* .decay(amount) ⇒ ActionTraces
* .reset() ⇒ ActionTraces

$3

Records a trace for the given state-action pair.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Updates the value function based on the stored traces, and the given error.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| error | number | The current TD error |

$3

Decay the traces by the given amount.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| amount | number | The amount to multiply the traces by, usually a value less than 1. |

$3

Reset the traces to their starting values.
Usually called at the beginning of an episode.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

ActionValueFunction ⇐ `FunctionApproximator`

Kind: global interface
Extends: FunctionApproximator

$3

Estimate the expected value of the returns given a specific state-action pair

Kind: instance method of ActionValueFunction
Overrides: call
Returns: number - - The approximated action value (q)

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Update the value of the function approximator for a given state-action pair

Kind: instance method of ActionValueFunction
Overrides: update

$3

Compute the gradient of the function approximator for a given state-action pair,
with respect to its parameters.

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Get the differentiable parameters of the function approximator

Kind: instance method of ActionValueFunction
Returns: Array.<number> - The parameters that define the function approximator

$3

Set the differentiable parameters fo the function approximator

Kind: instance method of ActionValueFunction

| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | new parameters for the function approximator |

$3

Update the parameters in some direction given by an array of errors.

Kind: instance method of ActionValueFunction

| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |

AgentFactory

Kind: global interface

$3

Kind: instance method of AgentFactory

Agent

Kind: global interface

* Agent
* .newEpisode(environment)
* .act()

$3

Prepare the agent of the next episode.
The Agent should perform any cleanup and
setup stepts that are necessary here.
An Environment object is passed in,
which the agent should store
each time.

Kind: instance method of Agent

| Param | Type | Description |
| --- | --- | --- |
| environment | Environment | The Environment object for the new episode. |

$3

Kind: instance method of Agent

EnvironmentFactory

Kind: global interface

$3

Kind: instance method of EnvironmentFactory

Environment

Kind: global interface

* Environment
* .dispatch(action)
.getObservation() ⇒ \
* .getReward() ⇒ number
* .isTerminated() ⇒ boolean

$3

Apply an action selected by an Agent to the environment.
This could a string representing the action (e.g. "LEFT"),
or an array representing the force to apply on actuators, etc.

Kind: instance method of Environment

| Param | Type | Description |
| --- | --- | --- |
| action | \* | An action object specific to the environment. |

$3

Kind: instance method of Environment
Returns: \* - An observation object specific to the environment.

$3

Kind: instance method of Environment
Returns: number - A scalar representing the reward for the current timestep.

$3

Kind: instance method of Environment
Returns: boolean - A boolean representing whether or not the episode has terminated.

FunctionApproximator

Kind: global interface

$3

Call the function approximators with the given arguments.
The FA should return an estimate of the value of the function
at the point given by the arguments.

Kind: instance method of FunctionApproximator
Returns: number - - The approximated value of the function at the given point

| Param | Type | Description |
| --- | --- | --- |
| args | \* | Arguments to the function being approximated approximated |

$3

Update the value of the function approximator at the given point.

Kind: instance method of FunctionApproximator

$3

Compute the gradient of the function approximator at the given point,
with respect to its parameters.

Kind: instance method of FunctionApproximator
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

| Param | Type | Description |
| --- | --- | --- |
| args | Array.<number> | Arguments to the function being approximated approximated |

$3

Get the differentiable parameters of the function approximator

Kind: instance method of FunctionApproximator
Returns: Array.<number> - The parameters that define the function approximator

$3

Set the differentiable parameters fo the function approximator

Kind: instance method of FunctionApproximator

| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | new parameters for the function approximator |

$3

Update the parameters in some direction given by an array of errors.

Kind: instance method of FunctionApproximator

| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |

PolicyTraces

Kind: global interface

* PolicyTraces
* .record(state, action) ⇒ PolicyTraces
* .update(error) ⇒ PolicyTraces
* .decay(amount) ⇒ PolicyTraces
* .reset() ⇒ PolicyTraces

$3

Records a trace for the given state-action pair.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Updates the value function based on the stored traces, and the given error.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| error | number | The current TD error |

$3

Decay the traces by the given amount.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| amount | number | The amount to multiply the traces by, usually a value less than 1. |

$3

Reset the traces to their starting values.
Usually called at the beginning of an episode.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

Policy

Kind: global interface

$3

Choose an action given the current state.

Kind: instance method of Policy
Returns: \* - An Action object of type specific to the environment

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Choose the best known action given the current state.

Kind: instance method of Policy
Returns: \* - An Action object of type specific to the environment

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Compute the probability of selecting a given action in a given state.

Kind: instance method of Policy
Returns: number - the probability between [0, 1]

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Kind: instance method of Policy

$3

Kind: instance method of Policy
Returns: Array.<number> - The gradient of the policy

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Kind: instance method of Policy
Returns: Array.<number> - The gradient of log(π(state, action))

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |

$3

Get the differentiable parameters of the policy

Kind: instance method of Policy
Returns: Array.<number> - The parameters that define the policy

$3

Set the differentiable parameters of the policy

Kind: instance method of Policy

| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | The parameters that define the policy |

$3

Update the parameters in some direction given by an array of errors.

Kind: instance method of Policy

| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |

StateTraces

Kind: global interface

* StateTraces
* .record(state) ⇒ StateTraces
* .update(error) ⇒ StateTraces
* .decay(amount) ⇒ StateTraces
* .reset() ⇒ StateTraces

$3

Records a trace for the given state

Kind: instance method of StateTraces
Returns: StateTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Updates the value function based on the stored traces, and the given error.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| error | number | The current TD error |

$3

Decay the traces by the given amount.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

| Param | Type | Description |
| --- | --- | --- |
| amount | number | The amount to multiply the traces by, usually a value less than 1. |

$3

Reset the traces to their starting values.
Usually called at the beginning of an episode.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

StateValueFunction ⇐ `FunctionApproximator`

Kind: global interface
Extends: FunctionApproximator

$3

Estimate the expected value of the returns given a specific state.

Kind: instance method of StateValueFunction
Overrides: call
Returns: number - - The approximated state value (v)

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Update the value of the function approximator for a given state

Kind: instance method of StateValueFunction
Overrides: update

$3

Compute the gradient of the function approximator for a given state,
with respect to its parameters.

Kind: instance method of StateValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |

$3

Get the differentiable parameters of the function approximator

Kind: instance method of StateValueFunction
Returns: Array.<number> - The parameters that define the function approximator

$3

Set the differentiable parameters fo the function approximator

Kind: instance method of StateValueFunction

| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | new parameters for the function approximator |

$3

Update the parameters in some direction given by an array of errors.

Kind: instance method of StateValueFunction

| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |

Interfaces

ActionTraces

$3

$3

$3

$3

ActionValueFunction ⇐ FunctionApproximator

$3

$3

$3

$3

$3

$3

AgentFactory

$3

Agent

$3

$3

EnvironmentFactory

$3

Environment

$3

$3

$3

$3

FunctionApproximator

$3

$3

$3

$3

$3

$3

PolicyTraces

$3

$3

$3

$3

Policy

$3

$3

$3

$3

$3

$3

$3

$3

$3

StateTraces

$3

$3

$3

$3

StateValueFunction ⇐ FunctionApproximator

$3

$3

$3

$3

$3

$3

@rl-js/interfaces

Interfaces

ActionTraces

$3

$3

$3

$3

ActionValueFunction ⇐ FunctionApproximator

$3

$3

$3

$3

$3

$3

AgentFactory

$3

Agent

$3

$3

EnvironmentFactory

ActionValueFunction ⇐ `FunctionApproximator`

StateValueFunction ⇐ `FunctionApproximator`

ActionValueFunction ⇐ `FunctionApproximator`

StateValueFunction ⇐ `FunctionApproximator`