Core interfaces for rl-js: Reinforcement Learning in JavaScript
npm install @rl-js/interfacesFunctionApproximatorFunctionApproximator* ActionTraces
* .record(state, action) ⇒ ActionTraces
* .update(error) ⇒ ActionTraces
* .decay(amount) ⇒ ActionTraces
* .reset() ⇒ ActionTraces
Kind: instance method of ActionTraces
Returns: ActionTraces - - This object
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
Kind: instance method of ActionTraces
Returns: ActionTraces - - This object
| Param | Type | Description |
| --- | --- | --- |
| error | number | The current TD error |
Kind: instance method of ActionTraces
Returns: ActionTraces - - This object
| Param | Type | Description |
| --- | --- | --- |
| amount | number | The amount to multiply the traces by, usually a value less than 1. |
Kind: instance method of ActionTraces
Returns: ActionTraces - - This object
FunctionApproximatorFunctionApproximator * ActionValueFunction ⇐ FunctionApproximator
* .call(state, action) ⇒ number
* .update(state, action, error)
* .gradient(state, action) ⇒ Array.<number>
* .getParameters() ⇒ Array.<number>
* .setParameters(parameters)
* .updateParameters(errors)
Kind: instance method of ActionValueFunction
Overrides: call
Returns: number - - The approximated action value (q)
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
Kind: instance method of ActionValueFunction
Overrides: update
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
| error | number | The difference between the target value and the currently approximated value |
Kind: instance method of ActionValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
Kind: instance method of ActionValueFunction
Returns: Array.<number> - The parameters that define the function approximator
Kind: instance method of ActionValueFunction
| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | new parameters for the function approximator |
Kind: instance method of ActionValueFunction
| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |
AgentFactory * Agent
* .newEpisode(environment)
* .act()
Kind: instance method of Agent
| Param | Type | Description |
| --- | --- | --- |
| environment | Environment | The Environment object for the new episode. |
Kind: instance method of Agent
EnvironmentFactory * Environment
* .dispatch(action)
.getObservation() ⇒ \
* .getReward() ⇒ number
* .isTerminated() ⇒ boolean
Kind: instance method of Environment
| Param | Type | Description |
| --- | --- | --- |
| action | \* | An action object specific to the environment. |
Kind: instance method of Environment
Returns: \* - An observation object specific to the environment.
Kind: instance method of Environment
Returns: number - A scalar representing the reward for the current timestep.
Kind: instance method of Environment
Returns: boolean - A boolean representing whether or not the episode has terminated.
* FunctionApproximator
* .call(args) ⇒ number
* .update(args, error)
* .gradient(args) ⇒ Array.<number>
* .getParameters() ⇒ Array.<number>
* .setParameters(parameters)
* .updateParameters(errors)
Kind: instance method of FunctionApproximator
Returns: number - - The approximated value of the function at the given point
| Param | Type | Description |
| --- | --- | --- |
| args | \* | Arguments to the function being approximated approximated |
Kind: instance method of FunctionApproximator
| Param | Type | Description |
| --- | --- | --- |
| args | \* | Arguments to the function being approximated approximated |
| error | number | The difference between the target value and the currently approximated value |
Kind: instance method of FunctionApproximator
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point
| Param | Type | Description |
| --- | --- | --- |
| args | Array.<number> | Arguments to the function being approximated approximated |
Kind: instance method of FunctionApproximator
Returns: Array.<number> - The parameters that define the function approximator
Kind: instance method of FunctionApproximator
| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | new parameters for the function approximator |
Kind: instance method of FunctionApproximator
| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |
* PolicyTraces
* .record(state, action) ⇒ PolicyTraces
* .update(error) ⇒ PolicyTraces
* .decay(amount) ⇒ PolicyTraces
* .reset() ⇒ PolicyTraces
Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object
| Param | Type | Description |
| --- | --- | --- |
| error | number | The current TD error |
Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object
| Param | Type | Description |
| --- | --- | --- |
| amount | number | The amount to multiply the traces by, usually a value less than 1. |
Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object
* Policy
.chooseAction(state) ⇒ \
.chooseBestAction(state) ⇒ \
* .probability(state, action) ⇒ number
* .update(state, action, error)
* .gradient(state, action) ⇒ Array.<number>
* .trueGradient(state, action) ⇒ Array.<number>
* .getParameters() ⇒ Array.<number>
* .setParameters(parameters)
* .updateParameters(errors)
Kind: instance method of Policy
Returns: \* - An Action object of type specific to the environment
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
Kind: instance method of Policy
Returns: \* - An Action object of type specific to the environment
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
Kind: instance method of Policy
Returns: number - the probability between [0, 1]
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
Kind: instance method of Policy
| Param | Type | Description |
| --- | --- | --- |
| state | Array.<number> | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
| error | number | The direction and magnitude of the update |
Kind: instance method of Policy
Returns: Array.<number> - The gradient of the policy
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
Kind: instance method of Policy
Returns: Array.<number> - The gradient of log(π(state, action))
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| action | \* | Action object of type specific to the environment |
Kind: instance method of Policy
Returns: Array.<number> - The parameters that define the policy
Kind: instance method of Policy
| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | The parameters that define the policy |
Kind: instance method of Policy
| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |
* StateTraces
* .record(state) ⇒ StateTraces
* .update(error) ⇒ StateTraces
* .decay(amount) ⇒ StateTraces
* .reset() ⇒ StateTraces
Kind: instance method of StateTraces
Returns: StateTraces - - This object
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
Kind: instance method of StateTraces
Returns: StateTraces - - This object
| Param | Type | Description |
| --- | --- | --- |
| error | number | The current TD error |
Kind: instance method of StateTraces
Returns: StateTraces - - This object
| Param | Type | Description |
| --- | --- | --- |
| amount | number | The amount to multiply the traces by, usually a value less than 1. |
Kind: instance method of StateTraces
Returns: StateTraces - - This object
FunctionApproximatorFunctionApproximator * StateValueFunction ⇐ FunctionApproximator
* .call(state) ⇒ number
* .update(state, error)
* .gradient(state) ⇒ Array.<number>
* .getParameters() ⇒ Array.<number>
* .setParameters(parameters)
* .updateParameters(errors)
Kind: instance method of StateValueFunction
Overrides: call
Returns: number - - The approximated state value (v)
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
Kind: instance method of StateValueFunction
Overrides: update
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
| error | number | The difference between the target value and the currently approximated value |
Kind: instance method of StateValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point
| Param | Type | Description |
| --- | --- | --- |
| state | \* | State object of type specific to the environment |
Kind: instance method of StateValueFunction
Returns: Array.<number> - The parameters that define the function approximator
Kind: instance method of StateValueFunction
| Param | Type | Description |
| --- | --- | --- |
| parameters | Array.<number> | new parameters for the function approximator |
Kind: instance method of StateValueFunction
| Param | Type | Description |
| --- | --- | --- |
| errors | Array.<number> | = The direction with which to update each parameter |