## Overview `autodiff-ts` is a TypeScript implementation of automatic differentiation. Automatic Differentiation (AD) [^1] is a technique for computationally determining the gradient of a function with respect to its inputs. It strikes a balance between t
npm install autodiff-tsautodiff-ts is a TypeScript implementation of automatic differentiation. Automatic Differentiation (AD) [^1] is a technique for computationally determining the gradient of a function with respect to its inputs. It strikes a balance between the precision of symbolic differentiation and the efficiency of numerical differentiation. Differentiation is a crucial part of many machine learning algorithms and optimization problems.It provides a simple function factory makeGradFn that generates a gradient function for a user-provided pure function. The user-provided function can include multiple scalar outputs, arithmetic operations, common mathematical functions and constants. The generated grad function can be used to compute the gradient of the user-provided function at a specific point.
``sh`
npm install autodiff-ts
`js
import { makeGradFn } from "autodiff-ts"
const f = (x, y) => y ** x + x
const grad = makeGradFn(f)
`
For function $f(x, y) = y^x + x$, $\frac{\partial f}{\partial x} = y^x \ln y + 1$ and $\frac{\partial f}{\partial y} = xy^{x - 1}$.
`js
grad(1, 2)
// { value: 3, gradients: [ 2.386294361119891, 1 ] } (gradients field contains [df/dx, df/dy])
grad(3, 4)
// { value: 67, gradients: [ 89.722839111673, 48 ] }
`
library. This gives us a body expression as well as the parameters that the function accepts.
- makeGradFn uses the abstract expression of the original function to create the gradient function$3

1. Reverse mode AD is the default choice (over forward mode AD) for many problems, as it reduces the repeated computation needed when an expression has multiple inputs but only one output.
- Such expressions are common in many objective functions: e.g. a loss function for a neural network.
- Rather than computing the gradient for each input variable independently, we compute the gradient of the output with respect to each input variable in a single pass.
1. We traverse the body expression of the user-provided function to build a computational graph which is an abstract representation of the function. The graph is made of
-
Variable nodes, which represent the input variables of the functions, as well as any intermediate inputs to any CompNodes
- These are "singleton" variables. An $x$ term appearing in different parts of the given function will be represented by the same Variable object. This will be significant during the gradient accumulation happening in the backward pass later.
- CompNode nodes, which represent the computation nodes that perform arithmetic operations or mathematical functions on the input variables.
- These take in a list of Variable, gradient (elementary) pairs, which represent the inputs to the computation node.
- They output a variable node, which represents the output of the computation node and a possible input to another computation node.
- We bind two functions to each CompNode:
- computeFn: which evaluates the computation node given the input values
- gradFn: which returns the elementary gradient of the output variable with respect to the input variables
- As we evaluate the expresssion left-to-right, we also store the topological order of the nodes.1. We perform a forward pass to evaluate the function given new input values.
- We use the topological order of the nodes to ensure we evaluate the nodes in the correct order.
- We use the computeFns at the ComputeNodes to evaluate the output nodes using the given inputs.
- We use the gradFn to calculate the elementary gradient given the inputs for each ComputeNode, storing the values as "edges" between the input
Variables and the ComputeNode.1. Instead of accumulating the gradient on the forward pass using the chain rule, we accumulate the gradient on the backward pass.
- This requires setting the initial gradient (
gradientAcc) as the output variable w.r.t to itself which is 1.
- We then use the chain rule to accumulate the gradient backward (See diagram), multiplying the gradient of the edge with the current gradientAcc. storing the accumulated gradient at the Variable nodes.
`ts
inputVariable.gradientAcc += node.gradientAcc * grad // accumulate gradients backward
` - Accumulating on a variable which already has a gradientAcc involves adding the new contribution to the current gradientAcc. This is to account for the multiple "branches" that this input variable takes on. This applies the generalised chain rule. [^2]
1. We return the final output value in the form
{ value, gradients }$3
> This forward mode AD implementation has many limitations (see below), and reverse mode AD should be used for its better efficiency.1. We keep a
Table`, with each entry a value-gradient pair, bound to a key[^1]: Automatic differentiation in machine learning: a survey
[^2]: In essence, the Generalised chain rule for multiple variables tells us: if output $L$ depends on functions $u(t)$ and $z(t)$, which depend on $t$, then $$\frac{\partial L}{\partial t} = \frac{\partial L}{\partial u} \frac{\partial u}{\partial t} + \frac{\partial L}{\partial z} \frac{\partial z}{\partial t}$$