Genetic Algorithm
- use when search space is too large to use brute-force search
- e.g. solving equations, automating the process of design and solving
combinatorial problems (timetable scheduling)
- many
problems can be reformulated as exploring an n-dimensional search space
-
adaptive parameters that change linearly as time approaches end and in response to quality of candidate solution
-
elitism (preserves top candidates)
- allows for profiling and debugging (see
EventEmitter API)
-
efficient (built on typed arrays)
For an
alternative heuristic search that may work better when your
problem uses continuous (real) values see my
particle swarm optimization algorithm
that follows a similar API.
Installation
``
sh
$ npm install genetic-algo
`
NPM link.
API
Example:
`
js
const { GeneticAlgorithm: GA } = require('genetic-algo')
// silly fitness function, maximises values of all genes (see below for a better example)
const fitnessFunction = arr => arr.reduce((x, y) => x + y, 0)
// you may also supply an object with options see below DEFAULT OPTS)
const ga = new GA(fitnessFunction, 1000 / nGenes /, 'u32' / dtype /)
// Array
const bestCandidates = Array.from(ga.search() / generator /)
`
In a nutshell:
1. Specify nGenes
(Int > 0, see NGENES section below)
2. Declare a fitness function that accepts a candidate (typed array) and
returns a number. Each candidate is of length nGenes
. The candidates
that score the highest will be favoured in the
selection and will make it to the next gene pool. (see FITNESS FUNCTION section below)
For multiple objectives (e.g. you want the solution to have more than one property: price, time, quality ...) supply an array of fitness functions.
3. Choose dtype
, one of: "f64" | "f32" | "i32" | "i16" | "i8" | "u32" | "u16" | "u8"
(see DTYPE section below)
4. [EXTRA] You might want a decode
function as well (see DECODE FUNCTION section below).
Fitness Function
$3
function(TypedArray): Number
The number it returns may be positive or negative. It may be an integer
or a real number.
$3
The previous example maximised the value of every gene. This example
computes the negative of the distance from roots of an equation:
`
js
// find root of expr
const expr = (x1, x2, x3, x4, x5, x6) => (Math.log2(x1) * x2 x3 / x4) + x5 (Math.log2(x6))
const fitness = xs => {
const val = -(Math.abs(expr(...xs)))
if (Object.is(NaN, val) || Object.is(Infinity, val)) {
return -Infinity
} else {
return val
}
}
`
Fittest candidates score 0 (distance from the root is 0 meaning root has
been found), least fit candidates have a negative value.
Output from this example which uses this fitness function:
`
log2( 98) * 0^ 61 / 209 + 0^log2( 76) = 0
log2( 39) * 0^228 / 209 + 0^log2(160) = 0
log2(100) * 0^ 89 / 202 + 0^log2(151) = 0
log2(124) * 0^163 / 247 + 0^log2( 76) = 0
log2( 31) * 0^166 / 9 + 0^log2(166) = 0
log2(221) * 0^100 / 132 + 0^log2(130) = 0
log2( 2) * 0^157 / 211 + 0^log2(150) = 0
log2( 2) * 0^100 / 132 + 0^log2(130) = 0
... ... ... ... ... ... ... ...
`
It's crucial that you reward candidate solutions for approximating i.e.
getting close to the solution. If they are a bit right -- add some fitness.
$3
For multiple objectives (e.g. you want the solution to have more than one property: price, time, quality ...) supply an array of fitness functions:
`
js
const fitness = [
(cand) => getSpeed(cand),
(cand) => getPrice(cand),
(cand) => getDurability(cand),
(cand) => getUserFriendliness(cand),
]
`
The functions may return numbers in any scale. E.g. getSpeed
can return a number in [0, 100], getPrice
can return a number in [100, 1000000] etc.
You might want to provide weights for each objective (by default each is 1.0 i.e. equally important):
`
js
const opts = {
weights: [0.2, 0.4, 0.1, 0.01] // default [1.0, 1.0, 1.0, 1.0]
}
`
$3
It sometimes makes sense to have a decode(cand)
function.
`
js
function decode(cand) {
return {
price: cand[0],
category: Math.floor(cand[1]),
area: Math.floor(cand[2]),
// etc.
}
}
`
And then it's much easier in the fitness function:
`
js
function fitnessFunction(cand) {
const { price, category, area, ... } = decode(cand)
let fitnessScore = 0
fitnessScore += 1000 - price
fitnessScore += getQualOfCat(category)
fitnessScore -= getCostOfArea(area)
// other vars ...
return fitnessScore
}
`
More examples here.
NGenes
This is how many numbers each array will have. Each gene (number)
corresponds to a dimension in the search space you are exploring. For example:
# |
meaning |
domain |
gene #1 |
time |
00:00 - 24:00 |
gene #2 |
day |
0 - 365 |
gene #3 |
room number |
1 - 128 |
... |
... |
... |
gene #1000 |
building |
A - Z |
For combinatorial problems, it makes sense to store an array of choices
and let genes be indices.
`
js
const deparatment = [
"biology",
"mathematics",
"electical-engineering",
...
]
const room = [
"k21",
"k12",
"w4",
...
]
// etc.
`
then each candidate can be a Uint array [depIdx, roomIdx, ...]
.
A different approach you could take is devote 2 genes to room
and let
the first be the ASCII code of the room (a
..z
) and the second room
number (1..100
or something).
Dtype
You need to set dtype
yourself depending on the problem domain.
data type |
typed array |
min value |
max value |
"u8" |
UInt8Array |
0 |
28 |
"u16" |
UInt16Array |
0 |
216 |
"u32" |
UInt32Array |
0 |
232 |
"i8" |
Int8Array |
-28 - 1 |
28 - 1 |
"i16" |
Int16Array |
-216 - 1 |
216 - 1 |
"i32" |
Int32Array |
-232 - 1 |
232 - 1 |
"f32" |
Float32Array (32-bit IEEE float) |
1.2 * 10-38 |
3.4 * 1038 |
"f64" |
Float64Array (64-bit IEEE float) |
5.0 * 10-324 |
1.8 * 10308 |
You benefit a lot from restricting the search space by choosing e.g. i8
as opposed to i16
.
[OPTIONAL] Default
opts
In addition to required parameters (fitnessFunction
, nGenes
,
dtype
), you can also supply an object with configuration. I encourage
to begin with defaults and then tweak if necessary. Here are the
defaults:
`
js
const { Duration, PopSize, NElite, NRounds, LogLvl } = require('genetic-algo');
const opts = {
logLvl: LogLvl.SILENT,
// stop condition
//
// if you find that the algorithm gets stuck too quickly, increase it
timeOutMS: Duration.seconds(30),
// stop condition
nRounds: NRounds.LARGE, / 1000000 /
// how many candidate solutions to keep track of
//
// it makes sense for it to be 100 - 1500 ish
//
// if you find that the algorithm gets stuck too quickly, increase it
popSize: PopSize.MEDIUM / 300 /,
// number of elite candidates (guaranteed to make it to next gene pool unaltered)
//
// if you find that the algorithm gets stuck too quickly, decrease it
//
// e.g. nElite: NElite.SMALL,
// e.g. nElite: NElite.MEDIUM,
// e.g. nElite: NElite.LARGE,
// e.g. nElite: 0.1
// e.g. nElite: [0.01, 0.1]
// e.g. nElite: { start: 0.1, end: 0.5, whenFit: 'constant' }
// e.g. nElite: { start: 0.01, end: 0.25, whenFit: 'increases' }
// e.g. nElite: { start: 0.1, end: 0.5, whenFit: 'decreases' }
nElite: NElite.ADAPTIVE, / { start: 0.05, end: 0.15 } /
// probability of mutation
//
// e.g. pMutate: PMutate.SMALL,
// e.g. pMutate: PMutate.MEDIUM,
// e.g. pMutate: PMutate.LARGE,
// e.g. pMutate: 0.1
// e.g. pMutate: [0.01, 0.1]
// e.g. pMutate: { start: 0.1, end: 0.5, whenFit: 'constant' }
// e.g. pMutate: { start: 0.01, end: 0.25, whenFit: 'increases' }
// e.g. pMutate: { start: 0.1, end: 0.5, whenFit: 'decreases' }
pMutate: PMutate.ADAPTIVE, / { start: 0.1, end: 0.01, whenFit: 'increases' } /
// when mutating, target at least ? genes
//
// e.g. nMutations: NMutations.SMALL,
// e.g. nMutations: NMutations.MEDIUM,
// e.g. nMutations: NMutations.LARGE,
// e.g. nMutations: 0.1
// e.g. nMutations: [0.01, 0.1]
// e.g. nMutations: { start: 0.1, end: 0.5, whenFit: 'constant' }
// e.g. nMutations: { start: 0.01, end: 0.25, whenFit: 'increases' }
// e.g. nMutations: { start: 0.1, end: 0.5, whenFit: 'decreases' }
nMutations: NMutations.ADAPTIVE, / { start: 10, end: 1, whenFit: 'decreases' } /
// when mutating, the value of a gene is replaced with a random value
// this specifies the range of the random value
//
// when not specified, it's set intelligently based on dtype so not necessary to tweak
//
// set it manually if you have an idea of where in the search space the solution might be
// this might cause it to converge faster
//
// e.g. randGeneVal: [-100, 2000] / lower and upper bounds /
// e.g. randGeneVal: () => -200 + Math.random() 1E4 / custom function */
randGeneVal: undefined,
// when using multi-objective optimisation, you can specify relative weights for every objective
// (measured by each of fitness function from the array)
// see FITNESS FUNCTION
// e.g. weights: [0.2, 0.4, 1] / needs to have the same length as the fitenss function array /
weights: undefined,
}
`
For example:
`
js
const opts = {
timeOutMS: Duration.seconds(30),
nElite: 0.1,
}
const nGenes = 1000
const dtype = 'u32'
const ga = new GA(someFitnessFunct, nGenes, dtype, opts)
`
Theory Behind Genetic Algorithms
Genetic algorithms simulate the process of evolution. You are the
one specifying what each candidate should be good at to survive (fitness
function).
This algorithm uses a nature-inspired heuristic and has the
potential to achieve excellent results but it might not find the
optimal (ideal) solution. That said, for many applications the best
solution is not needed. By sacrificing a bit of quality you drastically
reduce the time needed to find a solution. Without such heuristics some
problems cannot be solved at all. These would NP complete problems to
which we do not have an algorithm which would run in polynomial time.
$3
This is your "DNA" which represents a **complete solution to the
problem** you are trying to solve. The algorithm keeps track of a
population of those DNA strings. Candidates are modified in such a way
that the population approaches a solution. In this implementation
candidate solutions (chromosomes) are typed arrays. Depending on what
type of problem you are trying to solve you will use a different
dtype
. Each candidate corresponds to a point in the search space that
you are exploring.
$3
Measures the value of a candidate solution. The algorithm will perform well if your fitness function is good.
$3
One of the two ways candidate solutions are modified. Crossover is all
about recombination. It is a binary operation that takes two
candidates and selects a portion of genes from one parent and the rest
from the other.
In an ideal scenario, you would inherit genes from fit individuals.
However, if you do that too often, you will loose novelty and you the
algorithm will get stuck very quickly. You can change how often fittest
candidates (elite) are chosen by changing minPElite
and maxPElite
.
NOTE nElite
needs to be non-zero for this to work.
$3
One of the two ways candidate solutions are modified. This is a **unary
operation. It takes a single candidate and randomly alters a single
gene. Mutations introduce novelty**. If your algorithm gets stuck too
quickly it's because there was not enough novelty. In an ideal scenario,
fittest candidates would undergo mutation whereas the least fit would use
crossover. Furthermore, ideally, the algorithm would explore the fitness
landscape more at the beginning and then exploit the discovered peaks at the
end of running the algorithm. This implementation does both for you automatically.
$3
Population is a collection of candidate solutions. An initial population with
popSize = 5
, nGenes = 2
, dtype = 'u8'
might look something like this:
`
js
// gene1 gene2
[23, 0] // candidate 1
[ 1, 41] // candidate 2
[10, 1] // candidate 3
[ 1, 100] // candidate 4
[ 0, 222] // candidate 5
`
Profiling with EventEmitter API
The GeneticAlgorithm
emits signals along with some information
which can be used for profiling.
NOTE data emitted is in sub-bullets.
Emitted Once
1. "start"
after .search()
and all initialisation is complete, before the 1st round
Emitted on Stop Condition Met
1. "timeout"
when timeOutMS
limit is reached.
2. "stuck"
when stuck in a local minimum.
3. "rounds"
when nRounds
limit reached.
4. "end"
when finished.
Emitted Every Round
1. "round"
on every round start (not the same as "rounds"
).
2. "op"
on every selection and mutation / crossover operation application
Example of extracting data from signals:
`
js
ga.on('start', () => console.log('[START] with opts', opts))
ga.on('stuck', () => console.log([END] stuck
))
ga.on('timeout', () => console.log([END] timeout
))
ga.on('end', () => console.log([END] after round #${ga.rIdx} (took ${ga.timeTakenMS / SEC}sec)
))
``
More examples
here.
Performance
- The bottleneck is the fitness function.
- Log less for better performance
Downsides
- single-threaded (but see
parallel example that uses the cluster module from node stdlib).
License
MIT