Genetic Algorithm
- use when search space is too large to use brute-force search
- e.g. solving equations, automating the process of design and solving combinatorial problems (timetable scheduling)
- many problems can be reformulated as exploring an n-dimensional search space
- adaptive parameters that change linearly as time approaches end and in response to quality of candidate solution
- elitism (preserves top candidates)
- allows for profiling and debugging (see EventEmitter API)
- efficient (built on typed arrays)
For an alternative heuristic search that may work better when your problem uses continuous (real) values see my particle swarm optimization algorithm that follows a similar API.
Installation
$ npm install genetic-algo
API
Example:
const GeneticAlgorithm: GA = // silly fitness function, maximises values of all genes (see below for a better example)const fitnessFunction = arr // you may also supply an object with options see below DEFAULT OPTS)const ga = fitnessFunction 1000 /* nGenes */ 'u32' /* dtype */ // Array<TypedArray>const bestCandidates = Array
In a nutshell:
- Specify
nGenes
(Int > 0, see NGENES section below) - Declare a fitness function that accepts a candidate (typed array) and
returns a number. Each candidate is of length
nGenes
. The candidates that score the highest will be favoured in the selection and will make it to the next gene pool. (see FITNESS FUNCTION section below) For multiple objectives (e.g. you want the solution to have more than one property: price, time, quality ...) supply an array of fitness functions. - Choose
dtype
, one of:"f64" | "f32" | "i32" | "i16" | "i8" | "u32" | "u16" | "u8"
(see DTYPE section below) - [EXTRA] You might want a
decode
function as well (see DECODE FUNCTION section below).
Fitness Function
Signature
function(TypedArray): Number
The number it returns may be positive or negative. It may be an integer or a real number.
Example
The previous example maximised the value of every gene. This example computes the negative of the distance from roots of an equation:
// find root of exprconst expr = Math * x2 ** x3 / x4 + x5 ** Math const fitness = { const val = -Math if Object || Object return -Infinity else return val }
Fittest candidates score 0 (distance from the root is 0 meaning root has been found), least fit candidates have a negative value.
Output from this example which uses this fitness function:
log2( 98) * 0^ 61 / 209 + 0^log2( 76) = 0
log2( 39) * 0^228 / 209 + 0^log2(160) = 0
log2(100) * 0^ 89 / 202 + 0^log2(151) = 0
log2(124) * 0^163 / 247 + 0^log2( 76) = 0
log2( 31) * 0^166 / 9 + 0^log2(166) = 0
log2(221) * 0^100 / 132 + 0^log2(130) = 0
log2( 2) * 0^157 / 211 + 0^log2(150) = 0
log2( 2) * 0^100 / 132 + 0^log2(130) = 0
... ... ... ... ... ... ... ...
It's crucial that you reward candidate solutions for approximating i.e. getting close to the solution. If they are a bit right -- add some fitness.
Multiple Objectives
For multiple objectives (e.g. you want the solution to have more than one property: price, time, quality ...) supply an array of fitness functions:
const fitness =
The functions may return numbers in any scale. E.g. getSpeed
can return a number in [0, 100], getPrice
can return a number in [100, 1000000] etc.
You might want to provide weights for each objective (by default each is 1.0 i.e. equally important):
const opts = weights: 02 04 01 001 // default [1.0, 1.0, 1.0, 1.0]
[OPTIONAL] Decode Function
It sometimes makes sense to have a decode(cand)
function.
{ return price: cand0 category: Math area: Math // etc. }
And then it's much easier in the fitness function:
{ const price category area ... = let fitnessScore = 0 fitnessScore += 1000 - price fitnessScore += fitnessScore -= // other vars ... return fitnessScore}
More examples here.
NGenes
This is how many numbers each array will have. Each gene (number) corresponds to a dimension in the search space you are exploring. For example:
# | meaning | domain |
---|---|---|
gene #1 | time | 00:00 - 24:00 |
gene #2 | day | 0 - 365 |
gene #3 | room number | 1 - 128 |
... | ... | ... |
gene #1000 | building | A - Z |
For combinatorial problems, it makes sense to store an array of choices and let genes be indices.
const deparatment = "biology" "mathematics" "electical-engineering" ... const room = "k21" "k12" "w4" ... // etc.
then each candidate can be a Uint array [depIdx, roomIdx, ...]
.
A different approach you could take is devote 2 genes to room
and let
the first be the ASCII code of the room (a
..z
) and the second room
number (1..100
or something).
Dtype
You need to set dtype
yourself depending on the problem domain.
data type | typed array | min value | max value |
---|---|---|---|
"u8" |
UInt8Array |
0 | 28 |
"u16" |
UInt16Array |
0 | 216 |
"u32" |
UInt32Array |
0 | 232 |
"i8" |
Int8Array |
-28 - 1 | 28 - 1 |
"i16" |
Int16Array |
-216 - 1 | 216 - 1 |
"i32" |
Int32Array |
-232 - 1 | 232 - 1 |
"f32" |
Float32Array (32-bit IEEE float) |
1.2 * 10-38 | 3.4 * 1038 |
"f64" |
Float64Array (64-bit IEEE float) |
5.0 * 10-324 | 1.8 * 10308 |
You benefit a lot from restricting the search space by choosing e.g. i8
as opposed to i16
.
opts
[OPTIONAL] Default In addition to required parameters (fitnessFunction
, nGenes
,
dtype
), you can also supply an object with configuration. I encourage
to begin with defaults and then tweak if necessary. Here are the
defaults:
const Duration PopSize NElite NRounds LogLvl = ; const opts = logLvl: LogLvlSILENT // stop condition // // if you find that the algorithm gets stuck too quickly, increase it timeOutMS: Duration // stop condition nRounds: NRoundsLARGE /* 1000000 */ // how many candidate solutions to keep track of // // it makes sense for it to be 100 - 1500 ish // // if you find that the algorithm gets stuck too quickly, increase it popSize: PopSizeMEDIUM /* 300 */ // number of elite candidates (guaranteed to make it to next gene pool unaltered) // // if you find that the algorithm gets stuck too quickly, decrease it // // e.g. nElite: NElite.SMALL, // e.g. nElite: NElite.MEDIUM, // e.g. nElite: NElite.LARGE, // e.g. nElite: 0.1 // e.g. nElite: [0.01, 0.1] // e.g. nElite: { start: 0.1, end: 0.5, whenFit: 'constant' } // e.g. nElite: { start: 0.01, end: 0.25, whenFit: 'increases' } // e.g. nElite: { start: 0.1, end: 0.5, whenFit: 'decreases' } nElite: NEliteADAPTIVE /* { start: 0.05, end: 0.15 } */ // probability of mutation // // e.g. pMutate: PMutate.SMALL, // e.g. pMutate: PMutate.MEDIUM, // e.g. pMutate: PMutate.LARGE, // e.g. pMutate: 0.1 // e.g. pMutate: [0.01, 0.1] // e.g. pMutate: { start: 0.1, end: 0.5, whenFit: 'constant' } // e.g. pMutate: { start: 0.01, end: 0.25, whenFit: 'increases' } // e.g. pMutate: { start: 0.1, end: 0.5, whenFit: 'decreases' } pMutate: PMutateADAPTIVE /* { start: 0.1, end: 0.01, whenFit: 'increases' } */ // when mutating, target at least ? genes // // e.g. nMutations: NMutations.SMALL, // e.g. nMutations: NMutations.MEDIUM, // e.g. nMutations: NMutations.LARGE, // e.g. nMutations: 0.1 // e.g. nMutations: [0.01, 0.1] // e.g. nMutations: { start: 0.1, end: 0.5, whenFit: 'constant' } // e.g. nMutations: { start: 0.01, end: 0.25, whenFit: 'increases' } // e.g. nMutations: { start: 0.1, end: 0.5, whenFit: 'decreases' } nMutations: NMutationsADAPTIVE /* { start: 10, end: 1, whenFit: 'decreases' } */ // when mutating, the value of a gene is replaced with a random value // this specifies the range of the random value // // when not specified, it's set intelligently based on dtype so not necessary to tweak // // set it manually if you have an idea of where in the search space the solution might be // this might cause it to converge faster // // e.g. randGeneVal: [-100, 2000] /* lower and upper bounds */ // e.g. randGeneVal: () => -200 + Math.random() * 1E4 /* custom function */ randGeneVal: undefined // when using multi-objective optimisation, you can specify relative weights for every objective // (measured by each of fitness function from the array) // see **FITNESS FUNCTION** // e.g. weights: [0.2, 0.4, 1] /* needs to have the same length as the fitenss function array */ weights: undefined
For example:
const opts = timeOutMS: Duration nElite: 01const nGenes = 1000const dtype = 'u32' const ga = someFitnessFunct nGenes dtype opts
Theory Behind Genetic Algorithms
Genetic algorithms simulate the process of evolution. You are the one specifying what each candidate should be good at to survive (fitness function).
This algorithm uses a nature-inspired heuristic and has the potential to achieve excellent results but it might not find the optimal (ideal) solution. That said, for many applications the best solution is not needed. By sacrificing a bit of quality you drastically reduce the time needed to find a solution. Without such heuristics some problems cannot be solved at all. These would NP complete problems to which we do not have an algorithm which would run in polynomial time.
Candidate
This is your "DNA" which represents a complete solution to the
problem you are trying to solve. The algorithm keeps track of a
population of those DNA strings. Candidates are modified in such a way
that the population approaches a solution. In this implementation
candidate solutions (chromosomes) are typed arrays. Depending on what
type of problem you are trying to solve you will use a different
dtype
. Each candidate corresponds to a point in the search space that
you are exploring.
Fitness Function
Measures the value of a candidate solution. The algorithm will perform well if your fitness function is good.
Crossover
One of the two ways candidate solutions are modified. Crossover is all about recombination. It is a binary operation that takes two candidates and selects a portion of genes from one parent and the rest from the other.
In an ideal scenario, you would inherit genes from fit individuals.
However, if you do that too often, you will loose novelty and you the
algorithm will get stuck very quickly. You can change how often fittest
candidates (elite) are chosen by changing minPElite
and maxPElite
.
NOTE nElite
needs to be non-zero for this to work.
Mutations
One of the two ways candidate solutions are modified. This is a unary operation. It takes a single candidate and randomly alters a single gene. Mutations introduce novelty. If your algorithm gets stuck too quickly it's because there was not enough novelty. In an ideal scenario, fittest candidates would undergo mutation whereas the least fit would use crossover. Furthermore, ideally, the algorithm would explore the fitness landscape more at the beginning and then exploit the discovered peaks at the end of running the algorithm. This implementation does both for you automatically.
Population
Population is a collection of candidate solutions. An initial population with
popSize = 5
, nGenes = 2
, dtype = 'u8'
might look something like this:
// gene1 gene2 23 0 // candidate 1 1 41 // candidate 2 10 1 // candidate 3 1 100 // candidate 4 0 222 // candidate 5
Profiling with EventEmitter API
The GeneticAlgorithm
emits signals along with some information
which can be used for profiling.
NOTE data emitted is in sub-bullets.
Emitted Once
"start"
after.search()
and all initialisation is complete, before the 1st round
Emitted on Stop Condition Met
"timeout"
whentimeOutMS
limit is reached."stuck"
when stuck in a local minimum."rounds"
whennRounds
limit reached."end"
when finished.
Emitted Every Round
"round"
on every round start (not the same as"rounds"
)."op"
on every selection and mutation / crossover operation application
Example of extracting data from signals:
gagagaga
More examples here.
Performance
- The bottleneck is the fitness function.
- Log less for better performance
Downsides
- single-threaded (but see parallel example that uses the cluster module from node stdlib).
License
MIT