A Curation Algorithm

“Genetic algorithms” are one of those things that sound a lot harder and scarier than they actually are once you start playing with them. Of course, there are ways to make them extremely complicated, but the basics of an algorithm that kinda-sorta acts like genetics are really quite easy to understand.

For those following along at home, all the code is in this gist. I also tried something new and livestreamed the creation of this mini-project on twitch.

Algorithm

Our algorithm is pretty simple. It’s not a fantastic representation of any real-world population, or even of real-world biological principles, but it’s pretty good as an outline of how genetic algorithms can be constructed. The steps are:

Create an initial population
Apply selection pressure based on fitness scores of individuals in the population
Perform gene crossover and mutation
Show population characteristics to the user
Repeat, starting from step 2

Phenotypes

Phenotypes are what the expressed genes cause the organism to look like/behave like/etc. In this case, our phenotype is a 5 by 20 grid of ASCII characters, limited to a range of human-readable characters. Here, our phenotype is also essentially the same thing as our genotype—there’s a one-to-one correspondence between the two.

Initial Population

The initial population is just a grid of entirely random characters. Here’s an example organism:

V^Jx>\_N(4XtxDT%Y,/r
Ao/9|A+4Nx3nAd#kFYWy
=aZFzA4Z]]A`)5uzZOb4
FWaANR\EXy\e"6AZ[@X7
e[LA@QvVU0ABAA2P|Aw7

Selection and Fitness

Our fitness score is extremely simple—just count of the number of As the organism has. We can then sort all the organisms in our population by their fitness. Our selection pressure is a little weird and nonbiological—we take the two top-ranked individuals and have them form the basis for the entire next generation.

Crossover and Mutation

Crossover occurs when parents combine their genes. Because we only have two parents for the entire generation, the crossover algorithm isn’t that complicated. We simply randomly select a parent’s gene for each slot in the chromosome.

However, there’s a slight twist! Every time we select a gene, there’s a small chance that a randomly-chosen gene gets inserted instead, simulating the process of random mutation. In the long run, this allows populations to slowly acquire new genes, and if they happen to increase fitness, distribute them to new generations through crossover.

Example Run

Here’s an example run, except that I’ve only sampled every 10 generations so changes appear faster. The pressure applied is to maximize the number of As in the phenotype. The fitness score is the percentage of the genes which are A as well.

*** Generation 1 ***

"L9x>V_N%4otjDT%YD3r
Io\9cu+D#b3nmd;kFYW%
=aZGzA4Z]60/w#uzGf"n
FWaANzsE1y\e"`?R[]X7
l[LAKQ0,JWmh@s2P|A@a

Fitness: 6%

*** Generation 10 ***

V^JA^\_N(4XQxDT%YA/r
Ao/]AA+4#A3nAd#AFYAy
=aZFzA4AANA`)5uzZSb4
FWaANR\EXyAe"6AA[@X7
l[LAKQvV70ABAA2P|Aw7

Fitness: 22%

*** Generation 20 ***

V^JA^A_NA4XQAATFAA/r
AA/]AA+4#A3nAd#AAaAy
=aZFUA[AANAA)muzZA94
FWaANR\EAAAeA6AAA@X7
lALAKQAVA0ABAA2P|Aw7

Fitness: 38%

*** Generation 30 ***

V^JA^A_NAAAAAA7FAA/A
AA/]A3+4#A3*Ad#AAaAy
=aZF6AAAANAA)muAZAAA
F#aANRAEAAAAA6AAA@X7
lALAKQAVA0AJAA2PAAw'

Fitness: 50%

*** Generation 40 ***

V^JAAAANAAAAAA7FAA/A
AAAKAA+WAA32Av#AAaAy
AaZF6AAAAAAA)muAAAAA
FAAAARAfAAAAAAAAA@s7
lALAKQAAAAAJAA2PAAh'

Fitness: 62%

*** Generation 50 ***

V^JAAAANAAAAAAAFAA/A
AAA^AA+4AA32AAAAAAAA
AaZA6AA1AAAA)mAAAAAA
FAAAARAfAAAAAAAAA[s7
lALAKQAAAAAJAAAPAAh'

Fitness: 71%