Goal of this exercise is to learn and understand genetic algorithms by finding an optimal configuration for highly configurable software systems
Can be for example the Linux-Kernel or SQLite.
suppose we would measure each variant of SQLite:
2^88 variants and 5 minutes per measurement (compile + benchmark)
= 2^88 * 5min / 60 (per h) / 24 (per day) / 365 (per year)
= 2,944,111,585,058,457,655,296 years
this is why we use a meta-heuristic, since we don't want to wait that long ;)
Highly configurable software systems are systems which provide many features to be selected for creating an individual variant of a program.
However, selecting and deselecting specific features for a configuration can result in efficient or an inefficient configurations of the program.
For the exercise, we want to find an optimal configuration for the h264 and bdbc systems.
h264 is a block-oriented motion-compensation-based video compression standard (source).
Berkeley DB (bdb) is a software library intended to provide a high-performance embedded database for key/value data (source). The given dataset is based on the c-implementation of bdb.
bdbc_f = "code/datasets/bdbc_feature.txt"
bdbc_i = "code/datasets/bdbc_interactions.txt"
h264_f = "code/datasets/h264_feature.txt"
h264_i = "code/datasets/h264_interactions.txt"
path = bdbc_f
#path = bdbc_i
#path = h264_f
#path = h264_i
with open(path) as f:
content = f.read()
print(content)
feature file:
root: 4.29699011265453
HAVE_CRYPTO: 0.0120768614910988
HAVE_HASH: 4.62346632789263
HAVE_REPLICATION: 0.274902720516604
...
interaction file:
PS16K#CS512MB: -4.05239323580753
PS1K#HAVE_REPLICATION: -0.732537201572133
HAVE_CRYPTO#PS8K: -0.833090135903526
HAVE_REPLICATION#PS16K#DIAGNOSTIC: 31.7326205862695
...
fitness function:
4.29 + HAVE_CRYPTO 0.01 + HAVE_HASH 4.63 + HAVE_REPLICATION * 0.27 + ... +
PS16K CS512MB (-4.05) + PS1K HAVE_REPLICATION (- .73) + ...
How to build a genetic algorithm? See lecture nodes for that.
How to model an individual candidate solution?
For example when selecting only the feature "root" in bdbc:
[1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
The fitness of this candidate solution is: 4.29699011265453
Compare your Algorithm with these values (my best performing solutions):