Genesis is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. By leveraging the creative capabilities of LLMs and the optimization power of evolutionary search, Genesis enables automated exploration and improvement of scientific code.
Note: This implementation is based on and extends Shinka AI, an open-source platform for LLM-driven code evolution. We are grateful to the original authors for their foundational work.
The system is inspired by the AI Scientist, AlphaEvolve, and Darwin Goedel Machine. It maintains a population of programs that evolve over generations, with an ensemble of LLMs acting as intelligent mutation operators that suggest code improvements.
The framework supports parallel evaluation of candidates locally, on a Slurm cluster, or in cloud sandboxes. It maintains an archive of successful solutions, enabling knowledge transfer between different evolutionary islands. Genesis is particularly well-suited for scientific tasks where there is a verifier available and the goal is to optimize performance metrics while maintaining code correctness and readability.
| Guide | Description | What You'll Learn |
|---|---|---|
| π Getting Started | Installation, basic usage, and examples | Setup, first evolution run, core concepts |
| π Tutorial Notebook | Interactive walkthrough of Genesis features | Hands-on examples, configuration, best practices |
| π οΈ Creating Tasks | Guide to creating custom tasks | File structure, evaluation scripts, configuration |
| βοΈ E2B Integration | Running evaluations in cloud sandboxes | Setup, configuration, dependencies |
| βοΈ Configuration | Comprehensive configuration reference | All config options, optimization settings, advanced features |
| π¨ WebUI | Interactive visualization and monitoring | Real-time tracking, result analysis, debugging tools |
| πΊοΈ Roadmap | Future plans and language support | Supported languages, execution backends, planned features |
# Clone the repository
git clone https://github.com/GeorgePearse/Genesis
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create environment and install Genesis
cd Genesis
uv venv --python 3.12
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e .
# Run your first evolution experiment
genesis_launch variant=circle_packing_exampleFor detailed installation instructions and usage examples, see the Getting Started Guide.
| Example | Description | Environment Setup |
|---|---|---|
| β Circle Packing | Optimize circle packing to maximize radii. | LocalJobConfig |
| π€ Agent Design | Design agent scaffolds for math tasks. | LocalJobConfig |
| π― ALE-Bench | Code optimization for ALE-Bench tasks. | LocalJobConfig |
| β¨ Novelty Generator | Generate creative, surprising outputs (e.g., ASCII art). | LocalJobConfig |
For the simplest setup with default settings, you only need to specify the evaluation program:
from genesis.core import EvolutionRunner, EvolutionConfig
from genesis.database import DatabaseConfig
from genesis.launch import LocalJobConfig
# Minimal config - only specify what's required
job_config = LocalJobConfig(eval_program_path="evaluate.py")
db_config = DatabaseConfig()
evo_config = EvolutionConfig(init_program_path="initial.py",)
# Run evolution with defaults
runner = EvolutionRunner(
evo_config=evo_config,
job_config=job_config,
db_config=db_config,
)
runner.run()EvolutionConfig Parameters (click to expand)
| Key | Default Value | Type | Explanation |
|---|---|---|---|
task_sys_msg |
None |
Optional[str] |
System message describing the optimization task |
patch_types |
["diff"] |
List[str] |
Types of patches to generate: "diff", "full", "cross" |
patch_type_probs |
[1.0] |
List[float] |
Probabilities for each patch type |
num_generations |
10 |
int |
Number of evolution generations to run |
max_parallel_jobs |
2 |
int |
Maximum number of parallel evaluation jobs |
max_patch_resamples |
3 |
int |
Max times to resample a patch if it fails |
max_patch_attempts |
5 |
int |
Max attempts to generate a valid patch |
job_type |
"local" |
str |
Job execution type: "local", "slurm_docker", "slurm_conda" |
language |
"python" |
str |
Programming language for evolution |
llm_models |
["azure-gpt-4.1-mini"] |
List[str] |
List of LLM models for code generation |
llm_dynamic_selection |
None |
Optional[Union[str, BanditBase]] |
Dynamic model selection strategy |
llm_dynamic_selection_kwargs |
{} |
dict |
Kwargs for dynamic selection |
llm_kwargs |
{} |
dict |
Additional kwargs for LLM calls |
meta_rec_interval |
None |
Optional[int] |
Interval for meta-recommendations |
meta_llm_models |
None |
Optional[List[str]] |
LLM models for meta-recommendations |
meta_llm_kwargs |
{} |
dict |
Kwargs for meta-recommendation LLMs |
meta_max_recommendations |
5 |
int |
Max number of meta-recommendations |
embedding_model |
None |
Optional[str] |
Model for code embeddings |
init_program_path |
"initial.py" |
Optional[str] |
Path to initial program to evolve |
results_dir |
None |
Optional[str] |
Directory to save results (auto-generated if None) |
max_novelty_attempts |
3 |
int |
Max attempts for novelty generation |
code_embed_sim_threshold |
1.0 |
float |
Similarity threshold for code embeddings |
novelty_llm_models |
None |
Optional[List[str]] |
LLM models for novelty judgment |
novelty_llm_kwargs |
{} |
dict |
Kwargs for novelty LLMs |
use_text_feedback |
False |
bool |
Whether to use text feedback in evolution |
DatabaseConfig Parameters (click to expand)
| Key | Default Value | Type | Explanation |
|---|---|---|---|
db_path |
None |
Optional[str] |
Database file path (auto-generated if None) |
num_islands |
4 |
int |
Number of evolution islands for diversity |
archive_size |
100 |
int |
Size of program archive per island |
elite_selection_ratio |
0.3 |
float |
Proportion of elite programs for inspiration |
num_archive_inspirations |
5 |
int |
Number of archive programs to use as inspiration |
num_top_k_inspirations |
2 |
int |
Number of top-k programs for inspiration |
migration_interval |
10 |
int |
Generations between island migrations |
migration_rate |
0.1 |
float |
Proportion of island population to migrate |
island_elitism |
True |
bool |
Keep best programs on their original islands |
enforce_island_separation |
True |
bool |
Enforce full separation between islands |
parent_selection_strategy |
"power_law" |
str |
Parent selection: "weighted", "power_law", "beam_search" |
exploitation_alpha |
1.0 |
float |
Power-law exponent (0=uniform, 1=power-law) |
exploitation_ratio |
0.2 |
float |
Chance to pick parent from archive |
parent_selection_lambda |
10.0 |
float |
Sharpness of sigmoid for weighted selection |
num_beams |
5 |
int |
Number of beams for beam search selection |
JobConfig Parameters (click to expand)
LocalJobConfig (for local execution):
| Key | Default Value | Type | Explanation |
|---|---|---|---|
eval_program_path |
"evaluate.py" |
Optional[str] |
Path to evaluation script |
extra_cmd_args |
{} |
Dict[str, Any] |
Additional command line arguments |
time |
None |
Optional[str] |
Time limit for job execution |
conda_env |
None |
Optional[str] |
Conda environment to run jobs in |
SlurmDockerJobConfig (for SLURM with Docker):
| Key | Default Value | Type | Explanation |
|---|---|---|---|
eval_program_path |
"evaluate.py" |
Optional[str] |
Path to evaluation script |
extra_cmd_args |
{} |
Dict[str, Any] |
Additional command line arguments |
image |
"ubuntu:latest" |
str |
Docker image to use |
image_tar_path |
None |
Optional[str] |
Path to Docker image tar file |
docker_flags |
"" |
str |
Additional Docker flags |
partition |
"gpu" |
str |
SLURM partition to use |
time |
"01:00:00" |
str |
Job time limit |
cpus |
1 |
int |
Number of CPUs to request |
gpus |
1 |
int |
Number of GPUs to request |
mem |
"8G" |
Optional[str] |
Memory to request |
SlurmCondaJobConfig (for SLURM with Conda):
| Key | Default Value | Type | Explanation |
|---|---|---|---|
eval_program_path |
"evaluate.py" |
Optional[str] |
Path to evaluation script |
extra_cmd_args |
{} |
Dict[str, Any] |
Additional command line arguments |
conda_env |
"" |
str |
Conda environment name |
modules |
[] |
Optional[List[str]] |
Environment modules to load |
partition |
"gpu" |
str |
SLURM partition to use |
time |
"01:00:00" |
str |
Job time limit |
cpus |
1 |
int |
Number of CPUs to request |
gpus |
1 |
int |
Number of GPUs to request |
mem |
"8G" |
Optional[str] |
Memory to request |
To use EvolutionRunner, you need two key files: The evaluate.py script defines how to test and score your programs - it runs multiple evaluations, validates results, and aggregates them into metrics that guide the genesis evolution loop. The initial.py file contains your starting solution with the core algorithm that will be iteratively improved by LLMs across generations.
|
from genesis.core import run_genesis_eval
def main(program_path: str,
results_dir: str):
metrics, correct, err = run_genesis_eval(
program_path=program_path,
results_dir=results_dir,
experiment_fn_name="run_experiment",
num_runs=3, # Multi-evals to aggreg.
get_experiment_kwargs=get_kwargs,
aggregate_metrics_fn=aggregate_fn,
validate_fn=validate_fn, # Optional
)
def get_kwargs(run_idx: int) -> dict:
return {"param1": "value", "param2": 42}
def aggregate_fn(results: list) -> dict:
score = results[0]
text = results[1]
return {
"combined_score": float(score),
"public": {...}, # genesis-visible
"private": {...}, # genesis-invisible
"extra_data": {...}, # store as pkl
"text_feedback": text, # str fb
}
if __name__ == "__main__":
# argparse program path & dir
main(program_path, results_dir) |
# EVOLVE-BLOCK-START
def advanced_algo():
# This will be evolved
return solution
# EVOLVE-BLOCK-END
def run_experiment(**kwargs):
"""Main called by evaluator"""
result = solve_problem(kwargs)
return result
def solve_problem(params):
solution = advanced_algo()
return solutionKey Points:
|
genesis Launcher utilizes Hydra to configure and launch evolutionary experiments effortlessly. It supports concise configuration via Hydra's powerful override syntax, making it easy to manage and iterate scientific explorations.
# Run with pre-configured variant
genesis_launch variant=circle_packing_example
# Run with custom parameters
genesis_launch \
task=circle_packing \
database=island_large \
evolution=small_budget \
cluster=local \
evo_config.num_generations=20For comprehensive configuration options and advanced usage, see the Configuration Guide.
Monitor your evolution experiments in real-time with Genesis's interactive web interface! The WebUI provides live visualization of the evolutionary process, genealogy trees, and performance metrics.
The Programs view showing evolution results from HNSW optimization, with sortable columns for generation, score, cost, and complexity metrics.
Launch the WebUI alongside your evolution experiment:
# Start your evolution experiment
genesis_launch variant=circle_packing_example
# In another terminal, start the frontend
cd genesis/webui/frontend
npm install
npm run devFor detailed WebUI documentation, see the WebUI Guide.
- Shinka AI: The original implementation that Genesis is based on - a platform for LLM-driven program evolution
- OpenEvolve: An open-source implementation of AlphaEvolve
- LLM4AD: A Platform for Algorithm Design with Large Language Model
- Scale AgentEx: Automated experimentation and optimization for AI agents
Genesis is built upon the excellent work of the Shinka AI project. We extend our gratitude to the original authors and contributors for creating such a robust foundation for LLM-driven code evolution.
If you use Genesis in your research, please cite:
@misc{genesis2025,
title={Genesis: Platform Experiments for LLM-Driven Program Evolution},
author={Pearse, George},
howpublished={\url{https://genesis.ai}},
year={2025}
}
And please also consider citing the original Shinka AI work that this is based on.

