Reproducibility
Overview
Reproducibility is essential for evolutionary algorithms like NEAT in several contexts:
Debugging: Reproduce exact behavior to isolate and fix issues
Scientific Research: Enable others to verify and build upon your results
Algorithm Comparison: Fair evaluation requires consistent random behavior
Development: Test changes without random variation masking bugs
By default, NEAT-Python uses Python’s random module for all stochastic operations (initialization, mutation, crossover, parent selection). Without setting a seed, each run produces different results due to random initialization of Python’s random number generator.
Basic Usage
Setting a Seed via Config File
The simplest way to enable reproducibility is to add a seed parameter to your configuration file:
[NEAT]
fitness_criterion = max
fitness_threshold = 100.0
pop_size = 150
reset_on_extinction = False
no_fitness_termination = False
seed = 42 # Enable reproducibility
With this configuration, every run will produce identical evolution trajectories (assuming your fitness function is also deterministic or seeded separately).
Setting a Seed via Population Parameter
You can also set the seed programmatically when creating a Population:
import neat
config = neat.Config(neat.DefaultGenome,
neat.DefaultReproduction,
neat.DefaultSpeciesSet,
neat.DefaultStagnation,
'config-file')
# With reproducibility
pop = neat.Population(config, seed=42)
winner = pop.run(eval_genomes, 100)
The seed parameter takes precedence over the config file setting:
# Config file has seed=100, but this uses seed=42
pop = neat.Population(config, seed=42)
Note that passing seed=None explicitly to Population disables seeding from the
configuration file even if [NEAT] contains a seed = ... entry; omitting the seed argument entirely
causes Population to use config.seed if present.
Random Behavior (Default)
If you omit the seed or set it to None, NEAT-Python behaves non-deterministically:
# No seed - different results each run
pop = neat.Population(config)
winner = pop.run(eval_genomes, 100)
This is the default behavior and is suitable for final testing where you want to evaluate robustness across multiple random initializations.
Parallel Mode
Parallel evaluation using ParallelEvaluator also supports reproducibility with a per-genome seeding strategy.
Basic Parallel Reproducibility
import multiprocessing
import neat
def eval_genome(genome, config):
"""Fitness function that may use random numbers."""
net = neat.nn.FeedForwardNetwork.create(genome, config)
# Your fitness evaluation here
# Can use random.random(), random.choice(), etc.
return fitness
config = neat.Config(...)
# Create parallel evaluator with seed
with neat.ParallelEvaluator(multiprocessing.cpu_count(),
eval_genome,
seed=42) as evaluator:
pop = neat.Population(config, seed=42)
winner = pop.run(evaluator.evaluate, 100)
How Parallel Seeding Works
The ParallelEvaluator uses a deterministic per-genome seeding strategy:
Each genome gets a unique seed:
base_seed + genome.keySame genome always gets the same random sequence (given same base seed)
Different genomes get different but reproducible random sequences
Results are reproducible across runs and number of worker processes
This ensures that:
Reproducibility: Same seed produces identical results
Independence: Different genomes get different random numbers
Determinism: Same genome evaluated multiple times gets same fitness
Example: Each genome with key 1, 2, 3, … gets seeds 43, 44, 45, … (if base seed is 42).
Parallel Limitations
The parallel seeding strategy only works when:
Your fitness function uses Python’s
randommoduleFitness evaluation is deterministic given the random seed
No external non-deterministic factors (network, hardware timing, etc.)
Checkpointing
The checkpoint system automatically preserves and restores random state, ensuring perfect reproducibility when resuming from checkpoints.
Saving Checkpoints
import neat
config = neat.Config(...)
pop = neat.Population(config, seed=42)
# Save checkpoint every 5 generations
pop.add_reporter(neat.Checkpointer(5))
# Run evolution
pop.run(eval_genomes, 50)
In this example, checkpoints named neat-checkpoint-4, neat-checkpoint-9,
neat-checkpoint-14, … will be created. The numeric suffix N always
refers to the generation that has just been evaluated. In other words, a
checkpoint labeled neat-checkpoint-25 contains the evaluated population
(with fitness values) for generation 25. Restoring this checkpoint will skip
re-evaluation of generation 25 and proceed directly to reproduction, continuing
with generation 26.
Restoring Checkpoints
# Restore from checkpoint
pop = neat.Checkpointer.restore_checkpoint('neat-checkpoint-25')
# Continue evolution - will produce same results as if uninterrupted
pop.run(eval_genomes, 50)
The restored population continues with the exact random state from the checkpoint, ensuring the continued evolution is identical to what it would have been without interruption.
Checkpoint Reproducibility
Checkpoints preserve:
Population state (all genomes and their attributes)
Species structure
Generation counter
Random number generator state
Innovation tracker state
This means checkpointing maintains reproducibility even without explicitly setting a seed when restoring.
Note
Checkpoint vs. uninterrupted-run trajectories
Restoring the same checkpoint multiple times produces a deterministic continuation of evolution: given a fixed checkpoint file and fitness function, repeated restores followed by additional generations will take the same evolutionary path.
However, there is one subtle limitation: the sequence of populations you
get when running without checkpointing is not currently guaranteed to be
bit-for-bit identical to the sequence you get when running with
checkpointing and later restoring from a checkpoint at generation N.
The reason is that some helper objects involved in evolution (such as the
reproduction and stagnation helpers) are reconstructed when a checkpoint is
restored rather than being pickled and resumed exactly as-is. Although
their behavior is designed to be equivalent, and key invariants (population,
species, random state, innovation tracking) are preserved, there are still
rare edge cases where the evolutionary trajectory after generation N may
differ slightly between an uninterrupted run and a resumed run.
In practice this means:
Checkpoints are suitable for pausing/resuming long runs and for experiment management.
Checkpoints do not currently provide a strict guarantee that “run-with-checkpoint” and “run-without-checkpoint” will produce identical post-
Npopulations, even when using the same seed.
Limitations
Python Random Module Only
NEAT-Python’s seed parameter only controls Python’s random module. If your fitness function uses other random number generators, you must seed them separately.
NumPy Example:
import random
import numpy as np
import neat
# Seed all RNG sources
random.seed(42)
np.random.seed(42)
pop = neat.Population(config, seed=42)
winner = pop.run(eval_genomes, 100)
PyTorch Example:
import random
import torch
import neat
# Seed all RNG sources
random.seed(42)
torch.manual_seed(42)
pop = neat.Population(config, seed=42)
winner = pop.run(eval_genomes, 100)
Non-Deterministic Fitness Functions
Some fitness evaluations are inherently non-deterministic:
External simulators with their own RNG
Network communication with variable latency
Hardware timing dependencies
File system operations with non-deterministic ordering
Multi-threaded code with race conditions
In these cases, reproducibility may not be achievable even with proper seeding. Consider:
Using deterministic simulation modes if available
Mocking non-deterministic components during testing
Averaging over multiple evaluations to reduce variance
Documenting the non-deterministic sources in your results
Python Version Compatibility
Random number sequences may differ between Python versions or implementations (CPython vs PyPy). For complete reproducibility:
Document the Python version used
Use the same Python version for reproduction
Be aware that upgrading Python might change random sequences
Best Practices
When to Use Seeds
Development and Debugging:
Use fixed seed during development
Makes behavior predictable and repeatable
Easier to debug issues and compare changes
# Development: use fixed seed
pop = neat.Population(config, seed=42)
Scientific Research:
Use fixed seed for fair algorithm comparison
Document seed in publications and code
Enables others to reproduce your exact results
# Research: fixed seed, document in paper
SEED = 42 # Seed used for all experiments
pop = neat.Population(config, seed=SEED)
Production/Final Testing:
Run with multiple different seeds
Report statistics (mean, std, min, max) across runs
Ensures results generalize beyond single random initialization
# Production: multiple seeds for robustness
results = []
for seed in [42, 123, 456, 789, 999]:
pop = neat.Population(config, seed=seed)
winner = pop.run(eval_genomes, 100)
results.append(winner.fitness)
print(f"Mean fitness: {np.mean(results):.2f}")
print(f"Std dev: {np.std(results):.2f}")
Recommended Workflow
Develop with fixed seed - Find bugs and tune parameters
# Step 1: Development
pop = neat.Population(config, seed=42)
winner = pop.run(eval_genomes, 100)
print(f"Winner fitness: {winner.fitness}")
Verify reproducibility - Confirm seed works correctly
# Step 2: Verify reproducibility
pop1 = neat.Population(config, seed=42)
winner1 = pop1.run(eval_genomes, 100)
pop2 = neat.Population(config, seed=42)
winner2 = pop2.run(eval_genomes, 100)
assert winner1.fitness == winner2.fitness
print("✓ Results are reproducible!")
Evaluate with multiple seeds - Test robustness
# Step 3: Multiple seeds for final evaluation
import numpy as np
results = []
seeds = [42, 123, 456, 789, 999, 111, 222, 333, 444, 555]
for seed in seeds:
pop = neat.Population(config, seed=seed)
winner = pop.run(eval_genomes, 100)
results.append({
'seed': seed,
'fitness': winner.fitness,
'nodes': len(winner.nodes),
'connections': len(winner.connections)
})
fitnesses = [r['fitness'] for r in results]
print(f"Mean fitness: {np.mean(fitnesses):.2f} ± {np.std(fitnesses):.2f}")
print(f"Best fitness: {max(fitnesses):.2f}")
print(f"Worst fitness: {min(fitnesses):.2f}")
Debugging Non-Reproducible Results
If you’re getting different results despite setting a seed:
Check fitness function - Does it use non-seeded randomness?
# Bad: unseeded numpy
def eval_genomes(genomes, config):
for gid, genome in genomes:
genome.fitness = np.random.random() # Not seeded!
# Good: seeded numpy
def eval_genomes(genomes, config):
for gid, genome in genomes:
genome.fitness = random.random() # Uses NEAT's seed
Check external dependencies - Libraries with internal state?
Check Python version - Different versions may produce different sequences
Check multiprocessing - Using custom worker initialization?
Complete Examples
Basic Reproducible XOR
Here’s a complete reproducible XOR example:
import os
import neat
# XOR inputs and expected outputs
xor_inputs = [(0.0, 0.0), (0.0, 1.0), (1.0, 0.0), (1.0, 1.0)]
xor_outputs = [(0.0,), (1.0,), (1.0,), (0.0,)]
def eval_genomes(genomes, config):
for genome_id, genome in genomes:
genome.fitness = 4.0
net = neat.nn.FeedForwardNetwork.create(genome, config)
for xi, xo in zip(xor_inputs, xor_outputs):
output = net.activate(xi)
genome.fitness -= (output[0] - xo[0]) ** 2
def run():
# Load configuration
local_dir = os.path.dirname(__file__)
config_path = os.path.join(local_dir, 'config-xor')
config = neat.Config(neat.DefaultGenome,
neat.DefaultReproduction,
neat.DefaultSpeciesSet,
neat.DefaultStagnation,
config_path)
# Run with seed for reproducibility
pop = neat.Population(config, seed=42)
pop.add_reporter(neat.StdOutReporter(True))
winner = pop.run(eval_genomes, 300)
print(f"\\nWinner fitness: {winner.fitness:.6f}")
return winner
if __name__ == '__main__':
# Run twice - should get identical results
winner1 = run()
winner2 = run()
assert winner1.fitness == winner2.fitness
print("\\n✓ Results are reproducible!")
Reproducible Parallel Evaluation
import os
import multiprocessing
import neat
def eval_genome(genome, config):
"""Evaluate single genome - can use randomness."""
import random # Import in function for worker processes
net = neat.nn.FeedForwardNetwork.create(genome, config)
# Example: fitness depends on random inputs
fitness = 0.0
for _ in range(10):
inputs = [random.random(), random.random()]
outputs = net.activate(inputs)
fitness += outputs[0]
return fitness / 10.0
def run():
local_dir = os.path.dirname(__file__)
config_path = os.path.join(local_dir, 'config-neat')
config = neat.Config(neat.DefaultGenome,
neat.DefaultReproduction,
neat.DefaultSpeciesSet,
neat.DefaultStagnation,
config_path)
# Use parallel evaluation with seed
num_workers = multiprocessing.cpu_count()
with neat.ParallelEvaluator(num_workers, eval_genome, seed=42) as evaluator:
pop = neat.Population(config, seed=42)
pop.add_reporter(neat.StdOutReporter(True))
winner = pop.run(evaluator.evaluate, 100)
return winner
if __name__ == '__main__':
winner1 = run()
winner2 = run()
print(f"Run 1 fitness: {winner1.fitness:.6f}")
print(f"Run 2 fitness: {winner2.fitness:.6f}")
assert winner1.fitness == winner2.fitness
print("✓ Parallel evaluation is reproducible!")
Multi-Seed Statistical Analysis
import os
import numpy as np
import neat
def eval_genomes(genomes, config):
# Your fitness function here
pass
def run_experiment(num_runs=10):
"""Run multiple experiments with different seeds."""
config = neat.Config(...) # Load your config
results = []
for i in range(num_runs):
seed = 42 + i # Different seed for each run
pop = neat.Population(config, seed=seed)
winner = pop.run(eval_genomes, 100)
results.append({
'seed': seed,
'fitness': winner.fitness,
'generation': pop.generation,
'nodes': len(winner.nodes),
'connections': len(winner.connections)
})
print(f"Run {i+1}/{num_runs}: "
f"fitness={winner.fitness:.2f}, "
f"gen={pop.generation}")
return results
def analyze_results(results):
"""Analyze multi-seed results."""
fitnesses = [r['fitness'] for r in results]
generations = [r['generation'] for r in results]
print("\\n" + "="*60)
print("RESULTS SUMMARY")
print("="*60)
print(f"Fitness: {np.mean(fitnesses):.2f} ± {np.std(fitnesses):.2f}")
print(f" Best: {max(fitnesses):.2f}")
print(f" Worst: {min(fitnesses):.2f}")
print(f"Generations: {np.mean(generations):.1f} ± {np.std(generations):.1f}")
# Find best run
best_idx = np.argmax(fitnesses)
print(f"\\nBest run: seed={results[best_idx]['seed']}, "
f"fitness={results[best_idx]['fitness']:.2f}")
if __name__ == '__main__':
results = run_experiment(num_runs=10)
analyze_results(results)
Example Scripts
The NEAT-Python repository includes comprehensive example scripts demonstrating reproducibility:
Serial Reproducibility Example: XOR Problem
Location: examples/xor/evolve-feedforward-reproducible.py
This script demonstrates: - Reproducibility verification (same seed produces identical results) - Different seed comparison (different seeds produce different evolution paths) - Backward compatibility (evolution works without seed parameter)
Run with:
cd examples/xor
python evolve-feedforward-reproducible.py
Expected output shows all three tests passing with clear verification of reproducibility.
Parallel Reproducibility Example
Location: examples/parallel-reproducible/
This example demonstrates parallel evaluation with reproducibility:
- evolve-parallel.py - Main script with 3 reproducibility tests
- config-parallel - Configuration file with seed parameter
- README.md - Detailed documentation with best practices
The script tests: 1. Parallel reproducibility (same seed + multiple workers → identical results) 2. Seed effects (different seeds → different evolution) 3. Worker count independence (consistent results with different worker counts)
Run with:
cd examples/parallel-reproducible
python evolve-parallel.py
Expected output shows all reproducibility tests passing across different worker configurations.
See Also
Configuration file description - Configuration file format and parameters
Customizing Behavior - Customizing NEAT components
NEAT Overview - Overview of the NEAT algorithm
Module summaries - API reference for Population and ParallelEvaluator