Static Factorisation of Probabilistic Programs With User-Labelled Sample Statements and While Loops

Overview:

. 
├── evaluation/
│     ├── benchmark/                        # benchmark set of 16 probabilistic models
│     │     └── generated/                  # sub-programs generated by generate_factorisation.py script
│     ├── custom/                           # implementation of custom factorisations (GMM + LDA)
│     ├── finite/                           # implementation of fixed-finite factorisation
│     ├── unrolled/                         # programs where while loop can be syntactically unrolled
│     │     └── generated/                  # sub-programs generated by generate_factorisation.py script
│     ├── models/                           # your models go here
│     ├── lmh_standard.jl                   # Standard implementation of LMH
│     ├── lmh_factorised.jl                 # Implementation of LMH that leverages factorisation into sub-programs
│     ├── bench_lmh.jl                      # script to run LMH benchmark for model
│     ├── bench_smc.jl                      # script to run SMC benchmark for model
│     ├── bench_vi.jl                       # script to run VI benchmark for model
│     ├── paper_lmh_results.csv             # LMH runtime results reported in paper
│     ├── paper_smc_results.csv             # SMC runtime results reported in paper
│     ├── paper_vi_results.csv              # VI gradient variance results reported in paper
│     ├── ppl.jl                            # implemenation of research PPL
│     ├── smc.jl                            # implemenation of SMC (naive and iterative way)
│     └── vi.jl                             # implemenation of variational inference (standard and variance-reduced way)
│
├── compare/
│     ├── gen/                              # benchmark set of 16 probabilistic models re-implemented in Gen.jl
│     │     ├── run_lmh.py                  # script to run combinator inference for every model
│     │     ├── paper_gen_results.csv       # Gen runtime results reported in paper
│     │     └── geometric_recurse.jl        # Demonstrating why we could not evaluate Recurse combinator
│     ├── pyro_bbvi/                        # benchmark set re-implemented in Pyro
│     │     ├── bbvi.py                     # Black-box variational inference implementation in Pyro
│     │     ├── run_vi.py                   # script to run BBVI for every model
│     │     ├── paper_vi_results.csv        # VI gradient variance results reported in paper
│     │     └── geometric.jl                # Demonstrating why we cannot reduce variance with control-dependencies in Pyro
│     └── webppl/                           # benchmark set of 16 probabilistic models re-implemented in WebPPL
│           ├── smc/                        # Javascript implementation of naive SMC
│           ├── run_lmh.py                  # script to run C3 inference for every model
│           ├── run_smc.py                  # script to run SMC inference for every model
│           ├── paper_lmh_results.csv       # WebPPL C3 runtime results reported in paper
│           └── paper_smc_results.csv       # WebPPL SMC runtime results reported in paper
│
├── src/   
│     ├── formal/
│     │     ├── ast.jl                      # get abstract syntax tree for julia program
│     │     ├── factorisation_builder.py    # uses model_graph.py to write sub-programs for each factor of probabilistic program
│     │     └── formal_cfg.py               # computes the CFG for probabilistic program
│     └── static/
│           ├── cfg.py                      # types for building CFG
│           ├── ir.py                       # intermediate represenation of probabilstic program
│           ├── model_graph.py              # computes dependencies between sample statements as graph
│           └── rd_bp.py                    # functionality to compute reaching definitions and branch parents
│
├── run_all.sh                              # script to run all benchmarks
├── sanity_check.py                         # script to check if artifact installation was successful
├── sanity_check_output.txt                 # log of output of sanity_check.py
├── generate_factorisation.py               # script to generate sub-programs for each factor for every benchmark model
├── run_lmh_benchmark.py                    # script to run bench_lmh.jl for every benchmark model
├── run_smc_benchmark.py                    # script to run bench_smc.jl for every benchmark model
├── run_vi_benchmark.py                     # script to run bench_vi.jl for every benchmark model
├── generate_factorisation_for_model.py     # script to generate sub-programs for your model
└── run_benchmark_for_model.py.             # script to run benchmarks for your model

Installation (Kick-the-tires instructions)

No special hardware is needed for installation.

Recommendations:

Hardware: >= 3.5 GHZ dual core CPU, >= 8 GB RAM, and >= 10 GB storage
OS: tested on macOS (Sequoia 15.2, M2 Pro), Linux 64bit (Fedora 43, 5950x), Windows 11 64bit (5950x)
Installation with Docker

Docker (Recommended)

Install docker.

You can download and load the docker image provided at Zenodo with

docker load -i pplstaticfactor-amd64.tar

or

docker load -i pplstaticfactor-arm64.tar

depending on your system, which was saved with (Docker version 28.3.0)

docker buildx build --platform linux/amd64 -t pplstaticfactor-amd64 .
docker buildx build --platform linux/arm64 -t pplstaticfactor-arm64 .
docker image save pplstaticfactor-amd64 > pplstaticfactor-amd64.tar
docker image save pplstaticfactor-arm64 > pplstaticfactor-arm64.tar

Run those images with

docker run -it --name pplstaticfactor-amd64 --rm pplstaticfactor-amd64

or

docker run -it --name pplstaticfactor-arm64 --rm pplstaticfactor-arm64

Alternatively, build the pplstaticfactor image from scratch (this may take several minutes):

docker build -t pplstaticfactor .

If the build was successful, run the docker image:

docker run -it --name pplstaticfactor --rm pplstaticfactor

Environment Reference for Custom Installation

Python 3.10.12 with requirements.txt
- sexpdata==1.0.2
- pandas==2.2.3
- numpy==2.2.4
- pyro-ppl==1.9.1
- torch==2.7.1
- tqdm==4.67.1
Julia 1.9.2 with Project.toml
- Distributions = "0.25.112"
- JuliaSyntax = "0.4.10"
- Gen = "0.4.7"
node.js 24.13
- [email protected]

Sanity Check

To test if your installation was successful, run python3 sanity_check.py and compare output against sanity_check_output.txt.

Time measurements will differ, but you should not see any errors.

This concludes the instructions for the kick-the-tires phase.

Evaluation Instructions

For convenience, you may execute bash run_all.sh to run all experiments needed for reproducing the empirical evaluation.
This will take ~12 hours to complete on systems comparable to an M2 Pro.
In the next sections, we describe each of the experiments in detail.

1. Factorisation of sub-programs

(Re-) Generate the sub-programs:

python3 generate_factorisation.py

Produces factorised versions of probabilistic programs stored in evaluation/benchmark and writes them to evaluation/benchmark/generated. (Also factorises the unrolled version of HMM evaluation/unrolled/hmm.jl to evaluation/unrolled/generated/hmm.jl).

Expected output: Same programs as currently stored in evaluation/benchmark/generated.
If you like to verify, run

mv evaluation/benchmark/generated evaluation/benchmark/generated_existing
python3 generate_factorisation.py 
diff -r evaluation/benchmark/generated evaluation/benchmark/generated_existing

This should only complain about missing .gitkeep file, indicating no difference between existing and newly generated Julia files.
To clean up, run

touch evaluation/benchmark/generated/.gitkeep
rm -rf evaluation/benchmark/generated_existing

2. LMH: Single-site Metroplis-Hastings

Run LMH benchmarks N times (we set N = 10):

python3 run_lmh_benchmark.py N

The results are written to evaluation/lmh_results.csv and aggregated in evaluation/lmh_results_aggregated.csv.
The results reported in the paper can be found in evaluation/paper_lmh_results.csv (measured on a M2 Pro CPU).

This script runs the bench_lmh.jl file which measures the runtime for the factored and non-factored versions of the LMH algorithm and asserts that the samples are the same for all LMH implementations.

Expected output: Absolute runtime measurements may differ on your hardware, but the relative speed-up reported in columns rel_static, rel_finite and rel_custom should be similar.

Estimated runtime: 30min (on system comparable to M2 Pro)

To compare our approach against existing methods, run (we set N = 10):

python3 compare/gen/run_lmh.py N

and

python3 compare/webppl/run_lmh.py N

The results are written to compare/gen/lmh_results.csv and compare/webppl/lmh_results.csv and aggregated in compare/gen/lmh_results_aggregated.csv and compare/webppl/lmh_results_aggregated.csv, respectively.
The results reported in the paper can be found in compare/gen/paper_lmh_results.csv and compare/webppl/paper_lmh_results.csv (measured on a M2 Pro CPU).

Expected output: Again, the relative speed-ups reported in the rel column should be similar on your hardware.

Estimated runtime: 1h and 30min, respectively (on system comparable to M2 Pro)

python3 print_table_1.py will generate Table 1 of the manuscript from the paper_lmh_results.csv files.

python3 print_table_2.py will generate Table 2 of the manuscript from the paper_lmh_results.csv files.

3. BBVI: Black-box Variational Inference

Run BBVI benchmarks N times (we set N = 10):

python3 run_vi_benchmark.py N

The results are written to evaluation/vi_results.csv and aggregated in evaluation/vi_results_aggregated.csv.
The results reported in the paper can be found in evaluation/paper_vi_results.csv (measured on a M2 Pro CPU).

This script runs the bench_vi.jl file which measures the gradient variance for the factored and non-factored versions of the VI algorithm.

Expected output: Runtime may differ, but the gradient variance reported in the none and static columns, as well as the relative improvement in rel_static should be similar to the reported results.

Estimated runtime: 1h30min (on system comparable to M2 Pro)

To compare our approach against existing methods, run (we set N = 1) :

python3 compare/pyro_bbvi/run_vi.py N

The results are written to compare/pyro_bbvi/vi_results.csv and aggregated in compare/pyro_bbvi/vi_results_aggregated.csv.
The results reported in the paper can be found in compare/pyro_bbvi/paper_vi_results.csv (measured on a M2 Pro CPU).

Expected output: Again, the gradient variance and relative improvement (columns none, graph, rel_graph) should be similar to the paper results.

Estimated runtime: This may take a long time to complete, ~6-8 hours on system comparable to M2 Pro. You may lower N_ITER or L in the script for faster completion.

python3 print_table_3.py will generate Table 3 of the manuscript from the paper_vi_results.csv files.

4. SMC: Sequential Monte Carlo

Run SMC benchmarks N times (we set N = 10):

python3 run_smc_benchmark.py N

The results are written to evaluation/smc_results.csv and aggregated in evaluation/smc_results_aggregated.csv.
The results reported in the paper can be found in evaluation/paper_smc_results.csv (measured on a M2 Pro CPU).

This script runs the bench_smc.jl file which measures the runtime for the naive and iterative versions of the SMC algorithm.

Expected output: Absolute runtime measurements may differ on your hardware, but the relative speed-up reported in column rel_static should be similar.

Estimated runtime: 30 min (on system comparable to M2 Pro)

To compare our approach against existing methods, run (we set N = 10):

python3 compare/webppl/run_smc.py N

The results are written to compare/webppl/smc_results.csv and aggregated in compare/webppl/smc_results_aggregated.csv.
The results reported in the paper can be found in compare/webppl/paper_smc_results.csv (measured on a M2 Pro CPU).

Expected output: Absolute runtime measurements may differ on your hardware, but the relative speed-up reported in column cps_rel should be similar.

Estimated runtime: 5min (on system comparable to M2 Pro)

python3 print_table_4.py will generate Table 3 of the manuscript from the paper_smc_results.csv files.

Reproducability Guidelines

This artifact can be used to reproduce the empirical evaluation of Section 5.

Section 5.1.

For the interested reader, we briefly list where the sub-program generation described in Section 5 is implemented.

src/formal/formal_cfg.py: implements a function which translates a Julia program (of restricted grammar like described in Section 2 and see below) to the control-flow-graph (CFG) representation (src/static/cfg.py) of Section 3.1 and 4.1.
src/static/model_graph.py: implements a function which constructs a dependency graph based on Algorithm 1 (factorisation).
src/formal/factorisation_builder.py: uses this dependency graph to generate sub-programs

Section 5.2.

The claims in this section are based on the data presented in Table 1 and Table 2.
Table 1 and Table 2 can be reproduced by following the instructions outlined in 2. LMH: Single-site Metroplis-Hastings.

Section 5.3.

The claims in this section are based on the data presented in Table 3.
Table 3 can be reproduced by following the instructions outlined in 3. BBVI: Black-box Variational Inference.

Section 5.4.

The claims in this section are based on the data presented in Table 4.
Table 4 can be reproduced by following the instructions outlined in 4. SMC: Sequential Monte Carlo.

Reusability Guidelines: Implementing your own Models

The artifact may be reused to test the static factorisation approach on new models, i.e., to generate sub-programs and to run the benchmarks for the new models.

To do so, you may implement a new model in a single file in evaluation/models.

The static analysis only accepts programs written in Julia that follow the formal syntax described in the manuscript with some minor adaptions, which we summarise in the following.

A probabilistic program is defined with a Julia function annotated with @model with statement S and arguments A1,...,An.

modelname = "..."
proposers = Dict{String, Distribution}(...)

@model function model_fn(ctx::SampleContext, A1, ..., An)
    S
end

In general, we allow arbitrary pure (side-effect free) Julia expressions E in statements.

Further, you may define pure Julia helper functions outside of this model block to be used in expressions.
Also, container types like arrays may only be used in an immutable fashion.

You may also give your model a name with the global variable modelname and custom proposal distributions for LMH with the global proposers dictionary.

For references, take a look at the models in evaluation/benchmark.

Next, we describe the set of allowed statements S:

Assignment, where x is a variable

x = E

Importantly, the first use of a variable has to be type-annotated, e.g.

x::Vector{Float64} = E

and the type of the variable has to be static throughout the program.

Sequence of two statements

S1
S2

If statements

if E
    S1
end

if E
    S1
else
    S2
end

While statement

while E
    S
end

sample statement, where ctx is the SampleContext, x is a variable and f(E1,...En) is the constructor of a distribution

x = sample(ctx, E, f(E1,...En))

We also support to inline observed values with expression C that does not depend on random variables.

x = sample(ctx, E, f(E1,...En), observed=C)

You may also omit the assignment in this case:

sample(ctx, E, f(E1,...En), observed=C)

Running Benchmarks for your Model

Generate the sub-programs with

usage: generate_factorisation_for_model.py [-h] filename

positional arguments:
  filename    name of your program in evaluations/models/ without path but with file extension

E.g. python3 generate_factorisation_for_model.py test.jl generates evaluation/models/generated/test.jl for evaluation/models/test.jl.

Run a benchmark with

usage: run_benchmark_for_model.py [-h] filename algorithm repetitions

positional arguments:
  filename     name of your program in evaluations/models/ without path but with file extension
  algorithm    benchmark to run (lp|is|lmh|smc|vi)
  repetitions  number of experiment runs

E.g. python3 run_benchmark_for_model.py test.jl lmh 3.

You can configure the number of iterations for is and lmh in evaluation/N_iters.jl based on modelname.

The lmh option runs the benchmark described in 2. LMH: Single-site Metroplis-Hastings, the vi option runs the benchmark described in 3. BBVI: Black-box Variational Inference and the smc option runs the benchmark described in 4. SMC: Sequential Monte Carlo.
The lp option benchmarks just the density function of the model and the is option benchmarks importance sampling.

Note that in order to compare the factorised SMC implementation to a naive implementation, you have to implement a data-annealed version of your model in evaluation/smc.jl (This only makes sense if your model can make use of incrementally adding observed data points during SMC inference. For examples see evaluation/smc.jl).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Static Factorisation of Probabilistic Programs With User-Labelled Sample Statements and While Loops

Installation (Kick-the-tires instructions)

Docker (Recommended)

Environment Reference for Custom Installation

Sanity Check

Evaluation Instructions

1. Factorisation of sub-programs

2. LMH: Single-site Metroplis-Hastings

3. BBVI: Black-box Variational Inference

4. SMC: Sequential Monte Carlo

Reproducability Guidelines

Section 5.1.

Section 5.2.

Section 5.3.

Section 5.4.

Reusability Guidelines: Implementing your own Models

Running Benchmarks for your Model

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
compare		compare
evaluation		evaluation
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md
bcolors.py		bcolors.py
generate_factorisation.py		generate_factorisation.py
generate_factorisation_for_model.py		generate_factorisation_for_model.py
print_table_1.py		print_table_1.py
print_table_2.py		print_table_2.py
print_table_3.py		print_table_3.py
print_table_4.py		print_table_4.py
requirements.txt		requirements.txt
run_all.sh		run_all.sh
run_benchmark_for_model.py		run_benchmark_for_model.py
run_is_benchmark.py		run_is_benchmark.py
run_lmh_benchmark.py		run_lmh_benchmark.py
run_lp_benchmark.py		run_lp_benchmark.py
run_smc_benchmark.py		run_smc_benchmark.py
run_vi_benchmark.py		run_vi_benchmark.py
sanity_check.py		sanity_check.py
sanity_check_output.txt		sanity_check_output.txt

License

markus7800/PPLStaticFactor

Folders and files

Latest commit

History

Repository files navigation

Static Factorisation of Probabilistic Programs With User-Labelled Sample Statements and While Loops

Installation (Kick-the-tires instructions)

Docker (Recommended)

Environment Reference for Custom Installation

Sanity Check

Evaluation Instructions

1. Factorisation of sub-programs

2. LMH: Single-site Metroplis-Hastings

3. BBVI: Black-box Variational Inference

4. SMC: Sequential Monte Carlo

Reproducability Guidelines

Section 5.1.

Section 5.2.

Section 5.3.

Section 5.4.

Reusability Guidelines: Implementing your own Models

Running Benchmarks for your Model

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages