This repository contains the implementation of the Ananke provenance framework, as well as performance evaluation experiments. In the following, we are providing necessary setup steps as well as instructions on how to run the experiments from our paper and how to visualize them. All steps have been tested under Ubuntu 20.04.
The following dependencies are not managed by our automatic setup and need to be installed beforehand.:
- git
- bc
- wget
- maven
- unzip
- java
- python>=3.7 + pip3
For Ubuntu 20.04: sudo apt-get install git wget bc maven unzip default-jdk python3-pip
Some experiments rely on docker to be installed:
- docker (see here) must be setup to run without root
- docker-compose (see here)
This method will automatically download Apache Flink 1.10 and the input datasets, configure path variables, compile the Ananke framework and run a short demonstrator experiment to see whether the setup was succesful.
- Clone this repository.
- From the top-level folder, run
./auto_setup.sh. - Done.
- Clone this repository.
- Download Apache Flink 1.10 from here to a folder of your choosing.
- Untar Flink:
tar zxvf flink-1.10.0-bin-scala_2.11.tgz. - Open the file
scripts/config.shand replacePATH_HEREwith the location of the untared Flink folder. - Download the datasets by running
./standalone_input_data_downloader.shfrom the top-level directory. - Compile Ananke with our compiler script by running
./scripts/compile.shfrom the top-level directory. - Install the plotting requirements by running
pip3 install -r python-requirements.txtin the top-level directory.
You are now done with the setup. To run a short demonstrator experiment for checking the success of the setup run the following line:
./scripts/run.sh ./scripts/experiments/setup_exp.sh -d 1 -r 1.Experiment scripts are found in scripts/experiments and can be executed by calling scripts/run.sh from the top-level directory of the project. The run script takes care of creating output directories based on the commit hash and the date, and it also preprocesses the output after the end of the experiment. It can control maximum duration, number of repetitions, etc. using CLI args. For example:
# Run lrAnankeCompare (experiment underlying Figure 10) for 10 reps of 10 minutes
./scripts/run.sh ./scripts/experiments/lrAnankeCompare.sh -d 10 -r 10Result files are stored in the folder data/output.
Here, we describe how to automatically reproduce the results from our paper on your available hardware.
Caution: The dataset used for the Smart Grid queries must not be published due to privacy regulations, the corresponding experiments can thus not be reproduced by third parties.
The folder reproduce/ contains one bash script labelled as the corresponding figure in the paper. Executing such a script will run the experiment automatically, store the results, and create a plot of them. When reproducing Figure 20 or Table 2, dockerd must be running. Simply enter the folder and execute, e.g.
# Reproduce Figure 10 in the paper
./figure10.shFor running variations of the experiments and plotting the results, we suggest inspecting the bash scripts in the reproduce folder.
Beware that the hardware the experiments were executed on (as indicated in the paper) may differ from yours.