COLOR is a framework for improving traditional cardinality estimators in graph databases by applying graph colorings to produce a compact summary of the data graph. The project uses the Julia language for its implementation and provides tools to build graphs and run estimations.
To set up a local copy of the project, complete the following steps:
- Download the repository.
$ git clone
- Include the necessary Julia Packages.
$ julia
julia> using Pkg;
julia> Pkg.instantiate();
- Download query graphs, data graphs, and true cardinalities from G-Care and In-Memory Subgraph Matching, also available as zipped files.
Generally, the steps of a cardinality estimation will be:
- Create a
QueryGraph - Create a
DataGraph - Create a
ColorSummaryof theDataGraph - Perform a cardinality estimation using the
QueryGraphand theColorSummary - Find the exact cardinality using the
DataGraphandQueryGraphto evaluate the estimation.
We define PropertyGraphs which contain information about edge and vertex labels. PropertyGraphs have two types: QueryGraphs and DataGraphs.
To create a QueryGraph, an initial graph with default wildcard (-1) labels can be constructed using either the number of vertices or an existing DiGraph, then the labels can be updated after construction.
# initialize a QueryGraph using the number of vertices...
q = QueryGraph(3)
# ... or initialize with an existing DiGraph
q = QueryGraph(DiGraph(3))
# now update the labels using functions from PropertyGraph.jl
add_labeled_edge!(q, (1, 2), 2)
update_data_labels!(q, 1, 3)
update_node_labels!(q, 2, [3,4])DataGraphs are also created the same way with default wildcard (-1) labels. The main difference for a DataGraph is that its vertex data labels are not arbitrary and cannot be changed - instead, data labels are equivalent to vertex_id - 1.
# initialize a DataGraph using the number of vertices...
d = DataGraph(3)
# ... or initialize with an existing DiGraph
d = DataGraph(DiGraph(3))
# now update the labels using functions from PropertyGraph.jl
add_labeled_edge!(d, (1, 2), 2)
update_node_labels!(d, 2, [3,4])Functions to convert graph files formatted like G-Care or In-Memory Subgraph Matching are also included in src/Datasets.jl:
# convert a .txt file (following G-Care format) into a PropertyGraph
d1 = load_dataset("data.txt")
q1 = load_query("query.txt")
# convert a .graph file (following Subgraph-Matching format) into a PropertyGraph
d2 = load_dataset("data.graph",subgraph_matching_data=true)
q2 = load_query("query.graph", subgraph_matching_data=true)A lifted ColorSummary describing the overall DataGraph is necessary before performing cardinality estimation for a specified QueryGraph. To do this, simply use the generate_color_summary function from the ColorSummary.jl file:
d = load_dataset("data.txt")
summary = generate_color_summary(d)Optional parameters for the summary-building can be changed such as the type of coloring to use or the amount of sampling to allow, but more information can be found in the code documentation.
After obtaining a lifted ColorSummary, a cardinality estimation can be achieved for any QueryGraph by using the get_cardinality_bounds function from the QuasiStableCardinalityEstimator.jl file:
d = load_dataset("data.txt")
q = load_queryset("query.txt")
summary = generate_color_summary(d)
estimate = get_cardinality_bounds(q, summary)The resulting estimate will be a singular Float64 value.
Optional parameters for the estimation can be changed such as the sampling strategy or how to handle cycles, but more information can be found in the code documentation.
To calculate the accuracy of an estimation, we compare the result to the exact cardinality of the QueryGraph in the DataGraph. To do this, a get_exact_size function is included in the ExactSizeCalculator.jl file:
d = load_dataset("data.txt")
q = load_queryset("query.txt")
exact_cardinality = get_exact_size(q, d)The resulting cardinality will be a singular Float64 value.
Optional parameters for the calculation can be changed such as the timeout, but more information can be found in the code documentation.
The load_true_cardinality function from src/Datasets.jl also obtains the cardinality results from a given cardinality file (where the only element in the file is a singular cardinality value):
exact_cardinality = load_true_cardinality("cardinality.txt")There are a variety of Julia scripts provided which perform different experiments then save figures presenting the results to the Experiments/Results/Figures folder. These scripts are stored in the Experiments/Scripts folder.
For example, to find how using different maximum cycle sizes in the summary affects the cardinality estimation, the max_cycle_size.jl script can be called from the main directory:
$ julia Experiments/Scripts/max_cycle_size.jl
The bash script run_submitted_experiments.sh in the Experiments folder is also included and will run the experiments described in the submitted paper. The script will run all included experiments then save all the corresponding figures. Any figures that are included in the paper have their file name match their figure number (i.e. the figure presenting the effect of inference sampling on relative error is named fig_10.png). The experiments included in this bash script are:
degree_variance_exps.jl(figure 2)comparison_exps.jl(figures 3, 4, 5, 6, 7, 8)coloring_strategies.jl(figure 9)construction_scaling.jl(figure 10)proportion_updated.jl(figure 11)max_cycle_size.jl(figure 12)query_path_width_build.jl(figure 13)max_inference_paths.jl(figure 14)
The script can be called from the main directory:
$ Experiments/run_submitted_experiments.sh