This folder contains a binary for running DPF microbenchmarks for a two-party sparse histogram aggregation. See below for usage and options. In the following, we report the results on a fixed set of synthetic input files.
We fix the number of non-zero points in the entire histogram to 220, and choose three sets of non-zeros using the following three distributions:
We then evaluate in two evaluation modes:
All entries in the tables below record the time needed to expand a single DPF key at one of the two servers in the given setting. The evaluation is single-threaded and runs on an Intel(R) Xeon(R) CPU @ 2.30GHz.
The size of the values being aggregated is fixed to 32 bits.
Prefix bit lengths to evaluate for hierarchical evaluation: 21,23,25,27,29,31,32
Levels to evaluate for hierarchical evaluation: 21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111,113,115,117,119,121,123,125,127,128
Usage:
bazel run --cxxopt=-std=c++17 -c opt --dynamic_mode=off experiments:synthetic_data_benchmarks -- [options]
Options:
--input (CSV file containing non-zero buckets in the first column.); default: ""; --levels_to_evaluate (List of integers specifying the log domain sizes at which to insert hierarchy levels.); default: ; --log_domain_size (Logarithm of the domain size. All non-zeros in `input` must be in [0, 2^log_domain_size).); default: 20; --max_expansion_factor (Limits the maximum number of elements the expansion at any hierarchy level can have to a multiple of the number of unique buckets in the input file. Must be at least 2.); default: 2; --num_iterations (Number of iterations to benchmark.); default: 20; --only_nonzeros (Only evaluates at the nonzero indices of the input file passed via --input, instead of performing hierarchical evaluation. If true, all flags related to hierarchy levels will be ignored); default: false;