This project is the result of a club competition to develop the fastest possible implementation of the ID3 decision tree algorithm. The goal was to optimize both the training (tree building) and inference (prediction) phases.
- Optimized ID3 algorithm implementation in C++.
- Uses Google Benchmark for performance measurement.
- Includes test cases using Google Test.
- Uses
csv.hfor efficient CSV data loading.
This project uses Conan for dependency management and CMake for building.
- Install Conan: Follow the instructions on the Conan website.
- Install Dependencies:
conan install . --build=missing - Build Project:
This will create a
make build
builddirectory and compile the project.
To ensure the decision tree implementation is correct, run the test suite:
make testPerformance benchmarks are included using Google Benchmark. To run them:
make benchmarkThe primary goal of this project was speed. Below are the benchmark results on the Car Evaluation dataset. For more detailed performance metric, see perf.md
| Benchmark | Our Time | SciPy time | Speedup |
|---|---|---|---|
| BM_BuildTree | 134632 ns | 944930 ns | 7x |
| BM_TreePredict | 1091 ns | 330020 ns | 302x |
These results clearly show that this tree significantly outperforms the standard SciPy implementation. While the comparison was attempted to be done as fairly as possible, it is important to note that the SciPy implementation has many more features that are almost certainly not free to include, which likely explains the dramatic difference in inference time in particular.