[...Extending an offline discussion...]
Currently, the build requires numpy to generate the C interface. Since there''s not currently any arch deduction (mu-arch, cache sizes, etc) in the CMake build and all optimizations are handled by #pragma simd etc, it should be possible to distribute pregenerated C source source code to avoid the generation step for each release. I'm kind of envisioning something that resembles the release structure generated by Libint where the "compiler" (code generation) and exported libs are separate