The folder /examples/fast_exact_attention_kernel/ contains custom implementation of CUDA C kernel search.
The folder /examples/matrix_low_rank_decomposition/ contains custom implementation of 3D tensor decompositionm, which is equivalent to finding Strassen-like algorithms.