Currently is slower than Matrix QSPA although it performs slightly better (hahaha)  But yeah, optimize to get it as quick as you can. It's gonna be decently slow since there is a n^3 term in there but we shall have a look