Hey thanks for building this framework it is exactly what I need for my project but I was wondering, whether there is a particular reason for why flash attention 2 and rotary positional embeddings were discarded from the standard Llama implementation?