-
Notifications
You must be signed in to change notification settings - Fork 225
Open
Description
- Hot loops (loop_down_1, opt_loop, main_loop_sort) execute millions of times; aligning each to a uop‑cache boundary ensures the front end always fetches them cleanly.
Z7_SORT_ASM_USE_SEGMENT equ 1
_TEXT$Z7_SORT SEGMENT ALIGN(64) 'CODE'
and at each inner loop label:
align 64
- Activate the NUM_PREFETCH_LEVELS logic:
NUM_PREFETCH_LEVELS equ 4 ; prefetch two 64‑byte cache lines
Place the prefetch macro in the critical loops (MOVE_SMALLEST_UP, main_loop_sort), switching from default byte ptr [p+offs+cur_offset] to prefetcht0 unconditionally. It hides the L1 miss penalty for large datasets.
- Some macros load into t0 only to store immediately back. Replace
LOAD t0, s
STORE t0, k
with a single mov [p+4k], [p+4s] when registers allow—cuts µops and memory ops in half.
necros2k7
Metadata
Metadata
Assignees
Labels
No labels