Skip to content

Asm\x86\sort.asm - possible speedup #166

@necros2k7

Description

@necros2k7
  1. Hot loops (loop_down_1, opt_loop, main_loop_sort) execute millions of times; aligning each to a uop‑cache boundary ensures the front end always fetches them cleanly.

Z7_SORT_ASM_USE_SEGMENT equ 1
_TEXT$Z7_SORT SEGMENT ALIGN(64) 'CODE'

and at each inner loop label:

align 64

  1. Activate the NUM_PREFETCH_LEVELS logic:

NUM_PREFETCH_LEVELS equ 4 ; prefetch two 64‑byte cache lines

Place the prefetch macro in the critical loops (MOVE_SMALLEST_UP, main_loop_sort), switching from default byte ptr [p+offs+cur_offset] to prefetcht0 unconditionally. It hides the L1 miss penalty for large datasets.

  1. Some macros load into t0 only to store immediately back. Replace

LOAD t0, s
STORE t0, k

with a single mov [p+4k], [p+4s] when registers allow—cuts µops and memory ops in half.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions