23 Dec 17:45

github-actions

7f919ca

v3.9.1 Latest

Latest

Release wheels for v3.9.1.

Assets 10

25 Nov 22:34

sa-faizal

v3.9

0a20375

Release v3.9

Wave v3.9 Release

1. New Ops and Kernel Features

1.1 Fine-grained Pipeline Control using wave.schedule Construct

Introduces a new wave.schedule language feature enabling explicit control of pipeline staging. Authors can group operators into stages and control pipelining behavior. A ScheduleRegionGraph and ScheduleContext were added to support schedule tracing. (#333)

1.2 Float Remainder Op (remf)

Adds remf to the Wave dialect, implementing floating-point remainder semantics matching the arith dialect. (#279)

1.3 Tensor Load Enhancements

Basic Tensor Load Op Integration - Adds initial tensor load support with a data mover integrated into single-wave GEMM. (#379)
Unaligned Shape Support - Tensor load now supports unaligned shapes via computed descriptor dims and refined local bounds logic; stride computation moved entirely to handlers. (#399)
Shared Memory Padding - Tensor load ops now support padded shared-memory allocations; padding is preserved or dropped based on backend capability. (#408)
Tensor Load Multicast - Introduces multicast optimization for tensor loads to share a single load across workgroups in a cluster. (#437)
Tensor Waitcnt - Adds support for tensor-level waitcnt logic required for correct tensor load behavior. (#383)

1.4 Multi-Wave Execution and Support

Support added for executing a single 16×16×16 MMA across multiple waves and workgroups, with lit and e2e tests. (MI350 supported; MI25x not). (#442)
Extends TDM (Tensor Data Movement) op support by using WaveConstraint to compute per-wave tile sizes. (#463)

2. Compiler & Backend Enhancements

2.1 New ASM Backend (Experimental)

Adds an experimental assembly backend lowering MLIR → AMD GCN ISA directly.
Includes instruction support, expression simplification, tests, and documentation.
Currently supports a copy example and is runnable only with the wave runtime (no VMFB). (#356)
Adds lowering for a 16×16×16 MMA into the ASM backend, staging lhs/rhs through shared memory. Includes lit and e2e tests. (#404)
Replaces hardcoded loop scheduling with latency-driven scheduling. Adds a dedicated ticketing class for vmcnt/lgkmcnt placement. (#428)
Documentation added for the ASM backend, its capabilities, and workflow. (#356)

2.2 Optimized Memory Waitcnt for Async BF16 PP GEMM

Adds memory_counter_wait op and integrates it to optimize waitcnt placement in async BF16 pipelined GEMM. (#436)

2.3 Ping-Pong GatherToLDS for F16 GEMM

Adds a GatherToLDS ping-pong pipeline implementation, fixes dot-slicing bugs, and cleans waitcnt emission after upstream LLVM fixes. (#431)

3. Runtime / Integration Improvements

3.1 Wave as a TorchDynamo Custom Backend

Wave kernels can now be selected via:

torch.compile(MyMode, backend="wave")

Currently replaces torch.mm with Wave GEMM kernels; others fall back to eager execution. (#396)

Change Log

Git History

What's Changed

Bump version to 3.8.0 by @sa-faizal in #362
import the second batch of the wave dialect commits by @ftynse in #338
[Runtime] t.dlpack fallback support for pytorch compatibility by @Megan0704-1 in #363
Update pytorch rocm requirements from 6.3 to 6.4 by @Megan0704-1 in #364
Add wave.schedule for more fine grained control by @harsh-nod in #333
Forward codegen info for read/write to wave attributes by @tyb0807 in #355
Fix typos in gather_to_shared by @ftynse in #365
Add asm backend for compiler by @harsh-nod in #356
Remove vector.splat by @tgymnich in #366
Switch emitter to use only upstream dialects by @Hardcode84 in #359
[Water] Fix Include Cycle by @tgymnich in #369
Backport: Replace deprecated op vector.splat with vector.broadcast (#66) by @tgymnich in #372
[Synchronization] split barriers support in add_shared_memory_barriers pass by @Megan0704-1 in #351
Fix documentation by @harsh-nod in #374
Combine WaveExprAttr and WaveExpressionAttr by @tgymnich in #373
fix duplicated wave prefix by @tgymnich in #376
[debugging] debug_log extra_iteration_dimensions by @willghatch in #375
Broader support for Water emission by @martin-luecke in #371
Water Diagnostic Serialization by @tgymnich in #378
improve lit compatability for sharktank_integration test by @willghatch in #381
Simplify hardware transpose index calculation by @harsh-nod in #382
Bump IREE requirement pins to their latest versions by @raikonenfnu in #385
Bump the github-actions group across 1 directory with 3 updates by @dependabot[bot] in #384
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #387
Tensor waticnt support by @Megan0704-1 in #383
[Wave] Add remf op by @Megan0704-1 in #279
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #392
Tensor load op support by @Megan0704-1 in #379
Fix gather-to-lds tail padding calculations by @Hardcode84 in #393
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #397
Use correct offset for base element of transposed-load operation by @ashay in #388
Unaligned shapes support with tensor load op by @Hardcode84 in #399
Standalone examples by @panditsa in #367
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #405
[Compiler][NFC] Annotate GEMM operation on schedule by @raikonenfnu in #400
Update wmma codegen for gfx1250 by @Megan0704-1 in #407
Move wave index attribute representation to DictArrayAttr by @martin-luecke in #386
Keep all the dims for scan-op by @panditsa in #368
Updated requirements such that wave-lang picks the right version of iree dependencies by @xintin in #395
bumped wave-lang to 3.8.1 by @sa-faizal in #413
Tensor load padding support by @Hardcode84 in #408
[NFC] xfail CI machine specific failure by @raikonenfnu in #412
Add sample MMA lowering for asm backend by @harsh-nod in #404
Add lowering for unary wave ops by @martin-luecke in #424
Rewrite more reads and writes with gather-to-lds operations by @ashay in #377
[water] Fix HardwareConstraints verifiers by @tgymnich in #421
Add Wave as a custom dynamo backend by @nithinsubbiah in #396
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #430
Add latency based scheduling for asm backend by @harsh-nod in #428
Manual shared-memory management in GEMM by @panditsa in #391
Add wave.mma MLIR lowering by @martin-luecke in #429
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #432
[water] add C-bindings for constraints by @tgymnich in #420
[water] remove wg_constraint parameter from wave_constraint by @tgymnich in #425
[water] handle device constraint by @tgymnich in #426
[Compiler] Add PP for GatherToLDS based F16 GEMM by @raikonenfnu in #431
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #433
[Compiler][Gemm] Optimize memory waitcnt for Async BF16 PP GEMM by @raikonenfnu in #436
[water] Lower wave constraints to MLIR by @tgymnich in #422
gitignore water artifacts by @Hardcode84 in #439
Add -Wno-macro-redefined to cmake by @ftynse in #447
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #438
GEMM two_pp_cluster scheduling by @panditsa in #435
wait tensorcnt support by @Megan0704-1 in #406
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in https://github.com/iree-org/wav...

Contributors

ashay, ftynse, and 14 other contributors

Assets 10

04 Nov 01:01

github-actions

v3.8.2

1cdf9e3

v3.8.2

Release wheels for v3.8.2.

Assets 18

04 Nov 00:53

github-actions

v3.8.1.post1

7b64818

v3.8.1.post1

Release wheels for v3.8.1.post1.

Assets 10

14 Oct 18:17

sa-faizal

v3.8.0

c5287c1

Release v3.8.0

Wave v3.8 Release

New Website – Launched a new website to help users learn more about Wave and explore its features: https://www.wave-lang.com/

New Ops and Kernels

TopkOp Implementation — Implements TopkOp, modeled after ReduceOp, using iterative reduction and masking (similar to sglang’s moe_topk_softmax kernel). #277
RoundOp Added — Introduced RoundOp for Wave kernels. #283
Implement Broadcasting for SelectOp — Generalized broadcast handling from binary ops to any-arity ops, used by Topk and similar kernels. #251
[WMMA] v_wmma_f32_16x16x16_f16 Type Support — Added RDNA MMA type for 16×16×16 mixed-precision matrix operations. #306
Binary Ops Lowering — Added lowering support for binary operations to expand backend compatibility. ftynse/water#36
Distributed GEMM Across Multi-GPU Devices — Enables distributed matrix multiplication with per-dimension device partitioning via DeviceConstraint. #302
Dynamic AtomicOp Indexing — Adds mapping_dynamic_vals for runtime-computed indices in atomic operations (used in MoE alignment). #269

Kernel Optimization

Linearize Shared Memory Accesses — Linearizes shared-memory reads/writes to improve register reuse and enable better common subexpression detection. #275
Scalarize Packed Math — Added option to scalarize packed addf/mulf near MFMA ops to avoid hardware-induced performance penalties. #274
Hardware Transpose Support (gfx950+) — Enables amdgpu.transpose_load for native hardware transpose on MI350+ GPUs. #285

Compiler Enhancements

Normal Forms Framework — Introduces WaveNormalFormAttr for enforcing IR invariants and managing pass pre-/post-conditions at fine granularity. ftynse/water#41, ftynse/water#57
Dataflow-Based Shape Inference — Adds forward/backward dataflow analysis for shape inference, ensuring convergence and conflict detection. ftynse/water#20
Hyperparameter and Index Mapping Attributes — Adds WaveHyperparameterAttr and WaveIndexMappingAttr for symbol–value mappings and affine index modeling. ftynse/water#23, ftynse/water#30, ftynse/water#42
Lowering Pipeline Setup + RegisterOp Lowering — Establishes type converter (wave.tensor → vector/memref) and first lowering pattern (RegisterOp). ftynse/water#28
Wave Dialect GEMM Representation — Adds high-level and lowered GEMM kernels using new MMA type variants. ftynse/water#27
Wave Dialect Initialization — Created base dialect structure with symbol attributes, tensor types, symbolic shapes, and minimal MMA op. ftynse/water#17
Wave Dialect MLIR Converter & Emitter Smoketest — Added converter and water emitter to translate Wave kernel traces into MLIR Wave dialect. #273
Partition Gathers/Scatters Pass — Moved gather/scatter decomposition into a standalone pass for cleaner read/write handlers. #259

Scheduling / Runtime Improvements

Standalone Multi-Device Runtime Wrapper — Introduces MultiDeviceLaunchable, a minimal API for executing IREE models across multiple GPUs. #222
Full Trace Preservation After Compilation — WaveKernel now stores full post-pass traces for accurate Wave dialect emission and debugging. #298

Documentation / Developer Experience

Normal Forms Documentation — Added detailed explanation of normal form concepts, invariants, and pass conditions. ftynse/water#57
Wave Dialect Python Bindings — Added C API, Python bindings (water_mlir.dialects.wave), and CI integration for build verification. ftynse/water#19
Debug Env Var for Location Control — Added environment variable to dynamically control debug location levels for easier inspection. #289

Change Log

Git History

What's Changed

Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #272
hip-iree-target parsing non-colon fix by @lamikr in #271
Remove some unused TK remains. by @Hardcode84 in #268
Bump the github-actions group with 3 updates by @dependabot[bot] in #276
Change ROCm7 [docker installation] with TheRock in mi35x CI runner by @sa-faizal in #263
Replace target-backends with target-device argument by @panditsa in #221
Scalarize packed math by @Hardcode84 in #274
[Wave] Linearize shared memory accesses by @raikonenfnu in #275
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #281
Multibuffer liveness analysis by @Hardcode84 in #250
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #286
add round op by @saladpalad in #283
Capture locations for Iterate and Conditional by @willghatch in #290
Add trace to WaveKernel even when only compile to MLIR by @tyb0807 in #294
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #293
[Dist.] Standalone multi-device runtime wrapper by @panditsa in #222
Store full trace after all compilation passes in WaveKernel by @tyb0807 in #298
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #299
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #303
Bump the github-actions group with 2 updates by @dependabot[bot] in #300
Add a dedicated pass to partition gathers/scatters by @Hardcode84 in #259
add debug env var to set default location level by @willghatch in #289
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #305
Integrate Water sources from a separate repository by @ftynse in #288
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #307
[CI] Add GitHub Actions workflow: self-hosted RDNA4 by @Megan0704-1 in #295
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #309
Bump actions/setup-python from 5.6.0 to 6.0.0 in the github-actions group by @dependabot[bot] in #310
implement broadcasting for SelectOp by @willghatch in #251
add locations to placeholder and output nodes based on kernel location by @willghatch in #291
Fix version mismatch after iree-bump by @Megan0704-1 in #319
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #321
Fixes group_size_m in the reordered gemm. by @xintin in #316
[debugging] propagate locations by @willghatch in #297
[debugging] add location propagation to replace_all_uses_with by @willghatch in #296
Remove migration notice from README by @harsh-nod in #317
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #324
add location_check_pass by @willghatch in #292
[debugging] fix minor nit in test case by @willghatch in #323
Add support for hardware transpose operation by @ashay in #285
[CI] Update installation of user space libraries for RDNA4-CI by @Megan0704-1 in #325
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #326
Added C-style modulo/remainder operator by @nirmie in #322
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #328
Use dynamic values for AtomicOps by @panditsa in #269
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #331
[WMMA] v_wmma_f32_16x16x16_f16 type support by @Megan0704-1 in #306
Bump IREE requirement ...

Contributors

ashay, ftynse, and 14 other contributors

Assets 10

04 Sep 23:28

sa-faizal

v3.7.0

6d01157

Wave Release v3.7.0

Highlights in this release

Starting with this release, Wave will adopt a new versioning scheme to align with Shark-AI and IREE release versions. This change is intended to improve cross-project compatibility and simplify dependency management across the ecosystem.

Previous Wave version: v1.0.2
New version format: v3.7.0 (aligned with Shark-AI/IREE)

This change does not affect functionality or backwards compatibility, but version numbers going forward will reflect the aligned release cadence.

New Operators and Kernels

Reciprocal of square root operator (#187)
Sinh operator (#112)
RMSNorm Kernel (#100)
Scatter add operation (#56)

Documentation

Buffer Loads, Stores, and L1 Cache Swizzling (https://wave-lang.readthedocs.io/en/latest/wave/buffer_and_swizzle.html)
Debugger use instructions (https://wave-lang.readthedocs.io/en/latest/wave/debugging.html)
Matrix Addition Example (https://github.com/iree-org/wave/blob/9accd0bc13384c3aecae4154fdd07d04a81069ff/examples/jupyter/matrix_addition.ipynb)
Convolution 2D docs (https://wave-lang.readthedocs.io/en/latest/wave/conv.html)
Thread trace documentation (https://wave-lang.readthedocs.io/en/latest/wave/trace.html)

Kernel Optimizations

Compatibility to handle tensors from hal sub-allocation and scaling in BroadcastOp (#220, #219)
Changes to Speculative decode kernel (#75, #72)

Compiler Enhancements

Added conditional barriers to attention schedule (#223)
Extract dimensions from the input tensors instead of passing the as arguments (#183)
Compilation time boost (#157)
Enable async kernel execution with iree runtime and fully switch to Launchable (#11)
Checks to validate constraints for workgroup and wavefronts (#134)
New API for IndexMapping (#141, #151)
Build aplp in setup.py (#103)
Introduced APIs to support distributed workloads, implementation is ongoing (#191, #245)

Hardware Bring-up

Introduced FP16 and BF16 CDNA4 (double-rate) MFMA types (#261)
Unaligned shapes support in gather_to_shared pass (#60)
Gather to lds swizzling (#149)
Add scaled_dim/scaled_gemm support for gather_to_lds (#80)

Scheduling Improvements

Add 4-stage prefetch + multi-buffering schedule for attention(#214)
Rotating pointers for multi-buffering (#207)
Add schedule reordering support for BMK,NK->BMN (#20)
SchedulingType.FOUR_STAGE: GEMM Full Software Pipelining with Initiation Interval 1, Via Multibuffering (#77)
Include XCD reordering to template and tweaked PingPong Schedule (#76)

Misc/General Updates

Speculative decode benchmarking (#136)
Support unaligned and unconstrained shapes in expansion (#155)
Improve barrier placement pass (#68)
Enable buffer load for dynamic cases (#79)

Integration

Wave is now integrated into SGLang as a separate attention backend (sgl-project/sglang#8660)
MXFP4 GEMM, extend attention kernel, and decode attention kernel integration into SHARK Tank (nod-ai/amd-shark-ai#1777, nod-ai/amd-shark-ai#2140, nod-ai/amd-shark-ai#1957)

Change Log

Git History

disable writing to cache when there are debug_log operations by @willghatch in #99
Rename Turbine to Wave by @tgymnich in #110
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #116
Build with uv in GitHub Actions by @paulzzy in #114
fix signature of debug_log_write by @willghatch in #118
Fix race condition in release job by @paulzzy in #117
[Wave] Pad shared memory when total size is not divisible GatherToLDS size by @Hardcode84 in #113
[Wave] Remove spammy warning by @Hardcode84 in #120
rename debug_log_write to debug_log by @willghatch in #121
Migrate MI300 Capacity to new MI325 Capacity. by @deedongala in #131
Restore build_tools/update_iree_requirement_pins.py by @paulzzy in #130
modifying prefetch scheduling to have a more appropriate schedule by @bodhisaha in #122
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #132
Updated usage instructions by @sa-faizal in #73
Fix logit_cap calculation in paged attention test by @nithinsubbiah in #135
Fix missing python-version warning by @tgymnich in #66
Add initial RMSNorm kernel by @adedespirlet in #100
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #137
[Wave] Add sinh op by @panditsa in #112
add mappings to debug_log by @willghatch in #128
[Wave] Cleanup elements_per_thread by @Hardcode84 in #138
Support unaligned and unconstrained shapes in expansion by @nithinsubbiah in #94
[Wave] Add pass profiling and optimize expand_graph pass. by @Hardcode84 in #139
pytorch 2.8 by @tgymnich in #12
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #145
Re-enable manylinux builds with cibuildwheel by @paulzzy in #123
Add devcontainer by @tgymnich in #146
[WAVE] Convolution 2D docs by @badgerbroch in #107
NFC move some reordering logic for debug logs by @willghatch in #119
Support for TK CI manual dispatch. by @xintin in #148
[Wave] Introduce new API for IndexMapping by @harsh-nod in #141
[Wave] Build aplp in setup.py by @harsh-nod in #103
[Wave] More tests cleanup by @Hardcode84 in #30
Revert "Support unaligned and unconstrained shapes in expansion" by @raikonenfnu in #154
[Wave] Added bool to float casting by @badgerbroch in #133
Support and build for Windows by @paulzzy in #115
[Wave] Optimize subs_idxc by @Hardcode84 in #144
Remove unused code and rename some variables by @harsh-nod in #158
Update version number to be consistent with pypi by @harsh-nod in #160
[Wave] More cleanups by @Hardcode84 in #159
Add clang-format to pre-commit by @Hardcode84 in #156
add basic checks to validate constraints for workgroup and wavefronts by @ashay in #134
SchedulingType.FOUR_STAGE: GEMM Full Software Pipelining with Initiation Interval 1, Via Multibuffering by @SourishW in #77
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #161
Bump pypa/cibuildwheel from 3.1.2 to 3.1.3 in the github-actions group by @dependabot[bot] in #162
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #164
[Wave] Fix tail-padded memref allocs when minimize_shared_allocs is disabled by @Hardcode84 in #150
[Wave] Gather to lds swizzling by @Hardcode84 in #149
[debugging] add printer and handler args to debug_log by @willghatch in #153
[debugging] implement an html generation debug_log viewer by @willghatch in #165
Install Rust when building manylinux wheels by @paulzzy in #167
[debugging] add dark theme to html_view...

Contributors

ashay, harsh-nod, and 18 other contributors

Assets 10

28 Jul 22:10

harsh-nod

v1.0.1

1d59c54

Release v1.0.1 Pre-release

Pre-release

What's Changed

Warn on leak instead of raising a runtime error by @tgymnich in #65
Disable large and slow shapes for testAttentionBackward by @tgymnich in #64
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #70
[Wave] Unaligned shapes support in gather_to_shared pass by @Hardcode84 in #60
[Wave] Update README with quickstart instructions by @harsh-nod in #62
[Wave] Make thread trace documentation visible in docs by @harsh-nod in #71
[Wave] fix tutorial ref to tkl by @willghatch in #55
[Wave] fix gitignore and lit test for gather_to_shared by @raikonenfnu in #78
added XCD reordering to template and tweaked PingPong Schedule by @bodhisaha in #76
[Wave] add scaled_dim/scaled_gemm support for gather_to_lds by @raikonenfnu in #80
[Wave] Enable buffer load for dynamic cases by @raikonenfnu in #79
[Wave] Improve barrier placement pass by @Hardcode84 in #68
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #81
[Wave] Remove torch from default requirement by @raikonenfnu in #82
[Wave] Enable > 4GB bufferOps using resetOffset by @raikonenfnu in #84
Add scatter_add operation by @adedespirlet in #56
[Wave] Fix prefetch scheduling for GatherToLDS by @Hardcode84 in #85
[Wave] add debug_log_write op by @willghatch in #74
[WAVE] Updated wave speculativde decode as per the latest flashinfer kernel updates by @xintin in #72
[WAVE] Merge two kernels into one in wave speculative decode by @xintin in #75
[Wave] Register CustomOp to wave_lang S.T Wave+Turbine can live together by @raikonenfnu in #86
Python 3.13 Support by @tgymnich in #89
NFC: Fix type annotations for return values in attention kernels by @ftynse in #90
Recover location info from MLIR after Water roundtrip by @ftynse in #83
MoE kernel by @tyb0807 in #13
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #98
Temporary fix for math library loading while using cache by @yichiche in #96
add constraints to the cache key by @ashay in #91
support dynamic shapes in debug_log_write by @willghatch in #102
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #108
Deprecate vector_d.splat by @tgymnich in #109
Bump minimum torch version to 2.6 by @paulzzy in #104
Remove unused file with unused def_library by @paulzzy in #111

New Contributors

@tgymnich made their first contribution in #65
@iree-pr-automator[bot] made their first contribution in #70
@bodhisaha made their first contribution in #76
@adedespirlet made their first contribution in #56
@ftynse made their first contribution in #90
@yichiche made their first contribution in #96
@paulzzy made their first contribution in #104

Full Changelog: v1.0.0-beta.1...v1.0.1

Contributors

ashay, ftynse, and 11 other contributors

Assets 6

22 Jul 05:08

harsh-nod

v1.0.0-beta.1

ff6b645

v1.0.0-beta.1 Pre-release

Pre-release

Version 1.0.0-beta.1

Assets 6

18 Jul 18:28

github-actions

dev-wheels

fb28c82

dev-wheels

Automatic nightly release of wave-lang python wheels.

Assets 10

Releases: iree-org/wave

v3.9.1

Uh oh!

Release v3.9

Wave v3.9 Release

1. New Ops and Kernel Features

1.1 Fine-grained Pipeline Control using wave.schedule Construct

1.2 Float Remainder Op (remf)

1.3 Tensor Load Enhancements

1.4 Multi-Wave Execution and Support

2. Compiler & Backend Enhancements

2.1 New ASM Backend (Experimental)

2.2 Optimized Memory Waitcnt for Async BF16 PP GEMM

2.3 Ping-Pong GatherToLDS for F16 GEMM

3. Runtime / Integration Improvements

3.1 Wave as a TorchDynamo Custom Backend

Change Log

What's Changed

Contributors

Uh oh!

v3.8.2

Uh oh!

v3.8.1.post1

Uh oh!

Release v3.8.0

Wave v3.8 Release

New Ops and Kernels

Kernel Optimization

Compiler Enhancements

Scheduling / Runtime Improvements

Documentation / Developer Experience

Change Log

What's Changed

Contributors

Uh oh!

Wave Release v3.7.0

Highlights in this release

New Operators and Kernels

Documentation

Kernel Optimizations

Compiler Enhancements

Hardware Bring-up

Scheduling Improvements

Misc/General Updates

Integration

Change Log

Contributors

Uh oh!

Release v1.0.1

What's Changed

New Contributors

Contributors

Uh oh!

v1.0.0-beta.1

Uh oh!

dev-wheels

Uh oh!