Releases · ROCm/rocMLIR

26 Nov 04:47

rocm-ci

rocm-7.1.1

3d7e854

rocm-7.1.1 Latest

Latest

ROCm release v7.1.1

Assets 2

04 Nov 23:50

causten

rocm-7.1

3d7e854

rocm-7.1

What's Changed

Fix multi buffer test on gfx950 by @djramic in #1912
Update Docker image to 6.4.2 rocm version by @stefankoncarevic in #1916
Fix Docker image tag (rocm6.4 instead of rocm6.4.2) by @stefankoncarevic in #1917
Allow softmax type conversion to happen before or after elementwise ops in attention by @umangyadav in #1911
Remove GPUToMIGraphX passes by @umangyadav in #1921
Find first gemm index after fusing linalg.generic ops by @dhernandez0 in #1922
Obtain new Tier1 tuning problems from MIGraphX by @aarushjain29 in #1873
[CI] Improve error handling and validation in Jenkins pipeline, tuna-script and tuningRunner by @dorde-antic in #1913
[CI] Increases the number of lit workers based on the GPU arch. by @stefankoncarevic in #1919
tuna-script: validate tuning file for the presence of data by @dorde-antic in #1929
Fix llvm::SmallVector error on x86 MSVC by @sooknarine in #1932
Remove redundant attributes from Rock ops by @justinrosner in #1910
Update minCU count for MI308 by @umangyadav in #1927
Introduce new quick tune lists based on Tier1 configs and separated by architecture by @mirza-halilcevic in #1907
Fix wrong check in rocmlir-gen and other bugs in perfRunner by @dhernandez0 in #1936
Parameter Sweeps for Attention: Check all outputs, log failures, avoid kernel repeats by @dorde-antic in #1914
[EXTERNAL] Cherry pick fix for const folding of immediate args by @umangyadav in #1939
Use target branch for the premerge checks by @umangyadav in #1942
[DO NOT SQUASH] rock.global_load_to_lds for direct to LDS by @dhernandez0 in #1905
Align parameterSweeps with new layout handling in perfRunner by @dorde-antic in #1940
Update tests to excludes unsupported tests on Navi2x by @umangyadav in #1943
Change SeqLen to 384 from 1 in accuracy checker scripts by @umangyadav in #1948
Add regularization for multiple linalgs in preSoftmaxBody in Attention Ops by @umangyadav in #1950
Fix multi_buffer LIT test and correct lit.cfg files by @justinrosner in #1952
Fix incorrect fusion check by @justinrosner in #1956
Add backwards data convolution op to MIGraphX dialect by @justinrosner in #1946
Changed node selection by @leo-amd in #1881
Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops by @umangyadav in #1960
Fix rocmlir-gen device selection by @djramic in #1964
Refactor BlockwiseGemmAccelOp to take registers as well by @dhernandez0 in #1926
Upstream merge 56 by @dorde-antic in #1934
Jenkins: Robust SCM checkout (handles shallow fetch) + clearer stage layout by @leo-amd in #1963
Update mixr-gemm-gemm tests for unsupported arch by @justinrosner in #1968
Add ninja compile and link pools by @trixirt in #1953
CI. Do not reboot nodes on stages failures by @leo-amd in #1970
CI: Add Retry Logic for SCM Network Failures by @leo-amd in #1971
CI: Enable Fail-Fast for Parallel Pipeline Stages by @leo-amd in #1972
Use migraphx image in migraphx CI stage by @mirza-halilcevic in #1976
Attention: split-kv implementation by @dhernandez0 in #1895
Add/update getEffects for Rock ops by @justinrosner in #1959
Improve GPU results validation for subnorm values by @justinrosner in #1962
gemm+gemm split-k by @dhernandez0 in #1969
Allow split-k for bwd-weight ops by @justinrosner in #1955
CI: Improve resiliency by retrying stages on agent failure by @leo-amd in #1981
Remove irrelevant outdated examples by @umangyadav in #1985
Implement python script for handling new configs by @dorde-antic in #1924
Remove reverse_grid by @dhernandez0 in #1987
Add remove alloc pass by @justinrosner in #1992
Add TosaToRock support for transpose_conv2d by @justinrosner in #1951
Revert workaround for createFirstGemmNegInfPadding on gfx11 by @dhernandez0 in #1993
Use real data type after input fusions in attention using getInputFusionElementType by @pabloantoniom in #1982
Improve tuning-driver by @mirza-halilcevic in #1966
Update conv creation to use prefill flags by @justinrosner in #1949
Python tidy and formatter by @dorde-antic in #1978
Update Dockerfiles to rocm 7.0 by @djramic in #1991
Group Query Attention (GQA) optimization by @dhernandez0 in #1984
Fix recursion error in parameterSweeps by @justinrosner in #1995
September Upstream merge by @umangyadav in #1974
Add verifier for migraphx.reshape by @justinrosner in #1999
[EXTERNAL] Fix v_mov_b16_t16 index in folding pass by @justinrosner in #2011
Fix silent parameterSweeps errors and issues in V4R1 path by @justinrosner in #2009
Update MI350 quick-tune lists by @mirza-halilcevic in #2008
CI: Exclude f32 Attention Configs for Navi by @dorde-antic in #2003
Move CSE out of MIGraphXToTosaPass by @justinrosner in #2012
Add LIT test for gfx1201 backend bug by @justinrosner in #2018
[EXTERNAL] Undo changes in AMDGPUPromoteAlloca in order to unblock our CI by @pabloantoniom in #2028
[7.1][EXTERNAL][SROA] Add Stored Value Size Check for Tree-Structured Merge by @justinrosner in #2044

New Contributors

@sooknarine made their first contribution in #1932

Full Changelog: rocm-7.0.2...rocm-7.1

Contributors

sooknarine, trixirt, and 10 other contributors

Assets 2

14 Oct 23:59

causten

rocm-7.0.2

d0bcd5c

rocm-7.0.2

What's Changed

[7.0][UPSTREAM BACKPORT] Fix runtime unrolling when cascaded GEPs present by @justinrosner in #1996

Full Changelog: rocm-7.0.1...rocm-7.0.2

Contributors

justinrosner

Assets 2

19 Sep 17:29

causten

rocm-7.0.1

ac10652

rocm-7.0.1

What's Changed

Add E2E test for the OCP Fp8 fused kernel with QuantizeLinear and DeQuantizeLinear by @umangyadav in #1747
[TOSA] Set accType to Float16 for the Fp8 types by @umangyadav in #1745
Remove scheduling barrier hack for LDS barrier lowering by @dhernandez0 in #1749
Fixes for group conv emit-key by @dhernandez0 in #1748
Fix performance for non-standard layouts by @dhernandez0 in #1741
[6.4]Fix bug when both A and B are broadcasted (FoldBroadcast pass) by @dhernandez0 in #1744
[TOSA] Fix accType for the Quant Convolutions as well by @umangyadav in #1752
[6.4] Update gfx12 target in AmdArchDB by @TedThemistokleous in #1746
Add Fp8 to quick-tuning by @djramic in #1753
Add bf16 to tuning runner by @djramic in #1739
Enable output swizzle for multiple outputs by @dhernandez0 in #1750
Use AddDim for unit input dimensions to help getMaxVectorization() by @dhernandez0 in #1755
[DO NOT SQUASH] Enable atomic add bf16 reduction and split-k for Navi4x by @dhernandez0 in #1732
Enable bf16 atomic add for gfx950 by @dhernandez0 in #1734
Add test from SWDEV-518130 by @dhernandez0 in #1757
[6.4]fix compilation with HIP SDK 6.3 for Windows by @apwojcik in #1742
Add lookup for more layouts in PerfRunner and Add an option for verifying each perfConfig with tuningRunner by @umangyadav in #1758
Rocmlir tuning driver datatype fix by @dorde-antic in #1761
[CI] Added gfx942 architecture to the 'Tune MLIR kernels' stage by @stefankoncarevic in #1733
Fix dependency graph creation in RockPipeline and not generate loops with negative iterations by @umangyadav in #1760
Fix GlobalLoad 4b lowering by @dhernandez0 in #1764
Improve performance of quantizelinear for int4 by @dhernandez0 in #1706
Add fp8 convolution to the tuning runner by @djramic in #1738
Introduce perfConfig V3 with param to select different schedule by @umangyadav in #1767
Support for causal attention and more strict checks for KV-Cache by @dhernandez0 in #1770
Fix generateMlirDriverCommandLine for attention in perfRunner by @dhernandez0 in #1773
Remove hasValidChip() from ConvGenerator by @dorde-antic in #1771
Use MLIR based kernels for verification in MIGraphX stage by @umangyadav in #1766
[DO NOT SQUASH] March LLVM upstream merge by @dhernandez0 in #1763
Add requirements.txt file and modify Dockerfile by @dorde-antic in #1776
Add checks for uid and devices by @causten in #1777
Fix Dockerfile URL for requirements.txt by @stefankoncarevic in #1778
Adjust Dockerfile for Separate hip-python Installation by @stefankoncarevic in #1781
Skip unsupported datatypes in perfRunner by @djramic in #1780
Fix initialization for split-k by @dhernandez0 in #1784
Use hip-python API instead of rocm_agent_enumerator by @dorde-antic in #1762
Recover split-k fusion tests removed in last upstream merge by @dhernandez0 in #1785
Add hip-python to requirements.txt and update LLVM version by @dhernandez0 in #1787
Fix split-k fusion when there are two or more consecutive linalg.genericops by @dhernandez0 in #1782
Remove Machine Names Due to Security Team Advisory by @stefankoncarevic in #1788
Remove fp8 check on nightly CI. by @stefankoncarevic in #1789
[DO NOT SQUASH] upstream merge for sprint 48 by @dhernandez0 in #1786
Move requirements.txt -> pip_requirements.txt due to issues with cget by @dhernandez0 in #1792
Python script for testing metrics and plotting correlations by @dorde-antic in #1769
Fix attention bugs (swap thread and iter when Q LDS is bypassed and bf16 tests) by @dhernandez0 in #1797
Sort Dimensions based on Layout in case of input fusion by @umangyadav in #1793
Fix kernel generation when kernelRepeats are more than 1 by @umangyadav in #1799
Workaround issue 1802 by @dhernandez0 in #1800
Add Gemm+Elementwise+Gemm support by @dhernandez0 in #1774
Add dependencies for rocprofv3 by @djramic in #1801
Remove perfTest from Jenkins by @dhernandez0 in #1803
Add Tier1 model configs to rocMLIR by @dorde-antic in #1794
GEMM+GEMM migraphx integration by @dhernandez0 in #1791
Fix for issue 1802 workaround by @dhernandez0 in #1806
Update MI300 quick-tuning list by @mirza-halilcevic in #1765
gemm+gemm: extend allowed types by @dhernandez0 in #1795
Bump Dockerfiles to rocm-6.4 by @dorde-antic in #1808
Disable code coverage on nightly and weekly CI, and expand it to run on WMMA by @mirza-halilcevic in #1813
Fix grep ROCM_VERSION in Docker image build by @djramic in #1814
Add GEMM scheduleV2 by @umangyadav in #1772
Modify Tier1 models tuning problems by @dorde-antic in #1810
Prepare Jenkinsfiles for rocm-6.4 by @dorde-antic in #1809
Remove unused files by @dhernandez0 in #1804
Update AmdArchDb.cpp with gfx950 target info by @mirza-halilcevic in #1802
Add pybind11 to pip_requirements.txt by @mirza-halilcevic in #1816
Use migraphx.greater instead of migraphx.greater_or_equal by @dhernandez0 in #1827
Change rounding mode for FP32 to Fp16 truncation by @umangyadav in #1833
Implement with_attn_bias in AttentionConfiguration by @dorde-antic in #1834
Add rocprofv3 to perfRunner by @djramic in #1779
Fix rocm version in migraphx CI docker image by @djramic in #1837
Upstream merge sprint 50 by @djramic in #1815
[CI] Set 3600s test timeout and update LIT worker configuration by @stefankoncarevic in #1832
Remove hardcoded value for render group id in Dockerfile by @umangyadav in #1839
add back render group but do not assign GID by @umangyadav in #1843
Causal attention by @dhernandez0 in #1829
Correct rocprof invocation in fusion benchmarking path. by @stefankoncarevic in #1841
conv+gemm support by @dhernandez0 in #1820
Problem config for tier 1 models by @aarushjain29 in #1836
conv+gemm migraphx integration by @dhernandez0 in #1823
Separate new Tier1 tuning problems by @dorde-antic in #1849
Disable test temporarily to pass CI by @umangyadav in #1850
Implement GQA in AttentionConfiguration by @dorde-antic in #1847
Correct layout map access in MLIROnlyConfig by @stefankoncarevic in #1855
Add missing LDS barriers to attention by @dhernandez0 in #1853
Causal masking: migraphx integration by @dhernandez0 in #1831
Updated ATTN_TEST_PARAMETERS in reportUtils.py by @stefankoncarevic in #1858
[CLONE] Add CI node checks and retries. Refactored the pipeline to resolve compilation errors and address incorrect syntax by @umangyadav in #1835
Modify CI to use Tier1 and rotate through configs by @dorde-antic in #1840
Allow retries for failing tests / Remove failing tests by @dorde-antic in #1819
Print rocm version and permissions for /dev/dri /dev/kfd by @umangyadav in https://github.com/ROCm/rocM...

Contributors

causten, dhernandez0, and 9 other contributors

Assets 2

07 Aug 14:56

causten

rocm-6.4.3

88b9b7c

rocm-6.4.3

What's Changed

No changes since rocm-6.4.2

Assets 2

21 Jul 19:55

causten

rocm-6.4.2

88b9b7c

rocm-6.4.2

What's Changed

[6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
[6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
[HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
[BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
[BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825

Full Changelog: rocm-6.4.0...rocm-6.4.2

Contributors

umangyadav and mirza-halilcevic

Assets 2

20 May 15:53

causten

rocm-6.4.1

88b9b7c

rocm-6.4.1

What's Changed

[6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
[6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
[HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
[BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
[BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825

Full Changelog: rocm-6.4.0...rocm-6.4.1

Contributors

umangyadav and mirza-halilcevic

Assets 2

11 Apr 14:56

causten

rocm-6.4.0

25f6176

rocm-6.4.0

What's Changed

Fix crash with invalid !migraphx.shaped types by @krzysz00 in #1667
[CI] External CI mainline build support by @amd-jmacaran in #1670
Don't construct Embed{}s for 1x1 filters in convolutions by @krzysz00 in #1669
[CI] Update Dockerfile to use Ubuntu 22.04 by @stefankoncarevic in #1662
Lower minNumCUs for gfx11 as gfx1103 has 12 CUs only by @umangyadav in #1673
Fix alignment constraints not being correctly imposed in certain vect… by @krzysz00 in #1674
[CI] Update Dockerfile to set ONNX version to 1.14.1 by @stefankoncarevic in #1676
Fix not enabling fp8 WMMA on Navi4 by default by @krzysz00 in #1677
Use blockwise_broadcast_reduce in reduction fusions. by @manupak in #1668
Fix gated threadwise_write_all by @dhernandez0 in #1683
Fix int4 loads to be vector typed always by @manupak in #1682
Remove unnecessary pass from a test by @manupak in #1688
navi4x tests fail with mixed types bf8_fp8 by @dhernandez0 in #1684
[DO NOT SQUASH] Move to new-style atomic safety annotations by @krzysz00 in #1678
Fix hardcoded arguments and results ids for prefill by @dhernandez0 in #1687
find BlockArgument from gemm output going through all view-like operations by @dhernandez0 in #1690
Collected small Jenkinsfile improvements by @pcf000 in #1686
[CI] Add support for Navi4x architecture in nightly CI pipeline. by @stefankoncarevic in #1599
Fix conv1d bug and improve MIGraphXToTosa test coverage by @dhernandez0 in #1693
Support signed and unsigned integer types in migraphx dialect by @dhernandez0 in #1692
Fix conversion of quantizelinear for unsigned types by @dhernandez0 in #1694
Replace myself with Chris in CODEOWNERS by @jerryyin in #1698
Fix rocmlir-gen attention i8 verification bug by @dhernandez0 in #1697
Workaround for issue 1661 by @dhernandez0 in #1699
[CI] Refactor MIGraphX model testing with Jenkins credential access. by @stefankoncarevic in #1671
Remove Simon from CODEOWNERS by @dhernandez0 in #1702
Add GQA and KV Cache by @dhernandez0 in #1696
Prepare Dockerfiles for rocm-6.3 by @umangyadav in #1704
[CI] Updated Jenkins files to use rocm-6.3 by @stefankoncarevic in #1705
Move license file to top-level by @darren-amd in #1715
GridwiseGemmParams: fix compile error with LLVM libc++ due to missing const by @LunNova in #1708
Upstream merge Nov 24 by @djramic in #1703
Add a script for generating the quick-tuning perfconfigs list by @djramic in #1689
add fp8_fp8 in perfRunner by @umangyadav in #1707
Removed dummy target from LinalgNamedStructuredOps yamlgen by @stefankoncarevic in #1717
Set rock.prefill type to the blockargument type instead of gemm output type by @dhernandez0 in #1721
[CI] Revert multi-step execution in Navi3x nightly E2E tests by @stefankoncarevic in #1719
Split-k fusions by @dhernandez0 in #1718
Fuse reduce sum with split-k by @dhernandez0 in #1720
Fix usage of llvm::reverse() and remove warnings by @dhernandez0 in #1724
Allow OCP FP8 emulation by @umangyadav in #1716
Sort selected quick-tuning perfconfigs by problem coverage by @djramic in #1723
Enable f16 sum reduction by @dhernandez0 in #1722
[DRAFT]Add support for bf16 attention by @djramic in #1710
Remove kpack from the decision of how many elements to copy per thread by @dhernandez0 in #1714
Fix tuning for split-k fusions by @dhernandez0 in #1725
Support for gfx950 arch by @dhernandez0 in #1731
Upstream Merge [Jan] by @stefankoncarevic in #1728
Explicitly convert to char by @Xeonacid in #1709
Fix build failures by @umangyadav in #1735
Enable e2e fusion bf16 tests on gfx11. by @stefankoncarevic in #1736
KV-cache MIGraphX integration by @dhernandez0 in #1729
Enable dense_output_bf16 test, adjust build functions for Navi3x by @stefankoncarevic in #1737
[6.4][Backport] Backport some bugfixes by @dhernandez0 in #1754
[6.4][BACKPORT] [TOSA] Set accType to Float16 for the Fp8 types by @umangyadav in #1751
[6.4][BACKPORT] Use AddDim for unit input dimensions to help getMaxVectorization() by @umangyadav in #1756
[6.4]fix compilation with HIP SDK 6.3 for Windows (#1742) by @causten in #1759