Releases: ROCm/rocMLIR
Releases · ROCm/rocMLIR
rocm-7.1.1
ROCm release v7.1.1
rocm-7.1
What's Changed
- Fix multi buffer test on gfx950 by @djramic in #1912
- Update Docker image to 6.4.2 rocm version by @stefankoncarevic in #1916
- Fix Docker image tag (rocm6.4 instead of rocm6.4.2) by @stefankoncarevic in #1917
- Allow softmax type conversion to happen before or after elementwise ops in attention by @umangyadav in #1911
- Remove GPUToMIGraphX passes by @umangyadav in #1921
- Find first gemm index after fusing linalg.generic ops by @dhernandez0 in #1922
- Obtain new Tier1 tuning problems from MIGraphX by @aarushjain29 in #1873
- [CI] Improve error handling and validation in Jenkins pipeline, tuna-script and tuningRunner by @dorde-antic in #1913
- [CI] Increases the number of lit workers based on the GPU arch. by @stefankoncarevic in #1919
- tuna-script: validate tuning file for the presence of data by @dorde-antic in #1929
- Fix llvm::SmallVector error on x86 MSVC by @sooknarine in #1932
- Remove redundant attributes from Rock ops by @justinrosner in #1910
- Update minCU count for MI308 by @umangyadav in #1927
- Introduce new quick tune lists based on Tier1 configs and separated by architecture by @mirza-halilcevic in #1907
- Fix wrong check in rocmlir-gen and other bugs in perfRunner by @dhernandez0 in #1936
- Parameter Sweeps for Attention: Check all outputs, log failures, avoid kernel repeats by @dorde-antic in #1914
- [EXTERNAL] Cherry pick fix for const folding of immediate args by @umangyadav in #1939
- Use target branch for the premerge checks by @umangyadav in #1942
- [DO NOT SQUASH] rock.global_load_to_lds for direct to LDS by @dhernandez0 in #1905
- Align parameterSweeps with new layout handling in perfRunner by @dorde-antic in #1940
- Update tests to excludes unsupported tests on Navi2x by @umangyadav in #1943
- Change SeqLen to 384 from 1 in accuracy checker scripts by @umangyadav in #1948
- Add regularization for multiple linalgs in preSoftmaxBody in Attention Ops by @umangyadav in #1950
- Fix multi_buffer LIT test and correct lit.cfg files by @justinrosner in #1952
- Fix incorrect fusion check by @justinrosner in #1956
- Add backwards data convolution op to MIGraphX dialect by @justinrosner in #1946
- Changed node selection by @leo-amd in #1881
- Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops by @umangyadav in #1960
- Fix rocmlir-gen device selection by @djramic in #1964
- Refactor BlockwiseGemmAccelOp to take registers as well by @dhernandez0 in #1926
- Upstream merge 56 by @dorde-antic in #1934
- Jenkins: Robust SCM checkout (handles shallow fetch) + clearer stage layout by @leo-amd in #1963
- Update mixr-gemm-gemm tests for unsupported arch by @justinrosner in #1968
- Add ninja compile and link pools by @trixirt in #1953
- CI. Do not reboot nodes on stages failures by @leo-amd in #1970
- CI: Add Retry Logic for SCM Network Failures by @leo-amd in #1971
- CI: Enable Fail-Fast for Parallel Pipeline Stages by @leo-amd in #1972
- Use migraphx image in migraphx CI stage by @mirza-halilcevic in #1976
- Attention: split-kv implementation by @dhernandez0 in #1895
- Add/update
getEffectsfor Rock ops by @justinrosner in #1959 - Improve GPU results validation for subnorm values by @justinrosner in #1962
- gemm+gemm split-k by @dhernandez0 in #1969
- Allow split-k for bwd-weight ops by @justinrosner in #1955
- CI: Improve resiliency by retrying stages on agent failure by @leo-amd in #1981
- Remove irrelevant outdated examples by @umangyadav in #1985
- Implement python script for handling new configs by @dorde-antic in #1924
- Remove reverse_grid by @dhernandez0 in #1987
- Add remove alloc pass by @justinrosner in #1992
- Add TosaToRock support for transpose_conv2d by @justinrosner in #1951
- Revert workaround for createFirstGemmNegInfPadding on gfx11 by @dhernandez0 in #1993
- Use real data type after input fusions in attention using
getInputFusionElementTypeby @pabloantoniom in #1982 - Improve tuning-driver by @mirza-halilcevic in #1966
- Update conv creation to use prefill flags by @justinrosner in #1949
- Python tidy and formatter by @dorde-antic in #1978
- Update Dockerfiles to rocm 7.0 by @djramic in #1991
- Group Query Attention (GQA) optimization by @dhernandez0 in #1984
- Fix recursion error in parameterSweeps by @justinrosner in #1995
- September Upstream merge by @umangyadav in #1974
- Add verifier for migraphx.reshape by @justinrosner in #1999
- [EXTERNAL] Fix v_mov_b16_t16 index in folding pass by @justinrosner in #2011
- Fix silent parameterSweeps errors and issues in V4R1 path by @justinrosner in #2009
- Update MI350 quick-tune lists by @mirza-halilcevic in #2008
- CI: Exclude f32 Attention Configs for Navi by @dorde-antic in #2003
- Move CSE out of MIGraphXToTosaPass by @justinrosner in #2012
- Add LIT test for gfx1201 backend bug by @justinrosner in #2018
- [EXTERNAL] Undo changes in AMDGPUPromoteAlloca in order to unblock our CI by @pabloantoniom in #2028
- [7.1][EXTERNAL][SROA] Add Stored Value Size Check for Tree-Structured Merge by @justinrosner in #2044
New Contributors
- @sooknarine made their first contribution in #1932
Full Changelog: rocm-7.0.2...rocm-7.1
rocm-7.0.2
What's Changed
- [7.0][UPSTREAM BACKPORT] Fix runtime unrolling when cascaded GEPs present by @justinrosner in #1996
Full Changelog: rocm-7.0.1...rocm-7.0.2
rocm-7.0.1
What's Changed
- Add E2E test for the OCP Fp8 fused kernel with QuantizeLinear and DeQuantizeLinear by @umangyadav in #1747
- [TOSA] Set
accTypeto Float16 for the Fp8 types by @umangyadav in #1745 - Remove scheduling barrier hack for LDS barrier lowering by @dhernandez0 in #1749
- Fixes for group conv emit-key by @dhernandez0 in #1748
- Fix performance for non-standard layouts by @dhernandez0 in #1741
- [6.4]Fix bug when both A and B are broadcasted (FoldBroadcast pass) by @dhernandez0 in #1744
- [TOSA] Fix accType for the Quant Convolutions as well by @umangyadav in #1752
- [6.4] Update gfx12 target in AmdArchDB by @TedThemistokleous in #1746
- Add Fp8 to quick-tuning by @djramic in #1753
- Add bf16 to tuning runner by @djramic in #1739
- Enable output swizzle for multiple outputs by @dhernandez0 in #1750
- Use AddDim for unit input dimensions to help getMaxVectorization() by @dhernandez0 in #1755
- [DO NOT SQUASH] Enable atomic add bf16 reduction and split-k for Navi4x by @dhernandez0 in #1732
- Enable bf16 atomic add for gfx950 by @dhernandez0 in #1734
- Add test from SWDEV-518130 by @dhernandez0 in #1757
- [6.4]fix compilation with HIP SDK 6.3 for Windows by @apwojcik in #1742
- Add lookup for more layouts in PerfRunner and Add an option for verifying each perfConfig with tuningRunner by @umangyadav in #1758
- Rocmlir tuning driver datatype fix by @dorde-antic in #1761
- [CI] Added gfx942 architecture to the 'Tune MLIR kernels' stage by @stefankoncarevic in #1733
- Fix dependency graph creation in RockPipeline and not generate loops with negative iterations by @umangyadav in #1760
- Fix GlobalLoad 4b lowering by @dhernandez0 in #1764
- Improve performance of quantizelinear for int4 by @dhernandez0 in #1706
- Add fp8 convolution to the tuning runner by @djramic in #1738
- Introduce perfConfig V3 with param to select different schedule by @umangyadav in #1767
- Support for causal attention and more strict checks for KV-Cache by @dhernandez0 in #1770
- Fix generateMlirDriverCommandLine for attention in perfRunner by @dhernandez0 in #1773
- Remove hasValidChip() from ConvGenerator by @dorde-antic in #1771
- Use MLIR based kernels for verification in MIGraphX stage by @umangyadav in #1766
- [DO NOT SQUASH] March LLVM upstream merge by @dhernandez0 in #1763
- Add requirements.txt file and modify Dockerfile by @dorde-antic in #1776
- Add checks for uid and devices by @causten in #1777
- Fix Dockerfile URL for requirements.txt by @stefankoncarevic in #1778
- Adjust Dockerfile for Separate hip-python Installation by @stefankoncarevic in #1781
- Skip unsupported datatypes in perfRunner by @djramic in #1780
- Fix initialization for split-k by @dhernandez0 in #1784
- Use hip-python API instead of rocm_agent_enumerator by @dorde-antic in #1762
- Recover split-k fusion tests removed in last upstream merge by @dhernandez0 in #1785
- Add hip-python to requirements.txt and update LLVM version by @dhernandez0 in #1787
- Fix split-k fusion when there are two or more consecutive linalg.genericops by @dhernandez0 in #1782
- Remove Machine Names Due to Security Team Advisory by @stefankoncarevic in #1788
- Remove fp8 check on nightly CI. by @stefankoncarevic in #1789
- [DO NOT SQUASH] upstream merge for sprint 48 by @dhernandez0 in #1786
- Move requirements.txt -> pip_requirements.txt due to issues with cget by @dhernandez0 in #1792
- Python script for testing metrics and plotting correlations by @dorde-antic in #1769
- Fix attention bugs (swap thread and iter when Q LDS is bypassed and bf16 tests) by @dhernandez0 in #1797
- Sort Dimensions based on Layout in case of input fusion by @umangyadav in #1793
- Fix kernel generation when kernelRepeats are more than 1 by @umangyadav in #1799
- Workaround issue 1802 by @dhernandez0 in #1800
- Add Gemm+Elementwise+Gemm support by @dhernandez0 in #1774
- Add dependencies for rocprofv3 by @djramic in #1801
- Remove perfTest from Jenkins by @dhernandez0 in #1803
- Add Tier1 model configs to rocMLIR by @dorde-antic in #1794
- GEMM+GEMM migraphx integration by @dhernandez0 in #1791
- Fix for issue 1802 workaround by @dhernandez0 in #1806
- Update MI300 quick-tuning list by @mirza-halilcevic in #1765
- gemm+gemm: extend allowed types by @dhernandez0 in #1795
- Bump Dockerfiles to rocm-6.4 by @dorde-antic in #1808
- Disable code coverage on nightly and weekly CI, and expand it to run on WMMA by @mirza-halilcevic in #1813
- Fix grep ROCM_VERSION in Docker image build by @djramic in #1814
- Add GEMM scheduleV2 by @umangyadav in #1772
- Modify Tier1 models tuning problems by @dorde-antic in #1810
- Prepare Jenkinsfiles for rocm-6.4 by @dorde-antic in #1809
- Remove unused files by @dhernandez0 in #1804
- Update AmdArchDb.cpp with gfx950 target info by @mirza-halilcevic in #1802
- Add pybind11 to pip_requirements.txt by @mirza-halilcevic in #1816
- Use migraphx.greater instead of migraphx.greater_or_equal by @dhernandez0 in #1827
- Change rounding mode for FP32 to Fp16 truncation by @umangyadav in #1833
- Implement with_attn_bias in AttentionConfiguration by @dorde-antic in #1834
- Add rocprofv3 to perfRunner by @djramic in #1779
- Fix rocm version in migraphx CI docker image by @djramic in #1837
- Upstream merge sprint 50 by @djramic in #1815
- [CI] Set 3600s test timeout and update LIT worker configuration by @stefankoncarevic in #1832
- Remove hardcoded value for render group id in Dockerfile by @umangyadav in #1839
- add back render group but do not assign GID by @umangyadav in #1843
- Causal attention by @dhernandez0 in #1829
- Correct rocprof invocation in fusion benchmarking path. by @stefankoncarevic in #1841
- conv+gemm support by @dhernandez0 in #1820
- Problem config for tier 1 models by @aarushjain29 in #1836
- conv+gemm migraphx integration by @dhernandez0 in #1823
- Separate new Tier1 tuning problems by @dorde-antic in #1849
- Disable test temporarily to pass CI by @umangyadav in #1850
- Implement GQA in AttentionConfiguration by @dorde-antic in #1847
- Correct layout map access in MLIROnlyConfig by @stefankoncarevic in #1855
- Add missing LDS barriers to attention by @dhernandez0 in #1853
- Causal masking: migraphx integration by @dhernandez0 in #1831
- Updated ATTN_TEST_PARAMETERS in reportUtils.py by @stefankoncarevic in #1858
- [CLONE] Add CI node checks and retries. Refactored the pipeline to resolve compilation errors and address incorrect syntax by @umangyadav in #1835
- Modify CI to use Tier1 and rotate through configs by @dorde-antic in #1840
- Allow retries for failing tests / Remove failing tests by @dorde-antic in #1819
- Print rocm version and permissions for
/dev/dri/dev/kfdby @umangyadav in https://github.com/ROCm/rocM...
rocm-6.4.3
What's Changed
- No changes since rocm-6.4.2
rocm-6.4.2
What's Changed
- [6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
- [6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
- [HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
- [BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
- [BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825
Full Changelog: rocm-6.4.0...rocm-6.4.2
rocm-6.4.1
What's Changed
- [6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
- [6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
- [HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
- [BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
- [BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825
Full Changelog: rocm-6.4.0...rocm-6.4.1
rocm-6.4.0
What's Changed
- Fix crash with invalid !migraphx.shaped types by @krzysz00 in #1667
- [CI] External CI mainline build support by @amd-jmacaran in #1670
- Don't construct Embed{}s for 1x1 filters in convolutions by @krzysz00 in #1669
- [CI] Update Dockerfile to use Ubuntu 22.04 by @stefankoncarevic in #1662
- Lower minNumCUs for gfx11 as gfx1103 has 12 CUs only by @umangyadav in #1673
- Fix alignment constraints not being correctly imposed in certain vect… by @krzysz00 in #1674
- [CI] Update Dockerfile to set ONNX version to 1.14.1 by @stefankoncarevic in #1676
- Fix not enabling fp8 WMMA on Navi4 by default by @krzysz00 in #1677
- Use
blockwise_broadcast_reducein reduction fusions. by @manupak in #1668 - Fix gated threadwise_write_all by @dhernandez0 in #1683
- Fix int4 loads to be vector typed always by @manupak in #1682
- Remove unnecessary pass from a test by @manupak in #1688
- navi4x tests fail with mixed types bf8_fp8 by @dhernandez0 in #1684
- [DO NOT SQUASH] Move to new-style atomic safety annotations by @krzysz00 in #1678
- Fix hardcoded arguments and results ids for prefill by @dhernandez0 in #1687
- find BlockArgument from gemm output going through all view-like operations by @dhernandez0 in #1690
- Collected small Jenkinsfile improvements by @pcf000 in #1686
- [CI] Add support for Navi4x architecture in nightly CI pipeline. by @stefankoncarevic in #1599
- Fix conv1d bug and improve MIGraphXToTosa test coverage by @dhernandez0 in #1693
- Support signed and unsigned integer types in migraphx dialect by @dhernandez0 in #1692
- Fix conversion of quantizelinear for unsigned types by @dhernandez0 in #1694
- Replace myself with Chris in CODEOWNERS by @jerryyin in #1698
- Fix rocmlir-gen attention i8 verification bug by @dhernandez0 in #1697
- Workaround for issue 1661 by @dhernandez0 in #1699
- [CI] Refactor MIGraphX model testing with Jenkins credential access. by @stefankoncarevic in #1671
- Remove Simon from CODEOWNERS by @dhernandez0 in #1702
- Add GQA and KV Cache by @dhernandez0 in #1696
- Prepare Dockerfiles for rocm-6.3 by @umangyadav in #1704
- [CI] Updated Jenkins files to use rocm-6.3 by @stefankoncarevic in #1705
- Move license file to top-level by @darren-amd in #1715
- GridwiseGemmParams: fix compile error with LLVM libc++ due to missing const by @LunNova in #1708
- Upstream merge Nov 24 by @djramic in #1703
- Add a script for generating the quick-tuning perfconfigs list by @djramic in #1689
- add fp8_fp8 in perfRunner by @umangyadav in #1707
- Removed dummy target from LinalgNamedStructuredOps yamlgen by @stefankoncarevic in #1717
- Set rock.prefill type to the blockargument type instead of gemm output type by @dhernandez0 in #1721
- [CI] Revert multi-step execution in Navi3x nightly E2E tests by @stefankoncarevic in #1719
- Split-k fusions by @dhernandez0 in #1718
- Fuse reduce sum with split-k by @dhernandez0 in #1720
- Fix usage of llvm::reverse() and remove warnings by @dhernandez0 in #1724
- Allow OCP FP8 emulation by @umangyadav in #1716
- Sort selected quick-tuning perfconfigs by problem coverage by @djramic in #1723
- Enable f16 sum reduction by @dhernandez0 in #1722
- [DRAFT]Add support for bf16 attention by @djramic in #1710
- Remove kpack from the decision of how many elements to copy per thread by @dhernandez0 in #1714
- Fix tuning for split-k fusions by @dhernandez0 in #1725
- Support for gfx950 arch by @dhernandez0 in #1731
- Upstream Merge [Jan] by @stefankoncarevic in #1728
- Explicitly convert to char by @Xeonacid in #1709
- Fix build failures by @umangyadav in #1735
- Enable e2e fusion bf16 tests on gfx11. by @stefankoncarevic in #1736
- KV-cache MIGraphX integration by @dhernandez0 in #1729
- Enable dense_output_bf16 test, adjust build functions for Navi3x by @stefankoncarevic in #1737
- [6.4][Backport] Backport some bugfixes by @dhernandez0 in #1754
- [6.4][BACKPORT] [TOSA] Set accType to Float16 for the Fp8 types by @umangyadav in #1751
- [6.4][BACKPORT] Use AddDim for unit input dimensions to help getMaxVectorization() by @umangyadav in #1756
- [6.4]fix compilation with HIP SDK 6.3 for Windows (#1742) by @causten in #1759
New Contributors
- @darren-amd made their first contribution in #1715
- @LunNova made their first contribution in #1708
- @Xeonacid made their first contribution in #1709
Full Changelog: rocm-6.3.3...rocm-6.4.0
rocm-6.3.3
What's Changed
- Workaround for issue 1661 by @dhernandez0 in #1701
Full Changelog: rocm-6.3.0...rocm-6.3.3
rocm-6.3.2
What's Changed
- Workaround for issue 1661 by @dhernandez0 in #1701
Full Changelog: rocm-6.3.0...rocm-6.3.2