Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
5d7ae23
add no_std support
jturner314 Sep 19, 2019
97921e9
Merge pull request #51 from vadixidav/no_std
bluss Dec 7, 2020
6243d28
MAINT: Silence unused items warnings (these fire on non-x86)
bluss Dec 7, 2020
79b57a3
0.2.4
bluss Dec 7, 2020
48fdf21
TEST: Add github actions to replace travis
bluss Dec 20, 2020
11ec355
Merge pull request #53 from bluss/gh-actions
bluss Dec 28, 2020
a713a6b
DOC: Remove travis badge in readme
bluss Dec 28, 2020
b81d267
TEST: Add benchmark runner as example binary
bluss Dec 26, 2020
fc30d9d
TEST: Add csv output format to benchmark program and fixup error hand…
bluss Dec 27, 2020
e926f0a
TEST: Skip testing examples on MSRV
bluss Dec 28, 2020
319e49e
Merge pull request #54 from bluss/benchmark
bluss Dec 28, 2020
a3fd081
API: Update to Rust 2018 edition
bluss Dec 19, 2020
cb0ca4b
FIX: Use Ptr wrappers for raw pointers (mark safe to pass across thre…
bluss Dec 19, 2020
eb5582b
Add function that splits a range chunk in parts
bluss Dec 19, 2020
01e8ba2
FEAT: Add threading feature using a hierarchical thread pool
bluss Dec 19, 2020
860ec38
FEAT: Suport nthreads 2, 3, and 4 in parallel loops
bluss Dec 26, 2020
8b39aae
FEAT: Add method for num_pack_a
bluss Dec 27, 2020
a761cfc
TEST: Test threading feature
bluss Dec 28, 2020
e2040fc
TEST: Test from 1.42
bluss Dec 28, 2020
72d036f
FIX: Let the "thread_local" function be FnOnce when threading is disa…
bluss Dec 28, 2020
55ffa7f
FIX: Only use thread local if have std
bluss Dec 28, 2020
a0343ff
TEST: Cleanup in gh actions file
bluss Dec 29, 2020
6b3158c
FIX: Add heuristic to avoid using threads for small matrices
bluss Dec 29, 2020
9879d9e
MAINT: Disable warning for unused macro
bluss Jan 1, 2021
e941ba3
MAINT: Enable debug info in release/bench mode
bluss Jan 1, 2021
04264b0
DOC: Update crate docs for the threading feature
bluss Jan 1, 2021
33951b5
FIX: Put threadpool and nthreads into one combined Lazy
bluss Jan 1, 2021
2ddd0ba
FIX: Use an UnsafeCell for the kernel mask buffer and align it with repr
bluss Jan 1, 2021
5f9b4cd
FIX: Split LoopThreadConfig::new into one non-generic part
bluss Jan 2, 2021
9c58f3c
FEAT: Use num_cpus::get_physical as the fallback thread count
bluss Jan 4, 2021
612781d
MAINT: Set MSRV to Rust 1.41.1 and update Rust version policy
bluss Jan 5, 2021
5e4f356
TEST: Fix cache hit detection for cross test and use opt-level 2
bluss Jan 5, 2021
4104d26
TEST: Add test for benchmark example
bluss Jan 5, 2021
2dfe4f0
FIX: Move range chunk parallel code into threading
bluss Jan 5, 2021
f10f3ae
Merge pull request #52 from bluss/threading
bluss Jan 5, 2021
80c5e2c
FIX: Fix overflow in benchmark gflop computation (32-bit usize)
bluss Jan 5, 2021
a507ff5
FIX: Explicitly limit thread count to supported interval
bluss Jan 7, 2021
b7f46bc
DOC: Release note for 0.3.0 with threading
bluss Jan 7, 2021
bad7c38
0.3.0
bluss Jan 8, 2021
9e4a11f
FIX: Use &[T], not &T for the mask buffer
bluss Feb 7, 2021
d5c994e
FIX: Align mask buffer pointer manually
bluss Apr 8, 2021
77dd2b1
Merge pull request #56 from bluss/align-manually
bluss Apr 8, 2021
d0e1c54
0.3.1
bluss Apr 8, 2021
5d20d85
FIX: Kernel size in assertion
bluss Apr 9, 2021
7b1979a
kernel: Use pub(crate)
bluss Nov 7, 2021
1f6a175
FIX: Typo in `cfg(feature)` in tests
bluss Nov 8, 2021
8b75092
threading: Tweak the threading factor
bluss Nov 7, 2021
5907bf0
kernel: set archparam values as defaults
bluss Nov 9, 2021
ab05f63
complex: Add support for complex
bluss Nov 7, 2021
55888a2
complex: Update CI to use cgemm feature
bluss Nov 8, 2021
64f909f
complex: compute cgemm, zgemm in real parts
bluss Nov 9, 2021
e04e4ba
complex: Compile fallback kernels using fma too
bluss Nov 9, 2021
7535de1
complex: Document crate feature complex
bluss Nov 9, 2021
5fe43c8
complex: Use a different flop factor for complex
bluss Nov 10, 2021
6a813a5
complex: Print nicer type name for complex
bluss Nov 10, 2021
f1b04ea
benchmark: Make a better argument parser
bluss Nov 10, 2021
d904257
benchmark: Allow passing --extra-column for an extra column in csv
bluss Nov 10, 2021
2e8abde
complex: Setup archparams for cgemm/zgemm
bluss Nov 10, 2021
8492b13
tests: Combine repeated generic code in tests and benchmark
bluss Nov 10, 2021
5ce52bd
test: Move common test_a_kernel function into kernel
bluss Nov 10, 2021
d5b13c8
bench: Explain benchmarks in docs
bluss Nov 11, 2021
60fc628
test: Use complex scalars to test alpha/beta
bluss Nov 11, 2021
9ce5e23
test: Use both A I == A and I B == B in test_a_kernel
bluss Nov 11, 2021
ab4a538
Merge pull request #58 from bluss/complex
bluss Nov 13, 2021
efe70f2
Allow tweaking size parameters at compile time
bluss Nov 10, 2021
de0075d
test: Add benchmarking script
bluss Nov 13, 2021
805221d
test: Run miri
bluss Nov 13, 2021
6dc6a76
Fix crates.io badge
atouchet Nov 14, 2021
c9447f3
constconf: Fix usize parsing on 32-bit arch
bluss Nov 14, 2021
58623fb
test: Run the benchmark loop script in ci
bluss Nov 14, 2021
aa0ce95
test: Factor out common matrix compare
bluss Nov 15, 2021
510b9dc
constconf: Add assertions for MC, KC, NC parameters
bluss Nov 16, 2021
ecb8630
benchmark: silence other output in csv mode
bluss Nov 16, 2021
c2562ae
benchmark: Add --sleep argument to benchloop
bluss Nov 16, 2021
88a3c91
test: Run CI on macos too
bluss Nov 17, 2021
38d8f1a
0.3.2
bluss Nov 20, 2021
4c3950d
ptr: Fix Send/Sync impls for future compat warning
bluss May 1, 2022
f8f9d21
Fix Miri error with -Zmiri-tag-raw-pointers
jturner314 Dec 23, 2021
c2cb362
Add more checks to MIRIFLAGS for CI
jturner314 Jan 8, 2022
4ef1bd9
Updated comment in kernel_x86_avx
Tastaturtaste May 2, 2022
4f841fa
ptr: Silence suspicious Send/Sync impls warning
bluss May 3, 2022
1433d63
gemm: request only 16-byte alignment on macos
bluss Apr 14, 2023
1f8d3c7
0.3.3
bluss Apr 20, 2023
15da77c
loopmacros: Use while loop
bluss Apr 18, 2023
5bf5c7c
bench: Test both beta != 0 and 0 in layout benchmarks
bluss Apr 18, 2023
34e740e
sgemm kernel for NEON arm64/aarch64
bluss Apr 18, 2023
fac92b6
dgemm kernel for NEON arm64/aarch64
bluss Apr 18, 2023
d6f7a34
ci: Test aarch64 at its MSRV
bluss Apr 18, 2023
fe2b237
threading: Remove bias for aarch64
bluss Apr 18, 2023
c5d1930
uninline c_to_beta_c
bluss Apr 23, 2023
0fea705
gemm: Use slice for packing buffer
bluss Apr 23, 2023
058d3ef
Use build script to preserve MSRV on aarch64
bluss Apr 22, 2023
35c258d
Merge pull request #73 from bluss/arm64
bluss Apr 26, 2023
5e0aea7
0.3.4
bluss Apr 28, 2023
b85cfa1
gemm: Allow custom packing functions
bluss Apr 29, 2023
9896879
complex: pack real and imag separately
bluss Apr 29, 2023
6f86fd9
cgemm: Setup Avx2 and Fma autovectorized kernels
bluss Apr 29, 2023
2c536f2
x86-64: Specialize pack function for avx2
bluss Apr 29, 2023
18bd827
cgemm: use fma in avx2 kernel
bluss Apr 29, 2023
e6d04e1
cgemm: Add known-answer test
bluss Apr 29, 2023
e84562d
cgemm: enable fma for neon
bluss Apr 29, 2023
84c0baa
ci: Update miri flags
bluss Apr 29, 2023
258a69f
0.3.5
bluss Apr 30, 2023
145f9e8
Fix nostd build
bluss Apr 30, 2023
d88b19e
0.3.6
bluss Apr 30, 2023
496f08a
Remove space from file names
xander-zitara May 2, 2023
836e5ae
0.3.7
bluss May 2, 2023
d6aef69
bench: Add non-contiguous layouts
bluss Apr 30, 2023
c6f86de
gemm: request 8-byte buffer alignment on macos
bluss Sep 20, 2023
86f4432
ci: Drop 1.41 in cross test
bluss Sep 20, 2023
7753f81
gemm: Ensure alignment without repr(align()) on macos
bluss Sep 20, 2023
e8caf74
0.3.8
bluss Sep 20, 2023
a0bf1bb
Remove obsolete lint directive
bluss Mar 9, 2024
29f3d1c
ci: Test with cargo-careful and ThreadSanitizer
bluss Mar 9, 2024
c7ab1ac
ci: Update github action versions
bluss Mar 9, 2024
77ed4e0
Fix alignment in s390x and cross test
bluss Jul 27, 2024
bb3dd0b
0.3.9
bluss Jul 27, 2024
5b7cdcd
kernel: Silence unused method warning
bluss May 11, 2025
adff8c4
debugmacros: Silence unknown cfg warning
bluss May 11, 2025
0aa4593
example/usegemm: Remove unused method
bluss May 11, 2025
39cb02b
ci: Update cache rule for cross builder
bluss May 11, 2025
9126d49
ci: Pin either=1.13 for MSRV
bluss May 11, 2025
301ebc5
Exclude alignment for MaskBuffer for i686-win7-windows-msvc
drewkett Mar 25, 2025
9753008
sgemm: Reduce unnecessary AVX register permutations
SongXiaoXi May 11, 2025
1c91e1c
0.3.10
bluss May 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

name: Continuous integration

env:
CARGO_TERM_COLOR: always
CARGO_INCREMENTAL: 0
MATMUL_NUM_THREADS: 4
RUST_BACKTRACE: full

jobs:
tests:
runs-on: ${{ matrix.os }}
continue-on-error: ${{ matrix.experimental }}
strategy:
matrix:
include:
- rust: 1.41.1 # MSRV
experimental: false
os: ubuntu-latest
target: x86_64-unknown-linux-gnu
features: cgemm
- rust: stable
experimental: false
os: ubuntu-latest
target: x86_64-unknown-linux-gnu
features: threading cgemm
test_examples: yes_examples
test_benchmark: yes_bench
- rust: nightly
experimental: false
os: ubuntu-latest
target: x86_64-unknown-linux-gnu
mmtest_feature: avx
- rust: nightly
os: ubuntu-latest
target: x86_64-unknown-linux-gnu
features: threading cgemm
mmtest_feature: fma
experimental: false
- rust: nightly
os: ubuntu-latest
target: i686-unknown-linux-gnu
features: cgemm
install_deps: |
sudo apt-get update
sudo apt-get install -y gcc-multilib
experimental: false
- rust: stable
experimental: false
os: macos-latest
target: x86_64-apple-darwin
features: threading cgemm
test_examples: yes_examples

name: tests/${{ matrix.target }}/${{ matrix.rust }}
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
toolchain: ${{ matrix.rust }}
targets: ${{ matrix.target }}
- name: Install dependencies
if: matrix.install_deps
run: ${{ matrix.install_deps }}
- name: Cargo deps locks
if: ${{ matrix.rust == '1.41.1' }}
run:
cargo update -p either --precise 1.13.0
- name: Tests
run: |
rustc -C target-cpu=native --print cfg
cargo build -v --features "${{ matrix.features }}" --target "${{ matrix.target }}"
cargo test -v --tests --lib --no-fail-fast --features "${{ matrix.features }}" --target "${{ matrix.target }}"
cargo test -v --tests --lib --release --no-fail-fast --features "${{ matrix.features }}" --target "${{ matrix.target }}"
- name: Test examples
if: matrix.test_examples
run: |
cargo test -v --examples --features "${{ matrix.features }}" --target "${{ matrix.target }}"
- name: Test benchmark
if: matrix.test_benchmark
run: |
cargo bench --no-run -v --features "${{ matrix.features }}" --target "${{ matrix.target }}"
python3 ./benches/benchloop.py -t f32 f64 c32 c64 --mc 32 -s 32 64 | tee bench.csv
cat bench.csv
- name: Test specific feature
if: matrix.mmtest_feature
env:
MMTEST_FEATURE: ${{ matrix.mmtest_feature }}
MMTEST_ENSUREFEATURE: 1
run: |
cargo test -v --no-fail-fast

nostd-build:
runs-on: ubuntu-latest
continue-on-error: ${{ matrix.experimental }}
strategy:
matrix:
include:
- rust: 1.41.1 # MSRV
experimental: false
target: thumbv6m-none-eabi
- rust: stable
experimental: false
target: thumbv6m-none-eabi

name: nostd-build/${{ matrix.target }}/${{ matrix.rust }}
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
toolchain: ${{ matrix.rust }}
targets: ${{ matrix.target }}
- name: Tests
run: |
cargo rustc "--target=${{ matrix.target }}" --manifest-path=ensure_no_std/Cargo.toml

cross_test:
runs-on: ubuntu-latest
strategy:
matrix:
include:
- rust: stable
target: s390x-unknown-linux-gnu
features: constconf cgemm threading
- rust: stable
target: aarch64-unknown-linux-gnu
features: constconf cgemm threading
- rust: 1.65.0
target: aarch64-unknown-linux-gnu
features: cgemm

name: cross_test/${{ matrix.target }}/${{ matrix.rust }}
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
profile: minimal
targets: ${{ matrix.target }}
- name: Cache cargo plugins
uses: Swatinem/rust-cache@v2
- name: Install cross
if: steps.cache.outputs.cache-hit != 'true'
run: cargo install cross
- name: Tests
run: cross test --target "${{ matrix.target }}" --features "${{ matrix.features }}"
env:
MMTEST_FAST_TEST: 1
- name: Tests (Release)
run: cross test --release --target "${{ matrix.target }}" --features "${{ matrix.features }}"
env:
MMTEST_FAST_TEST: 1


cargo-careful:
runs-on: ubuntu-latest
name: cargo-careful
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
toolchain: nightly
- uses: Swatinem/rust-cache@v2
- name: Install cargo-careful
run: cargo install cargo-careful
- run: cargo careful test -Zcareful-sanitizer=thread --features=threading,cgemm

miri:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Miri
run: ci/miri.sh --features cgemm

70 changes: 0 additions & 70 deletions .travis.yml

This file was deleted.

35 changes: 33 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
[package]
name = "matrixmultiply"
version = "0.2.3"
edition = "2018"
version = "0.3.10"
authors = [
"bluss",
"R. Janis Goldschmidt"
Expand All @@ -11,13 +12,18 @@ license = "MIT/Apache-2.0"
repository = "https://github.com/bluss/matrixmultiply/"
documentation = "https://docs.rs/matrixmultiply/"

description = "General matrix multiplication for f32 and f64 matrices. Operates on matrices with general layout (they can use arbitrary row and column stride). Detects and uses AVX or SSE2 on x86 platforms transparently for higher performance. Uses a microkernel strategy, so that the implementation is easy to parallelize and optimize."
description = """
General matrix multiplication for f32 and f64 matrices. Operates on matrices with general layout (they can use arbitrary row and column stride). Detects and uses AVX or SSE2 on x86 platforms transparently for higher performance. Uses a microkernel strategy, so that the implementation is easy to parallelize and optimize.

Supports multithreading."""

keywords = ["matrix", "sgemm", "dgemm"]
categories = ["science"]

exclude = ["docs/*"]

build = "build.rs"

[lib]
bench = false

Expand All @@ -28,14 +34,39 @@ harness = false
[dependencies]
rawpointer = "0.2"

thread-tree = { version = "0.3.2", optional = true }
once_cell = { version = "1.7", optional = true }
num_cpus = { version = "1.13", optional = true }

[dev-dependencies]
bencher = "0.1.2"
itertools = "0.8"

[features]
default = ["std"]

# support for complex f32, complex f64
cgemm = []

threading = ["thread-tree", "std", "once_cell", "num_cpus"]
std = []

# support for compile-time configuration
constconf = []

[build-dependencies]
autocfg = "1"

[profile.release]
debug = true
[profile.bench]
debug = true

[package.metadata.release]
no-dev-version = true
tag-name = "{{version}}"

[package.metadata.docs.rs]
features = ["cgemm"]
# defines the configuration attribute `docsrs`
rustdoc-args = ["--cfg", "docsrs"]
4 changes: 3 additions & 1 deletion LICENSE-MIT
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
Copyright (c) 2016 - 2018 Ulrik Sverdrup "bluss"
Copyright (c) 2016 - 2023 Ulrik Sverdrup "bluss"
Copyirhgt (c) 2018 R. Janis Goldschmidt
Copyright (c) 2021 DutchGhost [constparse.rs]

Permission is hereby granted, free of charge, to any
person obtaining a copy of this software and associated
Expand Down
Loading