Releases: coreylowman/dfdx
Releases · coreylowman/dfdx
v0.13.0 - `dtypes` module & adds `AMP<F>` dtype
What's Changed
- Make 
storage_traits::TensorToArraypub by @AndrejOrsula in #817 - accurate-gelu by @jcrist1 in #813
 - Fixing examples/04-gradients.rs by @coreylowman in #824
 - Moving optim kernels to tensor ops by @coreylowman in #828
 - [Breaking] Adds 
AMP<F>dtype by @coreylowman in #811 - Adding documentation to dtypes module and amp by @coreylowman in #834
 
New Contributors
- @AndrejOrsula made their first contribution in #817
 - @jcrist1 made their first contribution in #813
 
Full Changelog: v0.12.1...v0.13.0
v0.12.1 - Re-export f16 dtype & making more apis public
What's Changed
- Allow models to be backward compatible through #799 by @nkoppel in #808
 - Various small fixes by @nkoppel in #814
 - Making all shape traits public by @coreylowman in #816
 
Full Changelog: v0.12.0...v0.12.1
v0.12.0 - Adds f16 dtype
Breaking changes
- [Breaking] Adding Tensor::try_realize, and Tensor::realize no longer returns Result by @coreylowman in #758
 - [Breaking] ReshapeTo::reshape_like and ReshapeTo::try_reshape_like now panic instead of returning option by @coreylowman in #766
 - [Breaking] Adding dilation/groups to Conv2D. Adding dilation to Pool2D by @coreylowman in #767
 - [Breaking] Use 
gemmfor matmul. Removes support for matrixmultiply & MKL by @coreylowman in #776 - [Breaking] Moving storage GAT to trait level generic. Split DeviceStorage into multiple traits by @coreylowman in #782
 - [Breaking] Adding dilation/groups to ConvTranspose2D by @coreylowman in #783
 
What's Changed
- Adding f16 as Dtype by @coreylowman in #696
 - Adding example by @sirandreww in #740
 - Adds TryConcatAlong to support Concat along any axis by @coreylowman in #750
 - Changed CUDA_ARCH in compatibility.cuh by @jafioti in #752
 - Allow 
broadcast_liketo accept tensors OR shapes by @VasanthakumarV in #751 - Removing rerun build.rs for output destination by @coreylowman in #754
 - Fixing compatibility for compute cap 70-75 by @coreylowman in #757
 - Adds TriangleTensor and CmpKernel traits to Device bound by @coreylowman in #760
 - Using Bernoulli distribution in dropout - makes dropout reproducible across dtypes by @coreylowman in #761
 - Fixes bug with f16 mean where number of elements reduced was f16::INF by @coreylowman in #763
 - Placeholder f16 gemm speedups by @coreylowman in #765
 - MultiHeadAttention 3d impl now broadcasts to 4d instead of duplicating logic by @coreylowman in #768
 - Moving 
cudarc?/f16behindf16feature by @coreylowman in #774 - impl Clone for Adam, SGD, RMSprop by @coreylowman in #775
 - Properly setting read_dst for gemm in forward/backward pass by @coreylowman in #777
 - Adds rayon dependency. Using 
gemm::Parallelism::Rayon(rayon::current_num_threads())by @coreylowman in #778 - Add LogSoftmax by @kurnevsky in #769
 - Moving some tests off nightly. Adding docs to conv2d op by @coreylowman in #779
 - Adding better error messages if nvidia-smi/nvcc are not found by @coreylowman in #784
 - Using for loop with gridDim.x * blockDim.x as increment by @coreylowman in #787
 - Removing __hmax and __hmin compat functions by @coreylowman in #788
 - Uses grid striding in fill_with by @coreylowman in #790
 - Exposed NumpyDType publicly by @jafioti in #791
 - Fixing weight shape for grouped Conv2D by @coreylowman in #797
 - Bump half/cudarc versions by @coreylowman in #805
 - Using Groups in conv weight init by @coreylowman in #806
 - Add scalar support to TensorCollection by @nkoppel in #799
 
New Contributors
- @sirandreww made their first contribution in #740
 - @kurnevsky made their first contribution in #769
 
Full Changelog: v0.11.2...v0.12.0
v0.11.2 - Tensor caching & other nice features
What's Changed
- Simplify upscale cuda kernels by @coreylowman in #680
 - JIT compiling stack/concat cuda kernels by @coreylowman in #684
 - Initial merging of nvidia-smi and nvcc checks by @quietlychris in #685
 - feat: use 
Cowwhen appropriate by @Alexandcoats in #688 - Add const generic 
NUM_THREADSarg to launch_cfg by @VasanthakumarV in #691 - feat: add 
Tensorliketo clean up spooky ghosts by @Alexandcoats in #689 - Add 
contiguousandtry_contiguousmethods by @VasanthakumarV in #690 - (feat) add device access method by @ccaven in #692
 - Add examples of runtime dimensions to 
examples/02-ops.rsby @VasanthakumarV in #698 - Prevent over-allocation for broadcasted outputs of sum_to by @nkoppel in #699
 - Adds caching layer to tensor allocations by @coreylowman in #670
 - Handle \r in build.rs by @ViliamVadocz in #702
 - Disabling cache by default & adds enable_cache() by @coreylowman in #704
 - Typos in feature_flags.rs by @mauvray in #710
 - Adds 
mat * vecimpl for matmul by @coreylowman in #716 - Adds better assertion macros for testing by @coreylowman in #714
 - Combining multiple github workflows for reuse by @coreylowman in #717
 - Changing nn ToDtype to use generic on method by @coreylowman in #719
 - Flatten2D now accepts generic batch dim by @coreylowman in #720
 - Uses 
impl Into<E>for scalar binary ops when possible by @coreylowman in #722 - Adds cudnn section to feature flags by @coreylowman in #723
 - Impls for (T,) by @opfromthestart in #725
 - Fixing dependencies for no-std by @coreylowman in #736
 - Adds rust 1.65 as the minimum rust compiler version by @coreylowman in #737
 - Moves scalar comparison to use the same method as tensor comparison. Deprecates 
try_scalar_*/scalar_*. by @coreylowman in #738 - Run CI for all kind of pushes - not only pull request related ones by @YannickFricke in #739
 - Adds 
Tensor::to_deviceto support sending tensors of any shape to any device by @coreylowman in #741 
New Contributors
- @VasanthakumarV made their first contribution in #691
 - @ccaven made their first contribution in #692
 - @mauvray made their first contribution in #710
 - @YannickFricke made their first contribution in #739
 
Full Changelog: v0.11.1...v0.11.2
v0.11.1 - cudnn, optimizations, and new ops/nn layers
What's Changed
- Fix bug in gather cuda kernel by @nkoppel in #588
 - feat(device): introduce AutoDevice type by @kakoc in #579
 - Use Recursive Macros to Implement Shape Operation Traits. by @nkoppel in #583
 - Add ToDtype tensor operation by @nkoppel in #582
 - Using 128 threads by default for cuda kernels by @coreylowman in #599
 - Add Slice tensor operation. by @nkoppel in #602
 - Optimizing conv kernels a bit by @coreylowman in #605
 - feat: add upper/lower triangles (tril and triu) allocations by @Alexandcoats in #568
 - Adds Tensor::roll by @coreylowman in #608
 - Using multiple streams for matmul with cuda by @coreylowman in #610
 - Fix no-std support by @Alexandcoats in #615
 - Adds matrixmultiply/std to std feature by @kstavro in #618
 - Implement concat for usize arrays; add concat to Device. by @nkoppel in #621
 - Allow conv2d and pool2d to use dynamic dimensions for width and height. by @nkoppel in #620
 - Switch to using nvcc --list-gpu-code for build.rs compute_cap by @quietlychris in #619
 - Fix bug in reshape on cuda by @nkoppel in #622
 - Don't always do try_min in pool_global.rs by @nkoppel in #623
 - Revert "Switch to using nvcc --list-gpu-code for build.rs compute_cap… by @coreylowman in #624
 - Adds 
restridedin favor ofget_unstrided_index->get_strided_indexby @coreylowman in #628 - Combines multiple calls to get_strided_index into a single loop by @coreylowman in #629
 - Reducing number of buffers sent to cuda for some operations by @coreylowman in #611
 - Optimizing conv2d more by @coreylowman in #631
 - Add ability to include smaller last batch by @nkoppel in #632
 - Upscale2D and ConvTrans2d by @opfromthestart in #603
 - impl Dtype for all Unit types except bool by @coreylowman in #635
 - Allow convtrans2d to use dynamic dimensions by @nkoppel in #639
 - JIT compiling kernel for to_dtype & reshape by @coreylowman in #634
 - Optimize conv transpose kernels to do same thing as conv by @coreylowman in #641
 - Reworking crate level documentation by @coreylowman in #644
 - Adds synchronize to DeviceStorage by @coreylowman in #645
 - adding usize dtype to cuda_kernel by @zojeda in #648
 - Add PReLU and LeakyReLU by @opfromthestart in #586
 - Moving logsumexp normalization off of graph by @coreylowman in #652
 - Adding CmpKernels to Device, more documentation by @coreylowman in #653
 - Removing bounds checking from cpu conv kernel folding by @coreylowman in #650
 - Allow upscale2d to use dynamic dimensions by @nkoppel in #654
 - Adding integration test for resnet18 by @coreylowman in #655
 - Removing some un-necessary blanket impls by @coreylowman in #656
 - Fixes conv transpose stride bug, adds more docs to upscale2d by @coreylowman in #658
 - Some QOL fixes by @opfromthestart in #659
 - Optimizing softmax & log_softmax by @coreylowman in #660
 - Reuse f(x) for unary operations when possible. by @coreylowman in #661
 - Allocating gradients in backward op by @coreylowman in #663
 - Adds 
Tensor::recip(1 / x) by @coreylowman in #665 - Reshape layer by @opfromthestart in #666
 - Re-using tensor storage when possible by @coreylowman in #664
 - Adds cudnn feature flag. Removes "test-cuda" feature flag. Using cuDNN for convolutions by @coreylowman in #651
 - Always attempting allocation reuse during inference by @coreylowman in #673
 - Clarify reshape behavior in docs by @coreylowman in #674
 - Have SplitInto keep tapes of each head seperate by @nkoppel in #676
 - Using arch option in nvrtc by @coreylowman in #675
 
New Contributors
- @kakoc made their first contribution in #579
 - @quietlychris made their first contribution in #619
 - @opfromthestart made their first contribution in #603
 - @zojeda made their first contribution in #648
 
Full Changelog: v0.11.0...v0.11.1
v0.11.0 - Cuda support, mixed const/runtime tensors, and device rewrite
What's Changed
- AddInto by @Dimev in #256
 - added 5d & 6d tensors by @M1ngXU in #283
 - Remove phantom by @M1ngXU in #282
 - remove tensor bound by @Dimev in #297
 - Adding nightly to cargo-test by @JYudelson1 in #294
 - Devices/Dyn dimensions refactor by @coreylowman in #304
 - Add instructions for running the mnist example. by @infalmo in #310
 - Removes Dyn. Use usize directly by @coreylowman in #315
 - Making f32 default dtype for Tensor, updating examples/docstrings by @coreylowman in #316
 - Only running gha on push by @coreylowman in #317
 - Adding Unit and HasUnitType. Reducing bounds for Dtype by @coreylowman in #313
 - Removing build_test_device. Using TestDevice everywhere by @coreylowman in #324
 - Adding SampleTensor, Removing RandTensor/RandnTensor by @coreylowman in #327
 - Removing usages of tensor aliases by @coreylowman in #328
 - Moving intel-mkl stuff into sub module in build.rs by @coreylowman in #329
 - Adding Cuda device and skeleton cuda kernel impls by @coreylowman in #322
 - Implementing abs/exp/div/sum_to cuda kernels by @coreylowman in #331
 - permute_to and broadcast_to cuda kernels by @coreylowman in #343
 - Add cuda implementations for unary and binary tensor operations in #341 and #334 by @nkoppel in #346
 - Using atomicAdd in binary op backwards to properly handle strides by @coreylowman in #350
 - Resolve #352 and #347 by @nkoppel in #354
 - Implement reshape cuda kernel (resolves #336) by @nkoppel in #356
 - Add missing device generic in transformer test by @ViliamVadocz in #358
 - Add select and gather cuda kernels. by @nkoppel in #359
 - Upgrade to cudarc 0.6.0 by @coreylowman in #361
 - Add tests for binary broadcasted add and fix bugs to allow them to pass. by @nkoppel in #357
 - run GHA on pull_request by @coreylowman in #364
 - matmul cuda kernels by @coreylowman in #342
 - Adding dynamic example. by @Narsil in #368
 - Add cuda kernels for min_to/max_to by @coreylowman in #370
 - Adding dropout cuda kernel by @coreylowman in #372
 - Adding ConstDim and ConstShape for tensor creation by @coreylowman in #373
 - Fixing computation of lda/ldb/ldc with cblas by @coreylowman in #375
 - Modify sum_to cuda kernel to not need atomic adds in backwards by @nkoppel in #367
 - Simplifying 
trait Conv2DKerneland Cpu implementation by @coreylowman in #376 - (#344) Implement cuda kernels for optimizers by @nkoppel in #378
 - Fix max_to and min_to edge case with negative zero by @ViliamVadocz in #380
 - Add cuda kernels for conv2d by @coreylowman in #369
 - Rework pool2d internals & add pool2d cuda kernels by @coreylowman in #384
 - Implement Shape for arrays (#377) by @nkoppel in #385
 - Efficient cuda kernels for reductions by @nkoppel in #382
 - Improving compilation times of deeply nested const generic modules by @coreylowman in #391
 - Fixing remainder of cuda tests & fixing cblas/cublas matmul with strides [1,1] by @coreylowman in #393
 - Adding Cuda device usage to mnist example by @coreylowman in #396
 - Adding GeLU operator (used in Gpt2) by @Narsil in #397
 - Removing codecov from workflows/readme by @coreylowman in #403
 - Reorganize tensor_ops, and add cuda_utils.cuh by @nkoppel in #398
 - Some small optimizations for conv2d on cpu by @coreylowman in #404
 - Removing Device generic from Gradients & optimizers by @coreylowman in #402
 - Add ToDevice and OnDevice to simplify nn api (#388) by @nkoppel in #394
 - Removes 
ModuleBuilder, AddsBuildModule&BuildOnDeviceby @coreylowman in #405 - Enable multi-core matmul by @infalmo in #417
 - Fix GELU CUDA kernel compilation by @ViliamVadocz in #409
 - Adding nn.Embedding layer. by @Narsil in #406
 - Removing defaults for Tensor Dtype & Device generic parameters by @coreylowman in #418
 - Removing Default for optimizers & adding &M to constructors by @coreylowman in #422
 - Adding runtime assertion in 
try_binary_opthat shapes are equal by @coreylowman in #428 - Add boolean operations and choose. by @nkoppel in #415
 - Add TensorFrom trait to create tensors from both vectors and arrays. by @nkoppel in #414
 - Adding nn builder structs, dtype generics, and remove device defaults. by @coreylowman in #433
 - Upgrade to cudarc==0.7.0 and use alloc_async instead of alloc_zeros_async by @coreylowman in #440
 - Add comparison tensor operations by @ViliamVadocz in #386
 - Add synchronize method to Cuda device by @ViliamVadocz in #442
 - f64 kernels by @coreylowman in #421
 - Add stack tensors method by @coreylowman in #449
 - cargo check cuda & run f64 tests in CI by @coreylowman in #447
 - Fix bug in #451 by @nkoppel in #453
 - Add more runtime shape checks by @coreylowman in #454
 - Adding ReshapeTo::reshape_like by @coreylowman in #456
 - Adding SampleTensor::sample_uniform_like and SampleTensor::sample_normal_like by @coreylowman in #457
 - Improve examples (add Cuda) by @TimerErTim in #452
 - Dataset iterators - adds batching, collating for iterators by @coreylowman in #462
 - Fixing issue with to_device and broadcasted tensors by @coreylowman in #465
 - Bump cudarc 0.7.2 by @coreylowman in #466
 - Adding index out of bounds checks to select/gather kernels by @coreylowman in #467
 - Rename to 
add_dim. by @infalmo in #471 - impl BuildModule for ZeroSizedModule by @coreylowman in #470
 - Adds TensorCollection by @coreylowman in #469
 - Fixing cargo doc warnings by @coreylowman in #473
 - Using 
--gpu-architecture nativewith nvcc by @coreylowman in #474 - using TensorFromVec for OneHotEncode and Arange by @coreylowman in #477
 - Small batchnorm optimizations by @coreylowman in #478
 - nvcc: fixed type bug by @M1ngXU in #480
 - Adds fast_alloc feature and binary kernel optimizations by @coreylowman in #481
 - Adding some "benchmarking" scripts by @coreylowman in #483
 - Add try_forward and try_forward_mut to Module and ModuleMut. by @nkoppel in #482
 - Optimizing cpu kernels for reductions by @coreylowman in #484
 - Using alloc_zeros_async and memset_zeros for cuda by @coreylowman in #489
 - Making Conv2D unbiased by default, and adding Bias2D module by @coreylowman in #494
 - Using image/filter stride in cuda kernel for conv by @coreylowman in #495
 - bump cudarc version by @coreylowman in #498
 - Adding attention_reshape (inference only) kernels. by @Narsil in #497
 - Adding lifetime to gat in Exact...
 
v0.10.0
What's Changed
Breaking Changes
- Binary ops (
add,sub,div,mul,maximum,minimum) take ownership of rhs by @coreylowman in #268 - backwards only allows 0d tensors now by @coreylowman in #206
 - Clone now keeps same id, removing Tensor::duplicate by @coreylowman in #249
 - Multi axis reductions
- See docs
 - #189, #190, #194
 - Reduction functions now can reduce across any axis/axes: 
mean,sum,max,min,stddev,var,softmax,log_softmax, andlogsumexp - Remove 
-1from valid axes, addtrait HasLastAxisto use in generic functions instead - Adding 
normalizefunction that normalizes across any axis - Removing single axis reduction functions 
fn *_axis():mean_axis,sum_axis,max_axis,min_axis,normalize_axis,std_axis,var_axis - Rename 
HasAxistoHasAxes - Add 
trait BroadcastTo- Remove 
trait Broadcast1,trait Broadcast2,trait Broadcast3,trait Broadcast4 
 - Remove 
 - Add 
trait Reduce/trait ReduceTo- Remove 
trait Reduce1 
 - Remove 
 
 - Batched select & select consistency
- See docs
 - Renaming SelectTo, using SelectTo for batched select by @coreylowman in #217
 - Add Batched Select for devices and tensor_ops by @coreylowman in #182
 
 - Reduce things in prelude by @coreylowman in #209
 - Renaming FlattenImage to Flatten2D by @coreylowman in #243
 
New features
Arcin Tensors instead of Rc by @caelunshun in #236powi()andpowf()functions by @coreylowman in #167no_stdsupport- See feature flags docs
 - Remove num-traits, no default features on depends by @coreylowman in #200
 - Adding intel-mkl feature and removing the 4 mkl-- features by @coreylowman in #239
 - Adding module that has docs for feature flags by @coreylowman in #240
 - Adding "numpy" feature to make numpy & npz optional by @coreylowman in #241
 - Adding 
#![no_std]support viano_std_compatby @coreylowman in #244 - Adding default-features = false to dependencies by @coreylowman in #257
 
- Adding Axis permutations via 
trait PermuteTo. - Adding 
trait ModuleMut- See docs
 - #225
 - Removing Module super traits by @coreylowman in #223
 - Rework Dropout/DropoutOneIn to use ModuleMut by @coreylowman in #226
 
 - Adding decoupled/l2 weight decay in optimizers:
- See docs
 - add HasArrayData to GradientProvider by @cBournhonesque in #261
 - Add weight decay to SGD by @cBournhonesque in #258
 - Adding weight_decay to Adam by @coreylowman in #275
 - Adding weight decay to RMSprop by @coreylowman in #276
 
 - Adding 
nn::Transformer#175, #173, #180- See docs
 
 - Adding 
nn::MinPool2D,nn::MaxPool2D,nn::AvgPool2Dby @coreylowman in #214- See docs
 
 - Adding 
nn::MinPoolGlobal,nn::MaxPoolGlobal,nn::AvgPoolGlobalby @coreylowman in #216- See docs
 
 - Adding 
nn::BatchNorm2Dby @coreylowman in #228- See docs
 
 
Misc changes
- Add tensor() function as a convenient way to make tensors from arrays by @coreylowman in #161
- See docs
 
 - Remove allocation in dropout implementation by @coreylowman in #164
 - Removing Tensor::OwnedTape by @coreylowman in #197
 - Revamping examples/ by @coreylowman in #205
 - Conv cleanup
- Moving conv into device and cleaning up a bit by @coreylowman in #212
 - Minifying conv impls by @coreylowman in #213
 - Changing conv2d and conv2d_batched to methods of tensors by @coreylowman in #221
 - Replacing conv2d implementation with matmuls by @coreylowman in #237
 
 - Fix typos by @cBournhonesque in #235
 - Combining multiple where clauses with const generics into a single one by @coreylowman in #264
 - Checking for null ptr in AllocateZeros by @coreylowman in #271
 - Reducing allocations in 
map_df_uses_fxby @coreylowman in #272 - Adding with_empty_tape and with_diff_tape by @coreylowman in #274
 
New Contributors
- @cBournhonesque made their first contribution in #235
 - @caelunshun made their first contribution in #236
 
Full Changelog: v0.9.0...v0.10.0
v0.9.0
Breaking Changes
- Add broadcast functions, reductions on any axis, and selecting subtensors (#137, #114, #139) by @coreylowman in #138
 - Added normalize axis and removed normalize by @vikigenius in #140
 - #67 
Optimizer::updatenow returnsResult<(), UnusedParamsError>by @coreylowman in #107 
New features
- #34 Add Transformers!!! by @jafioti in #120
 - #1 Add Conv2d by @coreylowman in #124
 - #55 Added reshape function by #90 #129 #120
 - #133 Adding FlattenImage layer that uses reshape by @coreylowman in #133
 - #142 Adding Module::forward_mut by @coreylowman in #148
 - #80 Adding nn::Softmax by @coreylowman in #81
 - #79 Adding smooth_l1_loss and huber_loss by @coreylowman in #82
 - #131 matmul now supports batched & broadcasted inputs by @coreylowman in #132
 - add macOS MKL support by @yerke in #73
 - Adding maximum function by @coreylowman in #143
 - Adding min_axis function by @coreylowman in #144
 
Additional changes
- Simplifying implementation of BCE loss using binary_map by @coreylowman in #75
 - Miscellaneous updates by @coreylowman in #76
 - Added custom model example by @jafioti in #83
 - add Debug and Display support for 
NpzErrorby @XBagon in #85 - Added nightly feature by @jafioti in #89
 - Added 2d broadcast_first functions and 3d linear forward by @jafioti in #94
 - #55 reshape, and #87 additional work on nightly feature by @coreylowman in #90
 - #69 adding map_df_uses_fx by @coreylowman in #105
 - Fixed a misleading docstring. by @M1ngXU in #109
 - Fix Issue #110 Fix (Dropout (test) for non-positive values) by @M1ngXU in #113
 - Issue #96 by @M1ngXU in #118
 
New Contributors
- @jafioti made their first contribution in #83
 - @XBagon made their first contribution in #85
 - @yerke made their first contribution in #73
 - @M1ngXU made their first contribution in #109
 - @vikigenius made their first contribution in #140
 
Full Changelog: v0.8.0...v0.9.0