Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.1.11
Changes
✨ New Features
- [FEAT] [New Query Plan] Add support for Projection and Coalesce, enable many tests @clarkzinzow (#1256)
- [FEAT] [New Query Planner] Add support for Concat. @clarkzinzow (#1254)
- [FEAT] [New Query Planner] Add support for tabular writes. @clarkzinzow (#1252)
- [FEAT] Multi-partition aggregate; Coalesce @xcharleslin (#1249)
- [FEAT] [New Query Planner] Add support for Sort, Repartition, and Distinct in new query planner. @clarkzinzow (#1248)
- [FEAT] Add Azure Support for Native Downloader @samster25 (#1250)
- [FEAT] Locally unique semantic IDs for Expressions @xcharleslin (#1243)
- [FEAT] Read parquet tables with int96 coercion option @jaychia (#1231)
- [FEAT] [New Query Plan] Add support for CSV scans, JSON scans, in-memory scans and caching materialized results. @clarkzinzow (#1246)
- [FEAT] Native Downloader add Retry Config parameters @samster25 (#1244)
- [FEAT] (Single partition only) DataFrame.sum() via Rust planner @xcharleslin (#1230)
- [FEAT] [New Query Planner] Logical --> physical translation, physical plan execution. @clarkzinzow (#1232)
- [FEAT] native parquet correctness checks @samster25 (#1225)
- [FEAT] add session token as input to io config @samster25 (#1224)
🚀 Performance Improvements
- [PERF] Native Parquet Bulk Reader @samster25 (#1233)
👾 Bug Fixes
- [BUG] drop native-tls (openssl) for azure which was a default feature @samster25 (#1251)
- [BUG] Fix decimal byte arrays @jaychia (#1247)
- [BUG] correct type when printing incorrect row count @samster25 (#1226)
- [BUG] try manylinux 2 28 @samster25 (#1214)
- [BUG] downgrade ray to 2.6 @samster25 (#1212)
- [BUG] add explict target for aarch64 linux @samster25 (#1209)
- [BUG] Fix incorrect sign bug for small decimals @xcharleslin (#1204)
- [BUG] Set SSL paths on linux @samster25 (#1203)
📖 Documentation
- [DOCS] Fix daft.read_parquet link @jaychia (#1228)
- [DOCS][CHORE] Add docs for IOConfig and S3Config @jaychia (#1227)
🧰 Maintenance
- [CHORE] Update test to only use store_schema kwarg for pa>=11 @jaychia (#1253)
- [FEAT] (Single partition only) DataFrame.sum() via Rust planner @xcharleslin (#1230)
- [CHORE] [New Query Planner] Introduce
LogicalPlanBuilderandQueryPlannerinterfaces to hide query planner implementations. @clarkzinzow (#1245) - [CHORE] LogicalPlan: Add display improvements, and Filter @xcharleslin (#1221)
- [CHORE] Add unit tests for int96 timestamps @jaychia (#1229)
- [DOCS][CHORE] Add docs for IOConfig and S3Config @jaychia (#1227)
- [CHORE] disable mac test for lack of docker @samster25 (#1223)
- [CHORE] Begin integrating Rust Logical Plan with Dataframe API @xcharleslin (#1207)
- [CHORE] integration tests for nightly platform wheels @samster25 (#1219)
- [CHORE] Remove existing LogicalPlan from all execution concepts @xcharleslin (#1208)
- [CHORE] Add endpoints to simulate rate-limiting on AWS S3 buckets @jaychia (#1220)
- [CHORE] Add pytest marker for integration @jaychia (#1211)
- [CHORE] Add s3 fixtures for retrying logic @jaychia (#1206)
- [CHORE] Add developer flag to use Rust query planner @xcharleslin (#1205)
- [CHORE] Rust Logical plan skeleton @xcharleslin (#1192)
⬆️ Dependencies
7 changes
- Bump tempfile from 3.7.0 to 3.7.1 @dependabot (#1238)
- Bump ray[data,default] from 2.5.1 to 2.6.1 @dependabot (#1200)
- Bump numpy from 1.25.1 to 1.25.2 @dependabot (#1199)
- Bump tempfile from 3.6.0 to 3.7.0 @dependabot (#1198)
- Bump serde_json from 1.0.103 to 1.0.104 @dependabot (#1197)
- Bump num-traits from 0.2.15 to 0.2.16 @dependabot (#1196)
- Bump serde from 1.0.171 to 1.0.179 @dependabot (#1195)
v0.1.10
Changes
✨ New Features
- [FEAT] Enable feature-flagged native downloader in daft.read_parquet @jaychia (#1190)
- [FEAT] parquet reader refactor, add parquet_stats_reader and parquet_schema_reader (1/2) @samster25 (#1191)
🚀 Performance Improvements
- [PERF] native streaming parquet @samster25 (#1193)
🧰 Maintenance
⬆️ Dependencies
6 changes
- Bump isbang/compose-action from 1.4.1 to 1.5.0 @dependabot (#1178)
- Bump serde_json from 1.0.100 to 1.0.103 @dependabot (#1168)
- Bump pyo3-log from 0.8.2 to 0.8.3 @dependabot (#1167)
- Bump dyn-clone from 1.0.11 to 1.0.12 @dependabot (#1166)
- Bump numpy from 1.25.0 to 1.25.1 @dependabot (#1164)
- Bump lxml from 4.9.2 to 4.9.3 @dependabot (#1163)
v0.1.9
Changes
🏆 Highlights
- [FEAT] [Tensor] Add support for
TensorandFixedShapeTensortypes. @clarkzinzow (#1073)
✨ New Features
- [FEAT] Consolidate to list namespace @jaychia (#1180)
- [FEAT] Add .image.crop Expression @jaychia (#1175)
- [FEAT] [Tensor] Add support for
TensorandFixedShapeTensortypes. @clarkzinzow (#1073) - [FEAT] Basic support for Arrow 128-bit Decimal. @xcharleslin (#1129)
- [FEAT] Native Parquet Downloader @samster25 (#1107)
🚀 Performance Improvements
- [PERF] Simple Read Planner and RangeReader for Native Parquet Reader @samster25 (#1172)
👾 Bug Fixes
- [BUG] Fix ownership model of IOClient @samster25 (#1128)
- [BUG] Ownership of Runtime and Clients @samster25 (#1125)
📖 Documentation
- [DOCS] Fix broken link to Ray Datasets docs @jaychia (#1186)
- [FEAT] Consolidate to list namespace @jaychia (#1180)
- [DOCS] Add docs for tensor dtype @jaychia (#1170)
- [DOCS] Add Flyte example @jaychia (#1150)
- [CHORE] Update README.rst typo @jaychia (#1141)
🧰 Maintenance
- [CHORE] Bump cargo version to 0.1.9 @jaychia (#1187)
- [CHORE] Exclude JSON pre-commit fixer for ipynb files @jaychia (#1184)
- [CHORE] New daft-plan crate; trait TreeDisplay @xcharleslin (#1176)
- [CHORE] More Parquet benchmarking @jaychia (#1160)
- [CHORE] Enable Parquet Integration tests for decimal types @samster25 (#1161)
- [CHORE] cache all crates @samster25 (#1158)
- [CHORE] move parquet unit tests under io @samster25 (#1157)
- [CHORE] [CI] use smarter github rust cache action @samster25 (#1156)
- [CHORE] bump profiling timeout @samster25 (#1155)
- [CHORE] Native Parquet Integration Tests @samster25 (#1154)
- [CHORE] Remove use of
dirs_exist_okwhich was only added in Py3.8 @jaychia (#1153) - [CHORE] Add parquet benchmarking @jaychia (#1151)
- [CHORE] Cleans up IO integration test fixtures for re-use @jaychia (#1152)
- [CHORE] Update README.rst typo @jaychia (#1141)
- [CHORE] No-op test for various parquet files @jaychia (#1130)
- [CHORE] Tidy typing for remaining binary ops: logical, comp @xcharleslin (#1124)
- [CHORE] Use workspace for cargo check @samster25 (#1127)
⬆️ Dependencies
10 changes
- Bump orjson from 3.9.1 to 3.9.2 @dependabot (#1143)
- Bump pandas from 2.0.2 to 2.0.3 @dependabot (#1142)
- Bump snafu from 0.7.4 to 0.7.5 @dependabot (#1146)
- Bump serde_json from 1.0.99 to 1.0.100 @dependabot (#1147)
- Bump opencv-python from 4.7.0.72 to 4.8.0.74 @dependabot (#1117)
- Bump ray[data,default] from 2.4.0 to 2.5.1 @dependabot (#1074)
- Bump chrono-tz from 0.8.2 to 0.8.3 @dependabot (#1119)
- Bump pyo3 from 0.19.0 to 0.19.1 @dependabot (#1122)
- Bump async-trait from 0.1.68 to 0.1.71 @dependabot (#1126)
- Bump tokio from 1.28.2 to 1.29.1 @dependabot (#1120)
v0.1.8
Changes
✨ New Features
- [FEAT] Ranged Get Native Downloader @samster25 (#1113)
- [FEAT] Native S3 Downloader Anonymous Mode @samster25 (#1105)
- [FEAT] Enable reading a list of URLs in read_* APIs @jaychia (#1102)
- [FEAT] Arithmetic with timestamps and durations. @xcharleslin (#1103)
- [FEAT] Automatic Region Retrying for S3 Native Downloader @samster25 (#1098)
- [FEAT] Better styling of large dataframe cells in HTML @jaychia (#1097)
👾 Bug Fixes
- [BUG] S3 Downloader set default region when region not detected @samster25 (#1100)
📖 Documentation
- [CHORE] Update README.rst for image downloading @jaychia (#1109)
- [DOCS] Update image tutorials with
.imagenamespaced expressions @jaychia (#1110)
🧰 Maintenance
- [CHORE] Tidy up typing of binary ops [1/2] @xcharleslin (#1114)
- [CHORE] Pin Pydantic to < 2 @jaychia (#1115)
- [CHORE] Remove rogue print statement @jaychia (#1112)
- [CHORE] Install wheel together with requirements in release build @jaychia (#1111)
- [CHORE] Update README.rst for image downloading @jaychia (#1109)
- [CHORE] Adding more test fixtures for different I/O sources @jaychia (#1083)
- [CHORE] Cache build artifacts in target folder @jaychia (#1104)
- [CHORE] Fix CI caching to cache integration test builds separately @jaychia (#1101)
- [CHORE] Use maturin directly instead of multiplatform build step @jaychia (#1099)
⬆️ Dependencies
- Bump serde_json from 1.0.97 to 1.0.99 @dependabot (#1095)
- Bump pytest from 7.3.2 to 7.4.0 @dependabot (#1089)
v0.1.7
Changes
🏆 Highlights
- [FEAT] Add
DataFrame.to_torch_map_datasetand.to_torch_iter_dataset. @xcharleslin (#1086) - [PERF] Rust based url downloading with error handling @samster25 (#1061)
✨ New Features
- [FEAT] Enable Native Downloader IO Config @samster25 (#1090)
- [FEAT] Add
DataFrame.to_torch_map_datasetand.to_torch_iter_dataset. @xcharleslin (#1086) - [FEAT] DataFrame.__iter__() and .iter_partitions() @xcharleslin (#1062)
- [FEAT] New DataType: Duration (without arithmetic) @xcharleslin (#1051)
- [FEAT] [Images] [9/N] Infer
Imagetype for PIL images on ingress. @clarkzinzow (#1067) - [FEAT] Automatically cast logical types to Python objects on
Series.to_pylist(). @clarkzinzow (#1063) - [FEAT] [Images] [8/N] Add encoding and resizing support for fixed-shape images. @clarkzinzow (#1052)
- Dataframe Iter 1/n: Physical plan streams results into Runner. @xcharleslin (#1060)
🚀 Performance Improvements
- [PERF] Rust based url downloading with error handling @samster25 (#1061)
👾 Bug Fixes
- [BUG] Fix remote mode typo @xcharleslin (#1092)
- [BUG] Reenable HTML viz hooks for np.ndarray and PIL Images @jaychia (#1078)
- [BUG] Fix string index bug in table repr @xcharleslin (#1079)
- [BUG] pin the version of python used in publishing @samster25 (#1068)
- [BUG] [CI] Fix merge conflict due to out-of-date base. @clarkzinzow (#1066)
📖 Documentation
- [FEAT] Add
DataFrame.to_torch_map_datasetand.to_torch_iter_dataset. @xcharleslin (#1086) - [CHORE] Fix filepath for autogeneration of .list.join docs @jaychia (#1084)
- In CI, limit tutorial to 500 rows @xcharleslin (#1076)
- [DOCS] Embeddings tutorial: Temporarily remove full dataset @xcharleslin (#1039)
- [DOCS] Remove release notes from documentation, link to Github instead @jaychia (#1049)
🧰 Maintenance
- [CHORE] set dependabot schedule to weekly @samster25 (#1085)
- [CHORE] Refactor integration test to use wheel built for release @jaychia (#1087)
- [CHORE] unpin numpy version for py<3.8 @jaychia (#1088)
- [CHORE] Fix filepath for autogeneration of .list.join docs @jaychia (#1084)
- [CHORE] Crate Smash v1 @samster25 (#1080)
- [CHORE] Scheduler cleanup: merge logical_op_runners.py into execution_step @xcharleslin (#1020)
- [CHORE] Inline the label enforcer into the release drafter wf @jaychia (#1057)
- [CHORE] Fix naming of "Release Drafter" workflow in trigger @jaychia (#1055)
- [CHORE] Add new trigger to run PR label enforcement after Release Drafter @jaychia (#1054)
- [CHORE][CI] Use pyarrow Table sort API that's compatible with older pyarrow versions @clarkzinzow (#1053)
⬆️ Dependencies
4 changes
- Bump hypothesis from 6.79.1 to 6.79.2 @dependabot (#1082)
- [CHORE] Crate Smash v1 @samster25 (#1080)
- Bump hypothesis from 6.78.2 to 6.79.1 @dependabot (#1065)
- Bump numpy from 1.24.3 to 1.25.0 @dependabot (#1064)
v0.1.6
Changes
🏆 Highlights
- [FEAT] Support for Timestamp datatype. @xcharleslin (#1032)
✨ New Features
- [FEAT] Support for Timestamp datatype. @xcharleslin (#1032)
- [FEAT] Thread user-provided schema through to DataFrame reads @jaychia (#1024)
- [FEAT] Daft Image viz support. Remove Tabulate dependency. @xcharleslin (#1027)
- [FEAT] Dataframe Concats @jaychia (#1023)
- [FEAT] Add kernels for .list.join on a list[utf8] column @jaychia (#989)
- [FEAT][Table-Read-Schema 2/3] Add table casting logic @jaychia (#1012)
- [FEAT][Table-Read-Schema 1/3] Split reading tabular file formats into 2 method calls @jaychia (#1010)
- [FEAT][Images] [7/N] Add image encoding support. @clarkzinzow (#1013)
- [FEAT] Visualization cleanup 2/n: Add repr_html to Series, Table, and PyO3 @xcharleslin (#1018)
- [FEAT] Visualization cleanup (1/n): Use Table for repr @xcharleslin (#1011)
📖 Documentation
- [DOCS] Fix links to ray.io latest docs @jaychia (#1038)
- [DOCS] Add initial docs pass, adding lots of cross-reference links. @clarkzinzow (#1009)
- [DOCS][Images] [6/N] Fix image dtype docs. @clarkzinzow (#1008)
- 0.1.5 release notes @samster25 (#1007)
🧰 Maintenance
- [CHORE] Update cargo version to v0.1.6 @jaychia (#1047)
- [CHORE] Add a GitHub action to enforce labels are added to the PR before merging @jaychia (#1045)
- [CHORE] Fix CI TPCH data generation for old deprecated kwarg @jaychia (#1044)
- [CHORE] Fix footer of release-drafter @jaychia (#1043)
- [CHORE] Add release-drafter files @jaychia (#1042)
- [CHORE][CI] Fix flakiness in Datasets integration tests. @clarkzinzow (#1017)
⬆️ Dependencies
7 changes
- Bump pytest from 7.3.1 to 7.3.2 @dependabot (#1034)
- Bump log from 0.4.18 to 0.4.19 @dependabot (#1036)
- Bump hypothesis from 6.76.0 to 6.78.2 @dependabot (#1040)
- Bump s3fs from 2023.5.0 to 2023.6.0 @dependabot (#1029)
- Bump orjson from 3.9.0 to 3.9.1 @dependabot (#1031)
- Bump dask from 2023.5.0 to 2023.6.0 @dependabot (#1030)
- Bump serde from 1.0.163 to 1.0.164 @dependabot (#1025)
v0.1.5
The Daft 0.1.5 release features better series exporting, bugfixes and improved documentation.
Enhancements
- Enable Cast from Image to Python via Numpy #990
Bug Fixes
- Fix Image Resize/Decode Expressions #1001
Build Changes
- Python script for subprefixing s3 tpch files #997
- Update pyo3-log from 0.8.1 to 0.8.2 #996
- Update hypothesis from 6.75.9 to 6.76.0 #995
Documentation
v0.1.4
Daft 0.1.4 Release Notes
The Daft 0.1.4 release features our Image type columns!
New Features
Image Types
Our first Daft Image types have landed!
You can now construct an Image column with .image.decode() on a Binary column.
See PRs:
- [Images] [1/N] Logical type for variable-shaped and fixed-shaped images. #955
- [Images] [3/N] Add image decoding for uint8 images. #981
- Image Resize for ImageType #967
- [Images] [2/N] Add scaffolding for image decoding and other ops. #965
Documentation
- Fix modin typo; add partial scale numbers; highlight highlights #986
- Add Page on benchmarking #980
- Fix link for broken link checker #972
- Clean up DataType userguide/API docs #966
- Add more complex datatypes to docs #961
- Datatype docs #894
Bug Fixes
- [Scheduler] Fix join performance bug #985
- Table Slice and IntoPartitions Fix #962
- size_bytes fix: Guard against calculating variance of one item #957
Build Changes
- Bump hypothesis from 6.75.8 to 6.75.9 #979
- Bump orjson from 3.8.14 to 3.9.0 #978
- Bump hypothesis from 6.75.6 to 6.75.8 #976
- Add s3fs to dev requirements #975
- Bump python version to 3.9 for profiling and pin Dask version for python 3.8 #973
- Bump log from 0.4.17 to 0.4.18 #971
- Bump dask from 2023.5.0 to 2023.5.1 #970
- Bump pandas from 2.0.1 to 2.0.2 #969
- Bump hypothesis from 6.75.5 to 6.75.6 #968
- Bump orjson from 3.8.13 to 3.8.14 #964
- Bump hypothesis from 6.75.3 to 6.75.5 #963
- finer feature flags for arrow2 for faster compile #960
- Bump pytest-cov from 4.0.0 to 4.1.0 #959
- Bump orjson from 3.8.12 to 3.8.13 #958
v0.1.3
Daft 0.1.3 Release Notes
The Daft 0.1.3 release features fixes for a few performance regressions.
Enhancements
- Very basic s3 parquet microbenchmark #954
Bug Fixes
- [I/O] Change back to random access read for Parquet. #953
- [CI] Fix flaky Ray Datasets integration test. #952
- [Ray Runner] Unfixing batch size for task awaiting #951
- Testing object related performance fixes #949
Build Changes
- [ci] [daft publish] pin urllib to < 2 for conda #950
v0.1.2
Daft 0.1.2 Release Notes
The Daft 0.1.2 release features performance improvements, bugfixes and some of our first Daft logical types!
New Features
Extension Types for Ray Runner and Embedding Logical Type
Adds our first “Logical Type”: Embeddings!
An Embedding is a “Logical Type” that encompasses a Fixed Size List. It is common in applications for Machine Learning and AI.
See: #929
Enhancements
- Use PyArrow filesystem for tabular file reads #939
- [I/O] Port to pyarrow filesystems by default. #942
- Memoize ray.get for batch metadata lookup #937
- [I/O] Expose user-provided fsspec filesystem arg in read APIs. #931
- Introduce Logical Arrays and SeriesLike Trait #920
- [Extension Types] Add support for cross-lang extension types. #899
Bug Fixes
- fix concats for extension array for old versions of pyarrow #944
Build Changes
- [ci] enable pyrunner for 310 #946
- Add Pyarrow 6.0 in matrix for CI testing #945
- Update requirement of tabulate to >=0.9.0 #940
- unpin numpy for 3.7 to get dependabot to stop complaining #938
- Bump slackapi/slack-github-action from 1.23.0 to 1.24.0 #936
- Bump hypothesis from 6.75.2 to 6.75.3 #928
- Bump dask from 2023.4.1 to 2023.5.0 #927
- Bump serde from 1.0.162 to 1.0.163 #921