Skip to content

Releases: Eventual-Inc/Daft

v0.1.11

11 Aug 19:36
48b46b3

Choose a tag to compare

Changes

✨ New Features

  • [FEAT] [New Query Plan] Add support for Projection and Coalesce, enable many tests @clarkzinzow (#1256)
  • [FEAT] [New Query Planner] Add support for Concat. @clarkzinzow (#1254)
  • [FEAT] [New Query Planner] Add support for tabular writes. @clarkzinzow (#1252)
  • [FEAT] Multi-partition aggregate; Coalesce @xcharleslin (#1249)
  • [FEAT] [New Query Planner] Add support for Sort, Repartition, and Distinct in new query planner. @clarkzinzow (#1248)
  • [FEAT] Add Azure Support for Native Downloader @samster25 (#1250)
  • [FEAT] Locally unique semantic IDs for Expressions @xcharleslin (#1243)
  • [FEAT] Read parquet tables with int96 coercion option @jaychia (#1231)
  • [FEAT] [New Query Plan] Add support for CSV scans, JSON scans, in-memory scans and caching materialized results. @clarkzinzow (#1246)
  • [FEAT] Native Downloader add Retry Config parameters @samster25 (#1244)
  • [FEAT] (Single partition only) DataFrame.sum() via Rust planner @xcharleslin (#1230)
  • [FEAT] [New Query Planner] Logical --> physical translation, physical plan execution. @clarkzinzow (#1232)
  • [FEAT] native parquet correctness checks @samster25 (#1225)
  • [FEAT] add session token as input to io config @samster25 (#1224)

🚀 Performance Improvements

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

  • [CHORE] Update test to only use store_schema kwarg for pa>=11 @jaychia (#1253)
  • [FEAT] (Single partition only) DataFrame.sum() via Rust planner @xcharleslin (#1230)
  • [CHORE] [New Query Planner] Introduce LogicalPlanBuilder and QueryPlanner interfaces to hide query planner implementations. @clarkzinzow (#1245)
  • [CHORE] LogicalPlan: Add display improvements, and Filter @xcharleslin (#1221)
  • [CHORE] Add unit tests for int96 timestamps @jaychia (#1229)
  • [DOCS][CHORE] Add docs for IOConfig and S3Config @jaychia (#1227)
  • [CHORE] disable mac test for lack of docker @samster25 (#1223)
  • [CHORE] Begin integrating Rust Logical Plan with Dataframe API @xcharleslin (#1207)
  • [CHORE] integration tests for nightly platform wheels @samster25 (#1219)
  • [CHORE] Remove existing LogicalPlan from all execution concepts @xcharleslin (#1208)
  • [CHORE] Add endpoints to simulate rate-limiting on AWS S3 buckets @jaychia (#1220)
  • [CHORE] Add pytest marker for integration @jaychia (#1211)
  • [CHORE] Add s3 fixtures for retrying logic @jaychia (#1206)
  • [CHORE] Add developer flag to use Rust query planner @xcharleslin (#1205)
  • [CHORE] Rust Logical plan skeleton @xcharleslin (#1192)

⬆️ Dependencies

7 changes

v0.1.10

31 Jul 18:36
bc11e57

Choose a tag to compare

Changes

✨ New Features

  • [FEAT] Enable feature-flagged native downloader in daft.read_parquet @jaychia (#1190)
  • [FEAT] parquet reader refactor, add parquet_stats_reader and parquet_schema_reader (1/2) @samster25 (#1191)

🚀 Performance Improvements

🧰 Maintenance

⬆️ Dependencies

6 changes

v0.1.9

25 Jul 04:16
751b839

Choose a tag to compare

Changes

🏆 Highlights

  • [FEAT] [Tensor] Add support for Tensor and FixedShapeTensor types. @clarkzinzow (#1073)

✨ New Features

🚀 Performance Improvements

  • [PERF] Simple Read Planner and RangeReader for Native Parquet Reader @samster25 (#1172)

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

⬆️ Dependencies

10 changes

v0.1.8

05 Jul 18:53
447cb2f

Choose a tag to compare

Changes

✨ New Features

👾 Bug Fixes

  • [BUG] S3 Downloader set default region when region not detected @samster25 (#1100)

📖 Documentation

  • [CHORE] Update README.rst for image downloading @jaychia (#1109)
  • [DOCS] Update image tutorials with .image namespaced expressions @jaychia (#1110)

🧰 Maintenance

  • [CHORE] Tidy up typing of binary ops [1/2] @xcharleslin (#1114)
  • [CHORE] Pin Pydantic to < 2 @jaychia (#1115)
  • [CHORE] Remove rogue print statement @jaychia (#1112)
  • [CHORE] Install wheel together with requirements in release build @jaychia (#1111)
  • [CHORE] Update README.rst for image downloading @jaychia (#1109)
  • [CHORE] Adding more test fixtures for different I/O sources @jaychia (#1083)
  • [CHORE] Cache build artifacts in target folder @jaychia (#1104)
  • [CHORE] Fix CI caching to cache integration test builds separately @jaychia (#1101)
  • [CHORE] Use maturin directly instead of multiplatform build step @jaychia (#1099)

⬆️ Dependencies

v0.1.7

26 Jun 18:36
9063395

Choose a tag to compare

Changes

🏆 Highlights

  • [FEAT] Add DataFrame.to_torch_map_dataset and .to_torch_iter_dataset. @xcharleslin (#1086)
  • [PERF] Rust based url downloading with error handling @samster25 (#1061)

✨ New Features

  • [FEAT] Enable Native Downloader IO Config @samster25 (#1090)
  • [FEAT] Add DataFrame.to_torch_map_dataset and .to_torch_iter_dataset. @xcharleslin (#1086)
  • [FEAT] DataFrame.__iter__() and .iter_partitions() @xcharleslin (#1062)
  • [FEAT] New DataType: Duration (without arithmetic) @xcharleslin (#1051)
  • [FEAT] [Images] [9/N] Infer Image type for PIL images on ingress. @clarkzinzow (#1067)
  • [FEAT] Automatically cast logical types to Python objects on Series.to_pylist(). @clarkzinzow (#1063)
  • [FEAT] [Images] [8/N] Add encoding and resizing support for fixed-shape images. @clarkzinzow (#1052)
  • Dataframe Iter 1/n: Physical plan streams results into Runner. @xcharleslin (#1060)

🚀 Performance Improvements

👾 Bug Fixes

📖 Documentation

  • [FEAT] Add DataFrame.to_torch_map_dataset and .to_torch_iter_dataset. @xcharleslin (#1086)
  • [CHORE] Fix filepath for autogeneration of .list.join docs @jaychia (#1084)
  • In CI, limit tutorial to 500 rows @xcharleslin (#1076)
  • [DOCS] Embeddings tutorial: Temporarily remove full dataset @xcharleslin (#1039)
  • [DOCS] Remove release notes from documentation, link to Github instead @jaychia (#1049)

🧰 Maintenance

  • [CHORE] set dependabot schedule to weekly @samster25 (#1085)
  • [CHORE] Refactor integration test to use wheel built for release @jaychia (#1087)
  • [CHORE] unpin numpy version for py<3.8 @jaychia (#1088)
  • [CHORE] Fix filepath for autogeneration of .list.join docs @jaychia (#1084)
  • [CHORE] Crate Smash v1 @samster25 (#1080)
  • [CHORE] Scheduler cleanup: merge logical_op_runners.py into execution_step @xcharleslin (#1020)
  • [CHORE] Inline the label enforcer into the release drafter wf @jaychia (#1057)
  • [CHORE] Fix naming of "Release Drafter" workflow in trigger @jaychia (#1055)
  • [CHORE] Add new trigger to run PR label enforcement after Release Drafter @jaychia (#1054)
  • [CHORE][CI] Use pyarrow Table sort API that's compatible with older pyarrow versions @clarkzinzow (#1053)

⬆️ Dependencies

4 changes

v0.1.6

14 Jun 22:12
23f784f

Choose a tag to compare

Changes

🏆 Highlights

✨ New Features

  • [FEAT] Support for Timestamp datatype. @xcharleslin (#1032)
  • [FEAT] Thread user-provided schema through to DataFrame reads @jaychia (#1024)
  • [FEAT] Daft Image viz support. Remove Tabulate dependency. @xcharleslin (#1027)
  • [FEAT] Dataframe Concats @jaychia (#1023)
  • [FEAT] Add kernels for .list.join on a list[utf8] column @jaychia (#989)
  • [FEAT][Table-Read-Schema 2/3] Add table casting logic @jaychia (#1012)
  • [FEAT][Table-Read-Schema 1/3] Split reading tabular file formats into 2 method calls @jaychia (#1010)
  • [FEAT][Images] [7/N] Add image encoding support. @clarkzinzow (#1013)
  • [FEAT] Visualization cleanup 2/n: Add repr_html to Series, Table, and PyO3 @xcharleslin (#1018)
  • [FEAT] Visualization cleanup (1/n): Use Table for repr @xcharleslin (#1011)

📖 Documentation

🧰 Maintenance

  • [CHORE] Update cargo version to v0.1.6 @jaychia (#1047)
  • [CHORE] Add a GitHub action to enforce labels are added to the PR before merging @jaychia (#1045)
  • [CHORE] Fix CI TPCH data generation for old deprecated kwarg @jaychia (#1044)
  • [CHORE] Fix footer of release-drafter @jaychia (#1043)
  • [CHORE] Add release-drafter files @jaychia (#1042)
  • [CHORE][CI] Fix flakiness in Datasets integration tests. @clarkzinzow (#1017)

⬆️ Dependencies

7 changes

v0.1.5

14 Jun 02:30
a54c534

Choose a tag to compare

The Daft 0.1.5 release features better series exporting, bugfixes and improved documentation.

Enhancements

  • Enable Cast from Image to Python via Numpy #990

Bug Fixes

  • Fix Image Resize/Decode Expressions #1001

Build Changes

  • Python script for subprefixing s3 tpch files #997
  • Update pyo3-log from 0.8.1 to 0.8.2 #996
  • Update hypothesis from 6.75.9 to 6.76.0 #995

Documentation

  • Include Dataframe comparison and related projects in readme #1005
  • Include Benchmarks in Readme #1003
  • Add Red Pajamas Tutorial to docs #1002
  • Include Blog in docs #1000
  • Update Datatype docs for complex types #999

v0.1.4

14 Jun 22:34
f74f0cb

Choose a tag to compare

Daft 0.1.4 Release Notes

The Daft 0.1.4 release features our Image type columns!

New Features

Image Types

Our first Daft Image types have landed!

You can now construct an Image column with .image.decode() on a Binary column.

See PRs:

  1. [Images] [1/N] Logical type for variable-shaped and fixed-shaped images. #955
  2. [Images] [3/N] Add image decoding for uint8 images. #981
  3. Image Resize for ImageType #967
  4. [Images] [2/N] Add scaffolding for image decoding and other ops. #965

Documentation

  1. Fix modin typo; add partial scale numbers; highlight highlights #986
  2. Add Page on benchmarking #980
  3. Fix link for broken link checker #972
  4. Clean up DataType userguide/API docs #966
  5. Add more complex datatypes to docs #961
  6. Datatype docs #894

Bug Fixes

  1. [Scheduler] Fix join performance bug #985
  2. Table Slice and IntoPartitions Fix #962
  3. size_bytes fix: Guard against calculating variance of one item #957

Build Changes

  1. Bump hypothesis from 6.75.8 to 6.75.9 #979
  2. Bump orjson from 3.8.14 to 3.9.0 #978
  3. Bump hypothesis from 6.75.6 to 6.75.8 #976
  4. Add s3fs to dev requirements #975
  5. Bump python version to 3.9 for profiling and pin Dask version for python 3.8 #973
  6. Bump log from 0.4.17 to 0.4.18 #971
  7. Bump dask from 2023.5.0 to 2023.5.1 #970
  8. Bump pandas from 2.0.1 to 2.0.2 #969
  9. Bump hypothesis from 6.75.5 to 6.75.6 #968
  10. Bump orjson from 3.8.13 to 3.8.14 #964
  11. Bump hypothesis from 6.75.3 to 6.75.5 #963
  12. finer feature flags for arrow2 for faster compile #960
  13. Bump pytest-cov from 4.0.0 to 4.1.0 #959
  14. Bump orjson from 3.8.12 to 3.8.13 #958

v0.1.3

14 Jun 22:32
e0140ec

Choose a tag to compare

Daft 0.1.3 Release Notes

The Daft 0.1.3 release features fixes for a few performance regressions.

Enhancements

  1. Very basic s3 parquet microbenchmark #954

Bug Fixes

  1. [I/O] Change back to random access read for Parquet. #953
  2. [CI] Fix flaky Ray Datasets integration test. #952
  3. [Ray Runner] Unfixing batch size for task awaiting #951
  4. Testing object related performance fixes #949

Build Changes

  1. [ci] [daft publish] pin urllib to < 2 for conda #950

v0.1.2

14 Jun 22:26
05cd7a3

Choose a tag to compare

Daft 0.1.2 Release Notes

The Daft 0.1.2 release features performance improvements, bugfixes and some of our first Daft logical types!

New Features

Extension Types for Ray Runner and Embedding Logical Type

Adds our first “Logical Type”: Embeddings!

An Embedding is a “Logical Type” that encompasses a Fixed Size List. It is common in applications for Machine Learning and AI.

See: #929

Enhancements

  1. Use PyArrow filesystem for tabular file reads #939
  2. [I/O] Port to pyarrow filesystems by default. #942
  3. Memoize ray.get for batch metadata lookup #937
  4. [I/O] Expose user-provided fsspec filesystem arg in read APIs. #931
  5. Introduce Logical Arrays and SeriesLike Trait #920
  6. [Extension Types] Add support for cross-lang extension types. #899

Bug Fixes

  1. fix concats for extension array for old versions of pyarrow #944

Build Changes

  1. [ci] enable pyrunner for 310 #946
  2. Add Pyarrow 6.0 in matrix for CI testing #945
  3. Update requirement of tabulate to >=0.9.0 #940
  4. unpin numpy for 3.7 to get dependabot to stop complaining #938
  5. Bump slackapi/slack-github-action from 1.23.0 to 1.24.0 #936
  6. Bump hypothesis from 6.75.2 to 6.75.3 #928
  7. Bump dask from 2023.4.1 to 2023.5.0 #927
  8. Bump serde from 1.0.162 to 1.0.163 #921

Documentation

  1. Add comment to explain future annotations isort rule in dataframe.py #947
  2. [Embedding tutorial] Suggest running on GPU cluster #932
  3. Embeddings tutorial #930