Releases · pola-rs/polars

30 Oct 12:13

github-actions

py-1.35.1

a99ad34

Python Polars 1.35.1 Latest

Latest

🚀 Performance improvements

Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
Skip filtering scan IR if no paths were filtered (#25037)
Optimize ipc stream read performance (#24671)

✨ Enhancements

Support BYTE_ARRAY backed Decimals in Parquet (#25076)
Allow glimpse to return a DataFrame (#24803)
Add allow_empty flag to item (#25048)

🐞 Bug fixes

The SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
Fix panic if scan predicate produces 0 length mask (#25089)
Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
Panic in group_by_dynamic with group_by and multiple chunks (#25075)
Minor improvement to internal is_pycapsule utility function (#25073)
Fix panic when using struct field as join key (#25059)
Allow broadcast in group_by for ApplyExpr and BinaryExpr (#25053)
Fix field metadata for nested categorical PyCapsule export (#25052)
Block predicate pushdown when group_by key values are changed (#25032)
Group-By aggregation problems caused by AmortSeries (#25043)
Don't push down predicates passed inserted cache nodes (#25042)
Allow for negative time in group_by_dynamic iterator (#25041)

📖 Documentation

Fix typo in public dataset URL (#25044)

🛠️ Other improvements

Disable recursive CSPE for now (#25085)
Change group length mismatch error to ShapeError (#25004)
Update toolchain (#25007)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @Liyixin95, @alexander-beedie, @coastalwhite, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Contributors

orlp, alexander-beedie, and 8 other contributors

Assets 2

26 Oct 20:05

github-actions

py-1.35.0

e9fce55

Python Polars 1.35.0

🏆 Highlights

Stabilize decimal (#25020)

🚀 Performance improvements

Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
Lower unique to native group-by and speed up n_unique in group-by context (#24976)
Better parallelize take{_slice,}_unchecked (#24980)
Implement native skew and kurtosis in group-by context (#24961)
Use native group-by aggregations for bitwise_* operations (#24935)
Address group_by_dynamic slowness in sparse data (#24916)
Push filters to PyIceberg (#24910)
Native filter/drop_nulls/drop_nans in group-by context (#24897)
Implement cumulative_eval using the group-by engine (#24889)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Implement native null_count, any and all group-by aggregations (#24859)
Speed up reverse in group-by context (#24855)
Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
Don't check duplicates on streaming simple projection in release mode (#24830)
Lower approx_n_unique to the streaming engine (#24821)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
Implement indexed method for BitMapIter::nth (#24766)
Pushdown slices on plans within unions (#24735)

✨ Enhancements

Stabilize decimal (#25020)
Support ewm_mean() in streaming engine (#25003)
Improve row-count estimates (#24996)
Remove filtered scan paths in IR when possible (#24974)
Introduce remote Polars MCP server (#24977)
Allow local scans on polars cloud (configurable) (#24962)
Add Expr.item to strictly extract a single value from an expression (#24888)
Add environment variable to roundtrip empty struct in Parquet (#24914)
Fast-count for scan_iceberg().select(len()) (#24602)
Add glob parameter to scan_ipc (#24898)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Add list.agg and arr.agg (#24790)
Implement {Expr,Series}.rolling_rank() (#24776)
Don't require PyArrow for read_database_uri if ADBC engine version supports PyCapsule interface (#24029)
Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
Support MergeSorted in CSPE (#24805)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Recursively apply CSPE (#24798)
Add streaming engine per-node metrics (#24788)
Add arr.eval (#24472)
Drop PyArrow requirement for non-batched usage of read_database with the ADBC engine and support iter_batches with the ADBC engine (#24180)
Improve rolling_(sum|mean) accuracy (#24743)
Add separator to {Data,Lazy}Frame.unnest (#24716)
Add union() function for unordered concatenation (#24298)
Add name.replace to the set of column rename options (#17942)
Support np.ndarray -> AnyValue conversion (#24748)
Allow duration strings with leading "+" (#24737)
Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
Add support for UInt128 to pyo3-polars (#24731)

🐞 Bug fixes

Re-enable CPU feature check before import (#25010)
Implement read_excel workaround for fastexcel/calamine issue loading a column subset from a named table (#25012)
Correctness any(ignore_nulls) and OOB in all (#25005)
Streaming any/all with ignore_nulls=False (#25008)
Fix incorrect join_asof on a casted expression (#25006)
Optimize memory on rolling groups in ApplyExpr (#24709)
Fallback Pyarrow scan to in-memory engine (#24991)
Make Operator::swap_operands return correct operators for Plus, Minus, Multiply and Divide (#24997)
Capitalize letters after numbers in to_titlecase (#24993)
Preserve null values in pct_change (#24952)
Raise length mismatch on over with sliced groups (#24887)
Check duplicate name in transpose (#24956)
Follow Kleene logic in any / all for group-by (#24940)
Do not optimize cross join to iejoin if order maintaining (#24950)
Fix typing of scan_parquet partially unknown (#24928)
Properly release the GIL for read_parquet_metadata (#24922)
Broadcast partition_by columns in over expression (#24874)
Clear index cache on stacked df.filter expressions (#24870)
Fix 'explode' mapping strategy on scalar value (#24861)
Fix repeated with_row_index() after scan() silently ignored (#24866)
Correctly return min and max for enums in groupby aggregation (#24808)
Refactor BinaryExpr in group_by dispatch logic (#24548)
Fix aggstate for gather (#24857)
Keep scalars for length preserving functions in group_by (#24819)
Have range feature depend on dtype-array feature (#24853)
Fix duplicate select panic (#24836)
Inconsistency of list.sum() result type with None values (#24476)
Division by zero in Expr.dt.truncate (#24832)
Potential deadlock in __arrow_c_stream__ (#24831)
Allow double aggregations in group-by contexts (#24823)
Series.shrink_dtype for i128/u128 (#24833)
Fix dtype in EvalExpr (#24650)
Allow aggregations on AggState::LiteralScalar (#24820)
Dispatch to group_aware for fallible expressions with masked out elements (#24815)
Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
Fix regression on write_database() to Snowflake due to unsupported string view type (#24622)
Fix XOR did not follow kleene when one side is unit-length (#24810)
Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
Incorrect precision in Series.str.to_decimal (#24804)
Use overlapping instead of rolling (#24787)
Fix iterable on dynamic_group_by and rolling object (#24740)
Use Kahan summation for in-memory groupby sum/mean (#24774)
Release GIL in PythonScan predicate evaluation (#24779)
Type error in bitmask::nth_set_bit_u64 (#24775)
Add Expr.sign for Decimal datatype (#24717)
Correct str.replace with missing pattern (#24768)
Ensure schema_overrides is respected when loading iterable row data (#24721)
Support decimal_comma on Decimal type in write_csv (#24718)

📖 Documentation

Introduce remote Polars MCP server (#24977)
Add {arr,list}.agg API references (#24970)
Support LLM in docs (#24958)
Update Cloud docs with correct fn argument order (#24939)
Update name.replace examples (#24941)
Add i128 and u128 features to user guide (#24938)
Add partitioning examples for sink_* methods (#24918)
Add more {unique,value}_counts examples (#24927)
Indent the versionchanged (#24783)
Relax fsspec wording (#24881)
Add pl.field into the api docs (#24846)
Fix duplicated article in SECURITY.md (#24762)
Document output name determination in when/then/otherwise (#24746)
Specify that precision=None becomes 38 for Decimal (#24742)
Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
Fix source mapping (#24736)

📦 Build system

Ensure build_feature_flags.py is included in artifact (#25024)
Update pyo3 and numpy crates to version 0.26 (#24760)

🛠️ Other improvements

Fix benchmark ci (#25019)
Fix non-deterministic test (#25009)
Fix makefile arch detection (#25011)
Make LazyFrame.set_sorted into a FunctionIR::Hint (#24981)
Remove symbolic links (#24982)
Deprecate Expr.agg_groups() and pl.groups() (#24919)
Dispatch to no-op rayon thread-pool from streaming (#24957)
Unpin pydantic (#24955)
Ensure safety of scan fast-count IR lowering in streaming (#24953)
Re-use iterators in set_ operations (#24850)
Remove GroupByPartitioned and dispatch to streaming engine (#24903)
Turn element() into {A,}Expr::Element (#24885)
Pass ScanOptions to new_from_ipc (#24893)
Update tests to be index type agnostic (#24891)
Unset Context in Window expression (#24875)
Fix failing delta test (#24867)
Move FunctionExpr dispatch from plan to expr (#24839)
Fix SQL test giving wrong error message (#24835)
Consolidate dtype paths in ApplyExpr (#24825)
Add days_in_month to documentation (#24822)
Enable ruff D417 lint (#24814)
Turn pl.format into proper elementwise expression (#24811)
Fix remote benchmark by no-longer saving builds (#24812)
Refactor ApplyExpr in group_by context on multiple inputs (#24520)
IR text plan graph generator (#24733)
Temporarily pin pydantic to fix CI (#24797)
Extend and rename rolling groups to overlapping (#24577)
Refactor DataType proptest strategies (#24763)
Add union to documentation (#24769)

Thank you to all our contributors for making this release possible!
@EndPositive, @EnricoMi, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @mjanssen, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @thomasjpfan and @williambdean

Contributors

orlp, dsprenkels, and 25 other contributors

Assets 2

19 Oct 15:18

github-actions

py-1.35.0-beta.1

a6fa669

Python Polars 1.35.0-beta.1 Pre-release

Pre-release

🚀 Performance improvements

Address group_by_dynamic slowness in sparse data (#24916)
Push filters to PyIceberg (#24910)
Native filter/drop_nulls/drop_nans in group-by context (#24897)
Implement cumulative_eval using the group-by engine (#24889)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Implement native null_count, any and all group-by aggregations (#24859)
Speed up reverse in group-by context (#24855)
Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
Don't check duplicates on streaming simple projection in release mode (#24830)
Lower approx_n_unique to the streaming engine (#24821)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
Implement indexed method for BitMapIter::nth (#24766)
Pushdown slices on plans within unions (#24735)

✨ Enhancements

Add environment variable to roundtrip empty struct in Parquet (#24914)
Fast-count for scan_iceberg().select(len()) (#24602)
Add glob parameter to scan_ipc (#24898)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Add list.agg and arr.agg (#24790)
Implement {Expr,Series}.rolling_rank() (#24776)
Don't require PyArrow for read_database_uri if ADBC engine version supports PyCapsule interface (#24029)
Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
Support MergeSorted in CSPE (#24805)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Recursively apply CSPE (#24798)
Add streaming engine per-node metrics (#24788)
Add arr.eval (#24472)
Drop PyArrow requirement for non-batched usage of read_database with the ADBC engine and support iter_batches with the ADBC engine (#24180)
Improve rolling_(sum|mean) accuracy (#24743)
Add separator to {Data,Lazy}Frame.unnest (#24716)
Add union() function for unordered concatenation (#24298)
Add name.replace to the set of column rename options (#17942)
Support np.ndarray -> AnyValue conversion (#24748)
Allow duration strings with leading "+" (#24737)
Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
Add support for UInt128 to pyo3-polars (#24731)

🐞 Bug fixes

Properly release the GIL for read_parquet_metadata (#24922)
Broadcast partition_by columns in over expression (#24874)
Clear index cache on stacked df.filter expressions (#24870)
Fix 'explode' mapping strategy on scalar value (#24861)
Fix repeated with_row_index() after scan() silently ignored (#24866)
Correctly return min and max for enums in groupby aggregation (#24808)
Refactor BinaryExpr in group_by dispatch logic (#24548)
Fix aggstate for gather (#24857)
Keep scalars for length preserving functions in group_by (#24819)
Have range feature depend on dtype-array feature (#24853)
Fix duplicate select panic (#24836)
Inconsistency of list.sum() result type with None values (#24476)
Division by zero in Expr.dt.truncate (#24832)
Potential deadlock in __arrow_c_stream__ (#24831)
Allow double aggregations in group-by contexts (#24823)
Series.shrink_dtype for i128/u128 (#24833)
Fix dtype in EvalExpr (#24650)
Allow aggregations on AggState::LiteralScalar (#24820)
Dispatch to group_aware for fallible expressions with masked out elements (#24815)
Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
Fix regression on write_database() to Snowflake due to unsupported string view type (#24622)
Fix XOR did not follow kleene when one side is unit-length (#24810)
Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
Incorrect precision in Series.str.to_decimal (#24804)
Use overlapping instead of rolling (#24787)
Fix iterable on dynamic_group_by and rolling object (#24740)
Use Kahan summation for in-memory groupby sum/mean (#24774)
Release GIL in PythonScan predicate evaluation (#24779)
Type error in bitmask::nth_set_bit_u64 (#24775)
Add Expr.sign for Decimal datatype (#24717)
Correct str.replace with missing pattern (#24768)
Ensure schema_overrides is respected when loading iterable row data (#24721)
Support decimal_comma on Decimal type in write_csv (#24718)

📖 Documentation

Add partitioning examples for sink_* methods (#24918)
Add more {unique,value}_counts examples (#24927)
Indent the versionchanged (#24783)
Relax fsspec wording (#24881)
Add pl.field into the api docs (#24846)
Fix duplicated article in SECURITY.md (#24762)
Document output name determination in when/then/otherwise (#24746)
Specify that precision=None becomes 38 for Decimal (#24742)
Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
Fix source mapping (#24736)

📦 Build system

Update pyo3 and numpy crates to version 0.26 (#24760)

🛠️ Other improvements

Re-use iterators in set_ operations (#24850)
Remove GroupByPartitioned and dispatch to streaming engine (#24903)
Turn element() into {A,}Expr::Element (#24885)
Pass ScanOptions to new_from_ipc (#24893)
Update tests to be index type agnostic (#24891)
Unset Context in Window expression (#24875)
Fix failing delta test (#24867)
Move FunctionExpr dispatch from plan to expr (#24839)
Fix SQL test giving wrong error message (#24835)
Consolidate dtype paths in ApplyExpr (#24825)
Add days_in_month to documentation (#24822)
Enable ruff D417 lint (#24814)
Turn pl.format into proper elementwise expression (#24811)
Fix remote benchmark by no-longer saving builds (#24812)
Refactor ApplyExpr in group_by context on multiple inputs (#24520)
IR text plan graph generator (#24733)
Temporarily pin pydantic to fix CI (#24797)
Extend and rename rolling groups to overlapping (#24577)
Refactor DataType proptest strategies (#24763)
Add union to documentation (#24769)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @nameexhaustion, @orlp, @pavelzw, @ritchie46, @thomasjpfan and @williambdean

Contributors

orlp, dsprenkels, and 20 other contributors

Assets 2

02 Oct 18:31

github-actions

py-1.34.0

150a9ed

Python Polars 1.34.0

🏆 Highlights

Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

Optimize gather_every(n=1) to slice (#24704)
Lower null count to streaming engine (#24703)
Native streaming gather_every (#24700)
Pushdown filter with strptime if input is literal (#24694)
Avoid copying expanded paths (#24669)
Relax filter expr ordering (#24662)
Remove unnecessary groups call in aggregated (#24651)
Skip files in scan_iceberg with filter based on metadata statistics (#24547)
Push row_index predicate for all scan types (#24537)
Perform integer in-filtering for Parquet inequality predicates (#24525)
Stop caching Parquet metadata after 8 files (#24513)
Native streaming .mode() expression (#24459)

✨ Enhancements

Implement maintain_order for cross join (#24665)
Add support to output dt.total_{}() duration values as fractionals (#24598)
Avoid forcing a pyarrow dependency in read_excel when using the default "calamine" engine (#24655)
Support scanning from file:/path URIs (#24603)
Log which file the schema was sourced from, and which file caused an extra column error (#24621)
Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)
Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
Use fixed-scale Decimals (#24542)
Add support for unsigned 128-bit integers (#24346)
Add unstable pl.Config.set_default_credential_provider (#24434)
Roundtrip BinaryOffset type through Parquet (#24344)
Add opt-in unstable functionality to load interval types as Struct (#24320)
Support reading parquet metadata from cloud storage (#24443)
Add user guide section on AWS role assumption (#24421)
Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

Removing dots after noqa comments (#24722)
Parse Decimal with comma as decimal separator in CSV (#24685)
Make Categories pickleable (#24691)
Shift on array within list (#24678)
Fix handling of AggregatedScalar in ApplyExpr single input (#24634)
Support reading of mixed compressed/uncompressed IPC buffers (#24674)
Overflow in slice-slice optimization (#24658)
Package discovery for setuptools (#24656)
Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction (#24590)
Remove inclusion of polars dir in runtime sdist/wheel (#24654)
Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous (#24647)
Widen from_dicts to Iterable[Mapping[str, Any]] (#24584)
Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
Raise Exception instead of panic when unnest on non-struct column (#24471)
Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
Newline escaping in streaming show_graph (#24612)
Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
Sink batches early stop on in-memory engine (#24585)
More precisely model expression ordering requirements (#24437)
Panic in zero-weight rolling mean/var (#24596)
Decimal <-> literal arithmetic supertype rules (#24594)
Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
Validate list type for list expressions in planner (#24589)
Fix scan_iceberg() storage options not taking effect (#24574)
Have log() prioritize the leftmost dtype for its output dtype (#24581)
CSV pl.len() was incorrect (#24587)
Add support for float inputs for duration types (#24529)
Roundtrip empty string through hive partitioning (#24546)
Fix potential OOB writes in unaligned IPC read (#24550)
Fix regression error when scanning AWS presigned URL (#24530)
Make PlPath::join for cloud paths replace on absolute paths (#24514)
Correct dtype for cum_agg in streaming engine (#24510)
Restore support for np.datetime64() in pl.lit() (#24527)
Ignore Iceberg list element ID if missing (#24479)
Fix panic on streaming full join with coalesce (#23409)
Fix AggState on all_literal in BinaryExpr (#24461)
Show IR sort options in explain (#24465)
Benchmark CI import (#24463)
Fix schema on ApplyExpr with single row literal in agg context (#24422)
Fix planner schema for dividing pl.Float32 by int (#24432)
Fix panic scanning from AWS legacy global endpoint URL (#24450)
Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
Do not propagate struct of nulls with null (#24420)
Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

Add default parquet compression levels (#24686)
Fix syntax error in data-types-and-structures.md (#24606)
Rename avg_birthday -> avg_age in examples aggregation (#23726)
Update Polars Cloud user guide (#24366)
Fix typo in set_expr_depth_warning docstring (#24427)

📦 Build system

Python pre-release 1.34.0b5 (#24699)
Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

Removing dots after noqa comments (#24722)
Make test_multiple_sorting_columns test runnable (#24719)
Remove {Upper,Lower}Bound expressions in IR (#24701)
Fix Makefile uv pip option syntax (#24711)
Add egg-info to gitignore (#24712)
Restructure python project directories again (#24676)
Use IR for polars-expr output field resolution (#24661)
Remove dist/ from release python workflow (#24639)
Escape sed ampersand in release script (#24631)
Remove PyOdide from release for now (#24630)
Fix sed in-place in release script (#24628)
Release script pyodide wheel (#24627)
Release script pyodide wheel (#24626)
Update release script for runtimes (#24610)
Remove unused UnknownKind::Ufunc (#24614)
Use cargo-run to call dsl-schema script (#24607)
Cleanup and prepare to_field for element and struct field context (#24592)
Resolve nightly clippy hints (#24593)
Rename pl.dependencies to pl._dependencies (#24595)
More release scripting (#24582)
Again a minor fix for the setup script (#24580)
Minor fix in release script (#24579)
Correct release python beta version check (#24578)
Python dependency failure (#24576)
Always install yq (#24570)
Deterministic import order for Python Polars package variants (#24531)
Check Arrow FFI pointers with an assert (#24564)
Add a couple of missing type definitions in python (#24561)
Fix quickstart example in Polars Cloud user guide (#24554)
Add implementations for loading min/max statistics for Iceberg (#24496)
Update versions (#24508)
Add additional unit tests for pl.concat (#24487)
Refactor parametric tests for as_struct on aggstates (#24493)
Use PlanCallback in name.map_* (#24484)
Pin xlsvwriter to 3.2.5 or before (#24485)
Add dataclass to hold resolved iceberg scan data (#24418)
Fix iceberg test failure in CI (#24456)
Move CompressionUtils to polars-utils (#24430)
Update github template to dispatch to cloud client (#24416)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @alonsosilvaallende, @andreseje, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @eitsupi, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @moizescbf, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Contributors

orlp, dsprenkels, and 27 other contributors

Assets 2

01 Oct 16:25

github-actions

py-1.34.0-beta.5

64eaeff

Python Polars 1.34.0-beta.5 Pre-release

Pre-release

🏆 Highlights

Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

Pushdown filter with strptime if input is literal (#24694)
Avoid copying expanded paths (#24669)
Relax filter expr ordering (#24662)
Remove unnecessary groups call in aggregated (#24651)
Skip files in scan_iceberg with filter based on metadata statistics (#24547)
Push row_index predicate for all scan types (#24537)
Perform integer in-filtering for Parquet inequality predicates (#24525)
Stop caching Parquet metadata after 8 files (#24513)
Native streaming .mode() expression (#24459)

✨ Enhancements

Implement maintain_order for cross join (#24665)
Add support to output dt.total_{}() duration values as fractionals (#24598)
Avoid forcing a pyarrow dependency in read_excel when using the default "calamine" engine (#24655)
Support scanning from file:/path URIs (#24603)
Log which file the schema was sourced from, and which file caused an extra column error (#24621)
Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)
Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
Use fixed-scale Decimals (#24542)
Add support for unsigned 128-bit integers (#24346)
Add unstable pl.Config.set_default_credential_provider (#24434)
Roundtrip BinaryOffset type through Parquet (#24344)
Add opt-in unstable functionality to load interval types as Struct (#24320)
Support reading parquet metadata from cloud storage (#24443)
Add user guide section on AWS role assumption (#24421)
Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

Make Categories pickleable (#24691)
Shift on array within list (#24678)
Fix handling of AggregatedScalar in ApplyExpr single input (#24634)
Support reading of mixed compressed/uncompressed IPC buffers (#24674)
Overflow in slice-slice optimization (#24658)
Package discovery for setuptools (#24656)
Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction (#24590)
Remove inclusion of polars dir in runtime sdist/wheel (#24654)
Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous (#24647)
Widen from_dicts to Iterable[Mapping[str, Any]] (#24584)
Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
Raise Exception instead of panic when unnest on non-struct column (#24471)
Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
Newline escaping in streaming show_graph (#24612)
Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
Sink batches early stop on in-memory engine (#24585)
More precisely model expression ordering requirements (#24437)
Panic in zero-weight rolling mean/var (#24596)
Decimal <-> literal arithmetic supertype rules (#24594)
Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
Validate list type for list expressions in planner (#24589)
Fix scan_iceberg() storage options not taking effect (#24574)
Have log() prioritize the leftmost dtype for its output dtype (#24581)
CSV pl.len() was incorrect (#24587)
Add support for float inputs for duration types (#24529)
Roundtrip empty string through hive partitioning (#24546)
Fix potential OOB writes in unaligned IPC read (#24550)
Fix regression error when scanning AWS presigned URL (#24530)
Make PlPath::join for cloud paths replace on absolute paths (#24514)
Correct dtype for cum_agg in streaming engine (#24510)
Restore support for np.datetime64() in pl.lit() (#24527)
Ignore Iceberg list element ID if missing (#24479)
Fix panic on streaming full join with coalesce (#23409)
Fix AggState on all_literal in BinaryExpr (#24461)
Show IR sort options in explain (#24465)
Benchmark CI import (#24463)
Fix schema on ApplyExpr with single row literal in agg context (#24422)
Fix planner schema for dividing pl.Float32 by int (#24432)
Fix panic scanning from AWS legacy global endpoint URL (#24450)
Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
Do not propagate struct of nulls with null (#24420)
Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

Add default parquet compression levels (#24686)
Fix syntax error in data-types-and-structures.md (#24606)
Rename avg_birthday -> avg_age in examples aggregation (#23726)
Update Polars Cloud user guide (#24366)
Fix typo in set_expr_depth_warning docstring (#24427)

📦 Build system

Python pre-release 1.34.0b5 (#24699)
Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

Restructure python project directories again (#24676)
Use IR for polars-expr output field resolution (#24661)
Remove dist/ from release python workflow (#24639)
Escape sed ampersand in release script (#24631)
Remove PyOdide from release for now (#24630)
Fix sed in-place in release script (#24628)
Release script pyodide wheel (#24627)
Release script pyodide wheel (#24626)
Update release script for runtimes (#24610)
Remove unused UnknownKind::Ufunc (#24614)
Use cargo-run to call dsl-schema script (#24607)
Cleanup and prepare to_field for element and struct field context (#24592)
Resolve nightly clippy hints (#24593)
Rename pl.dependencies to pl._dependencies (#24595)
More release scripting (#24582)
Again a minor fix for the setup script (#24580)
Minor fix in release script (#24579)
Correct release python beta version check (#24578)
Python dependency failure (#24576)
Always install yq (#24570)
Deterministic import order for Python Polars package variants (#24531)
Check Arrow FFI pointers with an assert (#24564)
Add a couple of missing type definitions in python (#24561)
Fix quickstart example in Polars Cloud user guide (#24554)
Add implementations for loading min/max statistics for Iceberg (#24496)
Update versions (#24508)
Add additional unit tests for pl.concat (#24487)
Refactor parametric tests for as_struct on aggstates (#24493)
Use PlanCallback in name.map_* (#24484)
Pin xlsvwriter to 3.2.5 or before (#24485)
Add dataclass to hold resolved iceberg scan data (#24418)
Fix iceberg test failure in CI (#24456)
Move CompressionUtils to polars-utils (#24430)
Update github template to dispatch to cloud client (#24416)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @eitsupi, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @moizescbf, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Contributors

orlp, dsprenkels, and 26 other contributors

Assets 2

28 Sep 12:46

github-actions

py-1.34.0-beta.4

c65a422

Python Polars 1.34.0-beta.4 Pre-release

Pre-release

🏆 Highlights

Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

Skip files in scan_iceberg with filter based on metadata statistics (#24547)
Push row_index predicate for all scan types (#24537)
Perform integer in-filtering for Parquet inequality predicates (#24525)
Stop caching Parquet metadata after 8 files (#24513)
Native streaming .mode() expression (#24459)

✨ Enhancements

Support scanning from file:/path URIs (#24603)
Log which file the schema was sourced from, and which file caused an extra column error (#24621)
Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)
Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
Use fixed-scale Decimals (#24542)
Add support for unsigned 128-bit integers (#24346)
Add unstable pl.Config.set_default_credential_provider (#24434)
Roundtrip BinaryOffset type through Parquet (#24344)
Add opt-in unstable functionality to load interval types as Struct (#24320)
Support reading parquet metadata from cloud storage (#24443)
Add user guide section on AWS role assumption (#24421)
Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

Widen from_dicts to Iterable[Mapping[str, Any]] (#24584)
Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
Raise Exception instead of panic when unnest on non-struct column (#24471)
Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
Newline escaping in streaming show_graph (#24612)
Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
Sink batches early stop on in-memory engine (#24585)
More precisely model expression ordering requirements (#24437)
Panic in zero-weight rolling mean/var (#24596)
Decimal <-> literal arithmetic supertype rules (#24594)
Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
Validate list type for list expressions in planner (#24589)
Fix scan_iceberg() storage options not taking effect (#24574)
Have log() prioritize the leftmost dtype for its output dtype (#24581)
CSV pl.len() was incorrect (#24587)
Add support for float inputs for duration types (#24529)
Roundtrip empty string through hive partitioning (#24546)
Fix potential OOB writes in unaligned IPC read (#24550)
Fix regression error when scanning AWS presigned URL (#24530)
Make PlPath::join for cloud paths replace on absolute paths (#24514)
Correct dtype for cum_agg in streaming engine (#24510)
Restore support for np.datetime64() in pl.lit() (#24527)
Ignore Iceberg list element ID if missing (#24479)
Fix panic on streaming full join with coalesce (#23409)
Fix AggState on all_literal in BinaryExpr (#24461)
Show IR sort options in explain (#24465)
Benchmark CI import (#24463)
Fix schema on ApplyExpr with single row literal in agg context (#24422)
Fix planner schema for dividing pl.Float32 by int (#24432)
Fix panic scanning from AWS legacy global endpoint URL (#24450)
Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
Do not propagate struct of nulls with null (#24420)
Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

Fix syntax error in data-types-and-structures.md (#24606)
Rename avg_birthday -> avg_age in examples aggregation (#23726)
Update Polars Cloud user guide (#24366)
Fix typo in set_expr_depth_warning docstring (#24427)

📦 Build system

Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

Remove dist/ from release python workflow (#24639)
Escape sed ampersand in release script (#24631)
Remove PyOdide from release for now (#24630)
Fix sed in-place in release script (#24628)
Release script pyodide wheel (#24627)
Release script pyodide wheel (#24626)
Update release script for runtimes (#24610)
Remove unused UnknownKind::Ufunc (#24614)
Use cargo-run to call dsl-schema script (#24607)
Cleanup and prepare to_field for element and struct field context (#24592)
Resolve nightly clippy hints (#24593)
Rename pl.dependencies to pl._dependencies (#24595)
More release scripting (#24582)
Again a minor fix for the setup script (#24580)
Minor fix in release script (#24579)
Correct release python beta version check (#24578)
Python dependency failure (#24576)
Always install yq (#24570)
Deterministic import order for Python Polars package variants (#24531)
Check Arrow FFI pointers with an assert (#24564)
Add a couple of missing type definitions in python (#24561)
Fix quickstart example in Polars Cloud user guide (#24554)
Add implementations for loading min/max statistics for Iceberg (#24496)
Update versions (#24508)
Add additional unit tests for pl.concat (#24487)
Refactor parametric tests for as_struct on aggstates (#24493)
Use PlanCallback in name.map_* (#24484)
Pin xlsvwriter to 3.2.5 or before (#24485)
Add dataclass to hold resolved iceberg scan data (#24418)
Fix iceberg test failure in CI (#24456)
Move CompressionUtils to polars-utils (#24430)
Update github template to dispatch to cloud client (#24416)

Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @moizescbf, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Contributors

orlp, dsprenkels, and 21 other contributors

Assets 2

27 Sep 13:14

github-actions

py-1.34.0-beta.3

d0914d4

Python Polars 1.34.0-beta.3 Pre-release

Pre-release

🏆 Highlights

Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

Skip files in scan_iceberg with filter based on metadata statistics (#24547)
Push row_index predicate for all scan types (#24537)
Perform integer in-filtering for Parquet inequality predicates (#24525)
Stop caching Parquet metadata after 8 files (#24513)
Native streaming .mode() expression (#24459)

✨ Enhancements

Support scanning from file:/path URIs (#24603)
Log which file the schema was sourced from, and which file caused an extra column error (#24621)
Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)
Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
Use fixed-scale Decimals (#24542)
Add support for unsigned 128-bit integers (#24346)
Add unstable pl.Config.set_default_credential_provider (#24434)
Roundtrip BinaryOffset type through Parquet (#24344)
Add opt-in unstable functionality to load interval types as Struct (#24320)
Support reading parquet metadata from cloud storage (#24443)
Add user guide section on AWS role assumption (#24421)
Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

Widen from_dicts to Iterable[Mapping[str, Any]] (#24584)
Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
Raise Exception instead of panic when unnest on non-struct column (#24471)
Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
Newline escaping in streaming show_graph (#24612)
Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
Sink batches early stop on in-memory engine (#24585)
More precisely model expression ordering requirements (#24437)
Panic in zero-weight rolling mean/var (#24596)
Decimal <-> literal arithmetic supertype rules (#24594)
Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
Validate list type for list expressions in planner (#24589)
Fix scan_iceberg() storage options not taking effect (#24574)
Have log() prioritize the leftmost dtype for its output dtype (#24581)
CSV pl.len() was incorrect (#24587)
Add support for float inputs for duration types (#24529)
Roundtrip empty string through hive partitioning (#24546)
Fix potential OOB writes in unaligned IPC read (#24550)
Fix regression error when scanning AWS presigned URL (#24530)
Make PlPath::join for cloud paths replace on absolute paths (#24514)
Correct dtype for cum_agg in streaming engine (#24510)
Restore support for np.datetime64() in pl.lit() (#24527)
Ignore Iceberg list element ID if missing (#24479)
Fix panic on streaming full join with coalesce (#23409)
Fix AggState on all_literal in BinaryExpr (#24461)
Show IR sort options in explain (#24465)
Benchmark CI import (#24463)
Fix schema on ApplyExpr with single row literal in agg context (#24422)
Fix planner schema for dividing pl.Float32 by int (#24432)
Fix panic scanning from AWS legacy global endpoint URL (#24450)
Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
Do not propagate struct of nulls with null (#24420)
Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

Fix syntax error in data-types-and-structures.md (#24606)
Rename avg_birthday -> avg_age in examples aggregation (#23726)
Update Polars Cloud user guide (#24366)
Fix typo in set_expr_depth_warning docstring (#24427)

📦 Build system

Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

Remove dist/ from release python workflow (#24639)
Escape sed ampersand in release script (#24631)
Remove PyOdide from release for now (#24630)
Fix sed in-place in release script (#24628)
Release script pyodide wheel (#24627)
Release script pyodide wheel (#24626)
Update release script for runtimes (#24610)
Remove unused UnknownKind::Ufunc (#24614)
Use cargo-run to call dsl-schema script (#24607)
Cleanup and prepare to_field for element and struct field context (#24592)
Resolve nightly clippy hints (#24593)
Rename pl.dependencies to pl._dependencies (#24595)
More release scripting (#24582)
Again a minor fix for the setup script (#24580)
Minor fix in release script (#24579)
Correct release python beta version check (#24578)
Python dependency failure (#24576)
Always install yq (#24570)
Deterministic import order for Python Polars package variants (#24531)
Check Arrow FFI pointers with an assert (#24564)
Add a couple of missing type definitions in python (#24561)
Fix quickstart example in Polars Cloud user guide (#24554)
Add implementations for loading min/max statistics for Iceberg (#24496)
Update versions (#24508)
Add additional unit tests for pl.concat (#24487)
Refactor parametric tests for as_struct on aggstates (#24493)
Use PlanCallback in name.map_* (#24484)
Pin xlsvwriter to 3.2.5 or before (#24485)
Add dataclass to hold resolved iceberg scan data (#24418)
Fix iceberg test failure in CI (#24456)
Move CompressionUtils to polars-utils (#24430)
Update github template to dispatch to cloud client (#24416)

Contributors

orlp, dsprenkels, and 21 other contributors

Assets 2

23 Sep 12:03

github-actions

py-1.34.0-beta.1

04dbc94

Python Polars 1.34.0-beta.1 Pre-release

Pre-release

🏆 Highlights

Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

Skip files in scan_iceberg with filter based on metadata statistics (#24547)
Push row_index predicate for all scan types (#24537)
Perform integer in-filtering for Parquet inequality predicates (#24525)
Stop caching Parquet metadata after 8 files (#24513)
Native streaming .mode() expression (#24459)

✨ Enhancements

Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)
Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
Use fixed-scale Decimals (#24542)
Add support for unsigned 128-bit integers (#24346)
Add unstable pl.Config.set_default_credential_provider (#24434)
Roundtrip BinaryOffset type through Parquet (#24344)
Add opt-in unstable functionality to load interval types as Struct (#24320)
Support reading parquet metadata from cloud storage (#24443)
Add user guide section on AWS role assumption (#24421)
Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

Add support for float inputs for duration types (#24529)
Roundtrip empty string through hive partitioning (#24546)
Fix potential OOB writes in unaligned IPC read (#24550)
Fix regression error when scanning AWS presigned URL (#24530)
Make PlPath::join for cloud paths replace on absolute paths (#24514)
Correct dtype for cum_agg in streaming engine (#24510)
Restore support for np.datetime64() in pl.lit() (#24527)
Ignore Iceberg list element ID if missing (#24479)
Fix panic on streaming full join with coalesce (#23409)
Fix AggState on all_literal in BinaryExpr (#24461)
Show IR sort options in explain (#24465)
Benchmark CI import (#24463)
Fix schema on ApplyExpr with single row literal in agg context (#24422)
Fix planner schema for dividing pl.Float32 by int (#24432)
Fix panic scanning from AWS legacy global endpoint URL (#24450)
Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
Do not propagate struct of nulls with null (#24420)
Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

Rename avg_birthday -> avg_age in examples aggregation (#23726)
Update Polars Cloud user guide (#24366)
Fix typo in set_expr_depth_warning docstring (#24427)

🛠️ Other improvements

More release scripting (#24582)
Again a minor fix for the setup script (#24580)
Minor fix in release script (#24579)
Correct release python beta version check (#24578)
Python dependency failure (#24576)
Always install yq (#24570)
Deterministic import order for Python Polars package variants (#24531)
Check Arrow FFI pointers with an assert (#24564)
Add a couple of missing type definitions in python (#24561)
Fix quickstart example in Polars Cloud user guide (#24554)
Add implementations for loading min/max statistics for Iceberg (#24496)
Update versions (#24508)
Add additional unit tests for pl.concat (#24487)
Refactor parametric tests for as_struct on aggstates (#24493)
Use PlanCallback in name.map_* (#24484)
Pin xlsvwriter to 3.2.5 or before (#24485)
Add dataclass to hold resolved iceberg scan data (#24418)
Fix iceberg test failure in CI (#24456)
Move CompressionUtils to polars-utils (#24430)
Update github template to dispatch to cloud client (#24416)

Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Contributors

orlp, dsprenkels, and 18 other contributors

Assets 4

16 Sep 08:51

github-actions

rs-0.51.0

400ca33

Rust Polars 0.51.0

💥 Breaking changes

Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

🚀 Performance improvements

Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
Allocate only for read items when reading Parquet with predicate (#24401)
Don't aggregate groups for strict cast if original len (#24381)
Allocate only for read items when reading Parquet with predicate (#24324)
Native streaming int_range with len or count (#24280)
Lower arg_unique natively to the streaming engine (#24279)
Move unordering optimization to end (#24286)
Do ordering simplification step after common sub-plan elimination (#24269)
Always simplify order requirements in IR (#24192)
Basic de-duplication of filter expressions (#24220)
Cache the IR in pipe_with_schema (#24213)
Lower arg_where natively to streaming engine (#24088)
Lower Expr.shift to streaming engine (#24106)
Lower order-preserving groupby to streaming engine (#24053)
Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
Lower top-k to streaming engine (#23979)
Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)

✨ Enhancements

Roundtrip BinaryOffset type through Parquet (#24344)
Add opt-in unstable functionality to load interval types as Struct (#24320)
Add user guide section on AWS role assumption (#24421)
Support unique / n_unique / arg_unique for array columns (#24406)
Support S3 virtual-hosted–style URI (#24405)
Remove explicit file create for local async writes (#24358)
Support Partitioning sinks in cloud (#24399)
User-friendly error message on empty path expansion (#24337)
Add Polars security policy (#24314)
Allow pl.Expr.log to take in an expression (#24226)
Implement diff() in streaming engine (#24189)
Enable Expr.diff(n) for negative n (#24200)
Allow upcasting null-typed columns to nested column types in scans (#24185)
Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
Add a deprecation warning for pl.Series.shift(Null) (#24114)
Improve Debug formatting of DataType (#24056)
Add cum_* as native streaming nodes (#23977)
Add peak_{min,max} support for booleans (#24068)
Add DataFrame.map_columns for eager evaluation (#23821)
Add native streaming for peaks_{min,max} (#24039)
IR graph arrows, monospace font, box nodes (#24021)
Add DataTypeExpr.default_value (#23973)
Lower rle to a native streaming engine node (#23929)
Add support for Int128 to pyo3-polars (#23959)
Lower rle_id to a native streaming node (#23894)
Pass endpoint_url loaded from CredentialProviderAWS to scan/write_delta (#23812)
Dispatch scan_iceberg to native by default (#23912)
Lower unique_counts and value_counts to streaming engine (#23890)
Implement dt.days_in_month function (#23119)
Fix errors on native scan_iceberg (#23811)
Reinterpret binary data to fixed size numerical array (#22840)
Make rolling_map serializable (#23848)

🐞 Bug fixes

Fix AggState on all_literal in BinaryExpr (#24461)
Replace unsafe with collect (#24494)
Show IR sort options in explain (#24465)
Benchmark CI import (#24463)
Fix schema on ApplyExpr with single row literal in agg context (#24422)
Fix planner schema for dividing pl.Float32 by int (#24432)
Fix panic scanning from AWS legacy global endpoint URL (#24450)
Emit proper tuple for Log in expression nodes (#24426)
Do not propagate struct of nulls with null (#24420)
Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
Implement approx_n_unique for temporal dtypes and Null (#24417)
Correct sink_ipc overload for compression (#24398)
Enable all integer dtypes for by parameter in join_asof (#24384)
Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
Fix incorrect output ordering for row-separable exprs (#24354)
Fix Series.__arrow_c_stream__ for Decimal and other logical types (#24120)
Match output type to engine for Struct arithmetic (#23805)
Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
Incorrect logic in negative streaming slice (#24326)
Do not error on non-list Sequence for columns parameter in read_excel (#23967)
Invalid conversion from non-bit numpy bools (#24312)
Make dt.epoch('s') serializable (#24302)
Make Expr.rechunk serializable (#24303)
Schema mismatch for 'log' operation (#24300)
Incorrect first/last aggregate in streaming engine (#24289)
Fix group offsets in sliced groups (#24274)
Panic in inexact date(time) conversion (#24268)
The index_of feature should not depends on the object feature (#24256)
Keep DSL cache after serialization and deserialization (#24265)
Sanitize and warn about eval usage (#24262)
Unique with keep="none" in new optimization pass (#24261)
Correct size limits for Decimal cast (#24252)
Unordered unions in check order observing pass (#24253)
Fix dtype for slice on Literal in agg context (#24137)
Fix incorrect filter(lit(True)) when scanning hive (#24237)
In-memory group_by on 128-bit integers (#24242)
Fix panic in gather inside groupby with invalid indices (#24182)
Release the GIL in map_groups (#24225)
Remove extra explode in LazyGroupBy.{head,tail} (#24221)
Fix panic in polars cloud CSV scan (#24197)
Fix panic when loading categorical columns from IO plugin (#24205)
Fix engine type for concat_list on AggScalar implode (#24160)
Rolling_mean handle centered weights with len(values) < window_size (#24158)
Reading is_in predicate for Parquet plain strings (#24184)
Make PyCategories pickleable (#24170)
Remove unused unsound function to_mutable_slice (#24173)
PyO3 extension types giving compat_level errors (#24166)
Allow non-elementwise by in top_k (#24164)
Fix sort_by for group_by_dynamic context (#24152)
Input-independent length aggregations in streaming (#24153)
Release GIL when iterating df in to_arrow (#24151)
Respect non-elementwise join_where conditions (#24135)
Resolve schema mismatch for div on Boolean (#24111)
Keep name when doing empty group-aware aggregation (#24098)
Implode instead of reshape_list (#24078)
Rolling mean with weights incorrect when min_samples < window_size (#23485)
Allow merge_sorted for all types (#24077)
Include datatypes in row_encode expression (#24074)
Include UDF materialized type in serialization (#24073)
Correct .rolling() output type for non-aggregations (#24072)
Correct planner output schema for join_asof (#24071)
Allow %B to work without specifying day (#24009)
Correct output for fold and reduce (#24069)
Expr.meta.output_name for struct fields (#24064)
Ensure upcast operations on pl.Date default to microsecond precision (#23981)
Add peak_{min,max} support for booleans (#24068)
Planner output type for mean with strange input type (#24052)
Remove, deprecate or change eager Exprs to be lazy compatible (#24027)
Scan of multiple sources with null datatype (#24065)
Categorical in nested data in row encoding (#24051)
Missing length update in builder for pl.Array repetition (#24055)
Race condition in global categories init (#24045)
Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
Error when using named functions (#24041)
Don't encode entire CategoricalMapping when going to Arrow (#24036)
Fix cast on arithmetic with lit (#23941)
Incorrect slice-slice pushdown (#24032)
Dedup common cache subplan in IR graph (#24028)
Allow join on Decimal in in-memory engine (#24026)
Fix datatypes for eval.list in aggregation context (#23911)
Allocator capsule fallback panic (#24022)
Accept another zlib "magic header" file signature (#24013)
Fix truediv dtypes so cast in list.eval is not dropped (#23936)
Don't reuse cached return_dtype for expanded map expressions (#24010)
Cache id is not a valid dot node id (#24005)
Align map_elements with and without return_dtype (#24007)
Fix column dtype lifetime for csv_write segfault on Categorical (#23986)
Allow serializing LazyGroupBy.map_groups (#23964)
Correct allocator name in PyCapsule (#23968)
Mismatched types for write function for windows (#23915)
Fix unpivot panic when index= column not found (#23958)
Fix assert_frame_equal with check_dtypes=False for all-null series with different types (#23943)
Return correct python package version (#23951)
Categorical namespace functions fail on Enum columns (#23925)
Properly set sumwise complete on filter for missing columns (#23877)
Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
Group By with filters (#23917)
Fix read_csv ignoring Decimal schema for header-only data (#23886)
Ensure collect() native Iceberg always scans latest when no snapshot_id is given (#23907)
Writing List(Array) columns to JSON without panic (#23875)
Fill Iceberg missing fields with partition values if present in metadata (#23900)
Create file for streaming sink even if unspawned (#23672)
Update cloud testing environment (#23908)
Parquet filtering on multiple RGs with literal predicate (#23903)
Incorrect datatype passed to libc::write (#23904)
Properly feature gate TZ_AWARE_RE usage (#23888)
Improve identification of "non group-key" aggregates in SQL GROUP BY queries (#23191)
Spawning tokio task outside reactor (#23884)
Correctly raise DuplicateError on asof_join with suffix="" (#23864)
Fix errors on native scan_iceberg (#23811)
Fix index ...

Contributors

mrkn, pka, and 46 other contributors

Assets 2

09 Sep 08:38

github-actions

py-1.33.1

1dc7792

Python Polars 1.33.1

🚀 Performance improvements

Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
Allocate only for read items when reading Parquet with predicate (#24401)
Don't aggregate groups for strict cast if original len (#24381)
Allocate only for read items when reading Parquet with predicate (#24324)

✨ Enhancements

Support S3 virtual-hosted–style URI (#24405)
Remove explicit file create for local async writes (#24358)
Add PyCapsule __arrow_c_schema__ interface to pl.Schema (#24365)
Support Partitioning sinks in cloud (#24399)
User-friendly error message on empty path expansion (#24337)
Add unstable pre_execution_query parameter to read_database_uri (#23634)
Add Polars security policy (#24314)

🐞 Bug fixes

Correct sink_ipc overload for compression (#24398)
Enable all integer dtypes for by parameter in join_asof (#24384)
Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
Wrap deprecated top-level imports in TYPE_CHECKING (#24340)
Fix incorrect output ordering for row-separable exprs (#24354)
Fix Series.__arrow_c_stream__ for Decimal and other logical types (#24120)
Match output type to engine for Struct arithmetic (#23805)
Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
Don't throw away type information for NumPy numeric values when using lit() (#24229)
Incorrect logic in negative streaming slice (#24326)
Ensure read_database_uri with ADBC works as expected with DuckDB URIs (#24097)
Do not error on non-list Sequence for columns parameter in read_excel (#23967)

📖 Documentation

Document newly added is_pure parameter for register_io_source (#24311)
Create a module docstring for the public polars module (#24332)
Update to Polars Cloud user guide (#24187)
Update distributed page (#24323)
Add a note and example about exporting unformatted Excel sheet data (#24145)
Add detail about server-side cursor behaviour for SQLAlchemy in the "iter_batches" parameter of read_database (#24094)
Add Polars security policy (#24314)

🛠️ Other improvements

Bump c-api (#24412)
Add a regression test for #7631 (#24363)
Update cloud test InteractiveQuery to DirectQuery (#24287)
Mark some tests as slow (#24327)
Mark more tests as ready for cloud (#24315)
Add hint to update PYPOLARS_VERSION on version assert test (#24313)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @VictorAtIfInsurance, @alexander-beedie, @coastalwhite, @dsprenkels, @itamarst, @kdn36, @kuril, @mcrumiller, @nameexhaustion, @nesb1, @orlp, @r-brink and @ritchie46

Contributors

orlp, dsprenkels, and 12 other contributors

Assets 4

Releases: pola-rs/polars

Python Polars 1.35.1

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.35.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.35.0-beta.1

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.34.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.34.0-beta.5

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.34.0-beta.4

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.34.0-beta.3

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.34.0-beta.1

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Rust Polars 0.51.0

💥 Breaking changes

🚀 Performance improvements