Skip to content

Releases: pola-rs/polars

Python Polars 1.35.1

30 Oct 12:13
a99ad34

Choose a tag to compare

🚀 Performance improvements

  • Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
  • Skip filtering scan IR if no paths were filtered (#25037)
  • Optimize ipc stream read performance (#24671)

✨ Enhancements

  • Support BYTE_ARRAY backed Decimals in Parquet (#25076)
  • Allow glimpse to return a DataFrame (#24803)
  • Add allow_empty flag to item (#25048)

🐞 Bug fixes

  • The SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
  • Fix panic if scan predicate produces 0 length mask (#25089)
  • Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
  • Panic in group_by_dynamic with group_by and multiple chunks (#25075)
  • Minor improvement to internal is_pycapsule utility function (#25073)
  • Fix panic when using struct field as join key (#25059)
  • Allow broadcast in group_by for ApplyExpr and BinaryExpr (#25053)
  • Fix field metadata for nested categorical PyCapsule export (#25052)
  • Block predicate pushdown when group_by key values are changed (#25032)
  • Group-By aggregation problems caused by AmortSeries (#25043)
  • Don't push down predicates passed inserted cache nodes (#25042)
  • Allow for negative time in group_by_dynamic iterator (#25041)

📖 Documentation

  • Fix typo in public dataset URL (#25044)

🛠️ Other improvements

  • Disable recursive CSPE for now (#25085)
  • Change group length mismatch error to ShapeError (#25004)
  • Update toolchain (#25007)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @Liyixin95, @alexander-beedie, @coastalwhite, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Python Polars 1.35.0

26 Oct 20:05

Choose a tag to compare

🏆 Highlights

🚀 Performance improvements

  • Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
  • Lower unique to native group-by and speed up n_unique in group-by context (#24976)
  • Better parallelize take{_slice,}_unchecked (#24980)
  • Implement native skew and kurtosis in group-by context (#24961)
  • Use native group-by aggregations for bitwise_* operations (#24935)
  • Address group_by_dynamic slowness in sparse data (#24916)
  • Push filters to PyIceberg (#24910)
  • Native filter/drop_nulls/drop_nans in group-by context (#24897)
  • Implement cumulative_eval using the group-by engine (#24889)
  • Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
  • Implement native null_count, any and all group-by aggregations (#24859)
  • Speed up reverse in group-by context (#24855)
  • Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
  • Don't check duplicates on streaming simple projection in release mode (#24830)
  • Lower approx_n_unique to the streaming engine (#24821)
  • Duration/interval string parsing optimisation (2-5x faster) (#24771)
  • Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
  • Implement indexed method for BitMapIter::nth (#24766)
  • Pushdown slices on plans within unions (#24735)

✨ Enhancements

  • Stabilize decimal (#25020)
  • Support ewm_mean() in streaming engine (#25003)
  • Improve row-count estimates (#24996)
  • Remove filtered scan paths in IR when possible (#24974)
  • Introduce remote Polars MCP server (#24977)
  • Allow local scans on polars cloud (configurable) (#24962)
  • Add Expr.item to strictly extract a single value from an expression (#24888)
  • Add environment variable to roundtrip empty struct in Parquet (#24914)
  • Fast-count for scan_iceberg().select(len()) (#24602)
  • Add glob parameter to scan_ipc (#24898)
  • Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
  • Add list.agg and arr.agg (#24790)
  • Implement {Expr,Series}.rolling_rank() (#24776)
  • Don't require PyArrow for read_database_uri if ADBC engine version supports PyCapsule interface (#24029)
  • Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
  • Support MergeSorted in CSPE (#24805)
  • Duration/interval string parsing optimisation (2-5x faster) (#24771)
  • Recursively apply CSPE (#24798)
  • Add streaming engine per-node metrics (#24788)
  • Add arr.eval (#24472)
  • Drop PyArrow requirement for non-batched usage of read_database with the ADBC engine and support iter_batches with the ADBC engine (#24180)
  • Improve rolling_(sum|mean) accuracy (#24743)
  • Add separator to {Data,Lazy}Frame.unnest (#24716)
  • Add union() function for unordered concatenation (#24298)
  • Add name.replace to the set of column rename options (#17942)
  • Support np.ndarray -> AnyValue conversion (#24748)
  • Allow duration strings with leading "+" (#24737)
  • Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
  • Add support for UInt128 to pyo3-polars (#24731)

🐞 Bug fixes

  • Re-enable CPU feature check before import (#25010)
  • Implement read_excel workaround for fastexcel/calamine issue loading a column subset from a named table (#25012)
  • Correctness any(ignore_nulls) and OOB in all (#25005)
  • Streaming any/all with ignore_nulls=False (#25008)
  • Fix incorrect join_asof on a casted expression (#25006)
  • Optimize memory on rolling groups in ApplyExpr (#24709)
  • Fallback Pyarrow scan to in-memory engine (#24991)
  • Make Operator::swap_operands return correct operators for Plus, Minus, Multiply and Divide (#24997)
  • Capitalize letters after numbers in to_titlecase (#24993)
  • Preserve null values in pct_change (#24952)
  • Raise length mismatch on over with sliced groups (#24887)
  • Check duplicate name in transpose (#24956)
  • Follow Kleene logic in any / all for group-by (#24940)
  • Do not optimize cross join to iejoin if order maintaining (#24950)
  • Fix typing of scan_parquet partially unknown (#24928)
  • Properly release the GIL for read_parquet_metadata (#24922)
  • Broadcast partition_by columns in over expression (#24874)
  • Clear index cache on stacked df.filter expressions (#24870)
  • Fix 'explode' mapping strategy on scalar value (#24861)
  • Fix repeated with_row_index() after scan() silently ignored (#24866)
  • Correctly return min and max for enums in groupby aggregation (#24808)
  • Refactor BinaryExpr in group_by dispatch logic (#24548)
  • Fix aggstate for gather (#24857)
  • Keep scalars for length preserving functions in group_by (#24819)
  • Have range feature depend on dtype-array feature (#24853)
  • Fix duplicate select panic (#24836)
  • Inconsistency of list.sum() result type with None values (#24476)
  • Division by zero in Expr.dt.truncate (#24832)
  • Potential deadlock in __arrow_c_stream__ (#24831)
  • Allow double aggregations in group-by contexts (#24823)
  • Series.shrink_dtype for i128/u128 (#24833)
  • Fix dtype in EvalExpr (#24650)
  • Allow aggregations on AggState::LiteralScalar (#24820)
  • Dispatch to group_aware for fallible expressions with masked out elements (#24815)
  • Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
  • Fix regression on write_database() to Snowflake due to unsupported string view type (#24622)
  • Fix XOR did not follow kleene when one side is unit-length (#24810)
  • Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
  • Incorrect precision in Series.str.to_decimal (#24804)
  • Use overlapping instead of rolling (#24787)
  • Fix iterable on dynamic_group_by and rolling object (#24740)
  • Use Kahan summation for in-memory groupby sum/mean (#24774)
  • Release GIL in PythonScan predicate evaluation (#24779)
  • Type error in bitmask::nth_set_bit_u64 (#24775)
  • Add Expr.sign for Decimal datatype (#24717)
  • Correct str.replace with missing pattern (#24768)
  • Ensure schema_overrides is respected when loading iterable row data (#24721)
  • Support decimal_comma on Decimal type in write_csv (#24718)

📖 Documentation

  • Introduce remote Polars MCP server (#24977)
  • Add {arr,list}.agg API references (#24970)
  • Support LLM in docs (#24958)
  • Update Cloud docs with correct fn argument order (#24939)
  • Update name.replace examples (#24941)
  • Add i128 and u128 features to user guide (#24938)
  • Add partitioning examples for sink_* methods (#24918)
  • Add more {unique,value}_counts examples (#24927)
  • Indent the versionchanged (#24783)
  • Relax fsspec wording (#24881)
  • Add pl.field into the api docs (#24846)
  • Fix duplicated article in SECURITY.md (#24762)
  • Document output name determination in when/then/otherwise (#24746)
  • Specify that precision=None becomes 38 for Decimal (#24742)
  • Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
  • Fix source mapping (#24736)

📦 Build system

  • Ensure build_feature_flags.py is included in artifact (#25024)
  • Update pyo3 and numpy crates to version 0.26 (#24760)

🛠️ Other improvements

  • Fix benchmark ci (#25019)
  • Fix non-deterministic test (#25009)
  • Fix makefile arch detection (#25011)
  • Make LazyFrame.set_sorted into a FunctionIR::Hint (#24981)
  • Remove symbolic links (#24982)
  • Deprecate Expr.agg_groups() and pl.groups() (#24919)
  • Dispatch to no-op rayon thread-pool from streaming (#24957)
  • Unpin pydantic (#24955)
  • Ensure safety of scan fast-count IR lowering in streaming (#24953)
  • Re-use iterators in set_ operations (#24850)
  • Remove GroupByPartitioned and dispatch to streaming engine (#24903)
  • Turn element() into {A,}Expr::Element (#24885)
  • Pass ScanOptions to new_from_ipc (#24893)
  • Update tests to be index type agnostic (#24891)
  • Unset Context in Window expression (#24875)
  • Fix failing delta test (#24867)
  • Move FunctionExpr dispatch from plan to expr (#24839)
  • Fix SQL test giving wrong error message (#24835)
  • Consolidate dtype paths in ApplyExpr (#24825)
  • Add days_in_month to documentation (#24822)
  • Enable ruff D417 lint (#24814)
  • Turn pl.format into proper elementwise expression (#24811)
  • Fix remote benchmark by no-longer saving builds (#24812)
  • Refactor ApplyExpr in group_by context on multiple inputs (#24520)
  • IR text plan graph generator (#24733)
  • Temporarily pin pydantic to fix CI (#24797)
  • Extend and rename rolling groups to overlapping (#24577)
  • Refactor DataType proptest strategies (#24763)
  • Add union to documentation (#24769)

Thank you to all our contributors for making this release possible!
@EndPositive, @EnricoMi, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @mjanssen, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @thomasjpfan and @williambdean

Python Polars 1.35.0-beta.1

19 Oct 15:18

Choose a tag to compare

Pre-release

🚀 Performance improvements

  • Address group_by_dynamic slowness in sparse data (#24916)
  • Push filters to PyIceberg (#24910)
  • Native filter/drop_nulls/drop_nans in group-by context (#24897)
  • Implement cumulative_eval using the group-by engine (#24889)
  • Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
  • Implement native null_count, any and all group-by aggregations (#24859)
  • Speed up reverse in group-by context (#24855)
  • Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
  • Don't check duplicates on streaming simple projection in release mode (#24830)
  • Lower approx_n_unique to the streaming engine (#24821)
  • Duration/interval string parsing optimisation (2-5x faster) (#24771)
  • Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
  • Implement indexed method for BitMapIter::nth (#24766)
  • Pushdown slices on plans within unions (#24735)

✨ Enhancements

  • Add environment variable to roundtrip empty struct in Parquet (#24914)
  • Fast-count for scan_iceberg().select(len()) (#24602)
  • Add glob parameter to scan_ipc (#24898)
  • Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
  • Add list.agg and arr.agg (#24790)
  • Implement {Expr,Series}.rolling_rank() (#24776)
  • Don't require PyArrow for read_database_uri if ADBC engine version supports PyCapsule interface (#24029)
  • Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
  • Support MergeSorted in CSPE (#24805)
  • Duration/interval string parsing optimisation (2-5x faster) (#24771)
  • Recursively apply CSPE (#24798)
  • Add streaming engine per-node metrics (#24788)
  • Add arr.eval (#24472)
  • Drop PyArrow requirement for non-batched usage of read_database with the ADBC engine and support iter_batches with the ADBC engine (#24180)
  • Improve rolling_(sum|mean) accuracy (#24743)
  • Add separator to {Data,Lazy}Frame.unnest (#24716)
  • Add union() function for unordered concatenation (#24298)
  • Add name.replace to the set of column rename options (#17942)
  • Support np.ndarray -> AnyValue conversion (#24748)
  • Allow duration strings with leading "+" (#24737)
  • Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
  • Add support for UInt128 to pyo3-polars (#24731)

🐞 Bug fixes

  • Properly release the GIL for read_parquet_metadata (#24922)
  • Broadcast partition_by columns in over expression (#24874)
  • Clear index cache on stacked df.filter expressions (#24870)
  • Fix 'explode' mapping strategy on scalar value (#24861)
  • Fix repeated with_row_index() after scan() silently ignored (#24866)
  • Correctly return min and max for enums in groupby aggregation (#24808)
  • Refactor BinaryExpr in group_by dispatch logic (#24548)
  • Fix aggstate for gather (#24857)
  • Keep scalars for length preserving functions in group_by (#24819)
  • Have range feature depend on dtype-array feature (#24853)
  • Fix duplicate select panic (#24836)
  • Inconsistency of list.sum() result type with None values (#24476)
  • Division by zero in Expr.dt.truncate (#24832)
  • Potential deadlock in __arrow_c_stream__ (#24831)
  • Allow double aggregations in group-by contexts (#24823)
  • Series.shrink_dtype for i128/u128 (#24833)
  • Fix dtype in EvalExpr (#24650)
  • Allow aggregations on AggState::LiteralScalar (#24820)
  • Dispatch to group_aware for fallible expressions with masked out elements (#24815)
  • Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
  • Fix regression on write_database() to Snowflake due to unsupported string view type (#24622)
  • Fix XOR did not follow kleene when one side is unit-length (#24810)
  • Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
  • Incorrect precision in Series.str.to_decimal (#24804)
  • Use overlapping instead of rolling (#24787)
  • Fix iterable on dynamic_group_by and rolling object (#24740)
  • Use Kahan summation for in-memory groupby sum/mean (#24774)
  • Release GIL in PythonScan predicate evaluation (#24779)
  • Type error in bitmask::nth_set_bit_u64 (#24775)
  • Add Expr.sign for Decimal datatype (#24717)
  • Correct str.replace with missing pattern (#24768)
  • Ensure schema_overrides is respected when loading iterable row data (#24721)
  • Support decimal_comma on Decimal type in write_csv (#24718)

📖 Documentation

  • Add partitioning examples for sink_* methods (#24918)
  • Add more {unique,value}_counts examples (#24927)
  • Indent the versionchanged (#24783)
  • Relax fsspec wording (#24881)
  • Add pl.field into the api docs (#24846)
  • Fix duplicated article in SECURITY.md (#24762)
  • Document output name determination in when/then/otherwise (#24746)
  • Specify that precision=None becomes 38 for Decimal (#24742)
  • Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
  • Fix source mapping (#24736)

📦 Build system

  • Update pyo3 and numpy crates to version 0.26 (#24760)

🛠️ Other improvements

  • Re-use iterators in set_ operations (#24850)
  • Remove GroupByPartitioned and dispatch to streaming engine (#24903)
  • Turn element() into {A,}Expr::Element (#24885)
  • Pass ScanOptions to new_from_ipc (#24893)
  • Update tests to be index type agnostic (#24891)
  • Unset Context in Window expression (#24875)
  • Fix failing delta test (#24867)
  • Move FunctionExpr dispatch from plan to expr (#24839)
  • Fix SQL test giving wrong error message (#24835)
  • Consolidate dtype paths in ApplyExpr (#24825)
  • Add days_in_month to documentation (#24822)
  • Enable ruff D417 lint (#24814)
  • Turn pl.format into proper elementwise expression (#24811)
  • Fix remote benchmark by no-longer saving builds (#24812)
  • Refactor ApplyExpr in group_by context on multiple inputs (#24520)
  • IR text plan graph generator (#24733)
  • Temporarily pin pydantic to fix CI (#24797)
  • Extend and rename rolling groups to overlapping (#24577)
  • Refactor DataType proptest strategies (#24763)
  • Add union to documentation (#24769)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @nameexhaustion, @orlp, @pavelzw, @ritchie46, @thomasjpfan and @williambdean

Python Polars 1.34.0

02 Oct 18:31
150a9ed

Choose a tag to compare

🏆 Highlights

  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

  • Optimize gather_every(n=1) to slice (#24704)
  • Lower null count to streaming engine (#24703)
  • Native streaming gather_every (#24700)
  • Pushdown filter with strptime if input is literal (#24694)
  • Avoid copying expanded paths (#24669)
  • Relax filter expr ordering (#24662)
  • Remove unnecessary groups call in aggregated (#24651)
  • Skip files in scan_iceberg with filter based on metadata statistics (#24547)
  • Push row_index predicate for all scan types (#24537)
  • Perform integer in-filtering for Parquet inequality predicates (#24525)
  • Stop caching Parquet metadata after 8 files (#24513)
  • Native streaming .mode() expression (#24459)

✨ Enhancements

  • Implement maintain_order for cross join (#24665)
  • Add support to output dt.total_{}() duration values as fractionals (#24598)
  • Avoid forcing a pyarrow dependency in read_excel when using the default "calamine" engine (#24655)
  • Support scanning from file:/path URIs (#24603)
  • Log which file the schema was sourced from, and which file caused an extra column error (#24621)
  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)
  • Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
  • Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
  • Use fixed-scale Decimals (#24542)
  • Add support for unsigned 128-bit integers (#24346)
  • Add unstable pl.Config.set_default_credential_provider (#24434)
  • Roundtrip BinaryOffset type through Parquet (#24344)
  • Add opt-in unstable functionality to load interval types as Struct (#24320)
  • Support reading parquet metadata from cloud storage (#24443)
  • Add user guide section on AWS role assumption (#24421)
  • Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

  • Removing dots after noqa comments (#24722)
  • Parse Decimal with comma as decimal separator in CSV (#24685)
  • Make Categories pickleable (#24691)
  • Shift on array within list (#24678)
  • Fix handling of AggregatedScalar in ApplyExpr single input (#24634)
  • Support reading of mixed compressed/uncompressed IPC buffers (#24674)
  • Overflow in slice-slice optimization (#24658)
  • Package discovery for setuptools (#24656)
  • Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction (#24590)
  • Remove inclusion of polars dir in runtime sdist/wheel (#24654)
  • Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous (#24647)
  • Widen from_dicts to Iterable[Mapping[str, Any]] (#24584)
  • Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
  • Raise Exception instead of panic when unnest on non-struct column (#24471)
  • Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
  • Newline escaping in streaming show_graph (#24612)
  • Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
  • Sink batches early stop on in-memory engine (#24585)
  • More precisely model expression ordering requirements (#24437)
  • Panic in zero-weight rolling mean/var (#24596)
  • Decimal <-> literal arithmetic supertype rules (#24594)
  • Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
  • Validate list type for list expressions in planner (#24589)
  • Fix scan_iceberg() storage options not taking effect (#24574)
  • Have log() prioritize the leftmost dtype for its output dtype (#24581)
  • CSV pl.len() was incorrect (#24587)
  • Add support for float inputs for duration types (#24529)
  • Roundtrip empty string through hive partitioning (#24546)
  • Fix potential OOB writes in unaligned IPC read (#24550)
  • Fix regression error when scanning AWS presigned URL (#24530)
  • Make PlPath::join for cloud paths replace on absolute paths (#24514)
  • Correct dtype for cum_agg in streaming engine (#24510)
  • Restore support for np.datetime64() in pl.lit() (#24527)
  • Ignore Iceberg list element ID if missing (#24479)
  • Fix panic on streaming full join with coalesce (#23409)
  • Fix AggState on all_literal in BinaryExpr (#24461)
  • Show IR sort options in explain (#24465)
  • Benchmark CI import (#24463)
  • Fix schema on ApplyExpr with single row literal in agg context (#24422)
  • Fix planner schema for dividing pl.Float32 by int (#24432)
  • Fix panic scanning from AWS legacy global endpoint URL (#24450)
  • Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
  • Do not propagate struct of nulls with null (#24420)
  • Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
  • Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

  • Add default parquet compression levels (#24686)
  • Fix syntax error in data-types-and-structures.md (#24606)
  • Rename avg_birthday -> avg_age in examples aggregation (#23726)
  • Update Polars Cloud user guide (#24366)
  • Fix typo in set_expr_depth_warning docstring (#24427)

📦 Build system

  • Python pre-release 1.34.0b5 (#24699)
  • Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

  • Removing dots after noqa comments (#24722)
  • Make test_multiple_sorting_columns test runnable (#24719)
  • Remove {Upper,Lower}Bound expressions in IR (#24701)
  • Fix Makefile uv pip option syntax (#24711)
  • Add egg-info to gitignore (#24712)
  • Restructure python project directories again (#24676)
  • Use IR for polars-expr output field resolution (#24661)
  • Remove dist/ from release python workflow (#24639)
  • Escape sed ampersand in release script (#24631)
  • Remove PyOdide from release for now (#24630)
  • Fix sed in-place in release script (#24628)
  • Release script pyodide wheel (#24627)
  • Release script pyodide wheel (#24626)
  • Update release script for runtimes (#24610)
  • Remove unused UnknownKind::Ufunc (#24614)
  • Use cargo-run to call dsl-schema script (#24607)
  • Cleanup and prepare to_field for element and struct field context (#24592)
  • Resolve nightly clippy hints (#24593)
  • Rename pl.dependencies to pl._dependencies (#24595)
  • More release scripting (#24582)
  • Again a minor fix for the setup script (#24580)
  • Minor fix in release script (#24579)
  • Correct release python beta version check (#24578)
  • Python dependency failure (#24576)
  • Always install yq (#24570)
  • Deterministic import order for Python Polars package variants (#24531)
  • Check Arrow FFI pointers with an assert (#24564)
  • Add a couple of missing type definitions in python (#24561)
  • Fix quickstart example in Polars Cloud user guide (#24554)
  • Add implementations for loading min/max statistics for Iceberg (#24496)
  • Update versions (#24508)
  • Add additional unit tests for pl.concat (#24487)
  • Refactor parametric tests for as_struct on aggstates (#24493)
  • Use PlanCallback in name.map_* (#24484)
  • Pin xlsvwriter to 3.2.5 or before (#24485)
  • Add dataclass to hold resolved iceberg scan data (#24418)
  • Fix iceberg test failure in CI (#24456)
  • Move CompressionUtils to polars-utils (#24430)
  • Update github template to dispatch to cloud client (#24416)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @alonsosilvaallende, @andreseje, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @eitsupi, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @moizescbf, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Python Polars 1.34.0-beta.5

01 Oct 16:25
64eaeff

Choose a tag to compare

Pre-release

🏆 Highlights

  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

  • Pushdown filter with strptime if input is literal (#24694)
  • Avoid copying expanded paths (#24669)
  • Relax filter expr ordering (#24662)
  • Remove unnecessary groups call in aggregated (#24651)
  • Skip files in scan_iceberg with filter based on metadata statistics (#24547)
  • Push row_index predicate for all scan types (#24537)
  • Perform integer in-filtering for Parquet inequality predicates (#24525)
  • Stop caching Parquet metadata after 8 files (#24513)
  • Native streaming .mode() expression (#24459)

✨ Enhancements

  • Implement maintain_order for cross join (#24665)
  • Add support to output dt.total_{}() duration values as fractionals (#24598)
  • Avoid forcing a pyarrow dependency in read_excel when using the default "calamine" engine (#24655)
  • Support scanning from file:/path URIs (#24603)
  • Log which file the schema was sourced from, and which file caused an extra column error (#24621)
  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)
  • Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
  • Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
  • Use fixed-scale Decimals (#24542)
  • Add support for unsigned 128-bit integers (#24346)
  • Add unstable pl.Config.set_default_credential_provider (#24434)
  • Roundtrip BinaryOffset type through Parquet (#24344)
  • Add opt-in unstable functionality to load interval types as Struct (#24320)
  • Support reading parquet metadata from cloud storage (#24443)
  • Add user guide section on AWS role assumption (#24421)
  • Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

  • Make Categories pickleable (#24691)
  • Shift on array within list (#24678)
  • Fix handling of AggregatedScalar in ApplyExpr single input (#24634)
  • Support reading of mixed compressed/uncompressed IPC buffers (#24674)
  • Overflow in slice-slice optimization (#24658)
  • Package discovery for setuptools (#24656)
  • Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction (#24590)
  • Remove inclusion of polars dir in runtime sdist/wheel (#24654)
  • Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous (#24647)
  • Widen from_dicts to Iterable[Mapping[str, Any]] (#24584)
  • Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
  • Raise Exception instead of panic when unnest on non-struct column (#24471)
  • Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
  • Newline escaping in streaming show_graph (#24612)
  • Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
  • Sink batches early stop on in-memory engine (#24585)
  • More precisely model expression ordering requirements (#24437)
  • Panic in zero-weight rolling mean/var (#24596)
  • Decimal <-> literal arithmetic supertype rules (#24594)
  • Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
  • Validate list type for list expressions in planner (#24589)
  • Fix scan_iceberg() storage options not taking effect (#24574)
  • Have log() prioritize the leftmost dtype for its output dtype (#24581)
  • CSV pl.len() was incorrect (#24587)
  • Add support for float inputs for duration types (#24529)
  • Roundtrip empty string through hive partitioning (#24546)
  • Fix potential OOB writes in unaligned IPC read (#24550)
  • Fix regression error when scanning AWS presigned URL (#24530)
  • Make PlPath::join for cloud paths replace on absolute paths (#24514)
  • Correct dtype for cum_agg in streaming engine (#24510)
  • Restore support for np.datetime64() in pl.lit() (#24527)
  • Ignore Iceberg list element ID if missing (#24479)
  • Fix panic on streaming full join with coalesce (#23409)
  • Fix AggState on all_literal in BinaryExpr (#24461)
  • Show IR sort options in explain (#24465)
  • Benchmark CI import (#24463)
  • Fix schema on ApplyExpr with single row literal in agg context (#24422)
  • Fix planner schema for dividing pl.Float32 by int (#24432)
  • Fix panic scanning from AWS legacy global endpoint URL (#24450)
  • Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
  • Do not propagate struct of nulls with null (#24420)
  • Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
  • Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

  • Add default parquet compression levels (#24686)
  • Fix syntax error in data-types-and-structures.md (#24606)
  • Rename avg_birthday -> avg_age in examples aggregation (#23726)
  • Update Polars Cloud user guide (#24366)
  • Fix typo in set_expr_depth_warning docstring (#24427)

📦 Build system

  • Python pre-release 1.34.0b5 (#24699)
  • Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

  • Restructure python project directories again (#24676)
  • Use IR for polars-expr output field resolution (#24661)
  • Remove dist/ from release python workflow (#24639)
  • Escape sed ampersand in release script (#24631)
  • Remove PyOdide from release for now (#24630)
  • Fix sed in-place in release script (#24628)
  • Release script pyodide wheel (#24627)
  • Release script pyodide wheel (#24626)
  • Update release script for runtimes (#24610)
  • Remove unused UnknownKind::Ufunc (#24614)
  • Use cargo-run to call dsl-schema script (#24607)
  • Cleanup and prepare to_field for element and struct field context (#24592)
  • Resolve nightly clippy hints (#24593)
  • Rename pl.dependencies to pl._dependencies (#24595)
  • More release scripting (#24582)
  • Again a minor fix for the setup script (#24580)
  • Minor fix in release script (#24579)
  • Correct release python beta version check (#24578)
  • Python dependency failure (#24576)
  • Always install yq (#24570)
  • Deterministic import order for Python Polars package variants (#24531)
  • Check Arrow FFI pointers with an assert (#24564)
  • Add a couple of missing type definitions in python (#24561)
  • Fix quickstart example in Polars Cloud user guide (#24554)
  • Add implementations for loading min/max statistics for Iceberg (#24496)
  • Update versions (#24508)
  • Add additional unit tests for pl.concat (#24487)
  • Refactor parametric tests for as_struct on aggstates (#24493)
  • Use PlanCallback in name.map_* (#24484)
  • Pin xlsvwriter to 3.2.5 or before (#24485)
  • Add dataclass to hold resolved iceberg scan data (#24418)
  • Fix iceberg test failure in CI (#24456)
  • Move CompressionUtils to polars-utils (#24430)
  • Update github template to dispatch to cloud client (#24416)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @eitsupi, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @moizescbf, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Python Polars 1.34.0-beta.4

28 Sep 12:46

Choose a tag to compare

Pre-release

🏆 Highlights

  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

  • Skip files in scan_iceberg with filter based on metadata statistics (#24547)
  • Push row_index predicate for all scan types (#24537)
  • Perform integer in-filtering for Parquet inequality predicates (#24525)
  • Stop caching Parquet metadata after 8 files (#24513)
  • Native streaming .mode() expression (#24459)

✨ Enhancements

  • Support scanning from file:/path URIs (#24603)
  • Log which file the schema was sourced from, and which file caused an extra column error (#24621)
  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)
  • Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
  • Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
  • Use fixed-scale Decimals (#24542)
  • Add support for unsigned 128-bit integers (#24346)
  • Add unstable pl.Config.set_default_credential_provider (#24434)
  • Roundtrip BinaryOffset type through Parquet (#24344)
  • Add opt-in unstable functionality to load interval types as Struct (#24320)
  • Support reading parquet metadata from cloud storage (#24443)
  • Add user guide section on AWS role assumption (#24421)
  • Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

  • Widen from_dicts to Iterable[Mapping[str, Any]] (#24584)
  • Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
  • Raise Exception instead of panic when unnest on non-struct column (#24471)
  • Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
  • Newline escaping in streaming show_graph (#24612)
  • Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
  • Sink batches early stop on in-memory engine (#24585)
  • More precisely model expression ordering requirements (#24437)
  • Panic in zero-weight rolling mean/var (#24596)
  • Decimal <-> literal arithmetic supertype rules (#24594)
  • Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
  • Validate list type for list expressions in planner (#24589)
  • Fix scan_iceberg() storage options not taking effect (#24574)
  • Have log() prioritize the leftmost dtype for its output dtype (#24581)
  • CSV pl.len() was incorrect (#24587)
  • Add support for float inputs for duration types (#24529)
  • Roundtrip empty string through hive partitioning (#24546)
  • Fix potential OOB writes in unaligned IPC read (#24550)
  • Fix regression error when scanning AWS presigned URL (#24530)
  • Make PlPath::join for cloud paths replace on absolute paths (#24514)
  • Correct dtype for cum_agg in streaming engine (#24510)
  • Restore support for np.datetime64() in pl.lit() (#24527)
  • Ignore Iceberg list element ID if missing (#24479)
  • Fix panic on streaming full join with coalesce (#23409)
  • Fix AggState on all_literal in BinaryExpr (#24461)
  • Show IR sort options in explain (#24465)
  • Benchmark CI import (#24463)
  • Fix schema on ApplyExpr with single row literal in agg context (#24422)
  • Fix planner schema for dividing pl.Float32 by int (#24432)
  • Fix panic scanning from AWS legacy global endpoint URL (#24450)
  • Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
  • Do not propagate struct of nulls with null (#24420)
  • Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
  • Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

  • Fix syntax error in data-types-and-structures.md (#24606)
  • Rename avg_birthday -> avg_age in examples aggregation (#23726)
  • Update Polars Cloud user guide (#24366)
  • Fix typo in set_expr_depth_warning docstring (#24427)

📦 Build system

  • Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

  • Remove dist/ from release python workflow (#24639)
  • Escape sed ampersand in release script (#24631)
  • Remove PyOdide from release for now (#24630)
  • Fix sed in-place in release script (#24628)
  • Release script pyodide wheel (#24627)
  • Release script pyodide wheel (#24626)
  • Update release script for runtimes (#24610)
  • Remove unused UnknownKind::Ufunc (#24614)
  • Use cargo-run to call dsl-schema script (#24607)
  • Cleanup and prepare to_field for element and struct field context (#24592)
  • Resolve nightly clippy hints (#24593)
  • Rename pl.dependencies to pl._dependencies (#24595)
  • More release scripting (#24582)
  • Again a minor fix for the setup script (#24580)
  • Minor fix in release script (#24579)
  • Correct release python beta version check (#24578)
  • Python dependency failure (#24576)
  • Always install yq (#24570)
  • Deterministic import order for Python Polars package variants (#24531)
  • Check Arrow FFI pointers with an assert (#24564)
  • Add a couple of missing type definitions in python (#24561)
  • Fix quickstart example in Polars Cloud user guide (#24554)
  • Add implementations for loading min/max statistics for Iceberg (#24496)
  • Update versions (#24508)
  • Add additional unit tests for pl.concat (#24487)
  • Refactor parametric tests for as_struct on aggstates (#24493)
  • Use PlanCallback in name.map_* (#24484)
  • Pin xlsvwriter to 3.2.5 or before (#24485)
  • Add dataclass to hold resolved iceberg scan data (#24418)
  • Fix iceberg test failure in CI (#24456)
  • Move CompressionUtils to polars-utils (#24430)
  • Update github template to dispatch to cloud client (#24416)

Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @moizescbf, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Python Polars 1.34.0-beta.3

27 Sep 13:14

Choose a tag to compare

Pre-release

🏆 Highlights

  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

  • Skip files in scan_iceberg with filter based on metadata statistics (#24547)
  • Push row_index predicate for all scan types (#24537)
  • Perform integer in-filtering for Parquet inequality predicates (#24525)
  • Stop caching Parquet metadata after 8 files (#24513)
  • Native streaming .mode() expression (#24459)

✨ Enhancements

  • Support scanning from file:/path URIs (#24603)
  • Log which file the schema was sourced from, and which file caused an extra column error (#24621)
  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)
  • Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
  • Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
  • Use fixed-scale Decimals (#24542)
  • Add support for unsigned 128-bit integers (#24346)
  • Add unstable pl.Config.set_default_credential_provider (#24434)
  • Roundtrip BinaryOffset type through Parquet (#24344)
  • Add opt-in unstable functionality to load interval types as Struct (#24320)
  • Support reading parquet metadata from cloud storage (#24443)
  • Add user guide section on AWS role assumption (#24421)
  • Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

  • Widen from_dicts to Iterable[Mapping[str, Any]] (#24584)
  • Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
  • Raise Exception instead of panic when unnest on non-struct column (#24471)
  • Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
  • Newline escaping in streaming show_graph (#24612)
  • Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
  • Sink batches early stop on in-memory engine (#24585)
  • More precisely model expression ordering requirements (#24437)
  • Panic in zero-weight rolling mean/var (#24596)
  • Decimal <-> literal arithmetic supertype rules (#24594)
  • Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
  • Validate list type for list expressions in planner (#24589)
  • Fix scan_iceberg() storage options not taking effect (#24574)
  • Have log() prioritize the leftmost dtype for its output dtype (#24581)
  • CSV pl.len() was incorrect (#24587)
  • Add support for float inputs for duration types (#24529)
  • Roundtrip empty string through hive partitioning (#24546)
  • Fix potential OOB writes in unaligned IPC read (#24550)
  • Fix regression error when scanning AWS presigned URL (#24530)
  • Make PlPath::join for cloud paths replace on absolute paths (#24514)
  • Correct dtype for cum_agg in streaming engine (#24510)
  • Restore support for np.datetime64() in pl.lit() (#24527)
  • Ignore Iceberg list element ID if missing (#24479)
  • Fix panic on streaming full join with coalesce (#23409)
  • Fix AggState on all_literal in BinaryExpr (#24461)
  • Show IR sort options in explain (#24465)
  • Benchmark CI import (#24463)
  • Fix schema on ApplyExpr with single row literal in agg context (#24422)
  • Fix planner schema for dividing pl.Float32 by int (#24432)
  • Fix panic scanning from AWS legacy global endpoint URL (#24450)
  • Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
  • Do not propagate struct of nulls with null (#24420)
  • Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
  • Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

  • Fix syntax error in data-types-and-structures.md (#24606)
  • Rename avg_birthday -> avg_age in examples aggregation (#23726)
  • Update Polars Cloud user guide (#24366)
  • Fix typo in set_expr_depth_warning docstring (#24427)

📦 Build system

  • Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

  • Remove dist/ from release python workflow (#24639)
  • Escape sed ampersand in release script (#24631)
  • Remove PyOdide from release for now (#24630)
  • Fix sed in-place in release script (#24628)
  • Release script pyodide wheel (#24627)
  • Release script pyodide wheel (#24626)
  • Update release script for runtimes (#24610)
  • Remove unused UnknownKind::Ufunc (#24614)
  • Use cargo-run to call dsl-schema script (#24607)
  • Cleanup and prepare to_field for element and struct field context (#24592)
  • Resolve nightly clippy hints (#24593)
  • Rename pl.dependencies to pl._dependencies (#24595)
  • More release scripting (#24582)
  • Again a minor fix for the setup script (#24580)
  • Minor fix in release script (#24579)
  • Correct release python beta version check (#24578)
  • Python dependency failure (#24576)
  • Always install yq (#24570)
  • Deterministic import order for Python Polars package variants (#24531)
  • Check Arrow FFI pointers with an assert (#24564)
  • Add a couple of missing type definitions in python (#24561)
  • Fix quickstart example in Polars Cloud user guide (#24554)
  • Add implementations for loading min/max statistics for Iceberg (#24496)
  • Update versions (#24508)
  • Add additional unit tests for pl.concat (#24487)
  • Refactor parametric tests for as_struct on aggstates (#24493)
  • Use PlanCallback in name.map_* (#24484)
  • Pin xlsvwriter to 3.2.5 or before (#24485)
  • Add dataclass to hold resolved iceberg scan data (#24418)
  • Fix iceberg test failure in CI (#24456)
  • Move CompressionUtils to polars-utils (#24430)
  • Update github template to dispatch to cloud client (#24416)

Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @moizescbf, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Python Polars 1.34.0-beta.1

23 Sep 12:03
04dbc94

Choose a tag to compare

Pre-release

🏆 Highlights

  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

  • Skip files in scan_iceberg with filter based on metadata statistics (#24547)
  • Push row_index predicate for all scan types (#24537)
  • Perform integer in-filtering for Parquet inequality predicates (#24525)
  • Stop caching Parquet metadata after 8 files (#24513)
  • Native streaming .mode() expression (#24459)

✨ Enhancements

  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)
  • Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
  • Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
  • Use fixed-scale Decimals (#24542)
  • Add support for unsigned 128-bit integers (#24346)
  • Add unstable pl.Config.set_default_credential_provider (#24434)
  • Roundtrip BinaryOffset type through Parquet (#24344)
  • Add opt-in unstable functionality to load interval types as Struct (#24320)
  • Support reading parquet metadata from cloud storage (#24443)
  • Add user guide section on AWS role assumption (#24421)
  • Support unique / n_unique / arg_unique for array columns (#24406)

🐞 Bug fixes

  • Add support for float inputs for duration types (#24529)
  • Roundtrip empty string through hive partitioning (#24546)
  • Fix potential OOB writes in unaligned IPC read (#24550)
  • Fix regression error when scanning AWS presigned URL (#24530)
  • Make PlPath::join for cloud paths replace on absolute paths (#24514)
  • Correct dtype for cum_agg in streaming engine (#24510)
  • Restore support for np.datetime64() in pl.lit() (#24527)
  • Ignore Iceberg list element ID if missing (#24479)
  • Fix panic on streaming full join with coalesce (#23409)
  • Fix AggState on all_literal in BinaryExpr (#24461)
  • Show IR sort options in explain (#24465)
  • Benchmark CI import (#24463)
  • Fix schema on ApplyExpr with single row literal in agg context (#24422)
  • Fix planner schema for dividing pl.Float32 by int (#24432)
  • Fix panic scanning from AWS legacy global endpoint URL (#24450)
  • Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data (#23405)
  • Do not propagate struct of nulls with null (#24420)
  • Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
  • Implement approx_n_unique for temporal dtypes and Null (#24417)

📖 Documentation

  • Rename avg_birthday -> avg_age in examples aggregation (#23726)
  • Update Polars Cloud user guide (#24366)
  • Fix typo in set_expr_depth_warning docstring (#24427)

🛠️ Other improvements

  • More release scripting (#24582)
  • Again a minor fix for the setup script (#24580)
  • Minor fix in release script (#24579)
  • Correct release python beta version check (#24578)
  • Python dependency failure (#24576)
  • Always install yq (#24570)
  • Deterministic import order for Python Polars package variants (#24531)
  • Check Arrow FFI pointers with an assert (#24564)
  • Add a couple of missing type definitions in python (#24561)
  • Fix quickstart example in Polars Cloud user guide (#24554)
  • Add implementations for loading min/max statistics for Iceberg (#24496)
  • Update versions (#24508)
  • Add additional unit tests for pl.concat (#24487)
  • Refactor parametric tests for as_struct on aggstates (#24493)
  • Use PlanCallback in name.map_* (#24484)
  • Pin xlsvwriter to 3.2.5 or before (#24485)
  • Add dataclass to hold resolved iceberg scan data (#24418)
  • Fix iceberg test failure in CI (#24456)
  • Move CompressionUtils to polars-utils (#24430)
  • Update github template to dispatch to cloud client (#24416)

Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Rust Polars 0.51.0

16 Sep 08:51
400ca33

Choose a tag to compare

💥 Breaking changes

  • Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

🚀 Performance improvements

  • Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
  • Allocate only for read items when reading Parquet with predicate (#24401)
  • Don't aggregate groups for strict cast if original len (#24381)
  • Allocate only for read items when reading Parquet with predicate (#24324)
  • Native streaming int_range with len or count (#24280)
  • Lower arg_unique natively to the streaming engine (#24279)
  • Move unordering optimization to end (#24286)
  • Do ordering simplification step after common sub-plan elimination (#24269)
  • Always simplify order requirements in IR (#24192)
  • Basic de-duplication of filter expressions (#24220)
  • Cache the IR in pipe_with_schema (#24213)
  • Lower arg_where natively to streaming engine (#24088)
  • Lower Expr.shift to streaming engine (#24106)
  • Lower order-preserving groupby to streaming engine (#24053)
  • Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
  • Lower top-k to streaming engine (#23979)
  • Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)

✨ Enhancements

  • Roundtrip BinaryOffset type through Parquet (#24344)
  • Add opt-in unstable functionality to load interval types as Struct (#24320)
  • Add user guide section on AWS role assumption (#24421)
  • Support unique / n_unique / arg_unique for array columns (#24406)
  • Support S3 virtual-hosted–style URI (#24405)
  • Remove explicit file create for local async writes (#24358)
  • Support Partitioning sinks in cloud (#24399)
  • User-friendly error message on empty path expansion (#24337)
  • Add Polars security policy (#24314)
  • Allow pl.Expr.log to take in an expression (#24226)
  • Implement diff() in streaming engine (#24189)
  • Enable Expr.diff(n) for negative n (#24200)
  • Allow upcasting null-typed columns to nested column types in scans (#24185)
  • Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
  • Add a deprecation warning for pl.Series.shift(Null) (#24114)
  • Improve Debug formatting of DataType (#24056)
  • Add cum_* as native streaming nodes (#23977)
  • Add peak_{min,max} support for booleans (#24068)
  • Add DataFrame.map_columns for eager evaluation (#23821)
  • Add native streaming for peaks_{min,max} (#24039)
  • IR graph arrows, monospace font, box nodes (#24021)
  • Add DataTypeExpr.default_value (#23973)
  • Lower rle to a native streaming engine node (#23929)
  • Add support for Int128 to pyo3-polars (#23959)
  • Lower rle_id to a native streaming node (#23894)
  • Pass endpoint_url loaded from CredentialProviderAWS to scan/write_delta (#23812)
  • Dispatch scan_iceberg to native by default (#23912)
  • Lower unique_counts and value_counts to streaming engine (#23890)
  • Implement dt.days_in_month function (#23119)
  • Fix errors on native scan_iceberg (#23811)
  • Reinterpret binary data to fixed size numerical array (#22840)
  • Make rolling_map serializable (#23848)

🐞 Bug fixes

  • Fix AggState on all_literal in BinaryExpr (#24461)
  • Replace unsafe with collect (#24494)
  • Show IR sort options in explain (#24465)
  • Benchmark CI import (#24463)
  • Fix schema on ApplyExpr with single row literal in agg context (#24422)
  • Fix planner schema for dividing pl.Float32 by int (#24432)
  • Fix panic scanning from AWS legacy global endpoint URL (#24450)
  • Emit proper tuple for Log in expression nodes (#24426)
  • Do not propagate struct of nulls with null (#24420)
  • Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
  • Implement approx_n_unique for temporal dtypes and Null (#24417)
  • Correct sink_ipc overload for compression (#24398)
  • Enable all integer dtypes for by parameter in join_asof (#24384)
  • Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
  • Fix incorrect output ordering for row-separable exprs (#24354)
  • Fix Series.__arrow_c_stream__ for Decimal and other logical types (#24120)
  • Match output type to engine for Struct arithmetic (#23805)
  • Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
  • Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
  • Incorrect logic in negative streaming slice (#24326)
  • Do not error on non-list Sequence for columns parameter in read_excel (#23967)
  • Invalid conversion from non-bit numpy bools (#24312)
  • Make dt.epoch('s') serializable (#24302)
  • Make Expr.rechunk serializable (#24303)
  • Schema mismatch for 'log' operation (#24300)
  • Incorrect first/last aggregate in streaming engine (#24289)
  • Fix group offsets in sliced groups (#24274)
  • Panic in inexact date(time) conversion (#24268)
  • The index_of feature should not depends on the object feature (#24256)
  • Keep DSL cache after serialization and deserialization (#24265)
  • Sanitize and warn about eval usage (#24262)
  • Unique with keep="none" in new optimization pass (#24261)
  • Correct size limits for Decimal cast (#24252)
  • Unordered unions in check order observing pass (#24253)
  • Fix dtype for slice on Literal in agg context (#24137)
  • Fix incorrect filter(lit(True)) when scanning hive (#24237)
  • In-memory group_by on 128-bit integers (#24242)
  • Fix panic in gather inside groupby with invalid indices (#24182)
  • Release the GIL in map_groups (#24225)
  • Remove extra explode in LazyGroupBy.{head,tail} (#24221)
  • Fix panic in polars cloud CSV scan (#24197)
  • Fix panic when loading categorical columns from IO plugin (#24205)
  • Fix engine type for concat_list on AggScalar implode (#24160)
  • Rolling_mean handle centered weights with len(values) < window_size (#24158)
  • Reading is_in predicate for Parquet plain strings (#24184)
  • Make PyCategories pickleable (#24170)
  • Remove unused unsound function to_mutable_slice (#24173)
  • PyO3 extension types giving compat_level errors (#24166)
  • Allow non-elementwise by in top_k (#24164)
  • Fix sort_by for group_by_dynamic context (#24152)
  • Input-independent length aggregations in streaming (#24153)
  • Release GIL when iterating df in to_arrow (#24151)
  • Respect non-elementwise join_where conditions (#24135)
  • Resolve schema mismatch for div on Boolean (#24111)
  • Keep name when doing empty group-aware aggregation (#24098)
  • Implode instead of reshape_list (#24078)
  • Rolling mean with weights incorrect when min_samples < window_size (#23485)
  • Allow merge_sorted for all types (#24077)
  • Include datatypes in row_encode expression (#24074)
  • Include UDF materialized type in serialization (#24073)
  • Correct .rolling() output type for non-aggregations (#24072)
  • Correct planner output schema for join_asof (#24071)
  • Allow %B to work without specifying day (#24009)
  • Correct output for fold and reduce (#24069)
  • Expr.meta.output_name for struct fields (#24064)
  • Ensure upcast operations on pl.Date default to microsecond precision (#23981)
  • Add peak_{min,max} support for booleans (#24068)
  • Planner output type for mean with strange input type (#24052)
  • Remove, deprecate or change eager Exprs to be lazy compatible (#24027)
  • Scan of multiple sources with null datatype (#24065)
  • Categorical in nested data in row encoding (#24051)
  • Missing length update in builder for pl.Array repetition (#24055)
  • Race condition in global categories init (#24045)
  • Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
  • Error when using named functions (#24041)
  • Don't encode entire CategoricalMapping when going to Arrow (#24036)
  • Fix cast on arithmetic with lit (#23941)
  • Incorrect slice-slice pushdown (#24032)
  • Dedup common cache subplan in IR graph (#24028)
  • Allow join on Decimal in in-memory engine (#24026)
  • Fix datatypes for eval.list in aggregation context (#23911)
  • Allocator capsule fallback panic (#24022)
  • Accept another zlib "magic header" file signature (#24013)
  • Fix truediv dtypes so cast in list.eval is not dropped (#23936)
  • Don't reuse cached return_dtype for expanded map expressions (#24010)
  • Cache id is not a valid dot node id (#24005)
  • Align map_elements with and without return_dtype (#24007)
  • Fix column dtype lifetime for csv_write segfault on Categorical (#23986)
  • Allow serializing LazyGroupBy.map_groups (#23964)
  • Correct allocator name in PyCapsule (#23968)
  • Mismatched types for write function for windows (#23915)
  • Fix unpivot panic when index= column not found (#23958)
  • Fix assert_frame_equal with check_dtypes=False for all-null series with different types (#23943)
  • Return correct python package version (#23951)
  • Categorical namespace functions fail on Enum columns (#23925)
  • Properly set sumwise complete on filter for missing columns (#23877)
  • Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
  • Group By with filters (#23917)
  • Fix read_csv ignoring Decimal schema for header-only data (#23886)
  • Ensure collect() native Iceberg always scans latest when no snapshot_id is given (#23907)
  • Writing List(Array) columns to JSON without panic (#23875)
  • Fill Iceberg missing fields with partition values if present in metadata (#23900)
  • Create file for streaming sink even if unspawned (#23672)
  • Update cloud testing environment (#23908)
  • Parquet filtering on multiple RGs with literal predicate (#23903)
  • Incorrect datatype passed to libc::write (#23904)
  • Properly feature gate TZ_AWARE_RE usage (#23888)
  • Improve identification of "non group-key" aggregates in SQL GROUP BY queries (#23191)
  • Spawning tokio task outside reactor (#23884)
  • Correctly raise DuplicateError on asof_join with suffix="" (#23864)
  • Fix errors on native scan_iceberg (#23811)
  • Fix index ...
Read more

Python Polars 1.33.1

09 Sep 08:38
1dc7792

Choose a tag to compare

🚀 Performance improvements

  • Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
  • Allocate only for read items when reading Parquet with predicate (#24401)
  • Don't aggregate groups for strict cast if original len (#24381)
  • Allocate only for read items when reading Parquet with predicate (#24324)

✨ Enhancements

  • Support S3 virtual-hosted–style URI (#24405)
  • Remove explicit file create for local async writes (#24358)
  • Add PyCapsule __arrow_c_schema__ interface to pl.Schema (#24365)
  • Support Partitioning sinks in cloud (#24399)
  • User-friendly error message on empty path expansion (#24337)
  • Add unstable pre_execution_query parameter to read_database_uri (#23634)
  • Add Polars security policy (#24314)

🐞 Bug fixes

  • Correct sink_ipc overload for compression (#24398)
  • Enable all integer dtypes for by parameter in join_asof (#24384)
  • Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
  • Wrap deprecated top-level imports in TYPE_CHECKING (#24340)
  • Fix incorrect output ordering for row-separable exprs (#24354)
  • Fix Series.__arrow_c_stream__ for Decimal and other logical types (#24120)
  • Match output type to engine for Struct arithmetic (#23805)
  • Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
  • Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
  • Don't throw away type information for NumPy numeric values when using lit() (#24229)
  • Incorrect logic in negative streaming slice (#24326)
  • Ensure read_database_uri with ADBC works as expected with DuckDB URIs (#24097)
  • Do not error on non-list Sequence for columns parameter in read_excel (#23967)

📖 Documentation

  • Document newly added is_pure parameter for register_io_source (#24311)
  • Create a module docstring for the public polars module (#24332)
  • Update to Polars Cloud user guide (#24187)
  • Update distributed page (#24323)
  • Add a note and example about exporting unformatted Excel sheet data (#24145)
  • Add detail about server-side cursor behaviour for SQLAlchemy in the "iter_batches" parameter of read_database (#24094)
  • Add Polars security policy (#24314)

🛠️ Other improvements

  • Bump c-api (#24412)
  • Add a regression test for #7631 (#24363)
  • Update cloud test InteractiveQuery to DirectQuery (#24287)
  • Mark some tests as slow (#24327)
  • Mark more tests as ready for cloud (#24315)
  • Add hint to update PYPOLARS_VERSION on version assert test (#24313)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @VictorAtIfInsurance, @alexander-beedie, @coastalwhite, @dsprenkels, @itamarst, @kdn36, @kuril, @mcrumiller, @nameexhaustion, @nesb1, @orlp, @r-brink and @ritchie46