Releases: pola-rs/polars
Python Polars 1.35.1
🚀 Performance improvements
- Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
- Skip filtering scan IR if no paths were filtered (#25037)
- Optimize ipc stream read performance (#24671)
✨ Enhancements
- Support BYTE_ARRAY backed Decimals in Parquet (#25076)
- Allow
glimpseto return aDataFrame(#24803) - Add
allow_emptyflag toitem(#25048)
🐞 Bug fixes
- The
SQLinterface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091) - Fix panic if scan predicate produces 0 length mask (#25089)
- Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
- Panic in
group_by_dynamicwithgroup_byand multiple chunks (#25075) - Minor improvement to internal
is_pycapsuleutility function (#25073) - Fix panic when using struct field as join key (#25059)
- Allow broadcast in
group_byforApplyExprandBinaryExpr(#25053) - Fix field metadata for nested categorical PyCapsule export (#25052)
- Block predicate pushdown when
group_bykey values are changed (#25032) - Group-By aggregation problems caused by
AmortSeries(#25043) - Don't push down predicates passed inserted cache nodes (#25042)
- Allow for negative time in
group_by_dynamiciterator (#25041)
📖 Documentation
- Fix typo in public dataset URL (#25044)
🛠️ Other improvements
- Disable recursive CSPE for now (#25085)
- Change group length mismatch error to
ShapeError(#25004) - Update toolchain (#25007)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @Liyixin95, @alexander-beedie, @coastalwhite, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.35.0
🏆 Highlights
- Stabilize decimal (#25020)
🚀 Performance improvements
- Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
- Lower
uniqueto native group-by and speed upn_uniquein group-by context (#24976) - Better parallelize
take{_slice,}_unchecked(#24980) - Implement native
skewandkurtosisin group-by context (#24961) - Use native group-by aggregations for
bitwise_*operations (#24935) - Address
group_by_dynamicslowness in sparse data (#24916) - Push filters to PyIceberg (#24910)
- Native
filter/drop_nulls/drop_nansin group-by context (#24897) - Implement
cumulative_evalusing the group-by engine (#24889) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Implement native
null_count,anyandallgroup-by aggregations (#24859) - Speed up
reversein group-by context (#24855) - Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/laston Decimals, Categoricals and Enums (#24786) - Implement indexed method for
BitMapIter::nth(#24766) - Pushdown slices on plans within unions (#24735)
✨ Enhancements
- Stabilize decimal (#25020)
- Support
ewm_mean()in streaming engine (#25003) - Improve row-count estimates (#24996)
- Remove filtered scan paths in IR when possible (#24974)
- Introduce remote Polars MCP server (#24977)
- Allow local scans on polars cloud (configurable) (#24962)
- Add
Expr.itemto strictly extract a single value from an expression (#24888) - Add environment variable to roundtrip empty struct in Parquet (#24914)
- Fast-count for
scan_iceberg().select(len())(#24602) - Add
globparameter toscan_ipc(#24898) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Add
list.aggandarr.agg(#24790) - Implement
{Expr,Series}.rolling_rank()(#24776) - Don't require PyArrow for
read_database_uriif ADBC engine version supports PyCapsule interface (#24029) - Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval(#24472) - Drop PyArrow requirement for non-batched usage of
read_databasewith the ADBC engine and supportiter_batcheswith the ADBC engine (#24180) - Improve rolling_(sum|mean) accuracy (#24743)
- Add
separatorto{Data,Lazy}Frame.unnest(#24716) - Add
union()function for unordered concatenation (#24298) - Add
name.replaceto the set of column rename options (#17942) - Support
np.ndarray -> AnyValueconversion (#24748) - Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrameload from list of dicts (#24739) - Add support for UInt128 to pyo3-polars (#24731)
🐞 Bug fixes
- Re-enable CPU feature check before import (#25010)
- Implement
read_excelworkaround for fastexcel/calamine issue loading a column subset from a named table (#25012) - Correctness
any(ignore_nulls)and OOB inall(#25005) - Streaming any/all with ignore_nulls=False (#25008)
- Fix incorrect
join_asofon a casted expression (#25006) - Optimize memory on rolling groups in
ApplyExpr(#24709) - Fallback
Pyarrowscan to in-memory engine (#24991) - Make
Operator::swap_operandsreturn correct operators forPlus,Minus,MultiplyandDivide(#24997) - Capitalize letters after numbers in to_titlecase (#24993)
- Preserve null values in
pct_change(#24952) - Raise length mismatch on
overwith sliced groups (#24887) - Check duplicate name in transpose (#24956)
- Follow Kleene logic in
any/allfor group-by (#24940) - Do not optimize cross join to iejoin if order maintaining (#24950)
- Fix typing of
scan_parquetpartially unknown (#24928) - Properly release the GIL for
read_parquet_metadata(#24922) - Broadcast
partition_bycolumns inoverexpression (#24874) - Clear index cache on stacked
df.filterexpressions (#24870) - Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index()afterscan()silently ignored (#24866) - Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpringroup_bydispatch logic (#24548) - Fix aggstate for
gather(#24857) - Keep scalars for length preserving functions in
group_by(#24819) - Have
rangefeature depend ondtype-arrayfeature (#24853) - Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr(#24650) - Allow aggregations on
AggState::LiteralScalar(#24820) - Dispatch to
group_awarefor fallible expressions with masked out elements (#24815) - Fix error for
arr.sum()on small integer Array dtypes containing nulls (#24478) - Fix regression on
write_database()to Snowflake due to unsupported string view type (#24622) - Fix XOR did not follow kleene when one side is unit-length (#24810)
- Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlappinginstead ofrolling(#24787) - Fix iterable on
dynamic_group_byandrollingobject (#24740) - Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64(#24775) - Add
Expr.signforDecimaldatatype (#24717) - Correct
str.replacewith missing pattern (#24768) - Ensure
schema_overridesis respected when loading iterable row data (#24721) - Support
decimal_commaonDecimaltype inwrite_csv(#24718)
📖 Documentation
- Introduce remote Polars MCP server (#24977)
- Add
{arr,list}.aggAPI references (#24970) - Support LLM in docs (#24958)
- Update Cloud docs with correct fn argument order (#24939)
- Update
name.replaceexamples (#24941) - Add i128 and u128 features to user guide (#24938)
- Add partitioning examples for
sink_*methods (#24918) - Add more
{unique,value}_countsexamples (#24927) - Indent the versionchanged (#24783)
- Relax fsspec wording (#24881)
- Add
pl.fieldinto the api docs (#24846) - Fix duplicated article in SECURITY.md (#24762)
- Document output name determination in when/then/otherwise (#24746)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
📦 Build system
- Ensure
build_feature_flags.pyis included in artifact (#25024) - Update pyo3 and numpy crates to version 0.26 (#24760)
🛠️ Other improvements
- Fix benchmark ci (#25019)
- Fix non-deterministic test (#25009)
- Fix makefile arch detection (#25011)
- Make
LazyFrame.set_sortedinto aFunctionIR::Hint(#24981) - Remove symbolic links (#24982)
- Deprecate
Expr.agg_groups()andpl.groups()(#24919) - Dispatch to no-op rayon thread-pool from streaming (#24957)
- Unpin pydantic (#24955)
- Ensure safety of scan fast-count IR lowering in streaming (#24953)
- Re-use iterators in
set_operations (#24850) - Remove
GroupByPartitionedand dispatch to streaming engine (#24903) - Turn
element()into{A,}Expr::Element(#24885) - Pass
ScanOptionstonew_from_ipc(#24893) - Update tests to be index type agnostic (#24891)
- Unset
ContextinWindowexpression (#24875) - Fix failing delta test (#24867)
- Move
FunctionExprdispatch fromplantoexpr(#24839) - Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr(#24825) - Add
days_in_monthto documentation (#24822) - Enable ruff D417 lint (#24814)
- Turn
pl.formatinto proper elementwise expression (#24811) - Fix remote benchmark by no-longer saving builds (#24812)
- Refactor
ApplyExpringroup_bycontext on multiple inputs (#24520) - IR text plan graph generator (#24733)
- Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rollinggroups tooverlapping(#24577) - Refactor
DataTypepropteststrategies (#24763) - Add
unionto documentation (#24769)
Thank you to all our contributors for making this release possible!
@EndPositive, @EnricoMi, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @mjanssen, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @thomasjpfan and @williambdean
Python Polars 1.35.0-beta.1
🚀 Performance improvements
- Address
group_by_dynamicslowness in sparse data (#24916) - Push filters to PyIceberg (#24910)
- Native
filter/drop_nulls/drop_nansin group-by context (#24897) - Implement
cumulative_evalusing the group-by engine (#24889) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Implement native
null_count,anyandallgroup-by aggregations (#24859) - Speed up
reversein group-by context (#24855) - Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/laston Decimals, Categoricals and Enums (#24786) - Implement indexed method for
BitMapIter::nth(#24766) - Pushdown slices on plans within unions (#24735)
✨ Enhancements
- Add environment variable to roundtrip empty struct in Parquet (#24914)
- Fast-count for
scan_iceberg().select(len())(#24602) - Add
globparameter toscan_ipc(#24898) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Add
list.aggandarr.agg(#24790) - Implement
{Expr,Series}.rolling_rank()(#24776) - Don't require PyArrow for
read_database_uriif ADBC engine version supports PyCapsule interface (#24029) - Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval(#24472) - Drop PyArrow requirement for non-batched usage of
read_databasewith the ADBC engine and supportiter_batcheswith the ADBC engine (#24180) - Improve rolling_(sum|mean) accuracy (#24743)
- Add
separatorto{Data,Lazy}Frame.unnest(#24716) - Add
union()function for unordered concatenation (#24298) - Add
name.replaceto the set of column rename options (#17942) - Support
np.ndarray -> AnyValueconversion (#24748) - Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrameload from list of dicts (#24739) - Add support for UInt128 to pyo3-polars (#24731)
🐞 Bug fixes
- Properly release the GIL for
read_parquet_metadata(#24922) - Broadcast
partition_bycolumns inoverexpression (#24874) - Clear index cache on stacked
df.filterexpressions (#24870) - Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index()afterscan()silently ignored (#24866) - Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpringroup_bydispatch logic (#24548) - Fix aggstate for
gather(#24857) - Keep scalars for length preserving functions in
group_by(#24819) - Have
rangefeature depend ondtype-arrayfeature (#24853) - Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr(#24650) - Allow aggregations on
AggState::LiteralScalar(#24820) - Dispatch to
group_awarefor fallible expressions with masked out elements (#24815) - Fix error for
arr.sum()on small integer Array dtypes containing nulls (#24478) - Fix regression on
write_database()to Snowflake due to unsupported string view type (#24622) - Fix XOR did not follow kleene when one side is unit-length (#24810)
- Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlappinginstead ofrolling(#24787) - Fix iterable on
dynamic_group_byandrollingobject (#24740) - Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64(#24775) - Add
Expr.signforDecimaldatatype (#24717) - Correct
str.replacewith missing pattern (#24768) - Ensure
schema_overridesis respected when loading iterable row data (#24721) - Support
decimal_commaonDecimaltype inwrite_csv(#24718)
📖 Documentation
- Add partitioning examples for
sink_*methods (#24918) - Add more
{unique,value}_countsexamples (#24927) - Indent the versionchanged (#24783)
- Relax fsspec wording (#24881)
- Add
pl.fieldinto the api docs (#24846) - Fix duplicated article in SECURITY.md (#24762)
- Document output name determination in when/then/otherwise (#24746)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
📦 Build system
- Update pyo3 and numpy crates to version 0.26 (#24760)
🛠️ Other improvements
- Re-use iterators in
set_operations (#24850) - Remove
GroupByPartitionedand dispatch to streaming engine (#24903) - Turn
element()into{A,}Expr::Element(#24885) - Pass
ScanOptionstonew_from_ipc(#24893) - Update tests to be index type agnostic (#24891)
- Unset
ContextinWindowexpression (#24875) - Fix failing delta test (#24867)
- Move
FunctionExprdispatch fromplantoexpr(#24839) - Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr(#24825) - Add
days_in_monthto documentation (#24822) - Enable ruff D417 lint (#24814)
- Turn
pl.formatinto proper elementwise expression (#24811) - Fix remote benchmark by no-longer saving builds (#24812)
- Refactor
ApplyExpringroup_bycontext on multiple inputs (#24520) - IR text plan graph generator (#24733)
- Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rollinggroups tooverlapping(#24577) - Refactor
DataTypepropteststrategies (#24763) - Add
unionto documentation (#24769)
Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @nameexhaustion, @orlp, @pavelzw, @ritchie46, @thomasjpfan and @williambdean
Python Polars 1.34.0
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Optimize gather_every(n=1) to slice (#24704)
- Lower null count to streaming engine (#24703)
- Native streaming
gather_every(#24700) - Pushdown filter with
strptimeif input is literal (#24694) - Avoid copying expanded paths (#24669)
- Relax filter expr ordering (#24662)
- Remove unnecessary
groupscall inaggregated(#24651) - Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Implement maintain_order for cross join (#24665)
- Add support to output
dt.total_{}()duration values as fractionals (#24598) - Avoid forcing a
pyarrowdependency inread_excelwhen using the default "calamine" engine (#24655) - Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Removing dots after noqa comments (#24722)
- Parse
Decimalwith comma as decimal separator in CSV (#24685) - Make
Categoriespickleable (#24691) - Shift on array within list (#24678)
- Fix handling of
AggregatedScalarinApplyExprsingle input (#24634) - Support reading of mixed compressed/uncompressed IPC buffers (#24674)
- Overflow in slice-slice optimization (#24658)
- Package discovery for
setuptools(#24656) - Add type assertion to prevent out-of-bounds in
GenericFirstLastGroupedReduction(#24590) - Remove inclusion of polars dir in runtime sdist/wheel (#24654)
- Method
dt.month_endwas unnecessarily raising when the month-start timestamp was ambiguous (#24647) - Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Add default parquet compression levels (#24686)
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
🛠️ Other improvements
- Removing dots after noqa comments (#24722)
- Make
test_multiple_sorting_columnstest runnable (#24719) - Remove
{Upper,Lower}Boundexpressions in IR (#24701) - Fix Makefile
uv pipoption syntax (#24711) - Add egg-info to gitignore (#24712)
- Restructure python project directories again (#24676)
- Use IR for
polars-exproutput field resolution (#24661) - Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @alonsosilvaallende, @andreseje, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @eitsupi, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @moizescbf, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.34.0-beta.5
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Pushdown filter with
strptimeif input is literal (#24694) - Avoid copying expanded paths (#24669)
- Relax filter expr ordering (#24662)
- Remove unnecessary
groupscall inaggregated(#24651) - Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Implement maintain_order for cross join (#24665)
- Add support to output
dt.total_{}()duration values as fractionals (#24598) - Avoid forcing a
pyarrowdependency inread_excelwhen using the default "calamine" engine (#24655) - Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Make
Categoriespickleable (#24691) - Shift on array within list (#24678)
- Fix handling of
AggregatedScalarinApplyExprsingle input (#24634) - Support reading of mixed compressed/uncompressed IPC buffers (#24674)
- Overflow in slice-slice optimization (#24658)
- Package discovery for
setuptools(#24656) - Add type assertion to prevent out-of-bounds in
GenericFirstLastGroupedReduction(#24590) - Remove inclusion of polars dir in runtime sdist/wheel (#24654)
- Method
dt.month_endwas unnecessarily raising when the month-start timestamp was ambiguous (#24647) - Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Add default parquet compression levels (#24686)
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
🛠️ Other improvements
- Restructure python project directories again (#24676)
- Use IR for
polars-exproutput field resolution (#24661) - Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @eitsupi, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @moizescbf, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.34.0-beta.4
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
- Use cargo-run to call dsl-schema script (#24607)
🛠️ Other improvements
- Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @moizescbf, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.34.0-beta.3
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
- Use cargo-run to call dsl-schema script (#24607)
🛠️ Other improvements
- Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @moizescbf, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.34.0-beta.1
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
🛠️ Other improvements
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Rust Polars 0.51.0
💥 Breaking changes
- Remove, deprecate or change eager
Exprs to be lazy compatible (#24027)
🚀 Performance improvements
- Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
- Allocate only for read items when reading Parquet with predicate (#24401)
- Don't aggregate groups for strict cast if original len (#24381)
- Allocate only for read items when reading Parquet with predicate (#24324)
- Native streaming
int_rangewithlenorcount(#24280) - Lower
arg_uniquenatively to the streaming engine (#24279) - Move unordering optimization to end (#24286)
- Do ordering simplification step after common sub-plan elimination (#24269)
- Always simplify order requirements in IR (#24192)
- Basic de-duplication of filter expressions (#24220)
- Cache the IR in
pipe_with_schema(#24213) - Lower
arg_wherenatively to streaming engine (#24088) - Lower Expr.shift to streaming engine (#24106)
- Lower order-preserving groupby to streaming engine (#24053)
- Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
- Lower top-k to streaming engine (#23979)
- Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)
✨ Enhancements
- Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406) - Support S3 virtual-hosted–style URI (#24405)
- Remove explicit file create for local async writes (#24358)
- Support Partitioning sinks in cloud (#24399)
- User-friendly error message on empty path expansion (#24337)
- Add Polars security policy (#24314)
- Allow pl.Expr.log to take in an expression (#24226)
- Implement diff() in streaming engine (#24189)
- Enable Expr.diff(n) for negative n (#24200)
- Allow upcasting null-typed columns to nested column types in scans (#24185)
- Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
- Add a deprecation warning for pl.Series.shift(Null) (#24114)
- Improve Debug formatting of DataType (#24056)
- Add
cum_*as native streaming nodes (#23977) - Add peak_{min,max} support for booleans (#24068)
- Add
DataFrame.map_columnsfor eager evaluation (#23821) - Add native streaming for
peaks_{min,max}(#24039) - IR graph arrows, monospace font, box nodes (#24021)
- Add
DataTypeExpr.default_value(#23973) - Lower
rleto a native streaming engine node (#23929) - Add support for
Int128to pyo3-polars (#23959) - Lower
rle_idto a native streaming node (#23894) - Pass
endpoint_urlloaded fromCredentialProviderAWStoscan/write_delta(#23812) - Dispatch
scan_icebergto native by default (#23912) - Lower
unique_countsandvalue_countsto streaming engine (#23890) - Implement
dt.days_in_monthfunction (#23119) - Fix errors on native
scan_iceberg(#23811) - Reinterpret binary data to fixed size numerical array (#22840)
- Make
rolling_mapserializable (#23848)
🐞 Bug fixes
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Replace unsafe with collect (#24494)
- Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Emit proper tuple for Log in expression nodes (#24426)
- Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417) - Correct
sink_ipcoverload for compression (#24398) - Enable all integer dtypes for
byparameter injoin_asof(#24384) - Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
- Fix incorrect output ordering for row-separable exprs (#24354)
- Fix
Series.__arrow_c_stream__for Decimal and other logical types (#24120) - Match output type to engine for
Structarithmetic (#23805) - Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
- Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
- Incorrect logic in negative streaming slice (#24326)
- Do not error on non-list
Sequenceforcolumnsparameter inread_excel(#23967) - Invalid conversion from non-bit numpy bools (#24312)
- Make
dt.epoch('s')serializable (#24302) - Make
Expr.rechunkserializable (#24303) - Schema mismatch for 'log' operation (#24300)
- Incorrect first/last aggregate in streaming engine (#24289)
- Fix group offsets in sliced groups (#24274)
- Panic in inexact date(time) conversion (#24268)
- The
index_offeature should not depends on theobjectfeature (#24256) - Keep DSL cache after serialization and deserialization (#24265)
- Sanitize and warn about eval usage (#24262)
- Unique with keep="none" in new optimization pass (#24261)
- Correct size limits for Decimal cast (#24252)
- Unordered unions in check order observing pass (#24253)
- Fix dtype for
sliceonLiteralin agg context (#24137) - Fix incorrect
filter(lit(True))when scanning hive (#24237) - In-memory group_by on 128-bit integers (#24242)
- Fix panic in
gatherinside groupby with invalid indices (#24182) - Release the GIL in map_groups (#24225)
- Remove extra explode in
LazyGroupBy.{head,tail}(#24221) - Fix panic in polars cloud CSV scan (#24197)
- Fix panic when loading categorical columns from IO plugin (#24205)
- Fix engine type for
concat_liston AggScalarimplode(#24160) - Rolling_mean handle centered weights with len(values) < window_size (#24158)
- Reading
is_inpredicate for Parquet plain strings (#24184) - Make PyCategories pickleable (#24170)
- Remove unused unsound function
to_mutable_slice(#24173) - PyO3 extension types giving compat_level errors (#24166)
- Allow non-elementwise by in top_k (#24164)
- Fix
sort_byforgroup_by_dynamiccontext (#24152) - Input-independent length aggregations in streaming (#24153)
- Release GIL when iterating df in to_arrow (#24151)
- Respect non-elementwise join_where conditions (#24135)
- Resolve schema mismatch for div on Boolean (#24111)
- Keep name when doing empty group-aware aggregation (#24098)
- Implode instead of
reshape_list(#24078) - Rolling mean with weights incorrect when min_samples < window_size (#23485)
- Allow
merge_sortedfor all types (#24077) - Include datatypes in
row_encodeexpression (#24074) - Include UDF materialized type in serialization (#24073)
- Correct
.rolling()output type for non-aggregations (#24072) - Correct planner output schema for
join_asof(#24071) - Allow %B to work without specifying day (#24009)
- Correct output for
foldandreduce(#24069) - Expr.meta.output_name for struct fields (#24064)
- Ensure upcast operations on
pl.Datedefault to microsecond precision (#23981) - Add peak_{min,max} support for booleans (#24068)
- Planner output type for
meanwith strange input type (#24052) - Remove, deprecate or change eager
Exprs to be lazy compatible (#24027) - Scan of multiple sources with
nulldatatype (#24065) - Categorical in nested data in row encoding (#24051)
- Missing length update in builder for pl.Array repetition (#24055)
- Race condition in global categories init (#24045)
- Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
- Error when using named functions (#24041)
- Don't encode entire CategoricalMapping when going to Arrow (#24036)
- Fix cast on arithmetic with
lit(#23941) - Incorrect slice-slice pushdown (#24032)
- Dedup common cache subplan in IR graph (#24028)
- Allow join on Decimal in in-memory engine (#24026)
- Fix datatypes for
eval.listin aggregation context (#23911) - Allocator capsule fallback panic (#24022)
- Accept another zlib "magic header" file signature (#24013)
- Fix
truedivdtypes socastinlist.evalis not dropped (#23936) - Don't reuse cached
return_dtypefor expanded map expressions (#24010) - Cache id is not a valid dot node id (#24005)
- Align
map_elementswith and withoutreturn_dtype(#24007) - Fix column dtype lifetime for
csv_writesegfault onCategorical(#23986) - Allow serializing
LazyGroupBy.map_groups(#23964) - Correct allocator name in
PyCapsule(#23968) - Mismatched types for
writefunction for windows (#23915) - Fix
unpivotpanic whenindex=column not found (#23958) - Fix
assert_frame_equalwithcheck_dtypes=Falsefor all-null series with different types (#23943) - Return correct python package version (#23951)
- Categorical namespace functions fail on
Enumcolumns (#23925) - Properly set sumwise complete on filter for missing columns (#23877)
- Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
- Group By with filters (#23917)
- Fix
read_csvignoring Decimal schema for header-only data (#23886) - Ensure
collect()native Iceberg always scans latest when nosnapshot_idis given (#23907) - Writing List(Array) columns to JSON without panic (#23875)
- Fill Iceberg missing fields with partition values if present in metadata (#23900)
- Create file for streaming sink even if unspawned (#23672)
- Update cloud testing environment (#23908)
- Parquet filtering on multiple RGs with literal predicate (#23903)
- Incorrect datatype passed to libc::write (#23904)
- Properly feature gate TZ_AWARE_RE usage (#23888)
- Improve identification of "non group-key" aggregates in SQL
GROUP BYqueries (#23191) - Spawning tokio task outside reactor (#23884)
- Correctly raise DuplicateError on asof_join with suffix="" (#23864)
- Fix errors on native
scan_iceberg(#23811) - Fix index ...
Python Polars 1.33.1
🚀 Performance improvements
- Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
- Allocate only for read items when reading Parquet with predicate (#24401)
- Don't aggregate groups for strict cast if original len (#24381)
- Allocate only for read items when reading Parquet with predicate (#24324)
✨ Enhancements
- Support S3 virtual-hosted–style URI (#24405)
- Remove explicit file create for local async writes (#24358)
- Add PyCapsule
__arrow_c_schema__interface topl.Schema(#24365) - Support Partitioning sinks in cloud (#24399)
- User-friendly error message on empty path expansion (#24337)
- Add unstable
pre_execution_queryparameter toread_database_uri(#23634) - Add Polars security policy (#24314)
🐞 Bug fixes
- Correct
sink_ipcoverload for compression (#24398) - Enable all integer dtypes for
byparameter injoin_asof(#24384) - Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
- Wrap deprecated top-level imports in TYPE_CHECKING (#24340)
- Fix incorrect output ordering for row-separable exprs (#24354)
- Fix
Series.__arrow_c_stream__for Decimal and other logical types (#24120) - Match output type to engine for
Structarithmetic (#23805) - Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
- Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
- Don't throw away type information for NumPy numeric values when using lit() (#24229)
- Incorrect logic in negative streaming slice (#24326)
- Ensure
read_database_uriwith ADBC works as expected with DuckDB URIs (#24097) - Do not error on non-list
Sequenceforcolumnsparameter inread_excel(#23967)
📖 Documentation
- Document newly added
is_pureparameter forregister_io_source(#24311) - Create a module docstring for the public
polarsmodule (#24332) - Update to Polars Cloud user guide (#24187)
- Update distributed page (#24323)
- Add a note and example about exporting unformatted
Excelsheet data (#24145) - Add detail about server-side cursor behaviour for SQLAlchemy in the "iter_batches" parameter of
read_database(#24094) - Add Polars security policy (#24314)
🛠️ Other improvements
- Bump c-api (#24412)
- Add a regression test for #7631 (#24363)
- Update cloud test
InteractiveQuerytoDirectQuery(#24287) - Mark some tests as slow (#24327)
- Mark more tests as ready for cloud (#24315)
- Add hint to update
PYPOLARS_VERSIONon version assert test (#24313)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @VictorAtIfInsurance, @alexander-beedie, @coastalwhite, @dsprenkels, @itamarst, @kdn36, @kuril, @mcrumiller, @nameexhaustion, @nesb1, @orlp, @r-brink and @ritchie46