Rust Polars 0.51.0
          ·
          
            0 commits
          
          to c42929d2a2556ca6a94ead42eb9c621e231f924f
          since this release
        
        
        
💥 Breaking changes
- Remove, deprecate or change eager 
Exprs to be lazy compatible (#24027) 
🚀 Performance improvements
- Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
 - Allocate only for read items when reading Parquet with predicate (#24401)
 - Don't aggregate groups for strict cast if original len (#24381)
 - Allocate only for read items when reading Parquet with predicate (#24324)
 - Native streaming 
int_rangewithlenorcount(#24280) - Lower 
arg_uniquenatively to the streaming engine (#24279) - Move unordering optimization to end (#24286)
 - Do ordering simplification step after common sub-plan elimination (#24269)
 - Always simplify order requirements in IR (#24192)
 - Basic de-duplication of filter expressions (#24220)
 - Cache the IR in 
pipe_with_schema(#24213) - Lower 
arg_wherenatively to streaming engine (#24088) - Lower Expr.shift to streaming engine (#24106)
 - Lower order-preserving groupby to streaming engine (#24053)
 - Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
 - Lower top-k to streaming engine (#23979)
 - Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)
 
✨ Enhancements
- Roundtrip 
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as 
Struct(#24320) - Add user guide section on AWS role assumption (#24421)
 - Support 
unique/n_unique/arg_uniqueforarraycolumns (#24406) - Support S3 virtual-hosted–style URI (#24405)
 - Remove explicit file create for local async writes (#24358)
 - Support Partitioning sinks in cloud (#24399)
 - User-friendly error message on empty path expansion (#24337)
 - Add Polars security policy (#24314)
 - Allow pl.Expr.log to take in an expression (#24226)
 - Implement diff() in streaming engine (#24189)
 - Enable Expr.diff(n) for negative n (#24200)
 - Allow upcasting null-typed columns to nested column types in scans (#24185)
 - Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
 - Add a deprecation warning for pl.Series.shift(Null) (#24114)
 - Improve Debug formatting of DataType (#24056)
 - Add 
cum_*as native streaming nodes (#23977) - Add peak_{min,max} support for booleans (#24068)
 - Add 
DataFrame.map_columnsfor eager evaluation (#23821) - Add native streaming for 
peaks_{min,max}(#24039) - IR graph arrows, monospace font, box nodes (#24021)
 - Add 
DataTypeExpr.default_value(#23973) - Lower 
rleto a native streaming engine node (#23929) - Add support for 
Int128to pyo3-polars (#23959) - Lower 
rle_idto a native streaming node (#23894) - Pass 
endpoint_urlloaded fromCredentialProviderAWStoscan/write_delta(#23812) - Dispatch 
scan_icebergto native by default (#23912) - Lower 
unique_countsandvalue_countsto streaming engine (#23890) - Implement 
dt.days_in_monthfunction (#23119) - Fix errors on native 
scan_iceberg(#23811) - Reinterpret binary data to fixed size numerical array (#22840)
 - Make 
rolling_mapserializable (#23848) 
🐞 Bug fixes
- Fix 
AggStateonall_literalinBinaryExpr(#24461) - Replace unsafe with collect (#24494)
 - Show IR sort options in 
explain(#24465) - Benchmark CI import (#24463)
 - Fix schema on 
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing 
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
 - Emit proper tuple for Log in expression nodes (#24426)
 - Do not propagate struct of nulls with null (#24420)
 - Be stricter with invalid NDJSON input when 
ignore_errors=False(#24404) - Implement 
approx_n_uniquefor temporal dtypes and Null (#24417) - Correct 
sink_ipcoverload for compression (#24398) - Enable all integer dtypes for 
byparameter injoin_asof(#24384) - Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
 - Fix incorrect output ordering for row-separable exprs (#24354)
 - Fix 
Series.__arrow_c_stream__for Decimal and other logical types (#24120) - Match output type to engine for 
Structarithmetic (#23805) - Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
 - Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
 - Incorrect logic in negative streaming slice (#24326)
 - Do not error on non-list 
Sequenceforcolumnsparameter inread_excel(#23967) - Invalid conversion from non-bit numpy bools (#24312)
 - Make 
dt.epoch('s')serializable (#24302) - Make 
Expr.rechunkserializable (#24303) - Schema mismatch for 'log' operation (#24300)
 - Incorrect first/last aggregate in streaming engine (#24289)
 - Fix group offsets in sliced groups (#24274)
 - Panic in inexact date(time) conversion (#24268)
 - The 
index_offeature should not depends on theobjectfeature (#24256) - Keep DSL cache after serialization and deserialization (#24265)
 - Sanitize and warn about eval usage (#24262)
 - Unique with keep="none" in new optimization pass (#24261)
 - Correct size limits for Decimal cast (#24252)
 - Unordered unions in check order observing pass (#24253)
 - Fix dtype for 
sliceonLiteralin agg context (#24137) - Fix incorrect 
filter(lit(True))when scanning hive (#24237) - In-memory group_by on 128-bit integers (#24242)
 - Fix panic in 
gatherinside groupby with invalid indices (#24182) - Release the GIL in map_groups (#24225)
 - Remove extra explode in 
LazyGroupBy.{head,tail}(#24221) - Fix panic in polars cloud CSV scan (#24197)
 - Fix panic when loading categorical columns from IO plugin (#24205)
 - Fix engine type for 
concat_liston AggScalarimplode(#24160) - Rolling_mean handle centered weights with len(values) < window_size (#24158)
 - Reading 
is_inpredicate for Parquet plain strings (#24184) - Make PyCategories pickleable (#24170)
 - Remove unused unsound function 
to_mutable_slice(#24173) - PyO3 extension types giving compat_level errors (#24166)
 - Allow non-elementwise by in top_k (#24164)
 - Fix 
sort_byforgroup_by_dynamiccontext (#24152) - Input-independent length aggregations in streaming (#24153)
 - Release GIL when iterating df in to_arrow (#24151)
 - Respect non-elementwise join_where conditions (#24135)
 - Resolve schema mismatch for div on Boolean (#24111)
 - Keep name when doing empty group-aware aggregation (#24098)
 - Implode instead of 
reshape_list(#24078) - Rolling mean with weights incorrect when min_samples < window_size (#23485)
 - Allow 
merge_sortedfor all types (#24077) - Include datatypes in 
row_encodeexpression (#24074) - Include UDF materialized type in serialization (#24073)
 - Correct 
.rolling()output type for non-aggregations (#24072) - Correct planner output schema for 
join_asof(#24071) - Allow %B to work without specifying day (#24009)
 - Correct output for 
foldandreduce(#24069) - Expr.meta.output_name for struct fields (#24064)
 - Ensure upcast operations on 
pl.Datedefault to microsecond precision (#23981) - Add peak_{min,max} support for booleans (#24068)
 - Planner output type for 
meanwith strange input type (#24052) - Remove, deprecate or change eager 
Exprs to be lazy compatible (#24027) - Scan of multiple sources with 
nulldatatype (#24065) - Categorical in nested data in row encoding (#24051)
 - Missing length update in builder for pl.Array repetition (#24055)
 - Race condition in global categories init (#24045)
 - Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
 - Error when using named functions (#24041)
 - Don't encode entire CategoricalMapping when going to Arrow (#24036)
 - Fix cast on arithmetic with 
lit(#23941) - Incorrect slice-slice pushdown (#24032)
 - Dedup common cache subplan in IR graph (#24028)
 - Allow join on Decimal in in-memory engine (#24026)
 - Fix datatypes for 
eval.listin aggregation context (#23911) - Allocator capsule fallback panic (#24022)
 - Accept another zlib "magic header" file signature (#24013)
 - Fix 
truedivdtypes socastinlist.evalis not dropped (#23936) - Don't reuse cached 
return_dtypefor expanded map expressions (#24010) - Cache id is not a valid dot node id (#24005)
 - Align 
map_elementswith and withoutreturn_dtype(#24007) - Fix column dtype lifetime for 
csv_writesegfault onCategorical(#23986) - Allow serializing 
LazyGroupBy.map_groups(#23964) - Correct allocator name in 
PyCapsule(#23968) - Mismatched types for 
writefunction for windows (#23915) - Fix 
unpivotpanic whenindex=column not found (#23958) - Fix 
assert_frame_equalwithcheck_dtypes=Falsefor all-null series with different types (#23943) - Return correct python package version (#23951)
 - Categorical namespace functions fail on 
Enumcolumns (#23925) - Properly set sumwise complete on filter for missing columns (#23877)
 - Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
 - Group By with filters (#23917)
 - Fix 
read_csvignoring Decimal schema for header-only data (#23886) - Ensure 
collect()native Iceberg always scans latest when nosnapshot_idis given (#23907) - Writing List(Array) columns to JSON without panic (#23875)
 - Fill Iceberg missing fields with partition values if present in metadata (#23900)
 - Create file for streaming sink even if unspawned (#23672)
 - Update cloud testing environment (#23908)
 - Parquet filtering on multiple RGs with literal predicate (#23903)
 - Incorrect datatype passed to libc::write (#23904)
 - Properly feature gate TZ_AWARE_RE usage (#23888)
 - Improve identification of "non group-key" aggregates in SQL 
GROUP BYqueries (#23191) - Spawning tokio task outside reactor (#23884)
 - Correctly raise DuplicateError on asof_join with suffix="" (#23864)
 - Fix errors on native 
scan_iceberg(#23811) - Fix index out of bounds panic filtering parquet (#23850)
 - Fix error on empty range requests (#23844)
 - Fix handling of hive partitioning 
hive_start_idxparameter (#23843) 
📖 Documentation
- Rename 
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
 - Update to Polars Cloud user guide (#24187)
 - Update distributed page (#24323)
 - Add Polars security policy (#24314)
 - Fix few typos (#24305)
 - Add missing reference to 
LazyFrame.pipe_with_schema()on the website (#24285) - Fix formatting of Series.value_counts examples (#24245)
 - Add 
DataFrame.map_columnsto API (#24128) - Update multiple pages in the Polars Cloud user guide (#23661)
 - Improve StackOverflow links in contributing guide (#23895)
 - Fix 
pyo3documentation page link (#23839) - Document the pureness requirements of udfs (#23787)
 
📦 Build system
🛠️ Other improvements
- Use 
PlanCallbackinname.map_*(#24484) - Replace unsafe with collect (#24494)
 - Move dataset expansion to end and refactor not to use stack optimizer (#24457)
 - Pin 
xlsvwriterto3.2.5or before (#24485) - Add methods to 
EnumUnitVecand shorten name (#24415) - Move CompressionUtils to polars-utils (#24430)
 - Update github template to dispatch to cloud client (#24416)
 - Bump c-api (#24412)
 - Add a regression test for #7631 (#24363)
 - Update cloud test 
InteractiveQuerytoDirectQuery(#24287) - Mark some tests as slow (#24327)
 - Mark more tests as ready for cloud (#24315)
 - Remove unnecessary stable_features for AVX512 (#24321)
 - Remove PDS-H code (#24301)
 - Get ready for even more cloud tests (#24292)
 - Add tests for slices with caches (#24288)
 - Readd ordering tests (#24284)
 - Expand BitRepr to u8/u16 and use in in_memory group_by (#24248)
 - Fix Makefile venv path (#24251)
 - Remove unnecessary parentheses (#24244)
 - Remove some transmutes (#24246)
 - Wrap Py* data structures in polars-python in locks (#24209)
 - Make non-nested shift{,_and_fill} ops generic (#24224)
 - Remove unused 
Wrap(#24214) - Propagate some python feature flags (#24201)
 - Allow upcasting null-typed columns to nested column types in scans (#24185)
 - Automatically label a few more types of PR (#24147)
 - Update toolchain (#24156)
 - InMemoryJoin should be coloured as InMemoryFallback (#24154)
 - Fool-proof retrieve_error_msg (#24132)
 - Add 
order_sensitiveproperty forAExpr(#24116) - Mark more tests as not possible on cloud (#24103)
 - Turn 
AggExpr::Countfrom tuple to struct (#24096) - Mark tests that may fail in cloud (#24067)
 - Make CI perf failures more lenient (#24066)
 - Fix hive partition string encoding in CI by upgrading 
deltalake(#24018) - Avoid unreachable if dtype feature is not enabled (#24062)
 - Make tests with sinks run on cloud again (#24048)
 - Update pyo3-polars versions (#24031)
 - Remove insert_error_function (#24023)
 - Remove cache hits, clean up in-mem prefill (#24019)
 - Use .venv instead of venv in pyo3-polars examples (#24024)
 - Fix test failing mypy (#24017)
 - Remove outdated comment (#23998)
 - Add a 
_plr.pyito removemypyissues (#23970) - Don't define CountStar as dyn OptimizationRule (#23976)
 - Rename 
atolandrtoltoabs_tolandrel_tol(#23961) - Introduce 
Row{Encode,Decode}as FunctionExpr (#23933) - Dispatch through 
pl.map_batchesandAnonymousColumnsUdf(#23867) - Ensure 
clippyandrustfmtrun in CI when changingpyo3-polars(#23930) - Split 
column_selector.rs(#23921) - Fix pyo3-polars proc-macro re-exports (#23918)
 - Make 
GetBatchStatepolling functions unsafe (#23795) - Rewrite 
evaluate_on_groupsfor.gather/.get(#23700) - Remove 
Contextfrom logical layer (#23863) - Add 
propteststrategy for PolarsDataTypeschemas (#23854) - Move Python C API to 
python-polars(#23876) - Refactor directory structure of streaming multi-scan (#23865)
 - Add subphase and query task spawning to StreamingExecState (#23725)
 - Update Rust Polars versions (#23861)
 - Make polars-parquet optional (#23860)
 - Relax constraint on maximum Python version for 
numba(#23838) 
Thank you to all our contributors for making this release possible!
@Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NeejWeej, @VictorAtIfInsurance, @agossard, @alexander-beedie, @aparna2198, @borchero, @c-peters, @camriddell, @cgevans, @cmdlineluser, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @etiennebacher, @gab23r, @gfvioli, @henryharbeck, @iishutov, @itamarst, @jarondl, @jimmmmmmmmmmmy, @jjurm, @joshuamarkovic, @juansolm, @kdn36, @kuril, @math-hiyoko, @mcrumiller, @mpasa, @mrkn, @mroeschke, @nameexhaustion, @nesb1, @orlp, @pka, @pomo-mondreganto, @r-brink, @rawhuul, @ritchie46, @stijnherfst, @vdrn and @wence-