Skip to content

Conversation

@singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Dec 18, 2025

About the change

Presently the expression when serialized doesn't capture the type so even binary when de-serialized its used as string which later fails. For the parsers its important to know the schema so that they could de-serialize stuff correctly, a part of it is handled in the SDK during response de-serialization via parser context but while the client can set this since its making the call the same can't be assumed by the server which would be doing the same deserialization of the request.

There is 3 ways to solve this problem :

  1. deserialize the stuff when you have appropriate inputs, till then we store the serialized version in memory when we required it based on the input we deserialize for example expression require schema so the response would store json and then filter(schema) and then wire the schema while deserialization in ExpressionParser.
  2. While deserializing capture the type of literal (it changes the serialization as well is spec change)
  3. send schema as part of response and then wire that to parser (its a spec change)

Considering where we are i implemented Approach#1

Testing

New test and existing tests

Expression filter = null;
if (jsonNode.has(RESIDUAL_FILTER)) {
filter = ExpressionParser.fromJson(jsonNode.get(RESIDUAL_FILTER));
filter = ExpressionParser.fromJson(jsonNode.get(RESIDUAL_FILTER), spec.schema());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was caught during the execution phase of spark, need to pass schema for residual

Comment on lines 171 to 172
} else if (defaultValue.isIntegralNumber() && defaultValue.canConvertToLong()) {
return defaultValue.longValue();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mostly required now since we are now binding with schema, need to think this through

Copy link
Contributor Author

@singhpk234 singhpk234 Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now : Integer → SingleValueParser tries to parse as DATE → Expects string

@sfc-gh-prsingh sfc-gh-prsingh force-pushed the binary-type-fix branch 3 times, most recently from f5ae0f6 to 7d7dcaa Compare December 18, 2025 23:51
@singhpk234 singhpk234 marked this pull request as ready for review December 19, 2025 00:01
}
if (request.filter() != null) {
configuredScan = configuredScan.filter(request.filter());
Expression filter = request.filter(schema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this change is probably not needed

* @param schema the table schema to use for type-aware deserialization of filter values
* @return the filter expression, or null if no filter was specified
*/
public Expression filter(Schema schema) {
Copy link
Contributor

@nastra nastra Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me think about this a bit more. I also think we have a few more cases across the codebase where we also ser/de Expression without a Schema and theoretically we would have the same issue in those places as well.
Whatever approach we pick, we'd want to follow up in those other places too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other thing we might need to consider is how we would be lazily binding this in other client implementations. @Fokko does pyiceberg have examples of how it does a late-binding similar to this one?
The issue that we have here is that we deserialize an Expression where we can only correctly do so when we bind it to a Schema

private static Object parseDateValue(Type type, JsonNode value) {
if (value.isTextual()) {
return DateTimeUtil.isoDateToDays(value.textValue());
} else if (value.isIntegralNumber() && value.canConvertToInt()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes are probably not required to fix the underlying issue, so we might want to separate them out and test them individually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants