Skip to content

Conversation

@kcharkseliani
Copy link

@kcharkseliani kcharkseliani commented Jul 8, 2025

This PR adds improvements to the JSON buffering logic in ArrowDeserializer, ensuring better handling of incomplete or invalid JSON inputs, and preserving decoding state across message boundaries.

🧩 Key Changes

✨ Enhancements:

  • Added handling of partial/incomplete JSON messages via buffered_incomplete queue.
  • Added documentation for reset_buffer_decoder and is_complete_json functions.

🛠 Fixes:

  • Fixed the irrecoverable 'invalid JSON' error when bad data drop is enabled by creating an empty BufferDecoder with reset_buffer_decoder method when corrupted or poisoned buffer is detected (Continue pipeline after invalid data #891).
  • JSON messages are now pre-validated in is_complete_json function with serde_json's deserialize to prevent them from reaching the tape decoder and remaining as incomplete messages which poisons the buffer (Continue pipeline after invalid data #891).

📦 Dependencies

Unchanged

📝 Notes

⚠️ Due to the nature of the arrow_json's TapeDecoder, when an invalid JSON message comes in after a previous incomplete message(s), the whole buffer has to be reset. This shouldn't be a problem if flushed frequently.

closes #891

@kcharkseliani kcharkseliani force-pushed the bugfix/issue_891_json_buffering_bad_data branch from 06a9f2a to 22ebb6f Compare July 8, 2025 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Continue pipeline after invalid data

1 participant