[connector][fix] Properly handle projection clause for partition deletes #189
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this patch do
This patch fixes an issue when primary keys contain clustering key and a user issues a partition delete.
Take the following schema as an example:
If a user issues a
DELETE FROM test WHERE pk=<pk>, the agent would push an event with pk only (c1 will not exist) and then the connector will perform the read back using a faulty query (missing a projection clause) evident from debug logs:That will cause the C* driver throw an exception logged into the pulsar function (i.e. the connector)
causing the connector not able to make progress and a build up in the events topic.
How does it fix the issue
There is a condition introduced in 2ec124f to compare the where condition length with the pk length (the latter being advised from the table schema, not the events topic) that falls back to the static columns projection clause if there is a length mismatch. This was introduced to handle static columns, which C* allows partition level updates for if the updated columns are static columns exclusively (also not that the events topic has no first citizen property indicting if the table change was a delete or update). If the static columns are absent, it makes sense to just return the projection clause (which indicate a delete by partition use cases with clustering keys existing, otherwise the where conditions length would match the pk length from schema).
A new test is added, that would failed with
no viable alternative at input 'FROM' (SELECT [FROM]...)without the proposed fix