-
Notifications
You must be signed in to change notification settings - Fork 17
Improved IN handling
#250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Improved IN handling
#250
Conversation
|
|
||
| defp expr({:in, _, [_, {:^, _, [_ix, 0]}]}, _sources, _params, _query), do: "0" | ||
|
|
||
| defp expr({:in, _, [left, {:^, _, [ix, len]}]}, sources, params, query) when len > 0 do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notably this functionality only happens with ^ values, literals maintain their current functionality, which helps this remain compatible with complex uses of IN such as with subqueries.
| def dumpers(:uuid, Ecto.UUID), do: [&__MODULE__.hex_uuid/1] | ||
| def dumpers(:uuid, type), do: [type, &__MODULE__.hex_uuid/1] | ||
| def dumpers(:binary_id, type), do: [type, &__MODULE__.hex_uuid/1] | ||
| def dumpers({:in, sub}, {:in, sub}), do: [{:array, sub}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line here is what tells Ecto itself that in ^val should result in a single parameter for val
|
Thank you! I'd like to convince myself that it's indeed equivalent to the previous approach (lists with nulls, different types, etc.) before merging so I'm taking some time :) Btw, just in case, https://clickhouse.com/docs/sql-reference/operators/in suggests
But I guess this change is not about millions of values but rather maybe hundreds to thousands The CI is failing for unrelated issues, "fixed" in #252. |
|
Nice! We also started recently using IN queries with fairly large payloads for the new consolidated views feature. We currently limit it to 14k sites so we have at maximum 14k IDs in the We also tested creating dictionaries and using these for lookups so the list of IDs does not need to be part of the query payload. It was more efficient as expected but for now we didn't go for that solution to reduce maintenance burden. tagging @aerosol. I don't think we have problems with the limit at the moment but this is interesting in case we need to bump the limit in the future. |
|
Just on a quick glance, it looks like this is equivalent to the workaround we're applying to avoid hitting ClickHouse limits due to too many bindings: https://github.com/plausible/analytics/blob/fa09b73ff1acff2248a664d652f818c999c7dcca/lib/plausible/stats/sql/where_builder.ex#L54 |
INimprovements for Ecto.CHHey folks, thanks for the library! We ran into a problem using the
inoperator over at CargoSense and we're hoping you're open to a change in the SQL those queries produce to make more efficient use of Clickhouse params.Problem
Currently when you have an ecto query like:
This results in clickhouse SQL:
This becomes an issue when
idsor any other array parameter becomes very large, which is common when using Ecto CH to perform bulk operations in Clickhouse.Solution
Now in this PR the same Ecto query produces this Clickhouse SQL:
We have just one parameter no matter the size of the array.