-
Notifications
You must be signed in to change notification settings - Fork 523
Add query limit for inactive users query #69178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
carl-codeorg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I look at it, having two different limit parameters feels confusing to me. What do you think about simplifying it to just having the single limit and removing the safety constraint limit? It also seems to me that if we're artificially limiting the query that would trigger the safety constraint to a number lower than what would actually trigger it, then the safety check becomes useless. What do you think?
carl-codeorg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we were already calling account_batch = inactive_users.limit(BATCH_SIZE), I wonder if the limit in the query itself would even help? Starting to second guess whether it's the query, or if the logic in the loop needs to be refactored.
Yeah i agree. |
I think batching could still potentially be useful. I was more getting at the safety constraint not being beneficial if we have another limit on the query prior to batching. Before we process any users, the whole query would be millions - so if we use a safety limit that is reasonable, it'll always fail - and if we artificially limit the query to something much lower, say a limit of 100k per run, it'll never hit that safety limit. I was thinking that, at least for the initial large group of users, it'd more useful just to have one upper limit of the number of records processed per run of the job. But keep that limit low enough to avoid the resource consumption issue from the dry run last week. But now I'm second guessing whether the limit approach is actually going to fix that issue. |
artem-vavilov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please share more details about this change? Was it made due to query timeouts or another reason?
| # @param limit [Integer] The maximum number of accounts to delete in a single run. | ||
| # This is a safety limit to prevent accidental deletion of too many accounts. | ||
| def initialize(dry_run: false, inactive_since: nil, limit: ACCOUNT_DELETION_LIMIT) | ||
| def initialize(dry_run: false, inactive_since: nil, query_limit: nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for renaming limit to query_limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We renamed limit to query_limit because we wanted to apply the limit directly to the query rather than as a separate safety constraint after fetching all inactive users. Previously, the safety limit (limit) was meant to prevent accidentally deleting too many accounts, but in practice, it wasn’t very effective, the initial query could still return millions of records before that limit was enforced.
This change also makes the safety constraint somewhat redundant, since we’re now controlling how many users can even be considered for deletion in the first place. I like the idea of having an upper limit for how many records are processed per run of the job.
| Queries::User::Inactive. | ||
| call(inactive_since: inactive_since). | ||
| where.not(id: processed_user_ids). | ||
| limit(@query_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query_limit will be overridden by the batch_size limit, meaning it has no real effect, and all matched accounts will be deleted in batches of 1,000 records:
code-dot-org/dashboard/lib/inactive_user_deleter.rb
Lines 59 to 72 in f62dce7
| loop do | |
| account_batch = inactive_users.limit(BATCH_SIZE) | |
| account_batch.each do |user| | |
| delete_user(user) | |
| self.num_accounts_deleted += 1 | |
| rescue StandardError => exception | |
| self.num_errors += 1 | |
| Honeybadger.notify(exception, context: {user_id: user.id}) | |
| log_message("Error deleting user_id #{user.id}: #{exception.message}") | |
| ensure | |
| processed_user_ids << user.id | |
| end | |
| break if account_batch.size < BATCH_SIZE | |
| end |
We need to add another break condition based on num_accounts_deleted in the loop:
break if num_accounts_deleted >= query_limit
Yeah, because the initial batch of inactive users is so large, when we run this query it locked up the machine it ran on. We need a way at least the first go time around to limit the amount of users we process per run, since |
This PR changes the use of the limit parameter in the InactiveUserDeleter job to prevent a large number of users from being deleted.
Previously, a dry run of the job caused total lockup of the machine it ran on because we have a lot of users considered inactive, when our query
inactive_users.sizeran. Removing that safety constraint and instead adding a break block within our loop ensures we no longer run this expensive query and enforce a limit within our loop as to how many users can be deleted.This will only really occur for this first batch of inactive users since we have so many.
Follow up work: Bring back the safety constraint?
Links