- 
                Notifications
    
You must be signed in to change notification settings  - Fork 347
 
fix: properly rewrite errors for watch api, part 2 #2656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
          Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@            Coverage Diff             @@
##             main    #2656      +/-   ##
==========================================
- Coverage   79.47%   79.41%   -0.06%     
==========================================
  Files         455      455              
  Lines       47156    47158       +2     
==========================================
- Hits        37471    37444      -27     
- Misses       6934     6954      +20     
- Partials     2751     2760       +9     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
  | 
    
aa150c8    to
    552a3c9      
    Compare
  
    552a3c9    to
    f6447fa      
    Compare
  
            
          
                internal/services/shared/errors.go
              
                Outdated
          
        
      | return status.Errorf(codes.ResourceExhausted, "watch disconnected: %s", err) | ||
| case errors.As(err, &datastore.WatchRetryableError{}): | ||
| // FailedPrecondition is safe to retry | ||
| return status.Error(codes.FailedPrecondition, err.Error()) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seem like Unavailable is a better error condition
From https://grpc.io/docs/guides/status-codes/
The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations.
Whereas FailedPrecondition says this
The operation was rejected because the system is not in a state required for the operation’s execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between FAILED_PRECONDITION, ABORTED, and UNAVAILABLE: (a) Use UNAVAILABLE if the client can retry just the failing call. (b) Use ABORTED if the client should retry at a higher level (e.g., when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence). (c) Use FAILED_PRECONDITION if the client should not retry until the system state has been explicitly fixed. E.g., if an “rmdir” fails because the directory is non-empty, FAILED_PRECONDITION should be returned since the client should not retry unless the files are deleted from the directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i only had one reason for using FailedPrecondition, which was that we didn't alarm on it 😆 i've updated to use Unavailable now.
f6447fa    to
    2e70602      
    Compare
  
    2e70602    to
    722bd3d      
    Compare
  
    722bd3d    to
    8cb6c91      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a continuation of #2640