feat: Implement Robust Network Timeout Handling with Retries


**Enhance Network Timeout Handling and Error Recovery for Production Reliability**

Description:

Currently, the tlock implementation has basic timeout handling with a fixed 5-second timeout for network operations (`const timeout = 5 * time.Second` in `networks/http/http.go`). This can be problematic in production environments where network conditions vary and more robust error handling is needed.

### Current Limitations:

1. Fixed timeout duration:
```go
// networks/http/http.go
const timeout = 5 * time.Second
```

2. Basic error handling without retries:
```go
// networks/http/http.go
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()

result, err := n.client.Get(ctx, roundNumber)
if err != nil {
    return nil, err
}
```

3. Limited error context in network operations:
```go
// tlock_age.go
signature, err := t.network.Signature(roundNumber)
if err != nil {
    return nil, fmt.Errorf(
        "%w: expected round %d > %d current round",
        ErrTooEarly,
        roundNumber,
        t.network.Current(time.Now()))
}
```

### Proposed Changes:

1. **Configurable Timeout Settings**
- Add configuration options for different timeout types:
  ```go
  type NetworkConfig struct {
      DialTimeout       time.Duration
      RequestTimeout    time.Duration
      KeepAliveTimeout time.Duration
      RetryTimeout     time.Duration
  }
  ```
- Allow timeout configuration through environment variables and CLI flags
- Implement reasonable defaults for different network operations

2. **Retry Mechanism with Exponential Backoff**
- Implement a retry mechanism for transient failures:
  ```go
  type RetryConfig struct {
      MaxAttempts      int
      InitialDelay     time.Duration
      MaxDelay         time.Duration
      BackoffMultiplier float64
  }
  ```
- Add exponential backoff for failed requests
- Distinguish between retryable and non-retryable errors

3. **Enhanced Error Context**
- Create custom error types for different failure scenarios:
  ```go
  type NetworkError struct {
      Op          string
      RoundNumber uint64
      Attempt     int
      Timeout     time.Duration
      Err         error
  }
  ```
- Add detailed error messages with:
  - Network endpoint information
  - Request timing details
  - Retry attempt count
  - Specific failure reason

4. **Monitoring and Logging**
- Add structured logging for network operations
- Include metrics for:
  - Request latencies
  - Retry counts
  - Failure rates
  - Timeout occurrences

### Implementation Details:

1. Create a new network client configuration structure:
```go
type NetworkClientConfig struct {
    Timeouts RetryConfig
    Retries  RetryConfig
    Logging  LogConfig
}
```

2. Implement retry logic with context:
```go
func (n *Network) getWithRetry(ctx context.Context, roundNumber uint64) (*Result, error) {
    var lastErr error
    for attempt := 0; attempt < n.config.Retries.MaxAttempts; attempt++ {
        select {
        case <-ctx.Done():
            return nil, &NetworkError{
                Op:          "get_signature",
                RoundNumber: roundNumber,
                Attempt:     attempt,
                Err:        ctx.Err(),
            }
        default:
            // Implement exponential backoff
            backoff := n.calculateBackoff(attempt)
            time.Sleep(backoff)
            
            result, err := n.client.Get(ctx, roundNumber)
            if err == nil {
                return result, nil
            }
            lastErr = err
            
            if !isRetryableError(err) {
                return nil, err
            }
        }
    }
    return nil, fmt.Errorf("max retries exceeded: %w", lastErr)
}
```

3. Add configuration validation:
```go
func validateConfig(config NetworkClientConfig) error {
    if config.Timeouts.RequestTimeout < minTimeout {
        return fmt.Errorf("request timeout %v below minimum %v", 
            config.Timeouts.RequestTimeout, minTimeout)
    }
    // Add other validation rules
    return nil
}
```

### Benefits:

1. Improved reliability in unstable network conditions
2. Better error handling and recovery
3. More detailed error reporting for debugging
4. Configurable behavior for different deployment environments
5. Better monitoring and observability

### Testing:

Add new test cases:
- Test retry behavior with simulated network failures
- Verify timeout configurations
- Test error handling with different network conditions
- Validate monitoring metrics

### Acceptance Criteria:

- [ ] Configurable timeout settings implemented
- [ ] Retry mechanism with exponential backoff working
- [ ] Enhanced error messages with context
- [ ] Monitoring and logging improvements
- [ ] Test coverage for new functionality
- [ ] Documentation updated with new configuration options
- [ ] Backward compatibility maintained


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement Robust Network Timeout Handling with Retries #88

Current Limitations:

Proposed Changes:

Implementation Details:

Benefits:

Testing:

Acceptance Criteria:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Implement Robust Network Timeout Handling with Retries #88

Description

Current Limitations:

Proposed Changes:

Implementation Details:

Benefits:

Testing:

Acceptance Criteria:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions