Two phase shutdown process

Our production application has 3 main _entrypoints_:

1. HTTP Routes (Koa)
2. Jobs/Crons (pg-boss)
3. Events (event-emitter2)

All within the same process in a Docker container.

When shutting down the application gracefully, we devised three steps:

1. Stop route handlers and job workers and drain them
2. Drain event listeners
3. Close connections to application DB, Redis, Sentry

Route handlers, job workers and event listeners may want to publish jobs or dispatch events at any time. As the event bus is in-memory, it needs to be available until all handlers/workers are drained.

Currently we use pg-boss `stop({ wait: true, close: true, graceful: true })`

This causes issues if, on step 2, an event listener wants to publish a job. Then, the pg-boss DB connection is already closed.

## Proposed solution

Expose a `drain(timeout)` method that stops workers (internally `offWork()` + wait with timeout) but keeps the DB open so new jobs may be published.

Allow `stop()` to stop the DB connection even if the worker is already stopped.

## Considered alternatives

We are going with alternative 3 for now, but would love to have your thoughts on this workflow!

### 1. Calling `stop` 2 times

```typescript
// Drain
    await this.boss.stop({
      close: false, // Keep DB connection open so we can still add jobs
      wait: true,
      graceful: true,
      timeout: timeoutMs,
    });
```

```typescript
// Close connections
    await this.boss.stop({
      close: true,
      wait: true,
      graceful: true,
      timeout: timeoutMs,
    });
```

This does not work as `stop` checks if the `this.stopped` flag is set and short-circuits

### 2. Stopping via `getDb().stop()`

```typescript
// Drain
    await this.boss.stop({
      close: false, // Keep DB connection open so we can still add jobs
      wait: true,
      graceful: true,
      timeout: timeoutMs,
    });
```

```typescript
await this.boss.getDb().stop();
```

The issue here is that the Db interface should not expose `stop()` as it is an implementation detail. We could patch it on our side but seems shady.

### 3. Implement custom Db

Copy the pg-boss implementation of `Db` to our code and manage the pool manually

### 4. `offWork()` + `stop()`

Call `offWork()` with each queue name (step 1) and then `stop()` after event listeners drain (step 3)

New jobs won't be processed during shutdown, but does not wait for current jobs to drain. This means these jobs may depend on the event listener which is shutdown.

### 5. Custom worker tracking

We could track running jobs on our side and offWork/wait for them.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Two phase shutdown process #576

Proposed solution

Considered alternatives

1. Calling `stop` 2 times

2. Stopping via `getDb().stop()`

3. Implement custom Db

4. `offWork()` + `stop()`

5. Custom worker tracking

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Two phase shutdown process #576

Description

Proposed solution

Considered alternatives

1. Calling stop 2 times

2. Stopping via getDb().stop()

3. Implement custom Db

4. offWork() + stop()

5. Custom worker tracking

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Calling `stop` 2 times

2. Stopping via `getDb().stop()`

4. `offWork()` + `stop()`