Puppeteer-Pool is a lightweight and efficient library for managing multiple Puppeteer contexts with ease. Designed for scalable web scraping and automation, it helps you handle concurrency and resource management without the hassle.
-
npmnpm i puppeteer @hoplin/puppeteer-pool -
yarnyarn add puppeteer @hoplin/puppeteer-pool -
pnpmpnpm install puppeteer @hoplin/puppeteer-pool
- Per-request tab lifecycle – every task opens a fresh page, clears it, and closes it, which keeps renderer memory predictable even on small hosts.
- Handler registry API – register once with
PuppeteerPool.enrollTask(id, handler)and enqueue typed payloads anywhere in your app. Queue payloads contain only{ handlerId, payload }, so RabbitMQ/SQS can drive the pool safely across restarts. - Pluggable queue providers – pick between
MEMORY,RABBITMQ, orSQSviaqueueProvideror the matching environment variable. Priority mode is kept for the in-memory queue. - Static singleton client – all public APIs are static (
start,stop,runTask,getPoolMetrics), simplifying integration in workers, daemons, or HTTP handlers.
[ Client API ]
-
PuppeteePool
PuppeteerPoolis singleton class. You can usePuppeteerPool.startto initialize pool manager.
-
PuppeteerPool.start
-
Static Method
-
Description: Initialize pool manager. You need to call this function to start puppeteer pool. Even if you invoke this function multiple times with differenct arguments, it will return the first initialized instance.
-
Args:
PuppeteerPoolStartOptions -
Defaults:
concurrencyLevel: 3taskQueueType:QueueMode.DEFAULTqueueProvider:process.env.PUPPETEER_POOL_QUEUE_PROVIDERorQueueProvider.MEMORYcontextMode:ContextMode.SHAREDenableLog:true,logLevel:LogLevel.DEBUG
type PuppeteerPoolStartOptions = { /** * Number of concurrency * Default is 3 */ concurrencyLevel?: number; /** * Task queue type (DEFAULT | PRIORITY) * Default is QueueMode.DEFAULT */ taskQueueType?: QueueMode; /** * Queue provider (MEMORY | RABBITMQ | SQS) * Default is envQueueProvider() or MEMORY */ queueProvider?: QueueProvider; /** * Context mode * Default is ContextMode.SHARED */ contextMode?: ContextMode; /** * Puppeteer launch options * Default is {} */ options?: puppeteer.LaunchOptions; /** * Custom config path */ customConfigPath?: string; /** * Enable log * Default is true */ enableLog?: boolean; /** * Log level * Default is LogLevel.DEBUG */ logLevel?: LogLevel; };
-
Return
Promise<PuppeteerPool>- Returns PuppeteerPool Instance.
-
-
PuppeteerPool.restart
- Static Method
- Description: Stop the running dispatcher, tear down active contexts, and start a fresh pool. When
optionsare omitted it reuses the laststart()call, which makes threshold-triggered resets or config reloads simple. - Args:
PuppeteerPoolStartOptions - Return
Promise<PuppeteerPool>
-
PuppeteerPool.stop
- Static Method
- Description: Stop pool manager. It will close all sessions and terminate pool manager.
- Return
Promise<void>
-
PuppeteerPool.enrollTask
- Static Method
- Description: Register a page handler with a unique string id. It returns a
symboltoken you can use when enqueuing tasks. - Args
id: string identifier persisted inside queue payloadshandler:(page, payload) => Promise<any>function
- Return:
symbol
-
PuppeteerPool.runTask
- Static Method
- Description: Enqueue payload for a previously enrolled handler. The handler runs once a page slot is available.
- Args
taskKey: symbol returned byenrollTaskpayload: serializable object passed to the handlerpriority: optional number (effective only whenqueueProvideris MEMORY andtaskQueueTypeis PRIORITY)
- Return
Promise<any>- Returns result of task(Same return type with task callback return type)
-
PuppeteerPool.getPoolMetrics
- Static Method
- Description: Get pool metrics. It will return metrics of pool manager.
- Return
{ memoryUsageValue: (Memory Usage in MegaBytes), memoryUsagePercentage: (Memory Usage with percentage), cpuUsage: (CPU Usage with percentage) }
import { ContextMode, PuppeteerPool, QueueMode } from '@hoplin/puppeteer-pool';
const visitTaskID = PuppeteerPool.enrollTask(
'visit-url',
async (page, payload: { url: string }) => {
await page.goto(payload.url);
return page.title();
},
);
async function main() {
await PuppeteerPool.start({
concurrencyLevel: 6,
contextMode: ContextMode.ISOLATED,
customConfigPath: `./puppeteer-pool-configs.json`,
taskQueueType: QueueMode.PRIORITY,
});
const baseUrls = [
'https://www.google.com',
'https://www.bing.com',
'https://github.com',
'https://www.naver.com',
'https://www.daum.net',
'https://www.youtube.com',
'https://www.amazon.com',
'https://www.netflix.com',
];
const urls = Array.from({ length: 50 }, (_, index) => {
const baseUrl = baseUrls[index % baseUrls.length];
const priority = Math.floor(Math.random() * 10) + 1;
return { url: baseUrl, priority: priority };
});
const promises = urls.map(({ url, priority }) => {
console.log(`Enqueue task: ${url}`);
return PuppeteerPool.runTask(visitTaskID, { url }, priority);
});
const titles = await Promise.all(promises);
console.log('[ Result length ] :', titles.length);
console.log('[ Expected Tasks ] :', urls.length);
console.log('[ Metrics ] :', await PuppeteerPool.getPoolMetrics());
}
main();- Pool Managing
- Config
- Support config customize
- Threshold Watcher
- Memory
- Support safe pool instance reset in runtime
- Metrics
- Support Metric by pool
- CPU usage of pool
- Memory usage of pool
- Managing session count in runtime
- Support Metric by pool
- Configure
PUPPETEER_POOL_QUEUE_PROVIDER(MEMORY,RABBITMQ, orSQS) or setqueueProviderinPuppeteerPool.start. - RabbitMQ provider
- Install
amqplibonly when you enable this provider. - Provide
PUPPETEER_POOL_RABBITMQ_URLandPUPPETEER_POOL_RABBITMQ_QUEUEenvironment variables.
- Install
- SQS provider
- Install
@aws-sdk/client-sqsonly when you enable this provider. - Provide
PUPPETEER_POOL_SQS_QUEUE_URLand eitherPUPPETEER_POOL_SQS_REGIONorAWS_REGIONfor the client.
- Install
- Priority queue mode is only available when
queueProviderisMEMORY.
- Uses the in-process
QueueorPriorityQueueimplementations fromsrc/queue. runTaskenqueues and immediately tries to dispatch if a context slot is free, so it is the lowest-latency option for single-host workers.- Priority ordering is honored only when
taskQueueTypeis set toQueueMode.PRIORITY.
runTaskpublishes serializedTaskMessageobjects to the configured queue. A long-lived consumer drains the queue and hands work to the dispatcher.- Messages are acknowledged only after the handler completes, allowing RabbitMQ to redeliver work if the worker process crashes before finishing.
- Prefetch, durability, or other channel-level tuning can be layered on top of this default behavior.
- Uses
@aws-sdk/client-sqswith long polling (ReceiveMessageCommand) to fetch batches from the queue specified byPUPPETEER_POOL_SQS_QUEUE_URL. - After a task succeeds the worker issues
DeleteMessageCommand; failed handlers rely on the queue visibility timeout for retries. - Regional configuration is resolved from
PUPPETEER_POOL_SQS_REGION(orAWS_REGION) so you can run the pool in multiple AWS regions.
| Variable | Description | Default |
|---|---|---|
PUPPETEER_POOL_QUEUE_PROVIDER |
Selects queue backend: MEMORY, RABBITMQ, or SQS. |
MEMORY |
PUPPETEER_POOL_RABBITMQ_URL |
RabbitMQ connection URI, e.g. amqp://user:pass@host/vhost. |
required when using RabbitMQ |
PUPPETEER_POOL_RABBITMQ_QUEUE |
Queue name for RabbitMQ tasks. | required when using RabbitMQ |
PUPPETEER_POOL_SQS_QUEUE_URL |
Full SQS queue URL. | required when using SQS |
PUPPETEER_POOL_SQS_REGION / AWS_REGION |
AWS region for the SQS client. PUPPETEER_POOL_SQS_REGION takes precedence. |
required when using SQS |
If you need more control (prefetch counts, SQS wait time, etc.), pass provider-specific options directly to PuppeteerPool.start.
┌────────────────────┐ ┌────────────────────────┐
│ User Application │ │ Puppeteer Pool │
│ (enroll/run) │──────────▶│ Registry saves handler │
└────────┬───────────┘ │ + payload metadata │
│ └─────────┬──────────────┘
│3. TaskMessage enqueued │
▼ ▼
┌────────────────────┐ ┌────────────────────────┐
│ Queue Provider │◀─────────│ Task Dispatcher │
│ (Memory/RMQ/SQS) │ │ (drains queue) │
└────────┬───────────┘ └─────────┬──────────────┘
│ │
│4. Assign handler+payload │
▼ ▼
┌────────────────────┐ ┌────────────────────────┐
│ Shared/Isolated │◀────────▶│ Context Slot (per tab) │
│ Browser Context │ │ │
└────────┬───────────┘ └─────────┬──────────────┘
│ │
│5. Page task executes │6. Metrics monitor
▼ ▼
┌────────────────────┐ ┌────────────────────────┐
│ Target Website │ │ Metrics Watcher │
└────────────────────┘ └────────────────────────┘
Default config should be puppeteer-pool-configs.json in root directory path.
If config file are not given or invalid path, manager will use default defined configurations. Or if you want to pass
config path, you can pass path to start() function as parameter.
{
"session_pool": {
"width": 1080,
"height": 1024
},
"context": {
"timeout": 10
},
"threshold": {
"activate": true,
"interval": 5,
"memory": 2048
}
}width: Width of session poolheight: Height of session pool- Inteager Validation
widthshould be larger or equal than 50heightshould be larger or equal than 50widthshould be integerheightshould be integer
- Inteager Validation
timeout: Timeout of context.- Inteager Validation
timeoutshould be at least 1timeoutshould be integer
- Inteager Validation
activate: Activate threshold watcherinterval: Interval of threshold watcher (seconds)memory: Memory threshold value (MB)- Inteager Validation
intervalshould be at least 3intervalshould be integermemoryshould be at least 500memoryshould be integer
