This is a library for creating workload on different HTTP servers. Just implement interfaces, create Runner instance and you're ready to go.
Note: This project is in early stage of development and probably not ready for production use.
Basic features:
- Creating concurrent requests to HTTP servers
- Saving results to database (currently, only Clickhouse is supported for historical reasons)
Advanced features:
- Increasing number of concurrent requests (also known as "Warm up")
- Circuit breaker pattern, which allows to temporarily stop requests
- No data loss on shutdown (except for requests that are in progress)
- Timestamp correction (for more info, see "Timestamp correction")
- Retries with backoff, timeouts
- Continuous mode (for more info, see "Continuous mode")
Sometimes, you need to run your workload for a long time. In our case, we
utilize one Clickhouse table as both input and output data. So, we need to
select Freshness parameter, which will be used to filter out records that
are older than now() - Freshness. These records will be used as input.
Sometimes, you need to manipulate timestamps that are stored to database.
For example, let's imagine you have Web Scraper. Some of the requests, of course, can result in timeouts and result in sudden RPS drop. So, one way to reach stable RPS is to shuffle timeouts. Adding random duration to each request finished with timeout can help to achieve that.
Also, when running in continuous mode, you may want to retry requests that
are failed because of client/server errors. This can also be achieved by
correction timestamp like: current timestamp - freshness + correction value. For example,
if you consider all results that are older than 7 days, and you want to retry
requests that are failed in 1 day, then you should correct its timestamp like:
current timestamp - 7 days + 1 day.
Correction value can be set via correction.error_correction field in config.
To run the service, you have several options for configuration:
- Using YAML configuration file:
go run ./cmd/main.go -config config.yaml- Using environment variables only:
go run ./cmd/main.go -env- Default behavior (environment variables):
go run ./cmd/main.goYou can also build and install the binary:
go build -o barash ./cmd/main.go
./barash -config config.yamlThe configuration system supports both YAML files and environment variables. Environment variables take precedence over YAML configuration, allowing you to override specific settings without modifying the configuration file.
The configuration is organized into the following sections:
Settings for the external API endpoints:
api:
type: "rest"
host: "api.example.com"
port: "443"
scheme: "https"
endpoint: "/v1/data"
method: "POST"
api_timeout: "3m"
num_retries: 3
min_wait_time: "2s"
max_wait_time: "16s"
extra_params:
format: "json"
body_file_path: "request_body.json"Settings for data retrieval from the database:
provider:
sleep_time: "1m"
select_batch_size: 40000
select_table: "source_table"
select_retries: 5
select_sql_path: "select.sql"
source:
backend: "ch" # "ch" or "pg"
credentials:
database: "source_db"
username: "user"
password: "password"
host: "127.0.0.1"
port: "9000"
continuous_mode:
freshness: "168h" # 7 daysSettings for API request execution with circuit breaker:
fetcher:
min_fetcher_workers: 400
max_fetcher_workers: 800
duration: "60s"
enable_warmup: false
idle_time: "10s"
timeout: "40s"
circuit_breaker:
enabled: true
max_requests: 10
consecutive_failure: 10
total_failure_per_interval: 900
interval: "60s"
timeout: "360s"Settings for saving results to the database with correction logic:
writer:
insert_batch_size: 10000
insert_table: "results_table"
insert_sql_path: "insert.sql"
save_tag: "production"
sink:
backend: "ch" # "ch" or "postgres"
credentials:
database: "sink_db"
username: "user"
password: "password"
host: "127.0.0.1"
port: "9000"
correction:
enable_errors_correction: false
error_correction: "24h"
enable_timeouts_correction: true
max_timeout_correction: "504h"Logging settings:
log:
level: "info" # debug, info, warn, error
encoding: "json" # json or consoleGraceful shutdown settings:
shutdown:
grace_period: "60s"
db_save_timeout: "30s"You can find a complete example configuration in config.example.yaml. Copy this file and modify it according to your needs:
cp config.example.yaml config.yaml
# Edit config.yaml with your settingsAll configuration options can be overridden using environment variables. The variable names follow this pattern:
RUN_MODEfor the run modeAPI_HOST,API_PORT, etc. for API configurationPROVIDER_SLEEP_TIME,PROVIDER_SELECT_BATCH_SIZE, etc. for provider configurationFETCHER_MIN_WORKERS,FETCHER_MAX_WORKERS, etc. for fetcher configurationWRITER_INSERT_BATCH_SIZE,WRITER_INSERT_TABLE, etc. for writer configurationCB_ENABLED,CB_MAX_REQUESTS, etc. for circuit breaker configurationCONTINUOUS_FRESHNESSfor continuous mode configurationCORRECTION_ENABLE_ERRORS, etc. for correction configurationLOG_LEVEL,LOG_ENCODINGfor logging configurationSHUTDOWN_GRACE_PERIOD,SHUTDOWN_DB_SAVE_TIMEOUTfor shutdown configuration
The configuration loading follows this precedence (highest to lowest):
- Environment variables
- YAML configuration file
- Default values
This means environment variables will always override YAML settings, allowing for flexible deployment configurations.
- Go 1.21 or later
- Access to ClickHouse or PostgreSQL database
go build ./cmd/main.gogo test ./...go mod tidy
go fmt ./...