High-Frequency Trading Stats Service

This project is a high-performance RESTful service built in Rust, designed to handle the rigorous demands of high-frequency trading (HFT) systems. It allows for the real-time ingestion of trading data and provides near-instantaneous statistical analysis on variable-sized windows of that data.

Core Design & Technology

The service was engineered with a primary focus on performance, safety, and robustness.

Language: Rust 🦀

Rust was chosen as it provides a unique combination of strengths ideal for HFT systems:

Peak Performance: Offers C-level performance with zero-cost abstractions, ensuring minimal and predictable latency.
Guaranteed Memory Safety: The ownership and borrow checker eliminates entire classes of bugs (e.g., data races, null pointer dereferencing) at compile time.
Fearless Concurrency: Rust's safety guarantees make it easier to write correct, efficient, and highly parallel code.

Data Structure: Segment Tree

To meet the requirement of performing statistical analysis (min, max, avg, var) in better than O(n) time, this service uses a Segment Tree.

This data structure is optimal for this use case, as it can calculate all required statistics for any given range in O(log N) time, where N is the total number of data points for a symbol. For maximum performance, this service uses a non-recursive iterative implementation, avoiding function call overhead and any risk of stack overflow.

Concurrency Model: DashMap

To handle a high volume of concurrent requests, the service uses DashMap as its central data store. Unlike a standard HashMap protected by a single global lock, DashMap provides fine-grained, sharded locking. This allows requests for different symbols to be processed in parallel, dramatically increasing throughput.

Numerical Stability: Welford's Algorithm

Standard variance calculation using the E[X²] - (E[X])² formula can suffer from catastrophic cancellation—a significant loss of precision when subtracting two large, nearly-equal numbers.

To guarantee high precision even with millions of data points, this service implements Welford's online algorithm. This is a numerically stable, single-pass method that computes variance by tracking a running mean and the sum of squared differences from that mean (M2). This approach avoids catastrophic cancellation entirely, ensuring the statistical results are always accurate.

Production-Ready Features

This service includes several features essential for deployment in a production environment.

Configuration Management: All aspects of the service (server, logging, memory management) are configured via Config.toml and can be overridden with environment variables (e.g., APP_SERVER__PORT=9090, APP_STORE__MAX_DATA_RETENTION=200000000), managed by the figment crate.
Structured Logging: Uses the tracing framework to emit structured (JSON) logs to both the console and a daily rotating file (logs/app.log), making them easy to analyze.
Graceful Shutdown: Listens for termination signals (Ctrl+C or SIGTERM) and shuts down gracefully, allowing in-flight requests to complete.
Health Check: Provides a GET /health endpoint for load balancers and container orchestrators (like Kubernetes) to verify service health.

Memory Management

The service includes sophisticated memory management to handle massive data volumes while maintaining predictable performance.

Automatic Data Cleanup

Retention Limit: Each symbol retains up to 150 million data points before cleanup is triggered
Cleanup Strategy: When the limit is reached, the system keeps only the most recent 100 million values
Cleanup-Before-Insertion: Data cleanup occurs before adding new batches, preventing temporary memory spikes
Bounded Memory: Memory usage never exceeds configured limits, ensuring predictable resource consumption

Key Benefits

No Memory Leaks: Automatic cleanup prevents unbounded memory growth
Predictable Performance: Memory usage remains within configured bounds
Statistical Accuracy: All cleanup preserves the integrity of statistical calculations
High Availability: No service interruption during cleanup operations

Memory Configuration

The cleanup behavior is fully configurable via Config.toml:

[store]
# Maximum data retention per symbol before cleanup is triggered (150M values)
max_data_retention = 150000000

# Number of values to keep after cleanup (100M values)
cleanup_keep_count = 100000000

# Maximum number of unique symbols that can be tracked simultaneously
max_symbols = 10

# Initial capacity for the segment tree (per symbol)
starting_capacity = 1000000

These settings can also be overridden with environment variables:

export APP_STORE__MAX_DATA_RETENTION=200000000
export APP_STORE__CLEANUP_KEEP_COUNT=120000000
export APP_STORE__MAX_SYMBOLS=20

Testing Strategy

The project employs a comprehensive, multi-layered testing strategy to ensure reliability and correctness.

Unit Tests: Located alongside the source code in src/, these test individual components like the SegmentTree in isolation.
Integration Tests: Located in the tests/ directory, these validate the entire service's API, including error handling and edge cases.
Stress Test: A dedicated, resource-intensive integration test (marked as #[ignore]) verifies correctness and memory management under extreme load with 150 million data points.
Memory Management Tests: Comprehensive tests verify cleanup behavior, memory bounds, and data integrity during high-volume operations.
Performance Benchmarks: Located in the benches/ directory, these use the Criterion framework to provide statistically rigorous performance measurements of key API endpoints.

Setup and Usage

Prerequisites

The Rust toolchain (install via rustup.rs)

Configuration

The service uses Config.toml for configuration. Key settings include:

[server]
host = "127.0.0.1"
port = 8080

[log]
level = "info"

[store]
# Memory management settings
max_data_retention = 150000000    # Cleanup trigger (150M values)
cleanup_keep_count = 100000000    # Values kept after cleanup (100M)
max_symbols = 10                  # Maximum tracked symbols
starting_capacity = 1000000       # Initial segment tree capacity

All settings can be overridden with environment variables using the APP_ prefix:

export APP_SERVER__PORT=9090
export APP_STORE__MAX_DATA_RETENTION=200000000

Build & Run

Clone the repository and navigate to the root directory.
Build the service in release mode for maximum optimization:
```
cargo build --release
```
Run the compiled binary:
```
./target/release/hft-service
```
The service will start on the port specified in Config.toml (default 8080).

Running Tests & Benchmarks

# Run all standard unit and integration tests
cargo test --release

# Run the ignored, resource-intensive stress test (150M data points)
# Warning: This test requires significant memory and may take several minutes
cargo test --release test_large_data_and_memory_cleanup -- --ignored

# Run memory management tests (smaller scale, runs with regular tests)
cargo test test_large_data_volumes_memory_management --release

# Run the performance benchmarks
cargo bench

Memory Stress Testing

The service includes a comprehensive stress test that:

Adds 150 million data points to trigger automatic cleanup
Verifies memory limits are never exceeded
Confirms statistical accuracy after cleanup operations
Tests both small and large batch operations

To run the full stress test:

# This test uses ~1.2GB of memory and takes 5-10 minutes
cargo test --release test_large_data_and_memory_cleanup -- --ignored --nocapture

API Reference

1. Health Check

Verifies that the service is running and ready to accept traffic.

Endpoint: GET /health
Success Response (200 OK):
```
{
  "status": "ok"
}
```

2. Add Data Batch

Adds a batch of consecutive trading prices for a specific symbol. All prices must be non-negative.

Endpoint: POST /add_batch/
Body: A JSON object containing a symbol and an array of values.

Example curl:

curl -X POST http://localhost:8080/add_batch/ \
-H "Content-Type: application/json" \
-d '{"symbol": "ABC-USD", "values": [150.1, 150.5, 151.0, 149.8, 150.2, 151.1, 151.2, 152.0, 151.5, 151.9]}'

3. Get Statistics

Provides statistical analysis on the last 1e{exponent} data points for a given symbol.

Endpoint: GET /stats/
Query Parameters:
- symbol (string): The financial instrument's identifier.
- exponent (integer): A number from 1 to 8.
Example curl: Get stats for the last 1e1 (10) data points of "ABC-USD" added in the previous example.
```
curl "http://localhost:8080/stats/?symbol=ABC-USD&exponent=1"
```

Success Response (200 OK): The following values are calculated from the 10 data points in the add_batch example above.

{
  "min": 149.8,
  "max": 152.0,
  "last": 151.9,
  "avg": 150.93,
  "var": 0.5380099999999984
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
benches		benches
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Config.toml		Config.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

High-Frequency Trading Stats Service

Table of Contents

Core Design & Technology

Language: Rust 🦀

Data Structure: Segment Tree

Concurrency Model: DashMap

Numerical Stability: Welford's Algorithm

Production-Ready Features

Memory Management

Automatic Data Cleanup

Key Benefits

Memory Configuration

Testing Strategy

Setup and Usage

Prerequisites

Configuration

Build & Run

Running Tests & Benchmarks

Memory Stress Testing

API Reference

1. Health Check

2. Add Data Batch

3. Get Statistics

About

Uh oh!

Releases

Packages

Languages

melatron/hft-service

Folders and files

Latest commit

History

Repository files navigation

High-Frequency Trading Stats Service

Table of Contents

Core Design & Technology

Language: Rust 🦀

Data Structure: Segment Tree

Concurrency Model: DashMap

Numerical Stability: Welford's Algorithm

Production-Ready Features

Memory Management

Automatic Data Cleanup

Key Benefits

Memory Configuration

Testing Strategy

Setup and Usage

Prerequisites

Configuration

Build & Run

Running Tests & Benchmarks

Memory Stress Testing

API Reference

1. Health Check

2. Add Data Batch

3. Get Statistics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages