This project is a high-performance RESTful service built in Rust, designed to handle the rigorous demands of high-frequency trading (HFT) systems. It allows for the real-time ingestion of trading data and provides near-instantaneous statistical analysis on variable-sized windows of that data.
The service was engineered with a primary focus on performance, safety, and robustness.
Rust was chosen as it provides a unique combination of strengths ideal for HFT systems:
- Peak Performance: Offers C-level performance with zero-cost abstractions, ensuring minimal and predictable latency.
- Guaranteed Memory Safety: The ownership and borrow checker eliminates entire classes of bugs (e.g., data races, null pointer dereferencing) at compile time.
- Fearless Concurrency: Rust's safety guarantees make it easier to write correct, efficient, and highly parallel code.
To meet the requirement of performing statistical analysis (min, max, avg, var) in better than O(n) time, this service uses a Segment Tree.
This data structure is optimal for this use case, as it can calculate all required statistics for any given range in O(log N) time, where N is the total number of data points for a symbol. For maximum performance, this service uses a non-recursive iterative implementation, avoiding function call overhead and any risk of stack overflow.
To handle a high volume of concurrent requests, the service uses DashMap as its central data store. Unlike a standard HashMap protected by a single global lock, DashMap provides fine-grained, sharded locking. This allows requests for different symbols to be processed in parallel, dramatically increasing throughput.
Standard variance calculation using the E[X²] - (E[X])² formula can suffer from catastrophic cancellation—a significant loss of precision when subtracting two large, nearly-equal numbers.
To guarantee high precision even with millions of data points, this service implements Welford's online algorithm. This is a numerically stable, single-pass method that computes variance by tracking a running mean and the sum of squared differences from that mean (M2). This approach avoids catastrophic cancellation entirely, ensuring the statistical results are always accurate.
This service includes several features essential for deployment in a production environment.
- Configuration Management: All aspects of the service (server, logging, memory management) are configured via
Config.tomland can be overridden with environment variables (e.g.,APP_SERVER__PORT=9090,APP_STORE__MAX_DATA_RETENTION=200000000), managed by thefigmentcrate. - Structured Logging: Uses the
tracingframework to emit structured (JSON) logs to both the console and a daily rotating file (logs/app.log), making them easy to analyze. - Graceful Shutdown: Listens for termination signals (
Ctrl+CorSIGTERM) and shuts down gracefully, allowing in-flight requests to complete. - Health Check: Provides a
GET /healthendpoint for load balancers and container orchestrators (like Kubernetes) to verify service health.
The service includes sophisticated memory management to handle massive data volumes while maintaining predictable performance.
- Retention Limit: Each symbol retains up to 150 million data points before cleanup is triggered
- Cleanup Strategy: When the limit is reached, the system keeps only the most recent 100 million values
- Cleanup-Before-Insertion: Data cleanup occurs before adding new batches, preventing temporary memory spikes
- Bounded Memory: Memory usage never exceeds configured limits, ensuring predictable resource consumption
- No Memory Leaks: Automatic cleanup prevents unbounded memory growth
- Predictable Performance: Memory usage remains within configured bounds
- Statistical Accuracy: All cleanup preserves the integrity of statistical calculations
- High Availability: No service interruption during cleanup operations
The cleanup behavior is fully configurable via Config.toml:
[store]
# Maximum data retention per symbol before cleanup is triggered (150M values)
max_data_retention = 150000000
# Number of values to keep after cleanup (100M values)
cleanup_keep_count = 100000000
# Maximum number of unique symbols that can be tracked simultaneously
max_symbols = 10
# Initial capacity for the segment tree (per symbol)
starting_capacity = 1000000These settings can also be overridden with environment variables:
export APP_STORE__MAX_DATA_RETENTION=200000000
export APP_STORE__CLEANUP_KEEP_COUNT=120000000
export APP_STORE__MAX_SYMBOLS=20The project employs a comprehensive, multi-layered testing strategy to ensure reliability and correctness.
- Unit Tests: Located alongside the source code in
src/, these test individual components like theSegmentTreein isolation. - Integration Tests: Located in the
tests/directory, these validate the entire service's API, including error handling and edge cases. - Stress Test: A dedicated, resource-intensive integration test (marked as
#[ignore]) verifies correctness and memory management under extreme load with 150 million data points. - Memory Management Tests: Comprehensive tests verify cleanup behavior, memory bounds, and data integrity during high-volume operations.
- Performance Benchmarks: Located in the
benches/directory, these use theCriterionframework to provide statistically rigorous performance measurements of key API endpoints.
- The Rust toolchain (install via rustup.rs)
The service uses Config.toml for configuration. Key settings include:
[server]
host = "127.0.0.1"
port = 8080
[log]
level = "info"
[store]
# Memory management settings
max_data_retention = 150000000 # Cleanup trigger (150M values)
cleanup_keep_count = 100000000 # Values kept after cleanup (100M)
max_symbols = 10 # Maximum tracked symbols
starting_capacity = 1000000 # Initial segment tree capacityAll settings can be overridden with environment variables using the APP_ prefix:
export APP_SERVER__PORT=9090
export APP_STORE__MAX_DATA_RETENTION=200000000- Clone the repository and navigate to the root directory.
- Build the service in release mode for maximum optimization:
cargo build --release
- Run the compiled binary:
The service will start on the port specified in
./target/release/hft-service
Config.toml(default8080).
# Run all standard unit and integration tests
cargo test --release
# Run the ignored, resource-intensive stress test (150M data points)
# Warning: This test requires significant memory and may take several minutes
cargo test --release test_large_data_and_memory_cleanup -- --ignored
# Run memory management tests (smaller scale, runs with regular tests)
cargo test test_large_data_volumes_memory_management --release
# Run the performance benchmarks
cargo benchThe service includes a comprehensive stress test that:
- Adds 150 million data points to trigger automatic cleanup
- Verifies memory limits are never exceeded
- Confirms statistical accuracy after cleanup operations
- Tests both small and large batch operations
To run the full stress test:
# This test uses ~1.2GB of memory and takes 5-10 minutes
cargo test --release test_large_data_and_memory_cleanup -- --ignored --nocaptureVerifies that the service is running and ready to accept traffic.
- Endpoint:
GET /health - Success Response (
200 OK):{ "status": "ok" }
Adds a batch of consecutive trading prices for a specific symbol. All prices must be non-negative.
- Endpoint:
POST /add_batch/ - Body: A JSON object containing a
symboland an array ofvalues. - Example
curl:curl -X POST http://localhost:8080/add_batch/ \ -H "Content-Type: application/json" \ -d '{"symbol": "ABC-USD", "values": [150.1, 150.5, 151.0, 149.8, 150.2, 151.1, 151.2, 152.0, 151.5, 151.9]}'
Provides statistical analysis on the last 1e{exponent} data points for a given symbol.
- Endpoint:
GET /stats/ - Query Parameters:
symbol(string): The financial instrument's identifier.exponent(integer): A number from 1 to 8.
- Example
curl: Get stats for the last 1e1 (10) data points of "ABC-USD" added in the previous example.curl "http://localhost:8080/stats/?symbol=ABC-USD&exponent=1" - Success Response (
200 OK): The following values are calculated from the 10 data points in theadd_batchexample above.{ "min": 149.8, "max": 152.0, "last": 151.9, "avg": 150.93, "var": 0.5380099999999984 }