A complete data pipeline for analyzing scamper network measurement data using ClickHouse and Grafana.
This AIMS-18 Hackathon project demonstrates:
- Data Loading: Convert scamper warts files to ClickHouse format
- Data Analysis: SQL-based network measurement analytics
- Data Visualization: RTT and DNS robustness analysis with Grafana
pip install -r requirements.txt
./setup.shOption A: Mock Data (Quick Demo)
python data/generate_mock_data_simple.pyOption B: Real Scamper Data
# Requires scamper daemon running
python Scamper/generate_scamper_data.py /var/run/scamper 8.8.8.8
python Clickhouse/warts2clickhouse.py *.warts- Grafana Dashboard: http://localhost:3000 (admin/admin)
- ClickHouse Query Interface: http://localhost:8123/play
AIMS-18/
├── Clickhouse/
│ ├── clickhouse-config.xml # ClickHouse configuration
│ └── schema.sql # Table schema definitions
│
├── data/
│ ├── generate_mock_data_simple.py # Generate mock test data and insert into ClickHouse
│ ├── ping_192.172.226.122.json # Sample JSON (converted with sc_warts2json)
│ └── ping_192.172.226.122.warts # Sample warts file
│
├── docker-compose.yml # Docker services configuration (ClickHouse, Grafana, etc.)
│
├── Grafana/
│ └── provisioning/
│ └── datasources/
│ └── clickhouse.yml # Grafana ClickHouse datasource setup
│
├── README.md # Project overview and usage instructions
├── requirements.txt # Python dependencies
│
├── Scamper/
│ ├── warts2clickhouse.py # Core script: parses warts and inserts into ClickHouse
│ └── generate_scamper_data.py # Generate real Scamper measurement data
│
└── setup.sh # One-click environment setup script
ping_measurements: RTT statistics and packet losstraceroute_measurements: Path discovery resultstraceroute_hops: Per-hop detailed informationdns_measurements: DNS query performance
Scamper → .warts files → warts2clickhouse.py → ClickHouse → Grafana
RTT Time Series Analysis:
SELECT
toStartOfMinute(timestamp) AS time,
avg(rtt_avg) AS value
FROM ping_measurements
WHERE timestamp >= today() - 2
GROUP BY time
ORDER BY timePacket Loss by Target:
SELECT
IPv6NumToString(destination) as target,
avg(rtt_avg) as avg_rtt,
avg(packet_loss) * 100 as loss_percent
FROM ping_measurements
WHERE timestamp >= today() - 1
GROUP BY target./warts2clickhouse.py your_file.warts
# View imported data
curl "http://localhost:8123/?query=SELECT * FROM ping_measurements LIMIT 10"# For Hackathon demonstration
./generate_mock_data_simple.py
# Check data count
curl "http://localhost:8123/?query=SELECT count() FROM ping_measurements"- Complete Pipeline: End-to-end solution from data generation to visualization
- Real-world Application: Based on CAIDA scamper for actual network measurements
- High Performance: ClickHouse columnar storage for large-scale analytics
- Docker Ready: One-click deployment with Docker Compose
- Production Ready: Supports both IPv4 and IPv6, handles timezone correctly
- Data Collection: Scamper (CAIDA)
- Database: ClickHouse (Columnar storage)
- Visualization: Grafana
- Deployment: Docker + Docker Compose
- Language: Python 3.8+
Clear data:
docker exec scamper-clickhouse clickhouse-client --query "TRUNCATE TABLE ping_measurements"Check ClickHouse status:
curl http://localhost:8123/pingView Grafana logs:
docker logs scamper-grafanaCreated for AIMS-18 Hackathon - A modern network measurement data analysis solution.