Skip to content

Commit d1fb8c9

Browse files
committed
Update minute replication job
1 parent 8f0541d commit d1fb8c9

File tree

3 files changed

+622
-205
lines changed

3 files changed

+622
-205
lines changed

images/replication-job/Dockerfile

Lines changed: 42 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,56 +1,54 @@
1-
FROM debian:bookworm
1+
FROM debian:bookworm AS builder
22

3+
# Setup PostgreSQL repository and install build dependencies
34
RUN apt-get update \
5+
&& apt-get install -y --no-install-recommends ca-certificates curl gnupg \
6+
&& . /etc/os-release \
7+
&& echo "deb http://apt.postgresql.org/pub/repos/apt ${VERSION_CODENAME}-pgdg main" > /etc/apt/sources.list.d/pgdg.list \
8+
&& curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor -o /etc/apt/trusted.gpg.d/postgresql.gpg \
9+
&& apt-get update \
410
&& apt-get install -y --no-install-recommends \
5-
ca-certificates \
6-
git \
7-
curl \
8-
gzip \
9-
bash \
10-
procps \
11-
openjdk-17-jre-headless \
12-
osmosis \
13-
postgresql-client \
14-
# Build deps for osmdbt (keep if you really need osmdbt built in this image)
15-
build-essential \
16-
cmake \
17-
gettext-base \
18-
libboost-program-options-dev \
19-
libbz2-dev \
20-
libexpat1-dev \
21-
libosmium2-dev \
22-
libprotozero-dev \
23-
libyaml-cpp-dev \
24-
libpqxx-dev \
25-
pandoc \
26-
postgresql-common \
27-
postgresql-server-dev-17 \
28-
zlib1g-dev \
29-
&& update-ca-certificates \
30-
&& rm -rf /var/lib/apt/lists/*
31-
32-
RUN git clone https://github.com/openstreetmap/osmdbt.git /osmdbt && cd /osmdbt && git checkout v0.9
33-
WORKDIR /osmdbt
34-
RUN mkdir -p build && cd build && cmake .. && make
35-
RUN cd /osmdbt/build && find . -type f -executable -name "osmdbt*" -exec cp {} /usr/local/bin/ \; || true
36-
ENV PATH="/osmdbt/build:/usr/local/bin:$PATH"
11+
git build-essential cmake pandoc \
12+
libosmium2-dev libprotozero-dev libyaml-cpp-dev \
13+
libpqxx-6.4 libpqxx-dev \
14+
libboost-program-options-dev libboost-system-dev libboost-filesystem-dev \
15+
postgresql-server-dev-17 libbz2-dev libexpat1-dev zlib1g-dev \
16+
&& rm -rf /var/lib/apt/lists/*
3717

38-
# --- Build and store the osm_logical.so plugin for PostgreSQL ---
39-
RUN cd /osmdbt/postgresql-plugin && \
40-
mkdir -p build && \
41-
cd build && \
42-
cmake .. && \
43-
make && \
44-
# Copia cualquier archivo .so producido a una ruta neutral fácil de extraer luego
45-
mkdir -p /usr/local/lib/osmdbt-plugin && \
46-
find . -name "*.so" -exec cp {} /usr/local/lib/osmdbt-plugin/ \;
18+
# Build osmdbt
19+
RUN git clone https://github.com/openstreetmap/osmdbt.git /osmdbt \
20+
&& cd /osmdbt && git checkout v0.9 \
21+
&& mkdir -p build && cd build \
22+
&& cmake -DBUILD_PLUGIN=OFF .. && make
23+
24+
RUN mkdir -p /tmp/runtime-libs \
25+
&& find /usr/lib/x86_64-linux-gnu /usr/local/lib -name "libosmium*.so*" -exec cp {} /tmp/runtime-libs/ \; 2>/dev/null || true \
26+
&& find /usr/lib/x86_64-linux-gnu /usr/local/lib -name "libprotozero*.so*" -exec cp {} /tmp/runtime-libs/ \; 2>/dev/null || true
27+
28+
# ============================================================================
29+
# Runtime stage
30+
# ============================================================================
31+
FROM debian:bookworm
4732

33+
# Install runtime dependencies
4834
RUN apt-get update \
4935
&& apt-get install -y --no-install-recommends \
50-
awscli \
36+
ca-certificates curl gzip bash procps postgresql-client \
37+
openjdk-17-jre-headless osmosis awscli \
38+
libpqxx-6.4 libyaml-cpp0.7 \
39+
libboost-program-options1.74.0 libboost-system1.74.0 libboost-filesystem1.74.0 \
40+
libbz2-1.0 libexpat1 zlib1g \
5141
&& rm -rf /var/lib/apt/lists/*
5242

53-
COPY start.sh /start.sh
43+
# Copy binaries and third-party libraries (libosmium, libprotozero from builder)
44+
COPY --from=builder /osmdbt/build /osmdbt/build
45+
COPY --from=builder /usr/local/bin /usr/local/bin
46+
COPY --from=builder /tmp/runtime-libs/ /usr/local/lib/
47+
48+
RUN ldconfig
5449

50+
ENV PATH="/osmdbt/build:/usr/local/bin:$PATH"
51+
52+
COPY start.sh /start.sh
5553
ENTRYPOINT ["/bin/bash","-c"]
5654
CMD ["/start.sh"]

images/replication-job/README.md

Lines changed: 86 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,96 @@
1-
### Delta replications job container
1+
# OSM Replication Job Container
22

3-
This contain is responsible for creating the delta replication files it may be set up by minute, hour, etc. Those replications files will be uploaded to a repository in AWS or Google Storage, depends on what you are using.
3+
This container is responsible for creating OSM delta replication files using [osmdbt](https://github.com/openstreetmap/osmdbt) tools. It generates replication files every minute and uploads them to S3 with integrity verification.
44

5-
### Configuration
5+
## What it does
66

7-
In order to run this container we need environment variables, these can be found in the following files👇:
7+
The container runs a continuous replication process that:
88

9-
- [.env.db.example](./../../envs/.env.db.example)
10-
- [.env.db-utils.example](./../../envs/.env.db-utils.example)
11-
- [.env.cloudprovider.example](./../../envs/.env.cloudprovider.example)
9+
1. **Executes replication cycle every minute**:
10+
- Runs `osmdbt-get-log` to fetch changes from PostgreSQL logical replication
11+
- Runs `osmdbt-create-diff` to generate `.osc.gz` files and `state.txt` files
12+
13+
2. **Uploads files to S3**:
14+
- Uploads `.osc.gz` files (e.g., `870.osc.gz`)
15+
- Uploads general `state.txt` (controls replication sequence)
16+
- Uploads specific `state.txt` files (e.g., `870.state.txt` for `870.osc.gz`)
17+
- All files are verified for integrity before upload
18+
19+
3. **Manages replication state**:
20+
- Uses local `state.txt` to control replication sequence
21+
- Recovers `state.txt` from S3 on container restart
22+
- Continues from the last known sequence automatically
23+
24+
4. **Error handling and monitoring**:
25+
- Verifies file integrity (gzip test, size checks)
26+
- Sends Slack notifications for errors and corruption
27+
- Automatically cleans up incomplete/orphaned files (`.lock`, `.log`, corrupted files)
28+
- Removes orphaned state files without matching `.osc.gz` files
29+
30+
5. **Self-healing**:
31+
- Detects and removes corrupted files
32+
- Regenerates files if corruption is detected
33+
- Maintains sequence continuity
34+
35+
## Technology Stack
36+
37+
- **osmdbt v0.9**: OSM Database Replication Tools from OpenStreetMap
38+
- **Base Image**: Debian Bookworm (for library version consistency)
39+
- **PostgreSQL**: Uses logical replication slot for change tracking
40+
- **AWS S3**: For storing replication files
41+
- **Slack**: Optional notifications for errors
42+
43+
## Configuration
44+
45+
### Required Environment Variables
46+
47+
The container requires environment variables from these files:
48+
49+
- [.env.db.example](../../envs/.env.db.example) - Database connection
50+
- [.env.db-utils.example](../../envs/.env.db-utils.example) - Database utilities
51+
- [.env.cloudprovider.example](../../envs/.env.cloudprovider.example) - Cloud storage configuration
1252

1353
**Note**: Rename the above files as `.env.db`, `.env.db-utils` and `.env.cloudprovider`
1454

15-
#### Running replication-job container
55+
### Key Environment Variables
56+
57+
#### Database Configuration
58+
- `POSTGRES_HOST` - PostgreSQL hostname
59+
- `POSTGRES_PORT` - PostgreSQL port (default: 5432)
60+
- `POSTGRES_DB` - Database name
61+
- `POSTGRES_USER` - Database user
62+
- `POSTGRES_PASSWORD` - Database password
63+
- `REPLICATION_SLOT` - Logical replication slot name (default: `osm_repl`)
64+
65+
#### S3 Configuration
66+
- `CLOUDPROVIDER` - Cloud provider (default: `aws`)
67+
- `AWS_S3_BUCKET` - S3 bucket name for replication files
68+
- `REPLICATION_FOLDER` - Folder path in S3 bucket (default: `replication`)
69+
70+
#### Slack Notifications (Optional)
71+
- `ENABLE_SEND_SLACK_MESSAGE` - Enable Slack notifications (default: `false`)
72+
- `SLACK_WEBHOOK_URL` - Slack webhook URL for notifications
73+
- `ENVIROMENT` - Environment name for notifications (e.g., `production`, `staging`)
74+
75+
#### Working Directory
76+
- `WORKING_DIRECTORY` - Working directory path (default: `/mnt/data`)
77+
78+
## Running the Container
79+
80+
### Using Docker Compose
81+
82+
```sh
83+
docker-compose run replication-job
84+
```
85+
86+
### Using Docker
1687

1788
```sh
18-
# Docker compose
19-
docker-compose run replication-job
20-
21-
# Docker
22-
docker run \
23-
--env-file ./envs/.env.db \
24-
--env-file ./envs/.env.replication-job \
25-
--env-file ./envs/.env.cloudprovider \
26-
-v ${PWD}/data/replication-job-data:/mnt/data \
27-
--network osm-seed_default \
28-
-it osmseed-replication-job:v1
89+
docker run \
90+
--env-file ./envs/.env.db \
91+
--env-file ./envs/.env.replication-job \
92+
--env-file ./envs/.env.cloudprovider \
93+
-v ${PWD}/data/replication-job-data:/mnt/data \
94+
--network osm-seed_default \
95+
-it osmseed-replication-job:v1
2996
```

0 commit comments

Comments
 (0)