This document outlines some general guidelines for running a secure Docker server, and specific notes regarding the containers hosted by d.rymcg.tech. This guide is not comprehensive. The standard disclaimer from the LICENSE is worth repeating:
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
You should start by reading the official Docker security guide.
You can install Docker on pretty much any Linux server, but some hosts are better than others.
In addition to the standard criteria of location, cost, performance, etc, you should consider if your host has the following features:
-
Hosted Firewall
- A Docker server will manage the iptables of the entire host machine, and this can have unintended consequences when it comes to protecting the open ports of the Docker server.
- You should not use a firewall manager like
ufworfirewalldon the same host with Docker. These tools will deceive you and Docker may publish ports you thought were blocked. - It is best to think of iptables on the Docker server as a routing table, and not a security device.
- Instead of relying on the local server firewall, you should use the external firewall that is provided by your hosting provider. Usually this is a web dashboard to select the incoming ports and default rules.
- A basic firewall ruleset should allow:
- Port 22 for admin SSH access
- Port 80 for HTTP redirection to HTTPS
- Port 443 for all web (HTTPS) traffic
- Deny all other ports, unless you choose to open something else.
- If you're stuck with a host without an external firewall, consider using _docker_vm or if you want to stick with native docker, use chaifeng/ufw-docker.
-
Nested Virtualization
- Although not necessary for a normal Docker installation, you may want to run a virtual machine like KVM or microVMs in containers with firecracker. If you have a VPS that is already virtualized, you need to have the Nested Virtualization kernel feature to run VMs inside of VMs.
# Check if nested virtualization supported:
## Intel:
cat /sys/module/kvm_intel/parameters/nested
## AMD:
cat /sys/module/kvm_amd/parameters/nested
-
Nested virtualization can be useful for creating multiple Docker VMs inside of one VPS.
- This could be useful on VPS that don't have an external
firewall, you can run
ufwon the host VPS, and then run Docker inside of a nested KVM virtual machine.
- This could be useful on VPS that don't have an external
firewall, you can run
-
Size
-
Don't get a server that is too big for your needs. Get a VPS that is sized exactly for one instance of Docker, where you can choose exactly how much RAM and disk you need. Otherwise you may be tempted to install other things on your server besides Docker and this complicates the security of the server.
-
To quote the Docker security guide:
"If you run Docker on a server, it is recommended to run exclusively Docker on the server, and move all other services within containers controlled by Docker. Of course, it is fine to keep your favorite admin tools (probably at least an SSH server), as well as existing monitoring/supervision processes, such as NRPE and collectd."
-
- Configure
/etc/ssh/sshd_configor add files in/etc/ssh/sshd_config.dand ensure the following options are set:PermitRootLogin prohibit-passwordPubkeyAuthentication yesPasswordAuthentication no
- Consider disabling host keys you do not need.
- Prefer using
rsaored25519keys. - Disable older
dsaandecdsakey types.
- Prefer using
- Install Docker on
Debian or
Ubuntu.
- Follow the exact instructions from docker.com, do not install Docker from your regular package manager repository, but use docker.com package repository instead.
- If you are proficient with a different distro, go ahead.
Docker has two non-default settings that could improve security of your containers, by mapping non-existant host UID ranges to the container space. These two settings are User Namespace mode and a Rootless mode.
However, the current Traefik configuration is setup to use the host network, which is incompatible with such settings, so they will not be considered here.
Nonetheless, you can experiment by turning User Namespace mode on, using the root Makefile:
# cd ~/git/vendor/enigmacurry/d.rymcg.tech
# turn it on:
make userns-remap
# turn it off:
make userns-remap-off
# check the current setting:
make userns-remap-check
These commands will automatically edit the Docker server's
/etc/docker/daemon.json and restart Docker.
Docker Engine runs as root (in the default configuration). By default, all containers run as root too. Root inside the container is the same UID as root outside the container (UID=0). Docker tries to do some minimal sandboxing, but the fact remains that if you run a docker container without any consideration for limiting the default privileges, your attack surface is far larger than necessary.
When creating Docker containers, you should limit the privileges and capabilities granted to it.
In a Dockerfile you can add an unprivileged user to use as the default user:
FROM alpine:3
ARG USER_UID=54321
RUN adduser foo -D -u ${USER_UID}
USER foo
Or you can run an image with an alternate UID:
# Example docker command:
docker run --rm -it --user 1234:1234 alpine:3 id -u
# Example docker-compose.yaml
version: "3.9"
services:
thing:
image: alpine:3
user: ${USER_UID:-54321}:${USER_GID:-54321}
By default, containers are given only a limited set of default
capabilities
(See all Linux capabilities
here) This means that
root inside the container is not quite as powerful as root outside
the container, but still has more privileges than necessary.
You can tell a container to start with a different list of privileges
than the default via the --cap-drop, --cap-add, and --security-opt
flags.
Here is an example that drops ALL privileges:
services:
thing:
.....
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
cap_drop: ['ALL'] tells the container to drop all privileges, even
the default ones that Docker normally gives. The
no-new-privileges:true flag disallows acquiring any privileges not
granted at the start (eg. a binary could have setcap enable a root
capability for a non-root user, for that program only, similar in
concept to setuid but for more fine-grained permission control.
no-new-privileges disallows access to the capability that setcap
requested.)
Unless your container works entirely without root access, this list is
likely too restrictive. You will need to use cap_add to add some of
the capabilites back. A good strategy is to drop ALL capabilites,
and then add all of them back, explicitly. Then you can comment out
the capabilites that you don't need, testing them by process of
elimination, whether container behaves properly without them:
NOTE: the following example, with all capabilites added, is
essentially the same as running your container with privileged: true:
services:
thing:
.....
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
cap_add:
### ALL capabilities back explicitly:
## Try to comment most of these out and see how your container behaves
- SETGID
- SETUID
- CHOWN
- DAC_OVERRIDE
- SYS_CHROOT
- AUDIT_WRITE
- FOWNER
- AUDIT_CONTROL
- AUDIT_READ
- BLOCK_SUSPEND
- DAC_READ_SEARCH
- FSETID
- IPC_LOCK
- IPC_OWNER
- KILL
- LEASE
- LINUX_IMMUTABLE
- MAC_ADMIN
- MAC_OVERRIDE
- MKNOD
- NET_ADMIN
- NET_BIND_SERVICE
- NET_BROADCAST
- NET_RAW
- SETFCAP
- SETPCAP
- SYS_ADMIN
- SYS_BOOT
- SYSLOG
- SYS_MODULE
- SYS_NICE
- SYS_PACCT
- SYS_PTRACE
- SYS_RAWIO
- SYS_RESOURCE
- SYS_TIME
- SYS_TTY_CONFIG
- WAKE_ALARM
You should never run a container as --privileged or privileged: true. Instead, you should figure out the exact capabilities it needs
by process of elimination.
Here is a description of the default capabilites granted to Docker containers and recommendations on when to drop them:
CHOWN- Allows the container to change file ownership.
- Drop this if the container doesn't need to modify file ownership.
DAC_OVERRIDE- Bypasses file read, write, and execute permission checks, which is useful for some kinds of containers that need to process files created by other users (e.g. backup processes).
- Drop this unless the container needs to access restricted files or directories.
FOWNER- Allows bypassing permission checks on operations that affect file ownership.
- Drop this if file permission changes are not required.
FSETID- Allows setting file system IDs on a file (e.g., setting the set-user-ID or set-group-ID).
- Drop this unless the container manages file system IDs.
KILL- Allows the container to send signals to other processes (including killing them).
- Drop this if the container doesn’t need to manage other processes.
NET_RAW- Allows the container to use raw sockets, which can be used for networking tools like ping.
- Drop this if the container doesn't need to create or manipulate raw network packets.
SETGID- Allows the container to set group IDs.
- Drop this if the container doesn't need to change its group permissions.
SETUID- Allows the container to set user IDs.
- Drop this if the container doesn't need to change its user permissions.
SETPCAP- Allows the container to modify process capabilities.
- Drop this if the container doesn’t need to modify its capabilities.
- The docker flag
--no-new-privilegessupercedes this capabilitiy.
NET_BIND_SERVICE- Allows non-root users to bind to low-numbered ports (below 1024
or whatever is set by
sysctl: net.ipv4.ip_unprivileged_port_start). - Drop this if the container doesn’t need to bind to privileged ports (e.g., running a service on port 80).
- Allows non-root users to bind to low-numbered ports (below 1024
or whatever is set by
SYS_CHROOT- Allows the container to use the chroot system call, which changes the root directory.
- Drop this unless your container uses chroot to restrict the filesystem view.
MKNOD- Allows the container to create special files (e.g., device files).
- Drop this unless your container needs to create special device files.
AUDIT_WRITE- Allows the container to write to the audit logs.
- Drop this unless the container is involved in auditing operations.
In the root directory of d.rymcg.tech, the Makefile has a target to
audit all containers. make audit will find all services, and print a
report of the privileges each service has, containing the following
information:
CONTAINERthe container nameUSERthe user and or UID the container runs asCAP_ADDwhich system capabilities to addCAP_DROPwhich system capabilities to dropSEC_OPTwhich security options to enable.BIND_MOUNTSthe list of all bind (host) mounted paths.PORTSthe list of open ports.
(Scroll right, the output is very wide .....)
$ make audit | less -S
CONTAINER USER CAP_ADD CAP_DROP SEC_OPT BIND_MOUNTS PORTS
bitwarden root __ __ ["no-new-privileges:true"] [] {"80/tcp":[{"HostIp":"127.0.0.1","HostPort":"8888"}]}
cryptpad root __ __ ["no-new-privileges:true"] ["/etc/localtime:/etc/localtime:ro","/etc/timezone:/etc/timezone:ro"] {}
debian root __ __ __ ["shell-shared:/shared"] {}
drawio-drawio-1 root __ __ ["no-new-privileges:true"] [] {}
sftp-sftp-1 root ["CHOWN","DAC_OVERRIDE","SYS_CHROOT","AUDIT_WRITE","SETGID","SETUID","FOWNER"] ["ALL"] ["no-new-privileges:true"] [] {"2000/tcp":[{"HostIp":"","HostPort":"2223"}]}
syncthing root __ __ ["no-new-privileges:true"] [] {"21027/udp":[{"HostIp":"","HostPort":"21027"}],"22000/tcp":[{"HostIp":"","HostPort":"22000"}],"8384/tcp":[{"HostIp":"127.0.0.1","HostPort":"8384"}]}
thttpd-thttpd-1 54321:54321 __ __ ["no-new-privileges:true"] [] {}
traefik-traefik-1 traefik ["NET_BIND_SERVICE"] ["ALL"] __ ["/var/run/docker.sock:/var/run/docker.sock:ro"] {}
websocketd-app-1 root __ __ ["no-new-privileges:true"] [] {}
whoami_foo-whoami-1 54321:54321 __ ["ALL"] ["no-new-privileges:true"] [] {}
whoami-whoami-1 54321:54321 __ ["ALL"] ["no-new-privileges:true"] [] {}
All well behaved process should:
- Not run as root (if it can be avoided)
- Only add the specific capabilites it needs.
- Drop
ALLother capabilites. - Set "no-new-privileges:true" Security Option. (Assuming it does not
need to assume new privileges via
setcaporsetuidbinary).
You should only run Docker on a dedicated server machine (or VM). If you ignore this advice, and run Docker natively on a server/workstation hybrid (Sworkstation), then you must take special care to avoid the following scenario:
- By default, the host network has access to every private docker network on the system. This means that every single user on the system can bypass traefik and access any private Docker container port. On a sworkstation with various userspace programs running, this presents a security problem.
To mitigate this issue, you can create firewall rules to prevent unauthorized users from accessing these private ports:
- Identify the Traefik user id:
id -u traefik(example:1001) - Identify the IP address of a docker container to test with (whoami):
docker inspect whoami-whoami-1 | jq -r .[0].NetworkSettings.Networks.whoami_default.IPAddress
-
As a normal user, test ping to that private container address (it should work by default)
-
Create
/etc/nftables/isolate-docker-containers.nftwith the following config, ensuring that you update all of thesetconfigs, according to your environment:allowed_uidscontainer_ifacesallow_from_containers_tcpallow_from_containers_udp
# /etc/nftables/isolate-docker-containers.nft
table inet hostguard {
# Set of users that should have unfiltered access to all docker containers:
# This SHOULD include the root user (e.g., 0)
# This MUST include the traefik user (e.g., 1001) !
set allowed_uids { type uid; elements = { 0, 1001 } } ## <----- set your actual traefik UID here
# All interfaces that carry container traffic to/from the host
set container_ifaces {
type ifname;
flags interval;
elements = { "docker0", "br-*", "macvlan*", "ipvlan*" };
}
# Set of allowed TCP/UDP host ports that the containers may access:
# Start with an empty list, so all ports are blocked:
set allow_from_containers_tcp { type inet_service; }
set allow_from_containers_udp { type inet_service; }
## OPTIONAL: to enable certain TCP/UDP host ports to be allowed access from containers,
## uncomment the next two lines, and set the port elements:
# set allow_from_containers_tcp { type inet_service; elements = { 53, 443 } }
# set allow_from_containers_udp { type inet_service; elements = { 53 } }
# EGRESS: block host -> containers for non-allowed UIDs
chain egress {
type filter hook output priority 0; policy accept;
meta oifname @container_ifaces meta skuid != @allowed_uids drop;
}
# INGRESS: block containers -> host (any local IP: VPN/LAN/WAN/lo)
chain ingress_local {
type filter hook input priority 0; policy accept;
ct state established,related accept;
iifname @container_ifaces fib daddr type local tcp dport @allow_from_containers_tcp accept;
iifname @container_ifaces fib daddr type local udp dport @allow_from_containers_udp accept;
iifname @container_ifaces fib daddr type local drop;
}
}
- Add this table to the main nftables config (check your OS for the proper file path):
# /etc/sysconfig/nftables.conf
include "/etc/nftables/isolate-docker-containers.nft"
- Restart nftables
sudo systemctl restart nftables
-
As a normal user, test pinging the IP address of the private docker container again (this should be blocked now).
-
As the
root(ortraefik) user, test pinging the same address (this should not be blocked).