Introduction
Docker has profoundly impacted software development and deployment, offering containerization capabilities that simplify application management. The Linux kernel features that support Docker containers create isolated, resource-controlled, and efficient environments for applications. This blog post delves into the essential Linux kernel features at the heart of Docker containers: namespaces, cgroups, and union file systems. We’ll also examine Linux commands that provide insights into these features, giving readers a comprehensive understanding of Docker’s inner workings and the technologies that support it.
Namespaces: The Foundation of Isolation
Namespaces are a Linux kernel feature responsible for isolating various resources, such as process IDs, network interfaces, file systems, user IDs, IPC objects, and more. Docker takes advantage of namespaces to create unique environments for each container, ensuring that container processes cannot access resources or processes outside their designated namespace.
Docker uses several types of namespaces to achieve isolation:
a. PID (Process ID) Namespace: Isolates process IDs, ensuring each container has a unique set of process IDs.
b. NET (Network) Namespace: Isolates network devices, interfaces, and routing tables, allowing each container to have its network stack.
c. IPC (Interprocess Communication) Namespace: Isolates System V IPC and POSIX message queues, preventing interference between containers’ IPC mechanisms.
d. MNT (Mount) Namespace: Isolates file system mount points, enabling each container to have its file system hierarchy.
e. UTS (Unix Time Sharing) Namespace: Isolates hostname and domain name, allowing each container to have its hostname.
f. USER Namespace: Isolates user and group IDs, ensuring containers have their user and group ID mappings.
To examine namespaces, let’s run a Docker container:
$ docker run -d --name test_container ubuntu:18.04 sleep 3600
Now, we can list the namespaces using the lsns
command:
$ lsns
You’ll see the namespaces created for the test_container
. To view the process tree within the container’s PID namespace, use the nsenter
command:
$ nsenter -t CONTAINER_PID -n pstree -p
Replace CONTAINER_PID
with the container’s process ID, which can be obtained using docker inspect
:
$ docker inspect --format '{{.State.Pid}}' test_container
Control Groups (cgroups): Resource Management and Limitations
Control groups, or cgroups, are a vital Linux kernel feature that allows Docker to manage and limit resource usage by containers. Cgroups enable the allocation and monitoring of resources such as CPU, memory, and disk I/O to containers. This resource management is essential for maintaining system stability and preventing resource hogging by containers.
Cgroups consist of various subsystems (also called controllers), each responsible for managing a specific type of resource:
a. cpu: Manages CPU usage and allocation among containers.
b. memory: Monitors and limits memory usage by containers.
c. blkio: Controls and limits block device (disk) I/O usage for containers.
d. devices: Manages access to devices by containers.
e. freezer: Pauses and resumes container processes.
To explore cgroups, we’ll use the cgcreate
, cgexec
, and cgclassify
commands.
First, let’s create a control group named test_cgroup
with limited CPU and memory resources:
$ sudo cgcreate -g cpu,memory:test_cgroup
$ sudo cgset -r cpu.shares=512 test_cgroup
$ sudo cgset -r memory.limit_in_bytes=256M test_cgroup
Now, let’s run a Docker container within the `test_cgroup`:
$ sudo cgexec -g cpu,memory:test_cgroup docker run --rm -it ubuntu:18.04
This command runs a Docker container based on the Ubuntu 18.04 image interactively within the test_cgroup
control group, which has been pre-configured with CPU and memory limitations. Once the container exits, it will be automatically removed.
Inside the container, you’ll notice the limited CPU and memory resources as imposed by the control group. To move a running container to a different control group, use the `cgclassify` command:
$ sudo cgclassify -g cpu,memory:new_cgroup CONTAINER_PID
Union File Systems (UnionFS): Building Efficient Layered File Systems
UnionFS is a file system service that creates layered file systems for Docker containers. It allows multiple file systems to be mounted together, appearing as a single file system. Docker uses UnionFS to build images in a layered manner, improving storage efficiency and reducing container creation times. Several UnionFS implementations are supported by Docker, including OverlayFS, AUFS, and Btrfs. To explore UnionFS, we’ll inspect the layers of a Docker image using the `docker history` command:
$ docker history ubuntu:18.04
You’ll see the layers and their corresponding sizes. To dive deeper, let’s inspect the container’s file system layers using the `docker inspect` command:
$ docker inspect --format '{{ json .GraphDriver }}' test_container
This will show you the container’s file system layers, including the writable layer. Now, let’s explore the container’s file system using the `docker diff` command:
$ docker diff test_container
This command lists changes made to the file system since the container was created. More here.
Conclusion
In this blog post, we have unraveled the essential Linux kernel features that power Docker containers: namespaces, cgroups, and union file systems. We have also explored Linux commands that provide insights into these features, giving readers a comprehensive understanding of Docker’s inner workings and the technologies that support it. A deep understanding of these concepts is invaluable for anyone looking to effectively harness the power of Docker and containerization.