Unraveling Docker: An In-Depth Exploration of Linux Kernel Features and Commands

Introduction

Docker has profoundly impacted software development and deployment, offering containerization capabilities that simplify application management. The Linux kernel features that support Docker containers create isolated, resource-controlled, and efficient environments for applications. This blog post delves into the essential Linux kernel features at the heart of Docker containers: namespaces, cgroups, and union file systems. We’ll also examine Linux commands that provide insights into these features, giving readers a comprehensive understanding of Docker’s inner workings and the technologies that support it.

Namespaces: The Foundation of Isolation

Namespaces are a Linux kernel feature responsible for isolating various resources, such as process IDs, network interfaces, file systems, user IDs, IPC objects, and more. Docker takes advantage of namespaces to create unique environments for each container, ensuring that container processes cannot access resources or processes outside their designated namespace.

Docker uses several types of namespaces to achieve isolation:

a. PID (Process ID) Namespace: Isolates process IDs, ensuring each container has a unique set of process IDs.

b. NET (Network) Namespace: Isolates network devices, interfaces, and routing tables, allowing each container to have its network stack.

c. IPC (Interprocess Communication) Namespace: Isolates System V IPC and POSIX message queues, preventing interference between containers’ IPC mechanisms.

d. MNT (Mount) Namespace: Isolates file system mount points, enabling each container to have its file system hierarchy.

e. UTS (Unix Time Sharing) Namespace: Isolates hostname and domain name, allowing each container to have its hostname.

f. USER Namespace: Isolates user and group IDs, ensuring containers have their user and group ID mappings.

To examine namespaces, let’s run a Docker container:

$ docker run -d --name test_container ubuntu:18.04 sleep 3600

Now, we can list the namespaces using the lsns command:

$ lsns

You’ll see the namespaces created for the test_container. To view the process tree within the container’s PID namespace, use the nsenter command:

$ nsenter -t CONTAINER_PID -n pstree -p

Replace CONTAINER_PID with the container’s process ID, which can be obtained using docker inspect:

$ docker inspect --format '{{.State.Pid}}' test_container

Control Groups (cgroups): Resource Management and Limitations

Control groups, or cgroups, are a vital Linux kernel feature that allows Docker to manage and limit resource usage by containers. Cgroups enable the allocation and monitoring of resources such as CPU, memory, and disk I/O to containers. This resource management is essential for maintaining system stability and preventing resource hogging by containers.

Cgroups consist of various subsystems (also called controllers), each responsible for managing a specific type of resource:

a. cpu: Manages CPU usage and allocation among containers.

b. memory: Monitors and limits memory usage by containers.

c. blkio: Controls and limits block device (disk) I/O usage for containers.

d. devices: Manages access to devices by containers.

e. freezer: Pauses and resumes container processes.

To explore cgroups, we’ll use the cgcreate, cgexec, and cgclassify commands.

First, let’s create a control group named test_cgroup with limited CPU and memory resources:

$ sudo cgcreate -g cpu,memory:test_cgroup
$ sudo cgset -r cpu.shares=512 test_cgroup
$ sudo cgset -r memory.limit_in_bytes=256M test_cgroup

Now, let’s run a Docker container within the `test_cgroup`:

$ sudo cgexec -g cpu,memory:test_cgroup docker run --rm -it ubuntu:18.04

This command runs a Docker container based on the Ubuntu 18.04 image interactively within the test_cgroup control group, which has been pre-configured with CPU and memory limitations. Once the container exits, it will be automatically removed.

Inside the container, you’ll notice the limited CPU and memory resources as imposed by the control group. To move a running container to a different control group, use the `cgclassify` command:

$ sudo cgclassify -g cpu,memory:new_cgroup CONTAINER_PID

Union File Systems (UnionFS): Building Efficient Layered File Systems

UnionFS is a file system service that creates layered file systems for Docker containers. It allows multiple file systems to be mounted together, appearing as a single file system. Docker uses UnionFS to build images in a layered manner, improving storage efficiency and reducing container creation times. Several UnionFS implementations are supported by Docker, including OverlayFS, AUFS, and Btrfs. To explore UnionFS, we’ll inspect the layers of a Docker image using the `docker history` command:

$ docker history ubuntu:18.04

You’ll see the layers and their corresponding sizes. To dive deeper, let’s inspect the container’s file system layers using the `docker inspect` command:

$ docker inspect --format '{{ json .GraphDriver }}' test_container

This will show you the container’s file system layers, including the writable layer. Now, let’s explore the container’s file system using the `docker diff` command:

$ docker diff test_container

This command lists changes made to the file system since the container was created. More here.

Conclusion

In this blog post, we have unraveled the essential Linux kernel features that power Docker containers: namespaces, cgroups, and union file systems. We have also explored Linux commands that provide insights into these features, giving readers a comprehensive understanding of Docker’s inner workings and the technologies that support it. A deep understanding of these concepts is invaluable for anyone looking to effectively harness the power of Docker and containerization.