• he/him

programming, video games, dadding. I happen to work for Xbox, but don't represent them.


more blog
dev.to/knutaf
discord
knutaf

iliana
@iliana

i worked professionally as a linux distro maintainer from 2014 to 2021. this is a tale of what i learned about docker, and the ecosystem that grew up around it, over those years. treat this as folklore, not as a proper secondary source, because i am not wasting my time googling for open source drama

docker essentially does two things:

  1. it lets you build, layer, and share OS images in a standard1 format.
  2. it lets you run linux containers that use those layered images as a filesystem.

the idea of a linux container is that the kernel creates separate namespaces for all of the features userspace programs use. there's a lot of them, but the one we'll talk about today is the PID namespace, which keeps track of all the processes on a system with a process ID.

i'm glossing over the details, but when you create a new PID namespace and put a process in it, that process becomes PID 1 inside the namespace. PID 1 is special; on normal systems it is usually the init system, which is primarily responsible for starting all of the other processes you care about running.

the foreshadowing zone

ok well this definition was correct like 13 years ago. since that time some folks decided (imo correctly) that an init system should not just run a series of shell scripts in lexicographical order on boot and shutdown, and should instead know what the concept of a long-running process that does things and should be restarted if it crashes is. this led to upstart and later systemd which do many more things, like avoid running a process altogether until someone asks for it over a network socket. (or things like sandboxing, which look a whole lot like some features of docker!)

this box is called "the foreshadowing zone" because we will come back to the complexity of init in a sec.

because of implementation details it also does a half dozen other tiny things, like becoming the new parent of any processes whose parents die, and being responsible for [stares at notecard] reaping zombie processes? listen, it's complicated, and this blog post about docker PID 1 zombies or whatever goes into detail just fine if you're interested.

docker was originally designed as a relatively lightweight system for assembling a filesystem and making a container on top of it, so it just exec'd whatever you asked it to — for example, mysqld — as the first PID in a new PID namespace: PID 1. mysqld doesn't expect to be run as PID 1 and does not know how to perform the responsibilities of PID 1 because mysqld is not an init system. in effect, docker has placed an unlicensed four-year-old in the drivers seat of a multimodal semi truck and said "ok buddy you can do this". this manifests itself as all sorts of weird problems, but the most notable is if you docker run that image and then hit ^C, it will not exit, because the process group hasn't exited because it is not mysqld's job to terminate all its children before exiting.

hang on. why is mysqld in its own container anwyay? the web app we want to containerize uses the database, and we're supposed to be able to package all our dependencies into one container, right? it's not like real world shipping containers, the entire metaphor upon which docker invented itself, have little pipes that go between them so the containers can talk to each other, right?

maybe what we need is an init system for our containers. that'll solve all two of the problems we know about so far: running multiple processes, and reaping their zombies.

except that's not what happened:

  • upstart and systemd are extremely complex (foreshadowing payoff), were never designed to run inside containers2, and didn't seem to care to change fundamental design decisions to do so, instead focusing on the part of the system outside your containers.
  • separate docker images can't be combined in any meaningful way; a stock mysql image might have different libraries in it than your web server because you used different distros as your base. so docker did the pragmatic thing, and decided they would make it so that you could link containers together and let them talk to each other.

these days, lots of docker containers do have an init process. it's called tini, and it runs only one process, but does so correctly (blah blah zombies). if you want to run multiple services inside a single container, you have to do it yourself, so it's not a surprise the entire ecosystem assumes you won't do that.

so, the answer: sometimes something is so groundbreaking that it changes the world without actually being ready yet, and we get stuck using a half-finished ecosystem for a decade or more.


  1. by "standard" we of course mean "made-up and then retconned into a standard", which is how all standards are made; as you'd expect there are some rough edges even to this day.

  2. if you want to run systemd in a container you can — some folks from red hat were even touting it at the time — but good lord is it a pain in the ass. it didn't catch on for a reason.


You must log in to comment.

in reply to @NireBryce's post:

in reply to @iliana's post:

Thanks for the explanation! It's cool to know the history behind it :)

Are there things nowdays like work like what Nire described, where it's one environment with all the dependencies (without the weight of a whole VM)? Or does it seem like we'll keep chugging ahead with one service-per-container and tie them all together with Kubernetes and Helm or similar?

My background has mainly been working at big tech companies that have their own weird stuff for handling distributed systems, so I'm honestly unfamiliar with stuff beyond the Tech Island I'm on.

There's a lot that could be said about this, but really what Docker provides is process isolation. You could deliver a big ball of Docker that had a full LAMP stack or similar today and people do, solving the init problem with runit or upstart usually but you'll still need a way to deliver your application as well. If you also bundle your application in... Well, what's the benefit of using Docker and all the pain and expense there instead of just delivery on metal?

well, docker does get you pretty good process isolation, but in my experience a lot of people don't care about that compared to distributing a pile of files without requiring you to think too much about it if you really didn't want to. (compare docker commit to the creation of an RPM or dpkg archive, for instance.)

honestly filesystem isolation (that is, the mount namespace) is probably a lot more core to docker's thesis. after all, you can disable making a new PID namespace when you start a container.

(i am of course biased as someone who maintained packages for a living and knew how awful that shit was.)

in terms of things nowadays that let you create an environment of all your dependencies, nix is probably that thing, but unfortunately as you may already know, nix.

Sure but the same issue applies. Filesystem isolation isn't new at all either, people have been using chroot jails (or more advanced lightweight isolation schemes like FreeBSD jails or Solaris Zones) since the day before forever and you end up struggling with the same issues as soon as you start bundling your applications together in the same container because then you're basically just doing a full paravirtualized OS.

To be fair, OSes also implemented process isolation before then too, such as with cgroups. I guess a lot of those things you could do, provided your OS was always in a state you expected (which it often isn’t)

I guess when I see companies adopting things like docker, from my limited outside view of things, it seems like they want an easy solution to deploy reproducible environments right? Plus a way to “horizontally scale” (to use an old buzzword), e.g. some IPC mechanism like how docker does with sockets and volumes?

I have a specific company in mind that’s looking at the vaguely cloud-container-microservices end of things because they were acquired and their new corpo overlords cringed at their ancient tech stack that’s tied to some on-prem windows servers that get rained on periodically and some mysql databases that are being used in horrific ways to act as a message broker

I was kind of just curious how distributed computing will continue to evolve, given a lot of (valid!) gripes about containers (and the very real problems they seem to want to solve)

I don't think Docker is a bad solution is the thing. I don't think any of the solutions people have used since the 60s to solve these same basic issues are inherently bad but the landscape is always changing and there's always tradeoffs. Solving a bunch of these issues is why IBM developed CP/CMS and "virtualization" as we know it today, which has its own set of issues. Docker is at its safest and most effective when there's one major application per container but then you have the problem of installing container groups. Probably the next "solution" is going to be a packed virtualization solution where you push whole OS images around, which brings us back to CP/CMS and its modern counterparts in the "hosted cloud" arena. Nix and Guix are pursuing another direction of whole reproducible system builds, which puts us back at scripted installs from the 90s or prepackaged system tapes from the 70s and 80s. Cycle of reincarnation; there's no perfect solution.