I was reading Julia Evans' container/networking/BGP post and this
bit stood out to me:
So, I told you in a last section that processes in containers listen on
ports. This is not true in the normal way though. Processes in containers
live in a "network namespace". What does that look like? I got off a
plane today and was like "I will figure out how all this works! I know
networking! this will be easy!"
It's not totally easy.
It might be a bit easier for someone to understand containers if they
know where the tech underlying them came from. That's usually true for
The network namespace she mentions is one of the bunch of namespaces that
makes Linux containers work. There are also filesystem namespaces, IPC
namespaces, user namespaces, and process namespaces. Despite all these
namespaces being fairly new, the namespace is one of the fundamental
abstractions of UNIX that makes it the system we know and tolerate today.
UNIX traditionally has everything available to the system represented by
files and in one namespace rooted at
/. This was a pretty revolutionary
thing, and to some extent, still is: there's no equivalent of
/dev/sd? on Windows.
Of course, this rule was broken almost immediately:
show up until UNIX v8 but processes existed the whole time. Advances
in how we used computers also broke this rule; BSD bolted sockets on in
a way that makes them almost, but not quite, entirely unlike files. In
fact, remember back a few sentences ago where I threw shade on Windows
for not representing things in the filesystem? Well, how many lines does
ls /dev/eth? return on your Linux box?
Plan 9 From Bell Labs was a re-thinking of UNIX in the face of these
changes. The concept of a single type of namespace is the key idea in
Plan 9: everything is represented underneath
/ somewhere, and everyone
gets their own view of it. (The mechanisms of how this is accomplished
is both interesting and a story for another time.) Everything is a file,
for real this time. The reason this is relevant -- and I didn't realize
this at first either -- is that it's trivial to cordon off processes:
if the process can't talk to the internet, unmount
/net. There's a
few other quirks of Plan 9 that make this work correctly, namely users
and per-process namespaces, but that's the central conceit.
Linux cribbed this idea from Plan 9, along with
/proc. And there was
much rejoicing because, hey, we have this cool way to section off parts
of the filesystem now! But if, like Linux, your system has multiple
types of namespaces, it now needs to have support to manipulate (read:
hide) those namespaces. To contain a process, you now need to have all
those namespaces set up correctly. And their namespacing needs to work
right. And they need to interact correctly. And if your entire userspace
is built on the assumption that there's only one
/ namespace, you
can't really give everyone their own view of it. This is where a big
chunk of the complexity of Linux containers comes from: all the bits of
your OS that are hidden, their interactions, and half-applied concepts.
So, that's what underlies all these shiny new containers: retrofitting
a useful idea and all the jank that comes with that.