In this lesson, you’ll learn Linux kernel constructs that make up a “container,” and see a demonstration of how each construct works.
Staff Field Engineer at VMware
Hi! My name is John and I'm a Staff Field Engineer @ VMware where I work with customers to help design and implement Kubernetes solutions, in addition to contributing back to upstream open source projects.
Hey everyone and welcome to KubeAcademy. My name is John Harris and I'm a senior cloud native architect of VMware, and in today's episode we're going to look at one of the foundational building blocks of Kubernetes and that's the container.
So when we talk about a container what we really mean is just an application or a process, that we can run with some kind of isolation around it, so that it only sees its own networking stack and process tree, also has its own root file system, and can optionally put some CPU and memory limits around that container so it doesn't stomp on other processes.
So the first part that we care about is the root file system, this is really represented by the docker image itself, and that gets run by the container runtime. And this is where you have all the supporting artifacts for your application, so the binaries, anything that you need. Sometimes it's a large OS like Ubuntu, or if you have a statically compiled Go binary for instance you might just use the scratch image.
So let's take a look at a image I've got on my machine. So I'm going to do a docker image inspect, on this image I have here. Netshoot, and we can see at the bottom here we have a root file system and we have a number of layers. These layers are all [00:01:09], and they represent different stages or layers in the build process for the image, and we could do a docker history on this image to see what commands will run. There we go. Where this image was created. So all of these layers are then merged together by docker, and this represents the root file system of our container when we actually run it.
So the second part that we care about is we want to run this with some isolations, we want to give it its own networking stack, we want to give it its own process tree, we want to make sure this root file system is running encapsulated from everything else on the machine. And the way with we do with that is using a Linux kernel construct called namespaces. Now there is seven namespaces, most of which are used by container run times. And each one of them is responsible for different things, the network namespace, the controls networking stack, is the PID namespace for processes, is the UTS namespace for host name. So we're going to run a container and see what the effect of those namespaces are on our process.
So let's do a docker run, I'm going to run it in the background, and importantly remember this host name I'm going to give this container which is C1. And we have the name of test, and we're going to run the same image that I ran before, and because this is going to die if I don't give it a process to run, I'm going to give this process sleep. So now we can see our container is running, I can do a ps, ux, and then grep for sleep, and we can see that it has the PID of two one eight zero. I can also do a docker inspect on this container, and grep for PID, and it's going to show us the same thing. Okay, so docker shows us this, we can get it from our ps as well, so this process is actually running on our kernel, there's no additional layer of virtualization here. So PID two one eight zero, process ID two one eight zero.
So this time what we're going to do is we're going to exec into this container. So let's exec [00:02:53] test. And now if we do an ls, we can see if we've got our own root file system. If I go into home, we can see this obviously isn't my machine. I can do a host name and I can see that it's C1 and if I do an ifconfig I can see that it has its own networking stack. So if I exit out of here and now do a host name, now I'm back on my laptop, we can see this is different. We can see that my networking stack's completely different, it has way more interfaces than inside the container.
What I can also do is use a tool called nsenter namespace enter to go into a target process and optionally inherit some of its namespaces. So let's go back and take a look at our PID so it's two one eight zero. So this time I'm going to use nsenter to go target two one eight zero. I'm going to inherit the UTS namespace which controls the host name. So now I've got a shell and I've run that using nsenter and I've taken two one eight zero's host name. So if we do an ifconfig again we can see it has still got my host's networking stack but if I do a host name we can see I'm C1. So I'm inside that container's namespace for C1.
So I'm inside that container's namespace for C1. So let's exit back out of here, now I'm back on the host so I do a host name again, there we go. This time I'm going to do nsenter but I'm going to choose some different namespaces. So I'm not going to inherit the host name but I am going to inherit the net and the mount namespaces. So this time if I do a host name, it still says jpersonal so I didn't take that namespace but if I do an ifconfig we can see that it has inherited the namespace, the interfaces for my namespace. If I do an ls, I've got this re file system for my machine. So I'm going to exit back out of here, now I'm on my host again so I can do a host name. Okay, so that was namespaces and that allowed us to give some isolation around our process in terms of networking stack and process, and host name and those kind of things.
But what happens if we want to run multiple containers on a machine and we don't want them to stumble over each other in terms of resource utilization. We want to be able to put some CPU and memory limits around our containers. And we can do that using a construct called control groups or C groups. So let's take a look at that now. So I'm going to do a docker ps and we can see that this machine, this container is already running, it has a container of F7. So I'm going to take a look at its C groups. So I'm going to go cat [00:04:58] it's two one eight zero is our PID and we're going to look at C group. And we can see here all the C groups for this process and if you take a look at our memory here, so it starts with this docker parent, so C groups are hierarchical.
So everything comes under this docker parent. And underneath here I have this particular C group. And we can see that this ID is actually the same as the ID for our container. So let's go with taking a look what's in there in terms of let's look at the memory limit that our container currently has. So I'm going to go cat proc two one eight zero. Nope, no I'm going into sysfs C group memory, and this is where we go into the exact C group we care about, so I'm going to go into docker and then F7 is my container ID, then I do memory limit in bytes. [00:05:44] That's a lot, right? I haven't put any constraint around this container when I ran it, so it basically has a whole ton of memory available to it.
So this time let's run another container with a memory limit. So let's do docker run, we're going to run it in the background. I'm not going to give it a host name this time, but I am going to give it a memory limit of four megabytes, so let's remember that. Name test two and I'm going to use the same image for the sake of argument. And again I need to give it a process to run, so it's just going to go to sleep again. Okay, so now if we do a docker ps, means you have two containers running, so F7 was our first one we can see it's been up four minutes, and then F9 was our second one we see that has been up one second.
So let's go take a look at the C group for this in the memory. Let's do a cat, sysfs, C group, memory docker, this time it's F9, memory limit in bytes. Okay cool, so we can see it has actually put a memory limit in bytes around this container of four megabytes, it's exactly what we asked for. Pretty cool, so basically what docker is doing or container run times is doing under the hood is taking the root file system you give them in the form of a docker image and then it's running that with a whole bunch of namespaces around it and optionally with some of these CPU and memory constraints around it as well. We can specify those when we run through docker or we can specify those in our Kubernetes manifest in terms of resource requests and limits. So I hope this video has been super useful in helping you understand exactly what a container is. Thanks for following and hope to see you in the next video.