Kubernetes in Depth
Kubernetes is made up of a number of components across control and workload planes. This lesson will walk through what each component does and how they work together.
John Harris
Staff Field Engineer at VMware
Hi! My name is John and I'm a Staff Field Engineer @ VMware where I work with customers to help design and implement Kubernetes solutions, in addition to contributing back to upstream open source projects.
Hey, everyone. Welcome to KubeAcademy. My name is John Harris, and I'm a Senior Cloud Native Architect at VMware. And in this episode we're going to look at the architecture of Kubernetes. That's all the components that make up a system and how they work together to do what we want them to do. All right, let's dive in.
So in this first slide you can see the architecture of a cluster. We have three main groups that we care about here. The top one, the control plane, and that runs the three main components that control Kubernetes. The API server, the scheduler, and the controller manager. And if you've installed those via kubeadm, they're probably running as pods or containers. So we also need a kubelet on those nodes, as well. And we'll talk about what all of these components do later on in the video.
On the bottom left, we have our nodes. This is where your actual workloads are going to run. Again, they're running as pods. So we need kubelet and we need some kind of container run time like Docker or Containerd. And you can have any number of those connected to your cluster within reason. On the bottom right is our data or persistence layer. This is etcd, which is a distributed key value store. And we usually run three of those for high availability. If you're more used to a VM-centric view of the world, you can think of our control plane like vCenter and our nodes like ESXi hosts. That's where actual all the work happens.
So the first component that we want to look at is our API server. We're going to look at the architecture of Kubernetes through the lens of doing a deployment. So we're going to see how this all works. So the API server is stateless. You usually run three, one on each control plane node. And this is the main entry point to the cluster. So via kubectl or via any other toolings like client libraries in different languages, plugins, etc. And it has a number of different responsibilities.
Firstly, it's serves the Kubernetes API, obviously. It also does a little bit of validation on your resources that you deploy to it, so kubectl or client-side libraries also do client-side validation to make sure they're not just sending garbage to the API server, but the server does a little bit of validation, too. But then there's authentication to make sure we are who we say we are using one of the various authentication methods.
It'll then do authorization. Once you've authenticated, do I actually have the permission to do what I want to do? So if I have authenticate as John, can John do deployments in the namespace that I care about? It does some admission control, which is maybe doing some additional validation or mutation of the request before it persists it. It'll then do serialization of that request, so our deployments, going to serialize it into a particular format and then it's going to write that to etcd. So it's important to note that the API server is the only thing that talks to etcd. It does reads from etcd and does writes from etcd. So everything talks to etcd via the API server.
So the second piece of our puzzle is etcd. Like I said, these usually run on three separate nodes to the control plane. They could run collocated with a control plane in what we call a stacked configuration, but we like to run them separately because they have a slightly different backup, restore, and performance profile. Etcd is a distributed key value data store. Uses the Raft algorithm to do leader election, low replication. And that's really just a fancy way of saying it keeps all the data in sync between the three nodes. So if you lose one, you're still okay.
Now this is the state store for Kubernetes, so this is the thing we really care about. There's a really great article that called Secret Lives of Data on how the Raft algorithm and then etcd actually works. I'm going to put that link in the show notes.
So once we've deployed ... Once we'd done our kubectl deploy to our API server, the API server does it serialization, it writes that data to etcd. Now what actually happens? This is where the second part of our control plane comes in, the controller manager. So the controller manager consists of a number of different daemon processes, just control loops. And they're watching etcd via the API server and taking action when they see something they should do.
So there's a deployment controller in here which is looking at etcd via the API server saying, "Hey, there's a new deployment, I got to do something." Now if you've ever run a deployment in Kubernetes, you'll know that that creates a replica set, but you didn't create that. So why does that happen? Well the controller for deployments creates the replica set, writes that information back into etcd, and then there's a replica set controller which looks at that information via the API server, pulls it out, and then does something else. In this case probably create pods.
Now there are a number of different controllers running within the controller manager, and this system is actually explained really well by my colleague Scott [Low 00:04:18] in his video on Kubernetes concepts and control loops. So you should definitely check that out if you want more information. And if you want to see all the controllers that run in the controller manager, you can just head to the Kubernetes Kubernetes Package Controller directory on GitHub. And you can see all of them listed out. They're all in separate repositories. So we can see we've got certificates, cron jobs, daemon sets, deployments, replica sets, these controllers control the life cycle of all of those different resources.
Okay, so now we've written our replica set, we've written our pod information back into etcd. We still haven't actually done anything. So this is where the scheduler comes in. The third part of our control plane. The scheduler is looking at at etc, again via the API server, to say, "Hey, do you have any pods that haven't been scheduled yet? Do you have any pods that are waiting to go somewhere?" It will read that out and it'll say, "Hey, yeah, like I've got three pods, maybe that need to be scheduled." It'll take things into account like taints and tolerations where the pods should be scheduled, availability zones maybe, and then it'll write the name of a node into that pod and it will again write it back to etc. So it's not actually telling a node to do anything at this point. It's just changing information then writing things back into etc. So we can really see this flow of everything going to etcd via the API server reads and writes.
Now we actually need to pick something up and run it. And this is where our worker node components start to come in. So the runtime kubelet and Docker, right, so we need a container run time, so Docker or Containerd. There are other compatible CRI run times as well. That needs to be running on our worker node. And kubelet's talking back to the API server and saying, "Hey, I am node A," let's say, "What pods are running on, or what pods should be scheduled on node A?" It gets that information from etcd and then it compares that with the information it actually has running. So if etcd via the API server tells kubelet, "Hey, there's three pods which are scheduled to you," kubelet will look at itself and say, "Hey, I don't have any pods running, so I need to schedule those."
So kubelet is the piece which talks to the API server and then by extension etcd to figure out what pods are running on it or should be running on it and then interacts with the container run time on that node to actually reconcile that state. So you can kind of think of the kubelet a little bit as a control loop as well.
Now there's one other piece of the puzzle. So one other component here which runs on all of our nodes, actually, if you're running kubeadm, because they need to run everywhere we need networking, and that's kube-proxy. So kube-proxy is a piece which runs as a daemon set on all of our nodes. And all it's really responsible for doing is programming iptables, in most cases. Iptables is used to satisfy services in Kubernetes. And if you're interested more in services in more detail, you should check out my colleague Timmy [Car's 00:06:53] video on services. But kube-proxy really watches the Kubernetes API server, and by extension etcd, for new services and it program's iptables rules on all of the nodes so that pods can talk to each other via their IP addresses. And it will do some magic with NAT and conntrack and things like that. And that's all inside iptables. But kube-proxy is the component that's responsible for programming, and that's why it has to run on every single node.
So I hope this introduction was useful in explaining what all the pieces of the Kubernetes architecture are. Thanks for joining us and we hope to see you in another video.
Give Feedback
Help us improve by sharing your thoughts.