KubeAcademy by VMware
Demo: Prometheus & Grafana
Next Lesson

In this lesson we will show you how to deploy a simple Prometheus stack on a Kubernetes cluster using the Prometheus Operator, and show you how to visualize your metrics using Grafana.

Hart Hoover

Senior Field Engineer at Kong

Hart Hoover is a Senior Field Engineer at Kong. His expertise lies in technical training, consulting, community building, Linux-based operating systems, computing automation, and cloud application architecture.

View Profile

Hello. My name is Hart Hoover, manager of Kubernetes Education at VMware. And in this video, we're going to be looking at deploying the Prometheus stack inside of a Kubernetes cluster. So let's take a look at all the components that we need to deploy a Prometheus stack inside of our Kubernetes cluster. First, we're going to be using a tool called Q Prometheus, it is a open-source project as part of the Prometheus operator that kind of gives you a good starting point of things you'll need to deploy Prometheus in your cluster.

It comes with a Prometheus operator itself, highly available Prometheus, highly-available alertmanager, a Prometheus node-exporter to get metrics out of your nodes in the cluster and adaptor for the Kubernetes metrics API so you can alert and a Prometheus adaptor for the Kubernetes metrics API, which allows you to query custom metrics out of your applications and kube-state-metrics, which gets metrics about objects in our clusters like pods, deployments, and replica sets. And finally, Grafana itself which is a dashboarding tool that allows us to graph all of our Prometheus data.

What's great about Q Prometheus is it gives you all of this kind of out of the box and an easy way to get started. You would probably want to customize these things if you were going to deploy them in production, but they come with this great manifest directory with all of the AML that deploys the entire stack. So just as a way to kind of get started, you can apply everything in this directory into your cluster and it will deploy to Kubernetes. So let's take a look at I've applied all this stuff outside of role-based access, what do I get here?

So I'm running a Kubernetes cluster here. So I'm running a Kubernetes cluster here and let's see what we've got. I've applied all the things in a monitoring namespace. So here we can see that we've got two stateful sets, one for Prometheus and one for alertmanager. Remember, Prometheus stores its data in a persistent storage so it's managed by a staple set to keep it the data. So as a reminder, Prometheus stores it's time series database in persistent storage. So if Prometheus itself has restarted the data is not lost. And the same with alertmanager, alertmanager keeps its data in persistent storage as well. We have several replica sets here that's managed by the deployment controllers. So we've got Grafana, the graphing tool, I talked about a second ago, kube-state-metrics, the Prometheus metrics adapter for the metrics API and then finally the Prometheus operator itself, which provides Prometheus as a service to my cluster.

And then as part of those replica sets and deployments they've deployed several pods. So again, like it says on the label, I have a highly-available alertmanager, a highly-available Prometheus and then I have a node-exporter for each node in my cluster. This is a three-node cluster, so I have three of those, kube-state-metrics, Prometheus adapter, and finally the operator. And then I have services that are exposing those outward, right? So I can access them remotely.

I'm also using port-forwarding from my local terminal to bring up these services in a browser. So now that we've seen what's deployed in the cluster using Q Prometheus, let's take a look at what it looks like from a web browser. So first this is a Prometheus itself, kind of, not much to look at when you first look at it, but the power is in this expression box here where you can execute your prompt QL queries to get data out of Prometheus.

So just to look at all of the metrics that are available to me, there are quite a lot. I'm going to look at something with node memory, let's do active bytes. So here it will show you all three nodes that I have running. It's running from the node-exporter, that's where it's getting this data from. As you can see, these are labels associated with the query and then a value that will then be graphed over time, right? So you can build your prompt QL queries from putting these things together. You can also pull up a graph of the query over time, what's been going on in Prometheus, which is very nice.

Also in Prometheus, you can get a list of the rules that come out of the box with Q Prometheus. So you can see how these prompt QL queries are structured based on some examples. So you can see some of these, for example and when they were last evaluated? How long it took to evaluate? And then you can also see in Prometheus any active alerts. So I have several alerts configured, and this will show up in alertmanager as well. I will always have this watchdog alert firing because this is basically testing to make sure that the alert pipeline is functional. So this alert if it's firing, I can test that my alert message will get to Slack or a web hook or an email before removing it from the system if I wanted to.

Clicking over to alertmanager, it mostly deals with alerts, right? And filtering and sending those alerts to different services that you've configured. Again, you can see the watchdog alert is firing and it has no severity because it's not a real alert just an alert to test your alerting system. And finally, we have Grafana. So I have a dashboard here configured that shows metrics about my Kubernetes cluster. It shows all of the CPU and memory use in the cluster. It keeps track of my deployment replicas, how many are unavailable? Should I need to check a deployment? All of my deployments are listed here. It also shows some information about nodes. So if one of my nodes was down, this box will be red and it will show me a node that was down.

Also, if I am deploying things a lot to this cluster, this would show me how many pods are running? How many pods are failed? In a failed state or an appending state? It also shows me the containers in those pods and how many are waiting? Terminated? How much memory they're using requests on CPU and memory? And then finally at the bottom, you can see a section for jobs, I'm not running any jobs in this cluster so those are not applicable. But Grafana has a lot of great dashboards that you can build either by hand if you want to handcraft your dashboards or you can use one from the Grafana community, which is where this one is from.

You can also filter this information by namespace, which is really neat. So this is just going to show me everything in the monitoring names base. I'll go back to auth everything. I can also filter by note if I wanted to. So this has been a quick look at a Prometheus stack deployed in the cluster. Just to reminder, if you wanted to get started with this really quickly, a great way to do that is to use the Q Prometheus project and start from there and kind of build out your monitoring stack as you need. Thank you.

Give Feedback

Help us improve by sharing your thoughts.

Links

Share