KubeAcademy by VMware
Logging in Kubernetes
Next Lesson

In this lesson, you will learn about logging in a Kubernetes cluster. What kind of data should I collect? What tools are available? How are applications expected to emit log data in a Kubernetes cluster?

Hart Hoover

Senior Field Engineer at Kong

Hart Hoover is a Senior Field Engineer at Kong. His expertise lies in technical training, consulting, community building, Linux-based operating systems, computing automation, and cloud application architecture.

View Profile

Hello. My name is Hart Hoover, Manager of Kubernetes Education at VMware and in this course I'm going to go through some logging strategies in Kubernetes and discuss options you have for aggregating your logs.

As a review, containers should be logging to standard out and standard error. Every note in a Kubernetes cluster keeps these logs in /var/log containers, and these logs are what you read when you use the kubectl logs command. Given that containers should be logging here, what strategies can we use to collect and aggregate your logs?

Option one is your applications in Kubernetes, all ship logs directly to some logging backend. Whether that backend is be realized log insight, elastic search, Splunk, or some managed log service. Basically the logic for connecting and shipping logs is in every application. Option two utilizes what's known as a sidecar container pattern. The name coming from a sidecar connected to a motorcycle. Team ship a container image that is included with every Kubernetes deployment that contains logic to connect to the logging backend. The sidecar shares some type of connection to the main application container and only handles logging data.

Option three is the node agent pattern. As container logs are collected on hosts anyway, in VAR log containers, a Kubernetes DaemonSet, or a log collector agent collects data from that directory on every node. The logging agent then manages connections to the logging backend. This is ideal as it decouples logging logic from your main applications as long as those applications are configured to log to standard out and standard error.

If your applications use different logging strategies and some do not log to standard out or standard error, you can combine a sidecar container and a node agent. The sidecar for these applications, streams logs from the application and writes it to standard out in var log containers, where the node agent is looking for data. This is a good strategy if your applications are not yet writing to standard locations, but you want to migrate them to Kubernetes

Of your four options, the best answer of course is it depends. The ideal solution is to have all of your containers writing the standard out and standard error with a node agent collecting and shipping logs so that you have that logic decoupled from your applications. However, that might not fit with the current state of your applications as you migrate them to Kubernetes.

So given that we know the ideal strategy for aggregating logs, what tooling can we use? First, it's no secret that logs for an application are no longer contained the application itself. You may depend on many outside cloud services, infrastructure platforms. You may have legacy applications you don't plan to touch right away as you migrate to Kubernetes or applications running elsewhere. You may even talk to social media platforms. Fluentd was written as a log aggregation tool to solve this problem. It's a CNCF project written and a combination of C and Ruby and uses JSON to unify log communication. It has a very large community of plugins and AR that are installed with Fluentd as Ruby gems and can be used for log aggregation as well as log data transformation. Fluentd provides a middleware for your services no matter where they are to handle log aggregation and use a common interface for log data.

You may have also heard of Fluentbit, which is designed as a lightweight forwarder from edge environments. Fluentbit and Fluentd while sharing similar names are a bit different. Fluentbit is not a CNCF project, but it is open source and completely written in C. Fluentbit provides the same unified logging with JSON, but has a tighter focus. Both Fluentd and Fluentbit can communicate with the Kubernetes API to add metadata to logs, such as the node it came from or namespace or pod.

Since Fluentbit is smaller than Fluentd it's much better suited to running in Kubernetes as a DaemonSet, running on every node as that logging agent. Again, our containers are logging to standard out and standard error in VAR log containers, Fluentbit watches this path on every host and forwards the logs to Fluentd for any transformation of the data we wish to perform.

In this example, we're using ElasticSearch for long-term storage and Kibana for visualization. In this example, we're using ElasticSearch for long-term storage and Kibana for visualization. You can also use node agents like Fluentd or Fluentbit to forward logs to manage services. In this example, Fluentbit is still used to as a collection mechanism and Fluentd is used for transformation before shipping the data to vRealize Log Insight. For some data services, and if you require no transformation at all, Fluentbit can be used to ship directly to outside services without passing through Fluentd first, reducing complexity of your logging infrastructure.

Once your log data is aggregated, you can start to visualize your log data and get context from it. Structure queries around a particular node or Kubernetes namespace, or even a Kubernetes pot. You can also use Kubernetes tags to query data across services for a complete view of an application. Hopefully you've learned about different log aggregation strategies and some tools that can get your teams visibility into your log data. In the next video, we'll be showing a demo of Fluentbit, ElasticSearch and Kibana to show how this data is collected and can be queried. Thank you.

Give Feedback

Help us improve by sharing your thoughts.

Links

Share