Having a solution for backing up and restoring objects in your Kubernetes cluster is important for resilience. In this lesson we look at using the popular OSS tool Velero to do just that.
Hi! My name is John and I'm a Senior Cloud Native Architect @ VMware where I work with customers to help design and implement Kubernetes solutions, in addition to contributing back to upstream open source projects.View Profile
Hey everyone, welcome to Kube Academy. My name is John Harris and I’m a senior cloud native architect of VMware. In today’s episode we’re going to look at backup and restore in your Kubernetes cluster. Hopefully you’re already kind of familiar with the backing up and restoring things and where that might be useful. You know, we may have a case where our application goes down or a cluster gets destroyed for some reason or some things get corrupted. You might want to have disaster recovery, right? We want to take a backup of that stuff. And then we want to be able to store it in another cluster. We also might want to do something like cluster migration. If you want to move a cluster from one to another. If we’ve got like a production cluster and we want you to take a copy of that, move it to like a staging cluster so we can do some testing.
And you might also have regulatory reasons why you need to keep a lot of backups for a specific amount of time, certain applications or certain clusters. If you’ve watched some of these videos in Kubernetes Academy, especially the architecture and the layout of Kubernetes, you might already know that all of Kubernetes state is stored in etcd. And that’s the stateful store that kind of backs the API server. Back in the long ago, what we really wanted to do was backup etcd because we know that everything stateful lives there. So we can go into our etcd node and we run a snapshot of the etcd data store and then we could keep that somewhere. We could replicate it, we can make sure it’s safe. And then if we ever wanted to do a restore or migration, what we could do is spin up a new cluster, go back into our etcd node and restore from that snapshot backup.
That’s still fine and kind of a valid way of backing up Kubernetes. There’s a couple of issues with it though. The first one is it’s kind of tricky to operate etcd and there’s a couple of tricky steps to take that snapshot and restore it properly. And so you don’t need to know what you’re doing, which is fine. But one of the bigger ones is that now there’s a big rise in managed clusters, right? So GKE, EKS AKS, they just don’t allow you to go in and fiddle around with etcd, right? The data store for Kubernetes is abstracted behind their cloud provider interface. You don’t have access to tweak API server flags, things like that. So you just sort of access to etcd, to take a backup of it. Today we’re going to talk about a different tool, which is Velero, used to be called Ark, a tool which originally came out of Heptio is now developed by VMware.
It’s free and open source. Anyone can download it. Anyone can use it for free. Anyone can contribute. It’s on GitHub. I’ll put the links to Velero and the documentation kind of like down in the video description afterwards. What Velero can do is it can take backups of everything in your Kubernetes cluster, but it doesn’t talk directly directly etcd. It talks through the API server. So it actually pulls out all of those Kubernetes native objects. You can do partial backups, you can tell it to only pick things in a particular namespace or only pick things with a particular label or set of labels. You can be quite granular about it. Obviously it also allows you to then work in managed clusters because it’s just talking to the API server, like everything else, it doesn’t need access etcd. It works great on things like UKS and GKE.
We all can also backup things at a different cadence. Right? If one application needs to be backed up every night or another application needs to backed up every month, we can set different backup objects to pull things in different places. We can store them in different areas. It’s pretty flexible. Another thing Velero can do is it can back up your persistent volumes, right? Let’s say I’ve got an application running in Amazon and I’ve got a persistent volume and EBS, what Velero can do is it’ll go and look at that EBS volume and it’ll take a snapshot of that and back that up for you as well. It has some native cloud provider integrations. If there isn’t a native provided integration for the cloud you’re in or the thing that you want to back up, they use a tool called Restic, an open source backup kind of framework to back up those volumes as well.
So it’s pretty flexible. And so we’re going to dive in and look at this cluster I have running right here. I have Velero already installed, so I can do a get all in my Velero namespace. We can see I’ve a Velero deployment. It’s an operator that just runs in the cluster and it sets up a few CRDs. One of those is backups. I can do a get backups, it means I haven’t submitted any backups yet. What I’m going to do is apply a new application to this cluster. I’m going to use the QRD application Kubernetes up and running, and we came out of the book. We can see it’s just two replicators here and I use this label app equals QRD. I’m just going to apply this to my cluster first. I’m going to create a new namespace called my app, and then I need to, okay, apply.
And I have a Kube CTL alias decay here. RD, and I’m going to put it in the namespace, my app. We apply that and then we can just get deployments and at my app name space, we can see that’s up and running, great. Now what I’m going to do is tell Velero to create a backup of everything with that app equals QRD label. Let’s do a Velero backup create. Here we go. What this space is doing, Velero also comes as a CLI, which is going to talk … It’s just going to create the CRD objects and submit them to the cluster using your Kube config under the hood. You could do this all with Kubernetes native and Kube CTL, but Velero is kind of like a convenience CLI to do some of those things. Here we’re going to do create a backup. A QRD Backup and use the select to app equals QRD.
If I put an O yaml at the end of this, we can see as such, just going to print out what it submits rather than me actually doing it. We can see it’s got a TTL in there. How long it’s going to keep the backup data for. We can see it’s going to include name spaces all. It’s super interesting, so right. Let’s go ahead and submit that. Now it’s telling us we can look at the logs for this backups. Let’s do a logs for this, and we see these are all the logs of what’s going around. It’s finding everything in that selector. It’s pulling it all down so it’s kind of difficult to read. But now if I do a get backups, I can see that there was a backup pull QRD, your backup and the age is 19 seconds. When I set Velero up, I tell it to use an S3 bucket as a place for it to keep all of my backups. I can use the S3 CLI. I’m just going to look in my dual listing of my Velero testing bucket. I could see in there, I’ve got a folder called backups.
So if I take a look inside there, we can see my QRD backup? We’ll take analysis of that. And we can see this QRD backup tar.GZ in here. This is a backup of all the resources that were in that selector. The app equals QRD. What we’re going to do now is delete the namespace my app. I’m going to do that now actually, because it takes a while, with the finalizes. Right. That’s deleting. Now what I’m going to do is delete that namespace and then I’m going to use Velero to do a restore of that backup object back into my cluster. My QRD pods are gone, replica set deployment, everything that was in that yaml has gone, the entire namespace, my apps. Now if I do a get namespaces, like my app is completely gone, right? What I need to do is do a restore. I’m going to do a Velero restore.
Yeah, let’s do that so we can see it. Okay. I’m going to do, tell Velero, “Hey, do a new restore and create it from this backup QRD backup.” It’s going to look and find that, it’s going to pull it out of S3 and it’s going to restore those things as well. So I hit that. We can see that Velero has submitted that request. Again, Velero works like anything else in Kubernetes. It’s a declarative system, it just submits a request object and then the operator is going to go look at those objects, same as backups and then go do some action. Again, I could look at the logs. I could also describe the restore here, so let’s do that.
Oh my goodness. There we go. We can see phase completed. Okay, so this QRD, there’s not much in it, right? I can do a get restore. We can see there’s one QRD backup. Now if I do a K get namespaces we can see that my app is back and if I do a get deploy and then my app namespace, QRD is back up and available 39 seconds, so restored by Velero. This is just a really brief introduction into backing up and restoring in your Kubernetes cluster, using Velero. Lots more things you can do. Very granular. And like I said before, we’ll put the links to the documentation and the tool, which is like free and open source. Anyone can use it down in below the video. I hope you enjoyed this. Hope it was useful. Let us know your feedback and we’ll see you in the next video.
Have questions about the material in this lesson?
We’ve got answers!
Post your questions in the Kubernetes community Slack. Questions about this lesson are best suited for the #velero channel.
Not yet a part of the Kubernetes Slack community? Join the discussion here.
Have feedback about this course or lesson? We want to hear it!
Send your thoughts to KubeAcademy@VMware.com.
Bootstrapping a Cluster with kubeadm
Bootstrapping Using Cluster API Part 1: Concepts, Components, and Terminology
Bootstrapping Using Cluster API Part 2: Creating a Cluster on AWS with Cluster API
Authenticating Kubernetes Clusters
Kubernetes RBAC Concepts