Manifold Blogcast 002: The Future of Kubernetes
Manifold Blogcast 002: The Future of Kubernetes
Jelmer: Yeah. So blogcast number two, it’s titled The Future of Kubernetes, and we got James with us, and me. Who’ll go first? James, maybe tell us a bit about yourself, what you do at Manifold, and stuff.
James: Okay. I am James. I am the technical lead at Manifold. Right now that means a lot of time developing applications on top of Kubernetes using the Kubernetes APIs. It also means writing a lot of documents, and reviewing documents, and reviewing pull requests, and doing ops work too when I have the chance. And yeah, hopefully coding more often than not, but usually not.
Jelmer: Cool. And just as an interest question for the audience, how long have you been working with Kubernetes software?
James: Since August of last year. Almost a year, I guess I’ve been playing around with it. Not too long in the grand scheme of things really, but in the life of Kubernetes, things move fast and things change quickly, so you can jump in and as long as you pay attention, you can catch up pretty fast.
Jelmer: Yeah. Definitely. I think the core’s been stable for quite a bit as well. I think that’s a pretty great feature about Kubernetes. There’s not much changing anymore, so yeah. I guess it’s a little bit the same with Go as well, and once you have that API, you can be pretty confident that you can go to the next version without much issues.
James: Yeah. You want to tell us about yourself, Jelmer? Or did you do that in the — ?
Jelmer: I did it in the last one, but I can do it over again. So I’m a tech lead at Manifold, more specifically for the OX squad, which is the Operations Experience squad, which focuses on Kubernetes and other operations, integrations, stuff like that. Another one is Terraform, for example, and I’ve been playing with Kubernetes for roughly the same time. I guess we started at around the same time when we started discussing these things. So, yeah, that’s my very brief introduction to all of this.
So yeah, this blogcast was aimed to speak about the future of Kubernetes, but we came up with some questions that also speak about the history and how we did things at Manifold, and yeah, we’ll just have a little chat about a few topics and just go over things and see how it goes from there.
So the first one that’s on the list is, Manifold migrated — I think it was January — from our legacy system on AWS to a Kubernetes solution, and we’ve got a few questions around that. And I think the first one is actually, that came from me, and I’m very interested in that. Would you have done something differently looking back at all the stuff we’ve done, is there anything you would have changed in that perspective?
James: So our actual cut over from the legacy infrastructure to Kubernetes was really fantastic. You can read more about that on our blog in the post that Jelmer wrote. But we were able to migrate without any downtime, so to run things concurrently and send some traffic to the new infrastructure, and then migrate everything over. So that part was fantastic. I wouldn’t change anything about that at all, and we were kind of hamstrung in our use of RDS and whatever the Amazon encryption thing, so moving providers was more difficult. At the time, maybe I would have pushed our AWS account manager a little more aggressively for access to EKS. Migrating to Kubernetes with zero downtime: the why and how We at Manifold always strive to get the most out of everything we do. For this reason, we continuously evaluate what…blog.manifold.co
Jelmer: I think we’d released, and then two weeks later they announced it, that it was becoming privately available, but yeah. I think yeah, the migration was actually pretty successful, I guess. Well, we didn’t have any downtime, and so far, we’ve only had one issue which was two-fold which was a memory leak, and then also something unfortunately with KOPS of not pinning down versions of images. So I think, yeah, in general, that migration was pretty smooth. I don’t know if you’ve got more to add on that, or — ?
James: Yeah. I’m just thinking about it now. We went from a system that was basically a single container per VM to higher density of however many containers would fit. We could have taken an intermediate step and increased the container density on some of our hosts first, and also looked at defining sort of some resource constraints, even just in Docker, before moving over.
Yeah. Sort of things to prepare us for the increased density. That said, it’s not like it was that large of a problem. It was only sort of one service that was having issues with its aggressive consumption of memory, or a leak.
Jelmer: Yeah. I think that the migration itself didn’t really lead to that because I think we were already aware of that memory leak before we actually moved to Kubernetes as well. We just chose to not really do much about it until it actually hurt us. So I think, yeah, the server density is actually quite interesting.
If you looked at our graphs or metrics from AWS, we chopped down so many servers, which is pretty interesting. Obviously, our servers are bigger now, but yeah, we cut down on a lot of overhead. Which also led to obviously the speed up of deployments, and I think that’s one of the main benefits that we got out of that.
James: Yeah. There’s other things too. Like when you increase instance sizes, if you stay within the same class, you sort of double your resources and you double the cost, so it’s a pretty fair trade-off, but once you get to some level and you jump to the next instance size in a class, then you go from sort of normal networking into enhanced networking, and then you’ve got actually a better network connection.
Effectively it works like a single interface that goes to your networking as well as your block storage, and then you switch to having a dedicated interface for block storage, so there’s sort of add-on benefits to increasing instance size that are pretty good.
So while before it was like we didn’t want to have massive instances for a single container, now we can sort of go up to wherever we want in that listing of instance sizes, get the extra goodness of networking, and not sort of waste the compute and memory resources.
Jelmer: Yeah. Definitely. Actually, I hadn’t thought about that yet because we get that, obviously. I think we started out on a low networking system, and now we’re on a medium I think. I don’t know how they name it, but yeah, that’s definitely something that you move towards, which is pretty interesting.
But I found really interesting, actually, about the whole migration, is the monitoring and how easy Datadog makes it because they have this whole prebuilt dashboard for Kubernetes which is actually pretty cool and you just install their daemonset in the cluster, and it just appears in your account. That’s pretty good.
James: Yeah. Yeah. It’ll be interesting to see how that, since we’re just starting to bring in Prometheus a bit, to see how we can connect up our Prometheus metrics with Datadog. I haven’t even looked at it. I’m sure they have an integration.
Jelmer: Actually, I looked into it a little bit because we are running StatsD. So I looked at, can we actually use StatsD to put that in Prometheus and stuff like that, and there are some connectors that do it, and I did see something mentioning Datadog as integration for that. So we can definitely do something with that. And yeah, instead of what we do now with StatsD of pulling it out of the application if we have all our proxies in place and stuff like that, that would be quite interesting to just use that instead of the custom bits. So yeah, that’s definitely something to look forward for.
James: Like the migration never ends, really. There’s always a — yeah.
Jelmer: Yeah. And I think that’s a big thing. I mentioned that in the blog post as well. Or not in the blog post, I think I actually mentioned in the previous blogcast. A lot of people don’t realize that. They go from their current infrastructure and what they currently have, and then they see Kubernetes, and then they expect that they have to go full on with everything in Kubernetes, and they have to have everything in place to have the perfect solution.
But what we did as well, we rolled out gradually. We didn’t start with resource limits, for example, we just added that sporadically on. And first thing we did is just make sure that we have our infrastructure mimicked from what we had before, and even by mimicking it, it already had a lot of improvements.
Like we had automatically, the DNS set up and the SSL certificates that we didn’t really have to think about anymore. So yeah, I think a lot of people don’t seem to realize that, that you don’t need to start out with everything in place yet. If you can just mimic what you have, it’s good enough to start with. So yeah, I think that was pretty interesting to see as well.
James: Yeah. For sure.
Jelmer: We touched on it a little bit earlier, but how excited are you about Amazon EKS, and do you think we should migrate or get that prioritized at one point?
James: I mean we’re done now. We already have our own control plane, so. No, I think having someone else handle this — comparing all of these sort of the manage control planes from the big cloud providers, there’s some differences and there are some limitations on what you can do. They sort of dictate the number of nodes that you can have in your cluster, sort of minimum amounts, and definitely the size. So you are required to have sort of some minimum instance size in the reduced selection of the instance types, but if it fits, then why wouldn’t we? If it means that we have to run and take care of three less systems, right? And usually you also don’t have to pay for them, so the reduction in cost is good.
James: And it may be different if this was some other kind of orchestration system, like if we were looking at Mesos and Marathon here, and your control plane for Mesos was managed, and you weren’t able to add in Chronos to do periodic jobs or something like that. Then that wouldn’t be great because you want to be able to actually use it how you want to use it. But as far as I can tell, you’ve got etcd, and API server, and the controller manager and scheduler going in the sort of managed control plane, but if you want to add in extra API service for API server aggregation, or just additional controllers that are listening, you can do that no problem. And you’re able to quite easily add on additional functionality into any of the managed providers. So it seems like as long as the pricing is right for the instances, there’s sort of all the benefits and no real downsides.
Jelmer: Yeah. Definitely. I actually haven’t looked at it. Is the pricing any different for EKS for the nodes than it is for regular nodes, or — ?
James: The prices are usually the same. I was able to see the pricing for it, I think back in February, but for your nodes, you pay the same price. You usually don’t pay for the control plane, but they limit the instance classes that you can use. So even though the nodes that we use now are, say, I think they’re medium level M4s — actually, they might be M4 large, so we might fit into that — but you wouldn’t be able to use the small node instances, even though they could quite happily run Kubernetes, run the kubelet.
Jelmer: Yeah. They just limit it. That kind of means that you can’t really do very small clusters, I guess. Like you would be able to do at KOPS and stuff like that. What you mentioned earlier with the whole extra controls and stuff like that, it’s obviously an interesting point, and it also brings us to another question on the list. I don’t know if it was that why you mentioned it, but the whole custom resources, and personally, I think that’s obviously one of the best things about Kubernetes so far for me personally. Because I’ve been working with custom resources since, well, January now as well, that’s the only thing I’ve been doing. And it is quite interesting, and the capabilities that you have with it are pretty amazing, and I know you wrote a blog post about that as well, what’s still missing and stuff like that. So maybe if you can go over that quickly in this one as well. I can bring it up if you want.
James: No. I’m good. Yeah. Custom resources and custom resource destinations are — it’s something that on the surface is, I remember when I first looked at them, I was like, “Well, why do you want this? Why would a REST API let you define a new type of object within that API? And even when you look at the documentation for it, it’s really just like, “Hey, I’m adding this new type to the API.’” End of story. What do you do with it, right? But it turns out that what you do with it is what you can do with any other resource in the system. So you can define any logic you want that sort of listens for the creation and modification and deletion of these custom resources. So that’s where we do things like, have one that is a composite of other resources and you can define one thing that’s tiny and have it sort of spit out a deployment, and a matching service, and network policies, and everything else, which is going to be pretty powerful. But you can also use it to do things like add configuration on how you might modify other resources in the system, like policies for the horizontal pod autoscaler are their own resources, right? I don’t think they’re custom resources, but they could be.
Jelmer: I don’t think they’re custom because I think the off scaler is built in, so it might be a built-in thing. Yeah. But yeah, I think that’s all pretty interesting, especially the modification as well. We’ve seen, and we’re experimenting what you can do with all of that. And yeah, just the fact that you can amazing tool that you can actually ship to different cloud providers and stuff like that, and then actually build your own custom logic, or business logic on top of that, I find very, very powerful. And yeah, we’ve seen some great implementations. The one I’m most excited about is the whole Let’s Encrypt cert-manager from Jetstack. The work they’ve been doing on that I find pretty amazing because it just enables so much people to just — they don’t have to think about it, and they can get SSL certificates in their cluster, which is pretty cool. Is there anything that you’re excited about within the whole CRD scope that you’re thinking like, “Ah. This is going to be cool”?
James: So the big thing that I’m most excited about, and it’s also still the largest piece that’s missing is an upgrade path for multiple versions. So in Kubernetes, there’s a very strong backwards and forwards compatibility guarantee for resources. So you can sort of ask for the V1beta1 version of a deployment, and it might be perhaps V1 in your cluster, and you’ll still be able to read it right now. So it’s nice there’s sort of mechanisms for translating between versions. So any clients consuming things, which would be like kube control as well as other controllers in the system, don’t need to all be updated in one go. It also means you can make containers and have them ship them around and not necessarily worry about the API that’s on the cluster it’s going in. So having that upgrade mechanism, or sort of translation layer for CRD users, somewhat long-standing, but also when it’s done it will be really awesome. Then you kind of don’t have to worry about, “Oh. What if I made a mistake in how I defined the CRD?” Well, it doesn’t matter. You can put in in the upgrade path later.
Jelmer: Yeah. That’s something I noticed with the work we’re doing now. It’s like, “Well, if we mess up the definition now, then we’ve got to change it all again.” Which for us isn’t really that big of a deal. The new things that we’re making are all just running development and stuff like that, so it’s like, no. That critical, if we need to wipe it and redo it, it’s fine, but yeah, that update class for critical bits is — yeah, it would be quite cool.
James: So with that, and there’s already some validation with open API schema, and then if you need complicated validation logic, you can use admission webhooks. Oh, and then there’s also something new that’s like still sort of on the development branch as a way to annotate fields in a CRD as things that should be shown when you do a kube kind of list. So that’s pretty awesome. The difference between a CRD and a sort of intrinsic resource is shrinking all the time, and you could really see a point where many of the resources that are defined within the Kubernetes core just become implemented as CRDs instead. So that you get to the point where the only actual resource defined in Kubernetes is the custom resource definition, and everything else is just done as custom resources.
Jelmer: Then it would actually be possible to say, “I’m just going to deploy a cluster that only knows about pods, and nothing else.” And then people can only deploy pods and nothing else anymore.
James: Yeah. It might be dangerous but conceptually it’s really neat.
Jelmer: Yeah. I don’t remember what exact — I think it might have been the auto. We were thinking about moving it into custom resource. I don’t want to definitely say that was the one, but I was reading some documentation, and they were considering it, moving one up there, the core bits already to custom research, or one of the alpha extension parts and moving that to a custom resource and I think it definitely makes sense. If you have something that gets developed by the core maintainers, and then they keep maintaining that, and then they see, “Oh, well, this custom resource bit needs some adjusting,” that will make it easier for everyone else as well. And I’m very excited about what you mention as well, that the listing bit where you can actually show values, yeah. That’s going to be pretty dope to see some more stated information on deployments, for example.
James: Yeah. Yeah. It’s really cool to have a way to influence kubectl without having to actually modify it itself, so you don’t have to worry about people installing plug-ins or something on every machine they might use.
Jelmer: Yeah. Definitely. Sorry, I’m just blanking out a bit. I see that you’ve also wrote down the API server aggregation for external APIs. I don’t know if you want to go through that a little bit?
James: Yeah. Yeah. Super related, right, the two paths for adding new resources into a cluster is either to use a custom resource definition which is stored within the API, so the main API server, or the API server, can also act as a proxy to other API servers with their resources under an API group. So like everything names based.
And I always looked at this and thought, “Wow. Okay. So now I’ve got to build a complete API server infrastructure with — “ Since you’re probably duplicating Kubernetes code, you’d have your own etcd cluster behind it. You’d kind of have to depend on having one because you may not have access to the control plane, like in the case of EKS, so then you’ve got to worry about what you want your availability guarantees to be, and durability of data, and everything. It sounded like a lot of work, but just looking at the metric server recently, right, which is a very small little server that only handles core metrics for pods and nodes, and it’s just done in memory, and it just sort of delegates to essentially scraping from the kubelets and their sort of metrics information.
And if you have something that is sort of on demand for viewing this data using an API server that’s then aggregated into the main API server, it’s kind of neat for that.
James: So you could see using that for something like, I don’t know. Just if you wanted to have on call information in your cluster for whatever reason, you could have an API server that is aggregated into the core one, and any time someone asks for some of these on-call resources, it can actually go out and talk to pay your duty and say, “Who’s currently in the schedule?” And then return their resources for that. Likewise, you could also have a CRD that defines that, and then have a controller that periodically synchronizes it in. But there’s some interesting use cases, I think, for just having a sort of a stateless or a sort of non-persistent API server that’s aggregated.
Jelmer: Yeah. Definitely. So I don’t know, is that something towards a virtual kubelet, or is that a totally different topic? Because I got a feeling it might be related, but I’m not sure.
James: No. Not really.
Jelmer: Okay. Nevermind then. Well, we can go into the virtual kubelet, everyone. Because I have no idea what it is, although I mentioned before.
James: Sure. Yeah. So if you look at like EKS, or AKS, or GKE, or any other TLA or KAS, you’ve got these managed systems that the control plane is — you can kind of assume what it will be.
It’s probably stock Kubernetes, and it’s probably running on three different hosts, right? And they’ve probably spread them across different availability zones. But it doesn’t have to be that way. What matters is the API that’s exposed to you for your control plane. You want to be able to register, you want to be able to list objects and watch changes, and you expect that the kubelets you’re running on your virtual machines will be able to connect and figure out what’s been scheduled for them and go like that, and you expect some parts of the control plane may poke into the kubelets themselves and see what’s going on.
But that could be implemented as — let’s say there’s three super beefy machines that actually appear to be separated per user, but there’s actually some kind of multi-tenancy going on that one of the cloud providers has written themselves. It’s not obviously in Kubernetes, but they’re using it to sort of increase the density that they have for users of it. Instead of having to spin up three instances for every customer. So you’ve effectively got a virtual control plane, or at least in most of these managed cases, they’d say it’s in an opaque control plane. You’re not really sure what’s in there.
Jelmer: Yeah. Okay. Interesting.
James: So if you tie to that a virtual kubelet in that case, you’re probably setting it up so you kind of know what’s going on behind the scenes, but this is like a process that runs somewhere, and connects to your control plane, and says, “Hey. I’m a kubelet, I’m a node here, and I’ve got infinite resources. So just schedule stuff to me, and I’m probably always going to be able to run it.” And then it sees when pods are scheduled to it, and for writing one of these there’s some ridiculous small number of API endpoints you have to implement. Like five or less, or something.
And you just need to say, “All right. So I’ve got a pod here. Now I’ll go and actually execute it.” And then there’s some methods for shutting it down or getting status on it, but the implementation of this might have it go and run on I think sort of the two starting ones for Hyper.sh and Azure’s sort of control container thing, I always forget the name of it, but basically you can just run containers in this sort of big amorphous blob of compute resources.
And so hooking up a virtual kubelet into Kubernetes means that you don’t really know where or what your control plane is, but you have an API that you can access. And then you don’t really know what or where your nodes are, you’ve just got one big pool of resources available. So you put this together and you’ve just got an API for running whatever you want and that API matches Kubernetes’ API, but the implementation underneath it sort of doesn’t matter and could be infinitely scalable.
So its really neat. It’s also in some cases, you’d probably want to actually know where things are running so that you can have good data locality. But maybe you don’t care and it’s helpful.
Jelmer: Yeah. It sounds very interesting for on the things, or serverless, and stuff like that, but I don’t know. I thought that Amazon had something similar to Lambda, but more for containers. Forgot the name, maybe like —
Jelmer: Yeah. That one. Sounds pretty interesting for things like that as well. Yeah. I’ll definitely have to look further in it. Is there anything else you want to talk about?
James: I would like to hear your thoughts on federation, Jelmer.
Jelmer: Oh, do you? So yeah, I think federation is actually quite an interesting one. Yeah. It’s been tried before, there was this whole topic about federation, I think a year and a half ago that I started noticing things about federation. These things really interest me, like how do you get things from one cluster to another? Or how do you go from one cloud provider to another, right? I think that’s the dream, and especially what we do at Manifold as well, like the marketplace app we run, is become cloud independent. And at the moment if you host yourself at Amazon, you’re still kind of in the Amazon environment. Obviously, if you have Kubernetes, well, you take that whole infrastructure away, right. You don’t have to think about EC2 instances anymore at the Kubernetes level. You still have to think about it when you set up Kubernetes, but what if you can all take that away as well and actually just move over from one cloud to the other. For example, if there’s an outage on Amazon — I think we saw an issue like a year and a half ago as well, Amazon had issues with S3 and a lot of people had issues with that. But what if you take everything that you have then and just run it on another cloud, you don’t have issues. And that just takes high availability from multi-region to multi-provider basically, which is pretty interesting.
Jelmer: So, yeah. I’m very interested in that, and yeah, it’s just there’s so much work to do it. And even for us when we did the migration, we started looking at moving to another provider as well, and then well, we start noticing — I think our biggest blocker was basically a database, and also related to that is — or KMS, or encryption layer, and how can we tackle that?
I think we did a good job with how we set up our encryption layers and stuff like that, where we still have to scale that across different providers, and I think for people that’s going to be the biggest blocker. Plus how do you do load balancing across different providers, as well, right? I know that Google they recently announced an interesting project as well, where they load it across multiple regions. I forgot what it was called. I’ll try and find it to link it in the post, but yeah, it’s things like that. How will you do that across Azure, AWS, or DigitalOcean, stuff like that? Yeah. I think that will be interesting to see how this gets solved. So yeah.
James: I saw, I believe it was a talk that was — I mean it doesn’t strictly handle multi-cluster federation, but their recommendations for doing multi-cloud was to run a one year control plane in one cloud and actually connect nodes from another cloud, another cloud provider, completely different region to that control plane. And then you could set up paints on your nodes to actually say like, “This one is on this provider and it’s in this region.” It’s certainly an interesting approach, and if you lose connection to the control plane, these nodes are going to just keep going, doing what they were doing and running their workloads. But it does still feel a little frightening.
Jelmer: Yeah. Definitely. It’s that thing of I mean high availability with just how much effort do you want to put into it? And I think for most people, you don’t need to put that much effort into it. I think a lot of people overestimate how important it is to have X amount of nines. But if you are at that scale, and if you have to reach that, then it does feel scary of like, “Oh, we’ll just put that control plane in one region, or one provider, and then just hope that everything else is fine.” Yeah. That still sounds scary, but it’s the way forward, I guess, and it’s somewhere to start.
James: I wonder if you could sort of re-parent nodes. That would be interesting. So you could start by just bringing up nodes in a new provider and then, later on, bring in their own control plane, and just sort of adopt the nodes.
Jelmer: Yeah. Basically, move the masters from one provider to another basically.
James: Well, still keep two control planes, just say that you had control plane here with some nodes, and then nodes over here, and you brought up a new control plane and put the nodes onto that one.
Jelmer: Oh, yeah. Like that. Huh. Interesting.
James: Probably a lot of work for very little benefit.
Jelmer: Yeah. Well, that’s again the question I think you mentioned at the beginning is you have to think about all these things and if you want to go to that scale it just becomes harder and harder to solve. So yeah. It’s all very exciting though, and I think we’ll see a lot more exciting stuff in the future. I don’t know if there was anything else that you wanted to touch on.
James: I don’t think so.
Jelmer: I think we covered most of what we wanted to talk about anyway.
Jelmer: Okay. Cool. Well, thank you for your time, and thanks for the people listening and reading the blog post as well, and hopefully, we can do this more.
Jelmer: All right.
James: All right.
James: Bye, Jelmer.