Over the last seven years, containers and Kubernetes have gone from hipster technology to core pieces of enterprise digital transformation.
But this revolution is incomplete and it’s likely that even more change will come in the next seven as the previous.
We’ll take a look at where we are and where we might be going in the cloud-native journey.
Hi there, I’m Brendan Burns, corporate Vice President and Microsoft Azure and co founder of the Kubernetes open source project. And I’m here today to talk to you about prognostications about the future of containers, I’m really excited to have this opportunity to give you my perspective about where we’re headed in the world of Kubernetes and containers.
Well, you know, first I wanted to start out with a warning, you know, these are really just going to be my opinions, I don’t want them to necessarily become things that you run out and do immediately, or things you start planning on having, you know, in the next couple of years, but I wanted to give you a sense of where I think the world needs to go and containers, and also, perhaps to inspire you a little bit to think about, you know, your own ideas about where containers should go in the next 5-10 years. Because over the last eight years, we’ve seen a real revolution in DevOps containers in Kubernetes. But it’s an incomplete one, it really hasn’t achieved everything that I think is possible in the world of DevOps.
I wanted to start with just taking a look at what Kubernetes is today. And fundamentally, what Kubernetes is today is what it was at the very beginning. Sure, we’ve added a ton of API’s, we’ve added, you know, many new cloud providers, many new contributors over the years. But fundamentally, Kubernetes was designed to be an application oriented API that lived on top of machines, whether they were virtual or physical machines that was intended to abstract you from your idea of a machine and give you an API that was useful in terms of defining your application and the characteristics that a developer or an operator of an application thought about. But in reality, what we’re starting to see here is that the idea of a machine is actually perhaps not something that people want to think about. And in fact, what we’ve already seen is people starting to integrate things like serverless containers with Kubernetes. So today, if you run a cluster in the Azure Kubernetes service, for example, but it’s there’s similar capabilities in many other cloud providers, you have the ability to run your containers, not just on machines, virtual machines, in this particular case, but also on serverless container infrastructure like Azure Container instances. And when I say serverless, container infrastructure, what I mean by that is, all you do is you hand us a container, and it runs with a CPU and with memory. But without any core operating system underneath it.
Of course, there actually is an operating system. But it’s transparent to you it’s not something that you think about. And I think that this is the first part of thinking about the future of Kubernetes, which is that the truth is that I believe the future of Kubernetes is entirely serverless.
Those machines that we talked about abstracting you from those the idea that there’s this operating system in a particular configuration, and there’s a collection of them brought together by the Kubernetes API. That’s an old idea. It’s really not addressing the the main value that most people find in Kubernetes, which are the higher level API’s like Deployment Services, ingress and beyond. And more importantly, it doesn’t interface well with where the cloud wants to go, either.
Because the cloud is thinking about how can I, you know, maximize the usage of the resources that I’ve put into the data center and expose them to our customers and the most useful primitives possible. And what we’re seeing more and more is that the the useful primitive that you can expose to your customer is a container and it is a serverless container? Right. And I think one of the things that mentioned here is that the whole idea of serverless gets sort of conflated with programming models, like Functions as a Service, I think it’s important to note that when we say serverless, what we really just mean is there’s no virtual machine there.
Right? We’re not saying, hey, it’s all gonna be Functions as a Service, or, you know, HTTP web pass, we’re not really saying anything about, you know, an application development pattern, that’s traditionally the domain of a pass.
What we’re really saying is serverless infrastructure, it looks and feels at some level, like the kind of infrastructure you expect with virtual machines. But it’s a container instead, it’s that application oriented primitive, that Kubernetes was trying to deliver from the beginning. But there’s a lot of really open questions. As we think about how Kubernetes lives on in this serverless future.
There’s a whole bunch of really important questions that need to be reasoned about, I think in order to start thinking about them, it’s important to think about why is serverless containers under Kubernetes? Such an attractive thing to an end user? Well, I think when you think about the VM, you know, the real question that comes to mind is in a cloud native world in an open, you know, in a net new world of developing using containers.
Why are you even interested in the VM in the first place, there’s not a whole lot that the VM brings to the table, other than the CPU and the memory that it supplies into the Kubernetes cluster.
The fact that you know, you’ve packaged up four cores and 10 gig gigabytes of RAM or, you know, 100 cores and a terabyte of ram into one little box and then effectively donated that box to Kubernetes. It’s pretty arbitrary. Right? Why you know if Kubernetes is providing this abstract API that spans on top of these virtual machines already, like, why are we even bothering modeling those machines, and it has a lot to do with the legacy of where communities came from, rather than I think thinking about the future of where Kubernetes ought to be, because the thing about that virtual machine is, there’s all kinds of concerns that it brings in, in particular around things like security and compliance, right. So if you have a virtual machine sitting in your cloud, or sitting in your data center somewhere, well, there’s a whole compliance regime associated with keeping that operating system patched, making sure there’s no vulnerable software.
In many cases, the software that you’re updating isn’t even software that you care about for your application, you’re updating SSH for somebody to be able to log into this machine. That’s not something your application cares about your application cares about the stuff that’s inside the container. And obviously, you need to keep that stuff very, very secure and compliant. But all of the work that you’re doing in order to ensure the security and the compliance of the operating system on a machine, it’s effectively wasted effort in and cloud native application world.
In addition to security and compliance. Another thing that the VM brings in is reliability, right? Like that operating system can crash, system D can go down, the kernel can panic, there’s all sorts of different things that can happen inside the operating system itself. A daemon can go out of control, and you know, spin up and use all of the CPU or run the operating system out of memory. Kubernetes tries to protect you a little bit from this sort of stuff. And of course, it enables recovery by moving your pods between machines if it does happen. But the very fact that you have the machine there introduces reliability problems that really, again, are not something you care about, they actually just make your life worse, right. And so in addition to the security and compliance problems that a machine brings in a machine brings in reliability problems to that to that cluster as well.
Obviously, the cloud makes it easier, obviously, communities makes it easier, but the easiest thing would be to just remove the machines entirely.
Finally, the machine actually adds cost. And the reason to think about it this way is for actually two reasons.
One is that no matter how good a job Kubernetes does packing containers onto your machines, there’s always going to be space leftover, either the space is going to be leftover because of internal fragmentation, you sort of cut up each of the machines, and there’s a little bit leftover that you can’t use for any particular container, well, you’re still paying for that, even though you’re not using it. And so obviously, that increases your costs. But additionally, you generally are going to keep you know, it takes a little bit of time to spin up a virtual machine. And you’re going to want to keep a little bit of spare capacity on the side, even if you turn on auto scaling in order to be able to respond to you know, scale events. And so again, there’s this extra capacity, either in internal fragmentation space that you can use, or you know, buffer capacity that you’re keeping in order to scale that adds cost without adding any value. And so for all of those reasons, the fact that there are virtual machines, underlying communities, really is something that, you know, doesn’t provide much end user value at the end of the day. But so if we believe and you believe, hopefully by this point that having the machines under Kubernetes is not really something that’s particularly useful.
A really interesting and important question is how do we get to that new future? And the reason for this is because the truth of the matter is that at the code level, and and at the lowest levels of the Kubernetes, API Kubernetes itself, it’s actually quite addicted to machines. Right? It was built, you know, at the bottom layers thinking of itself as a machine manager. And so when you think about something as simple as how do I get the logs for a particular pod?
Well, the code actually sort of says, Oh, where’s that pod running? Okay, I’ll go to that machine, I’ll find the logs on that machine, I’ll return the logs for that pod.
From an end user perspective, you’ve just asked for the logs for a particular container. By from Kubernetes is perspective, it’s used its knowledge of where that container is running in which the particularities of the particular machine in order to provide those logs to you. And so the evolution of Kubernetes to a serverless future is an evolution that is going to have to take time and with a great deal of care. And I think what’s more important is that, you know, there is a certain amount of the the community or the community at is a big broad community that is running production workloads today.
Some of them are running on bare metal where you know, this beautiful serverless container thing is not really feasible, or Kubernetes is the thing that supplies serverless containers. And so as we take things forward, it’s going to be a real community evolution, to see how we can get from where we are to where we need to go. So there’s some definitely some big issues, some big rocks that we need to be reasoning about, you know, in order to get to a world from where we have VMs, and something like Azure Container instances underneath Kubernetes, to a world where, you know, we need to be thinking about just having serverless, underneath Kubernetes. And the first thing to be thinking about about how we do this integration, is thinking about failures.
Right. So when Kubernetes thinks about spreading your container around to multiple machines, it’s thinking about each machine as a unit of failure, it’s thinking, because of those reliability problems that I talked about earlier, the machine is a unit of failure and Kubernetes, in order to try and deliver reliability for you is going to spread things out. So it has this notion of spreading containers inside of it Scheduler. ] But when we moved to a world of serverless, containers, it becomes something that the cloud worries about, you know, in a world of serverless containers, machines don’t fail because there are no machines. Right? The machine has been hidden from you by the abstraction of the serverless container. And suddenly Kubernetes, which has previously thought a lot about spreading containers across a bunch of nodes, you know, no longer has to think about spreading, and we need to teach communities how to, you know, sort of Remember to forget effectively how to forget that spreading was something that it needed to worry about. But of course, there are actually failure domains inside the serverless infrastructure.
At the end of the day, there are physical machines, underlying serverless containers, they can fail. And so it is important actually, for spreading to occur somewhere. And it is also important for Kubernetes, to be able to reason about the reliability of the applications. So the exact interplay between a cloud providing perhaps failure domains or upgrade domains, where scheduled maintenance will occur up to Kubernetes. So that Kubernetes can then request certain characteristics of scheduling down to the serverless container infrastructure. And how that interplay works out.
That’s still a very open question. And an important area that we’re going to need to reason about as we think about moving Kubernetes towards serverless. I think another interesting thing is that, you know, a node in Kubernetes represents a unit of capacity. And so you think about auto scaling your cluster by adding nodes or removing nodes to add capacity or remove capacity. But in the serverless container world, like as your container instances, the capacity is limitless, and you only pay for what you use. And so suddenly, the notion of auto scaling the cluster becomes irrelevant.
The cluster is exactly the size that it needs to be. But again, Kubernetes has thought a lot about capacity. And it’s thought a lot about each individual machine supplying a certain number of cores and a certain amount of memory.
It hasn’t thought a lot about something where there’s an infinite capacity underlying it. So as we think about integrating serverless containers into communities, we have to be thinking about how we again, enabled terminators to forget about something like capacity.
The truth is that, you know, the old statement that the future is here, it’s just not evenly distributed is true in this case, as well. And in fact, this future is being explored in the virtual cubelet project, the virtual cubelet project is an effort to sort of take serverless container technology, whether it’s Azure Container instances, Amazon, fargate, or many other ways of delivering serverless containers, and to integrate it with the Kubernetes API. And so if you’re interested in how this intersection comes to be, if you’re interested in exploring these questions going forward, I encourage you to go check out the virtual cubelet on GitHub, maybe even run it up on Azure, see what it looks like. And you too can help us move forward into the world of Kubernetes and serverless containers.
Now, I talked a lot about, you know, effectively the the shape of the cluster and the infrastructure providing the cluster. But I didn’t talk about very much about what I think is the more interesting part of where the future of Kubernetes lies. And so if we go back to that picture of Kubernetes, with the API is sitting on top of serverless containers, I think actually the even more interesting part of the future is what happens above Kubernetes.
Kubernetes is provided a great low level API for defining and deploying your applications. But the truth of the matter is, it hasn’t made it any easier to build those applications. And I think a lot of what we need to do in the future is to take steps to make it easier for people to construct those applications that run on top of Kubernetes. And that’s going to involve adding more capabilities, adding more API’s.
Hopefully not adding more complexity I know it’s kind of scary sometimes in here adding more API’s to communities and you think, Oh my god, yet another API that I’m gonna have to learn yet another CRD that I’m gonna have to worry about, but I hope that as we start developing some of these additional layers, we’re actually hiding some of the complexity, we’re not just adding resources, we’re actually adding abstraction and adding encapsulation. And that will actually, at the end of the day, make your lives as developers of applications. And even perhaps as operators of applications easier.
I wanted to sort of motivate this discussion of how we make developers lives easier, I want to start with the idea of accessing cloud storage from a user application inside of a Kubernetes pod. And when you think about it, it’s not actually, you know, all that complicated.
It’s something that we do all the time inside of that user application, we figure out how to talk to the cloud storage, we talked to the cloud storage, and we get on with our lives.
But I think that in that sort of simplification, we’re actually hiding a lot of stuff that actually makes it harder to build distributed applications. And there’s three sort of aspects of what I think about when I went about why it makes it harder. The first is that whenever you’re integrating with a new piece of storage, there’s questions. And one of the first questions is, which client library should I use? Alright, so let’s say we’re using TypeScript, we’re going to use NPM, we’re going to talk to Redis.
That’s not really like rocket science, right? Like, that’s something that people have been doing for many, many years. And yet, if you go to NPM, and you search for Redis, there’s more than one client library out there. Right there, you know, there may be one that has way more, you know, downloads or or whatever, you know, but you may not actually decide to use that one,
I, we’ve actually had customers who have developed applications who, for whatever reason, decided to use a different one, maybe it’s because the search terms just didn’t quite match, maybe it’s because, you know, they cut and paste to some project that they did a long time ago, without even really thinking about the library that they were using.
When you’re taking a dependency on a library, you are, you know, you’re asking yourself a question of like, which is the best library to use. And the truth of the matter is, if we let 1000, humans make 1000 decisions, they’re going to all make different decisions. And while that’s great for, you know, letting 1000 flowers bloom, it tends to not actually be that great for developing the best applications that we can.
Because I think one of the other things that people don’t necessarily think about when they take that dependency on that third party library, is that they’re actually also introducing a risk. It’s become very fashionable today to talk about the risks of the software supply chain, but it is a very, very real risk, I think we need to move from a world where we think, hey, it’s awesome, look at all these different libraries that I can pull into my application that make my development easier to, oh, my God, every single library that I pull into my application is a potential state sponsored attacker trying to get inside of my application. And that might seem a little pessimistic.
But the truth of the matter is that the software supply chain and open source is the Wild West, if you pulled out an NPM package, you’re trusting the entire set of people who develop that NPM package, and not just that development package, but every single transitive package that that package depends on. Right. And so I think we need to move to a world where every single dependency that we take, we think of it as a potential risk. And so in that world, we should be really concerned about taking dependencies. And so that act of talking to Redis. And using a library off of NPM, becomes suddenly a choice that introduces risk. And then finally, there’s just the speed of getting up to speed on a particular library, and the performance of having a highly performant library.
Right. So we all are really, you know, our main job in life is not to connect to Redis, our main job is to build some sort of application that happens to need to use Redis. Right. And so if you have to learn how to use that client, and maybe you’re going to switch from Redis, to Cassandra, to Mongo to Postgres to, you know, PostgreSQL SQL, every time you switch, you have to learn a new client library.
Every time you change languages, you have to learn a new client library, and you know, they’re gonna look a teeny bit different. I’ve developed a number of different client libraries for Kubernetes. And they all look a little bit different in the different languages. And some of that is EDM.
Some of that is making it look like language, but a lot of that is just the little choices. Do I say get or do I say read? You know, do I say delete? Or do I say destroy?
All of those little choices end up meaning that the knowledge that I learned in a particular area don’t necessarily apply when I switch providers when I search storage backends when I switch languages, and all of that adds cognitive overload, that slows me down. Right? And so when we’re thinking about taking a dependency on storage, we’re actually doing things That opened ourselves up to making the right to answering giving the right answers to a bunch of questions.
It opens us up to the risk of software dependencies, and it actually slows down our development process. So now hopefully you’re thinking, well, oh, my God, I didn’t I can’t not take dependencies on so on storage libraries, right. I mean, I need storage, and I don’t want to do it myself. So I think the fortunate maybe you’re feeling a little bit bad. But I think the fortunate thing is that this problem has been solved before. And the way that I like to illustrate the fact that this problem has been solved before is that none of us implements sort. And yet, many of us use sort on a daily or in some project, right? And the reason that none of us implements or at least since we took, you know, undergraduate algorithms, the reason that none of us implement sort, is because there’s a standard library, right. So in every modern programming language, there’s not just a runtime that runs your stuff.
There’s a standard library that provides a bunch of useful implementations of things that you’re going to use every day, you would tell someone, they were crazy, if they decided, like, Hey, I’m gonna write this application. And by the way, on the way to writing the application, I’m going to reimplement sorting, I’m going to random limit, HTTP serving, want to do it all from scratch for all of those same reasons that I just listed before of having to answer all the questions, right, introducing security problems and slowing yourself down, you’d say no, no, no, there’s a standard library, use the standard HTTP server, use the standard sorting algorithms don’t bother with that focus on the business logic.
Um, and that’s what dapper is. So dapper is an open source project. It’s email@example.com, up on GitHub, and it’s trying to provide that standard library for distributed applications.
Right. So just as every application, or every programming language has a standard library, we’re trying to provide a standard library for containerized applications. And the way to think about this is we’re actually using the sidecar model.
Now, you may have seen the sidecar model with service meshes and things like that. I think service mesh is a great example of a broader pattern that dapper falls into, which is to say dapper provides a bunch of useful co routines, or co processes that can make your life easier. I mentioned earlier about how a lot of these programming languages have standard libraries. One of the things that’s obvious in the container world is that it’s a polyglot world. You can’t affinity eyes, any particular standard library. So there’s, you know, there’s some libraries out there that have already been developed, but they’re very fixed to a single language.
Dapper takes an opposite approach. Dapper says, Hey, you know what, we’re just gonna talk HTTP or G RPC. And that means that any language can talk to dapper. And we can use the exact same implementation, whether you’re coming from go Python, TypeScript, Java, rust, dotnet, or really any other language that can speak HTTP. And all of those concerns around designing things right around taking the right dependencies around worrying about security.
Dapper takes responsibility for all of those, because HTTP is built into all of these languages, there are no external dependencies, because accessing HTTP is a standard way of accessing things. There’s not really any open questions about how to do it. So let’s take a look at what this looks like in Java.
It helps you incorporate the best practices that everybody has done, sorry, the Java example come in just a second, I want to talk about the value of this dapper coprocessor. It encapsulates the best practices of how to access storage, you no longer have to, you know, sort of answer the questions the right way. Because dapper is a community project, all of the people in that community well have come together to develop a single best practices implementation that you can take advantage of. It also has security dependencies. I mentioned that earlier.
The dapper project is the thing taking dependencies, not your application code. The dependencies that are taken are done by the community done very carefully, done very mindfully, and also done responsible responsibly, to see the ease and things like that. So you can depend on you can have a single dependency on dapper, and trust that the dapper project will take care of the security dependencies for you. And then finally, what you’ll see in a second in the examples of how to use dapper, it actually reduces complexity, because the interface to dapper is standardized, so that whether you’re talking to Redis, or Cassandra, or Cosmos dB, and Azure, or any other cloud based, no SQL store, the interface is the same. So there’s one thing to learn. And then the interface itself is just a vanilla HTTP. And that’s also something that is very, very familiar to most developers in most programming languages. So that cycling back to what I mentioned earlier, let’s take a look at what it means what it looks like to access storage inside of dapper from Java. So what you’ll see here is this is storing a JSON blob to a particular key value.
In a no SQL store, and it is three lines of code, and one of them didn’t quite fit on the slide. So I’ve got mine split into two into two, but it’s three lines of code, creating a new post. And you can see I’m talking to localhost localhost isn’t chosen here, you know, arbitrarily for the demo.
Localhost is used actually, because dapper runs inside of the same pod, as your application shares the same network namespace is only exposed on localhost. And the nice thing about that is you don’t have to worry about security, you can use plain HTTP, not HTTPS, because the network traffic is restricted exclusively to inside of that pod that is running dapper, in fact, is the thing that does HTTPS does encryption handles authentication out to the storage back end, once you’ve created your posts, you set your you know, set the body of the post, you execute the post, and you’ve stored data to your no SQL store. And again, whether you’re talking to Redis, or Cassandra, or Mongo, or Cosmos dB, or any other cloud NO SEQUEL, this, this code operates the same. So not only is dapper useful for, you know, making it simpler to write code, it actually makes it easier to write portable code as well. And that’s also a good value from a standard library.
Now, you may think at this point, the dapper is only for storage, because that’s been the examples that I’ve drawn so far.
But in fact, dapper is intended to be a standard library for a number of different cloud native application type patterns. And one of the more obvious ones that’s a corollary to storage is event.
Well, if you know, storage is sort of an outbound thing, events are an inbound thing. And this is how you would listen for events in dapper, you just create an HTTP server.
Or if you already have an HTTP server, you just add a new handler to that HTTP server. And that handler will be called whenever an event occurs. And that’s it, there was no client library to pull in, there wasn’t a client library to learn.
There wasn’t, you know, best practices to figure out around how to listen to the differences between listening to Kafka versus event grid versus any other event store, you just set up an HTTP handler and the right things happen. So again, this is the same pictures we have restored, but the errors have been reversed. So the from the cloud or from some other event store, the event comes into the dapper co process, the dapper co process handles the details of understanding Kafka, or rabbit mq, or whatever else translate that event into an HTTP call into your code. And again, it’s polyglot, because literally any programming language in the modern world knows how to develop an HTTP server. So I hope that gives you a good crisp understanding of what we mean when dapper is a standard library for cloud native applications. And it’s not just restricted to storage and events.
But in fact, data provides storage events, secrets, authentication, and many more. And they’re all done through the notion of an interface and an implementation. So you can have a secrets implementation that is backed up by Azure Cloud, Azure Key Vault by hashey, Core Vault by any other cloud secrets provider. So you can deal with this abstract idea of adaper secret without having to worry about the concrete details of a specific implementation to a specific cloud or other secret provider.
I really invite you if it seems interesting or exciting, or you just want to play around, I really invite you to come out to join the community in dapper.
It really is intended to be a place where we come together to build the standard library because it really is, it can only become a standard library, if you know the community rallies around it, if it’s only three people writing things down, but we don’t gain the benefit of everybody’s expertise and everybody’s opinions and all of their use cases. So please come join us on GitHub.
It has hit a one Dotto release. So it is production ready today. And I really think it is a big part of the future of containers and Kubernetes. And that’s it. I hope you have enjoyed the talk.
I put a question mark there at the end because of course this isn’t really the end. This is perhaps the end of the beginning. I hope that the ideas around how serverless and serverless containers integrate with Kubernetes around how we build a standard library for Kubernetes. I hope these things resonate with you. And you help us move forward as a community into the next 10 years of what containers cloud and Kubernetes mean.
Thank you, Brendan, that was a great keynote opener, and I was excited to hear about all the future of containers and around what you’re doing in the dapper project.
Obviously, if anybody’s interested, we’ll be in the chat, we can give you some links to where you can find that project and contribute more to it. With that we’re about to start our partner showcase day, where again, we’re going to have two tracks over 25 sponsors, lots of things for us to learn. Again, it’s going to be very hands on very in depth, very real world real life scenarios that we’ll be walking you through.
This is going to be kind of deep developer walkthroughs you’re going to see everything on how you can integrate with the JFrog platform from many of our partners. And obviously, if there’s a track that’s going at the same time, and there’s two you really want to be at, we’re gonna put all of these up online, of course after we’re done with swappa. So with that, I’d like to welcome you to join us on one of the two tracks or both, and bounce around between them. And learn as much as you can today with our partner showcase day, give us feedback, let us know how things are going.
There’ll be a survey at the end. So if you’d like to get a T shirt, please fill out our survey and let us know how we’re doing.
With that, thank you and have a great day.