Kubernetes Meets Real World: The DevOps Edition

Jessica Deen

Kubernetes is the de facto open source container orchestration system that supercharges applications. We know this to be true after nearly 6 years, but what comes after the 101 course? This demo fueled session takes you on a journey demonstrating how you can not only deploy a production ready Kubernetes cluster with best practices easily, but also how leveraging open source and industry standard DevOps based tooling can streamline and enhance the process. This session covers everything from scaling and pod security, to debugging a live application, testing those changes, and gaining confidence during the review process.

VIDEO TRANSCRIPT

What’s up everyone. My name is Jessica Deen. I’m super excited to be back here at SwampUp.
And today we’re going to learn about Kubernetes in the real world. So let’s go ahead and dive right in and do a brief overview of an agenda so that you know what you can expect from me in this session.

First off, we’re going to start off with an introduction into the architecture behind running production level Kubernetes. There’s some considerations you want to make in advance rather than just the basic getting started. Then we’re going to move into covering resiliency, how do you cover or handle failure when it comes to production level Kubernetes? How do you handle security, maybe network security? How do you handle things when there’s obstruction and distributed systems and there’s kind of microservices all over the place?

We’re going to talk about how we handle that. Then we’re going to talk about scalability. There’s several different ways you can scale in production, Kubernetes, and sometimes seconds matter. So we’ll talk about how we can handle that as well. And then at the very end, we’re going to have confidence and we’re going to learn how we can have confidence in this entire process from a DevOps perspective.

So let’s go ahead and dive right in with the architecture review. Let’s recap on how Kubernetes actually works. So if you’re not using something that’s managed, meaning something that’s provided by Azure, Google, Amazon, Oracle, whatever your cloud provider is, and you’re taking care of your master node, or what is commonly referred to as a control plane, you probably have something like this, where your master node or nodes has the API server etcd, that’s your database, your scheduler, your cloud controller and your controller manager and you can have multiple schedulers. But either way, in this masternode scenario, this is what’s actually communicating to with what’s called your agent pool or your worker pool. This is how many nodes you have. Typically, when you’re getting started and you’re playing around and you do a basic command, you might have three nodes. But as you start scaling, you’re going to have significantly more with more resources, more CPU and more memory. That’s your agent pool. And then there’s a Kubernetes API endpoint that communicates to your control plane. And then the control plane then goes and schedules things accordingly over into your agent pool. And then of course, there’s you. And this is where you would use some sort of workload definition be it a manifest JSON file, and maybe you’re doing a bake. Or maybe you’re using helm, which is the de facto Kubernetes package manager. And it’s powered by template engine, so it gives you more control, especially for large microservice deployments. Either way, you’re doing something declarative that you’re handing off to the Kubernetes API endpoint. That API endpoint is then communicating with your control plane and then scheduling things accordingly with your agent pool and your worker pool. Now in a managed scenario via Azure, Google or Amazon, usually those cloud providers are taking care of the control plane for you. That way, all you have to worry about as the developer engineer that you are, is actually taking that workload definition, handing it off to the API endpoint, and then everything else gets scheduled and managed for you. Okay, so now that we kind of have the architecture and a brief overview of how this works, what kind of considerations do we have to make when it comes into basic versus production? First off, regardless of whether or not you’re using Azure, Google Amazon, whatever your cloud is, chances are you have a simple basic command with something like this. It’s some sort of create command. And in Azure, we call it a resource group. This is where we put our resource objects or a Kubernetes cluster or network or virtual machine scale sets, whatever we’re using. We put that into this resource group. We give it a name for a cluster, we describe how many nodes we want. Typically again, getting started is three, and we either generate SSH keys or we provide our own SSH key pair. But production level Kubernetes tends to look a little bit different, you actually have a lot more parameters that you have to consider. For one, you’ll still give your resource group your name, your node count, but then you might also in the case of Azure, define a service principal and client secret. And this is essentially authorization that you’re giving to this cluster. And by creating a dedicated authorization account, you can actually attach that over to your Container Registry, or ultimately attach that over to your pods if you’re using manage pod identity. So there’s a lot of flexibility you can do from a scalable perspective when you consider things like that. There’s also things that are on this slide that you need to be aware of in advance. When you’re creating your cluster. For example, a network plugin if you’re using something aside from Kubenet, and that’s just the basic network within Kubernetes. Every cloud will have something more advanced. In our case in Azure, if you’re using something more advanced, chances are you’re going to have to define that advanced network at the time of cluster creation. You can’t go back and retroactively add it. So if you’re with another cloud provider other than Azure, you’re going to want to ask them is this something that I can do retroactively and change? Chances are you can’t, because that’s the underlining networking. But you’re going to want to know that information in advance. You’re also going to want to know what type of virtual machines set type you want. In Azure’s case you can choose virtual machine scale sets, or you can choose availability sets. The last two flags on this particular slide, zones and network policy. Zones is what we’re going to talk about in our first demo. And this is how we handle failure. This is how we handle which region each node gets put into. If you don’t specify that all of your nodes might be put in the same region. And if that region or that data center goes down, now your entire workload and application goes down. Now zones is something that needs to be configured at the time of cluster creation. You’re going to want to ask your cloud provider if that’s still the case as well. Network policy. Now Calico’s a little unique because Calico is actually an open source project by a company called Tigera. And this is probably the easiest way to get started with Calico, especially on AKS, because you just use a flag, but that network policy has to be defined at the time of cluster creatio. Now you can install Calico manually since it’s open source, you can go and deploy it and configure after the fact. But it’s considerably more time consuming than just setting everything up now. Just because you have flags enabled doesn’t mean that you have to use them immediately. But now you’re setting your cluster up to be able to scale as your needs and your production use cases scale as well. Now let’s go into how we handle resiliency and specifically how we handle failure. That’s where we’re going to talk about availability zones. So right now I have a picture of AKS architecture, you can sub AKS for G-Cloud or Amazon or whatever you want. But typically, you’re going to have nodes in a data center, right? And they’re going to be sitting in different regions, whatever your cloud provider is. Let’s say that one of those regions goes down and now your cluster goes down right with all of the nodes. But if you’re using something that has availability sets, you can actually put those nodes and therefore your pods and your containers in different zones or different areas, different data centers. So let’s take a look in the context of Kubernetes architecture, you still have your user your workload definition, your control plane, but now each node gets put into an availability zone. So let’s say that you have a larger cluster, you have six nodes, as we can see, in this example, nodes one and two are in availability zone one, three, and four are in zone two, five, and six are in zone three. And let’s say that you have three replicas, your replicas are going to be equal to the number of zones that you have. So each copy of your pod or your application is going to be balanced across zone one, zone two and zone three. And now zone two goes down. That means you still have two other zones are two other nodes that are in different data centers, to make sure that your application is still up. Right, so now we can handle failure with a little bit more planning in mind. Now there are some things to know when it comes to handling application failure. As I’ve already mentioned, you need to make sure that your deployments with replication sets or replicas are set equal to the number of zones that you’re using. If you have three zones, three replicas, and I’ll show you an example, definition or deployment YAML, so that you can kind of see it’s the same replicas we’ve been using, but you might not have been balancing that across different zones. You also want to make sure that you use an Ingress controller that’s highly available, that’s not just going to go down because those zones go down, right, so maybe something that sits outside your cluster, as opposed to an Ingress controller that’s in that node or in that region that fails. And then regardless of the fact that you still have nodes in different regions, you want to understand that your disk still mounts directly into your pods. Other thing you’re going to want to consider and you’re going to see that in availability zones itself is how you check whether or not you have zones added in. Now all you have to do is do that dash dash zones flag but you’re going to want to Know how you can actually verify that after the fact. So you can do that once in your actual terminal, you can simply do k get nodes wide. In this example, I have three different nodes in a virtual machine scale set. And then I can do a describe on those nodes and search for failure domain. And I see I have three different zones. If I want an easier, more visual way, I can just log into my Azure Portal. And I can see that under my virtual machine scale set that I have zones one, two and three available in East us. Now obviously, this will change depending on your cloud provider, but you’re going to want to make sure that you can check it both from a command line provisioning state standpoint because you can add that in as a failsafe check to see CI/CD. But then also if you’re still learning and you’re still playing around, you can have that visual confirmation as well. Alright, so now that we have seen how that works, and how you check and verify that zones and everything is properly configured, remember that that has to be configured at the time of cluster creation with dash dash zones, and this flag might change depending again on what cloud provider and what offering they have, but you’re going to want to make sure that you plan for that in advance, because it’s going to be important also when you start considering your infrastructure as code and how that fits into your CI/CD. Now, again, we talked about deploying to availability zones, and I mentioned replication sets or replicas. And I should have told you that I would show you an example deployment YAML. Let’s pay attention to under spec there’s replicas. Now this is hard coded. This is obviously a declared definition where I just said replicas equals three. If you’re using something like helm, that might change that might equal replicas is equal to dot values dot replicas, or however your Helm chart works, but the answer is the same however many zones you put equal to the amount of replicas. All right, so now that we’ve handled failure, let’s talk about how we can actually handle security and specifically network policy in the midst of abstraction. Now, by default, if you’re using an advanced network, you’re going to have what’s called a flat network, meaning pod A over here can communicate with pod B over Here. Or more descriptively production service Canary service, however you have your cluster set up over here can communicate with this service over here that might not even be connected to it, even if it’s in a different namespace. So let’s take a look at this example. You have two worker nodes here, each has their little Kubelet on it to communicate with the control plane, and then you have two different applications deployed. Now by default, if you don’t have something in the middle, the left is going to communicate with the right and vice versa. So that’s where Calico comes in. Calico again, is an open source project. And you can put it in the middle where it can choose or you define sorry, not it, you can choose to either allow traffic and allow that communication across that network. Or you can also choose to deny that traffic. And you can do that very easily, still with a declarative workload or YAML file or manifest, all you have to do is say that you want to either allow traffic based on whatever parameter in our example in our demo, we’re going to use a label or then deny that communication. So when the policy is applied, that’s gonna go ahead and deny traffic when you delete the network policy, then it’s going to go ahead and allow traffic. So again, in Azure’s case, that that network policy flag does need to be set at the time of creation. If you’re using a different cloud, you’re going to want to make sure that you understand if that’s something that can be retro actively applied. So once you have that enabled, let’s talk about how easy it is to actually apply that network policy. First of all you really have to do is apply YAML. So you can see right here, I have 11 different microservices, that’s at the bottom. And that’s right search in a specific namespace. At the top, I’m just going to run an Alpine image. Right? I’m just going to use this to test. I can do a W get on one of my API’s. And I’m going to query my stock API, Just like that, I could communicate and download the index.html. So now I’m going to actually apply a network policy and I’m actually going to have it match a label. So everything in prod is going to have a label that’s equal to Twt app. That’s my Trailwind traders app. You can see that my pod still exists, none of them had to restart. But now when I try to communicate to that service and to that pod, I can’t anymore. Now this is accomplished through a label. If I do K, get pods, and I again search, but this time for label, now I see those same services pop up. Now I can easily delete this policy and allow that communication to happen by running the same YAML only this time using the delete command, and I delete that policy from my cluster. Nothing in my pods changes, I’m not deleting the label. But now when I run w get again, I’m able to redownload that index.html file. That’s really how easy it is. Once you have Calico enabled, you’re just going to go ahead and actually apply your declarative syntax to allow or deny traffic. All right, so now that we’ve learned about network policy, let’s talk about scaling. And specifically scaling with adding in serverless. In the interest of seconds when seconds matter in scaling and in your application. There’s three different types of scaling in Kubernetes. First off, you can add or remove nodes. These are virtual machines. But typically adding a node can take about three minutes. And let’s say that you have something that’s highly transactional. And it’s running 10,000 transactions a minute. Now that three minutes it takes to spin up means that you’ve just lost the ability to process 30,000 transactions. Of course, you could also add a remove pod replicas, but those replicas are only going to be able to spin up based on the architecture and how much architecture you have underneath. Another option is actually using what’s called a virtual node, which is kind of it’s a fake node that inserts itself in your cluster that will start scheduling things. In this instance, something’s called Azure Container instances. This is like Docker run in the cloud. And we’re accomplishing this with an open source project called Virtual Kubelet. Virtual Kubelet itself does not work or does not only work with Azure, it works with other clouds. So you can check out to see what you can leverage with virtual Kubelet, and in this scenario. But today’s demo, I’m going to show you running with virtual nodes at this time is only going to work in Azure itself. Now, let’s talk about how the networking works and how we handle this. First off, let’s say that we have our cluster, right, we have our Kubernetes control plane, we have a highly available Ingress controller, we have our nodes and our pods running. And then we have a virtual node. And the virtual node is actually just really a Golang binary that’s sitting kind of in the control plane, telling the control plane, hey, I’m available to schedule things to, if you specifically tell me you want me to run jobs. Now once that get that virtual node gets the jobs, it’s going to start scheduling them in that Docker run in the cloud or container instances. So better looking at it is you can actually see the traffic coming into our Ingress, realizing it needs to get scheduled over into our virtual node, which is then going to fire off container instances. But because it’s still part of our cluster, the container instances can still communicate with the remaining nodes within our cluster itself. Now because this is networking, and it’s actually scheduling things and it’s kind of running on its own channel, it needs to also have its own subnet. So you’ll just have to consider that when it comes down into your infrastructure as code and your architecture. And ideally, you’re not just running basic commands, again regardless of what cloud provider you’re using. Ideally, you’re using something that’s declarative, it’s some sort of JSON template or manifest file, something that you can repeat and stand up so you have confidence in your infrastructure. And we’ll talk about that in the confidence stage. So here’s just an example of creating your own dedicated subnet. And then you would tell your virtual node or your virtual instance to use that particular subnet. And to add the virtual node, at least an Azure case, you would just use the Add on or better yet is you would actually define it again in your declared infrastructure as code. Now I want to make sure that we also are clear on what virtual node supports. First off, it supports Linux containers. It’s pretty common in in all things Kubernetes, but also supports windows containers And it supports GPU, and even cooler as I mentioned, that Goalan binary. That Goaling binary is actually being deployed via Helm. So if you have been on the fence about using helm, or maybe you’re still using bake or you have your own manifest files, Helm, especially Helm three right now… is probably going to be something you want to consider just as a way to manage your all up microservices. It’s even the way that we handle deploying out again, a virtual node binary in production across the board. Now I mentioned briefly earlier, but I just want to make sure we cover this, you need to be explicit to tell your pods to use virtual node, Similar to how if you have a hybrid cluster, and you have Windows workloads and Linux workloads, you have to declare the node selector to say which node you want Windows or Linux. In this instance, you have to do a similar thing where you have to be specific and say that you want to use the virtual Kubelet node. So you have to specify that otherwise it won’t work. You set your toleration accordingly and then you move on. And you can again, add that into your Helm chart or whatever your definition is. So now I want to show you now that we understand how virtual node kind of works. Let’s show you it in action. To do this, I’m going to use an application called Tailwind Traders, Tailwind Traders has 11 different microservices, it’s a highly abstracted application. And we’ve recently added in RabbitMQ to start processing messages. But the unique thing is, is once we send those messages over to rabitMQ, which is running in our Kubernetes cluster, we’re gonna have our message processors actually run on the virtual node, or in the Docker run in the cloud and Azure Container instances. So RabbitMQ will send that over to message processors and message processors will start spinning up based on that demand, so we don’t have to wait for that extra node to spin up okay. Now to do this, we’re actually going to use event driven scaling and this is where we start having serverless kind of tie in. So we’re going to be able to scale Kubernetes based on events. This works with what a wide variety of Azure services, queues, event hubs and blobs, but it also works with other clouds, AWS, Google Cloud, and it doesn’t only work with RabbitMQ. You could also use it with Kafka or Prometheus. You can check out more about the Keda project, which again is open source. And by the way, it was deployed to this cluster through a Helm chart as well as was RabbitMQ, but you can check it out at Keda.sh. But now I really want to show you it in action and how we would handle that in production with an influx of messages. So first off, let’s take a look at the middle box right here where I’m doing k get HPA. That’s for horizontal pod autoscaler. And you’ll notice under targets, it says unknown out of five. Now on the bottom, I’m actually going to get pods wide so that I can see the node that the pods are running on. At the top, I’m just going to apply a batch job and this is actually going to go ahead and say hey, send this job, send it 300 messages all the way through, keep it going super secret username and password, and that batch job is now going to be created. Almost instantly, you can see jobs slowly starting to come in. And this does take a few seconds, right, to kind of start getting traffic going. But if I cancel that watch command and restart it, you’ll notice that the jobs are starting to get scheduled on my virtual node or virtual node – ACI Linux, right. You can see some are waiting, some are pending, some are creating, they’re starting to get created every you’ve see seven seconds, 20 seconds, 16 seconds, and now they’re starting to come in a little bit faster, a little bit faster. This job is going to slowly start to pick up as it starts to slowly inundate with X amount of messages right. Now we’re starting to get zero seconds, zero seconds, zero seconds. If I kill my watch command on horizontal pod auto scalars, I can see that I actually have almost 35,000 metrics that are now coming in, right. So now that’s going to start putting more and more demand where I need to start spinning up, Docker run in the cloud, or ACI, literally, by the second. This is when something’s highly transactional. You can see that all of these now are starting to spin up with my virtual node or virtual node ACI. Now, the same way that I started this and the same way that I started the Calico demo, I can also delete this job. All I have to do is go back into the YAML. And I can go back in and actually change that to delete. You can see right now my targets have slowly started to drop. But let’s change this now to delete and we’re going to delete this batch job. Now when I do a query on HPA. Now,I can see that I’m back to zero to five targets because it’s scaled all the way back down instantly, and it deleted the job. So if we go take a look at the pods, all of the pods are now terminating. So that’s what we can also confirm that yes, all of these jobs were being scheduled by RabbitMQ and RabbitMQ message processors were taking place in the virtual node that’s living again in that Kubernetes cluster. So now we’ve learned how we can also handle scaling and scaling by the second. Right? How can we gain DevOps confidence? And this is the part that I just I love talking about. There’s been a lot more we’ve highlighted here, right? We’ve talked about things to consider that you have to be aware of, zones and Keda, and different applications, all of that. And how do you recreate that reliably? Well, in this demo, you’ll notice that I’m using GitHub actions. And if we click on deploy infra, you’ll notice that every single thing I went through is actually redeployable in an infrastructure stage or infrastructure DOP, including down to creating special subnets, creating namespaces, creating things for my JFrog environment, certificate manager. And anything else that I had in my pipeline doesn’t change, I can still use JFrog to build my Maven packages, Or NUGET or whatever app language I’m programming. And I can still push that information over to Artifactory. And I can trigger Slack notifications. I can even label my pods for network policy. So that I can apply Calico, I can do that in both my dev environments and my production environments. In fact, the last three stages you see here, build images, deploy to Dev and deploy to prod is almost identical to the demo I ran. It’s fun from last year. All I had to do was to add in the infrastructure stage. Here’s another visualization using a different CI/CD system, codefresh, which is Kubernetes native. And actually, every single step runs in a container itself, it’s Kubernetes on the back end. You can see that I’m still able to use JFrog to build my jar file if I’m doing Maven. If I were using a different language, which Tailwind Traders, by the way, actually was written in both dotnet core and node or JavaScript. It doesn’t matter the process is still gonna stay the same. I just want to make sure that I have JFrog as a private package feed or have something set up that can actually scan binaries and be able to host things. I like JFrog, because not only can I host my packages, I can also host my Helm charts, I can host my Docker images. And as you can see, right now it’s on the Xray security scan. I can scan all of that and have it in one location. In fact, I can even send that information back over to Slack. So now I also add ChatOps into this process. Part of making sure that you’re running Kubernetes in production is making sure it’s repeatable as possible. You have confidence in what you’re doing. So if we actually drill down into any of these, you can see the security scan right now or the sock notifications, you can see that I can actually promote right from Slack because I have that plugin added in. But I can also scroll up and drill down into the JFrog notifications, it will take me right over into my dashboard where I can see not only the build information, the helm charts attached the Docker files attached, But I will also be able to verify the Xray status. So you can see the build ID right here, I can see which CI server doesn’t matter if it’s Jenkins, GitHub actions, Travis CI, whatever you’re using, I’m using several to show you it’s not about the product. It’s about what you do with it. But you can see there’s my Xray status of medium and you can decide whether or not that’s a risk you’re willing to take or whether it’s something you need to mitigate here. Now we can also scroll up and see the Xray scan report. Now in the event that it failed, and I’m going to find right now one that did fail, I would also be able to make that decision like, Okay, this is not a healthy scan, or it has some security risks. I obviously don’t want to allow that to be released into production. This kind of goes back over into security, but it’s on the DevOps side. You can see right here, here in my modules. The one artifacts, or single artifact is my Helm chart right here that gets scanned. I can see my manifest.json, that’s my Docker image right there, I can see the Xray data and all the different violations. And I can set watch policies and security policies that are based on if I’m willing to accept this problem, or if I want to automatically fail the build accordingly. And obviously, I can choose to approve or deny, and I would have dev or test. And just to prove it really doesn’t matter what your CI/CD system is, here’s Azure pipelines with again, the exact same setup. The only thing that GitHub actions had was the infrastructure as code. But this is the exact same as codefresh, the last three stages and GitHub actions. And now we have it in Pipelines, It’s still using JFrog, I’m still able to publish my triggers, I’m still able to send that build information over. And I still can publish that through a webhook over into my Slack channel, right. So I still have that confidence of whether or not it was clean, whether or not it was secure. But there’s one more area of confidence that we need to talk about. We’ve talked about some of these things that you’ve seen on the screen, I mentioned Travis CI, I mentioned Jenkins, codefresh, which is DevOps, but Kubernetes native. JFrog Artifactory and Xray, you can tie in, if you’re using Teamcity, or White Source. It’s not about these products. It’s about how you can gain confidence from these products. So one of the biggest questions I get asked is okay, I’m running this application with 11 microservices, or however many in production. How do I debug something when it’s failing? So since we’re in quarantine, I thought it’d be fun to show you something like that, and to add additional confidence into our DevOps process, but I’m gonna show you that with a bike application. This is where I can actually rent a bike by the hour. So here we have AdventureWorks. And this is a cyclists thing. So I’m going to log in as a customer, and I can click around and find a bike that maybe I want to rent, but I can’t see any pictures. And since I am, I would say I’m a millennial. But… Really, I mean, come on, everyone wants pictures, right? We read picture books or whatever, I want people to see the bike that I’m renting before I commit to, I don’t know, $1 an hour, however much it is. So I can go over to Visual Studio code. And I have several different API’s or services here. And I’m going to focus specifically on the bike service. And that’s what I have open in Visual Studio code. You can see that I’ve deployed it with a Helm chart, and I’m going to go down to my JavaScript file for my server. Now the cool thing is, is I can actually set a breakpoint right here on line 231. And this is going to be something specific to Azure, but I believe it also exists for other clouds. I’m going to run the debugger with Node but in a Kubernetes cluster, so this is actually going to deploy out everything here locally. But it’s going to reroute the traffic from my dev environment over into my local system. But for that one API, so I can do live debugging. And I can hit a breakpoint and kind of play around whether or not this fix will work. So for example, if I go back and I search for another bike, now that I set that breakpoint, you’ll see that I’ll be forced back over to VS code because I hit the breakpoint. So I know that the image problem or resolution is in this block of code. In fact, the image URL was hard coded to a static placeholder. So comment those lines out, and I will restart my debugger here, right, this is going to re deploy that container and rerun that service to redirect it back over to my system. So I can see if that resolves the issue. Now this is doing it in a dev environment. So I can refresh and I can see okay, there’s my bike. I just to make sure that it’s not specific to this one cruiser. We can click around and we can find other cruisers. So just by kind of having that interaction, I have confidence now in the fix that I’ve made for something that’s even abstracted, but I want to make sure everyone on my team has the same level of confidence, we can have confidence that our binaries are being scanned and that we have network policy, but how we how can we have confidence in human errors and fixes. So I’m going to push this on a Special Branch over to my repo, and then I’m going to create a pull request. And to do this, I’m actually going to use GitHub actions. In fact, there’s a pull request workflow that we can actually integrate with Kubernetes. So you can see that I’ll create the pull request right here, we can go and take a look at the files that were changed. You can see that I commented out 232 and 233, we can go back to the conversation tab. And you’ll see that my bikes API PR or pull requests workflow has starte. Now what’s happening in this workflow is it’s actually going to take the changes that I made all the changes or the two in this instance, and it’s going to build a new Docker container. It’s gonna push that Docker image, it’s going to push that over into Artifactory. It’s going to create a child namespace over in my Kubernetes cluster, it’s going to release the helm chart for that only one API. And it’s actually going to create a prefix or a special test URL for everyone involved in this change, to be able to go see this private version. So you can see GitHub actions bot commented with a special name of my pull request username. I can drill down into this and now I’m taken over into a special URL that’s going to show me that change live. So now everyone in the team is going to have confidence in the same fix. And we’ve added that to our DevOps process. We’ve added that with our existing tooling, we still have Artifactory in play an Xray in play. But we have additional confidence that not only does our infrastructure exist, but now we also have this fix implemented. Now notice that even when I remove that prefix, I haven’t merged anything yet, right? So I still have the placeholder on the overall URL. The only way that people can see that private version is with that private prefix that you see that was added to GitHub actions. So this gives us an additional layer of confidence. And it includes people that were previously never able to participate in the review process. For example, it allows designers to be included, project managers, people who don’t speak code, everyone gets to be included, and everyone gets to have the same level of confidence. All right, so we’ve really been on a journey and we’ve covered a lot of material, a lot of what we need to consider when it comes to production Kubernetes abstraction, infrastructure confidence. So what are some key takeaways that we can kind of go home with, okay? Specifically, when running Kubernetes in production. First off, you’re going to want to make sure that you have a plan for handling failure, whatever your cloud provider is, whatever your application is, whatever language it’s written in. Today, we used an example of using availability zones. Now you’re also going to want to make sure If you have a plan for network policy, right, something that you can handle your traffic and handle networking accordingly even though you’re dealing with abstraction and distributed systems. We used Calico, again by Tigira, and we used Azuere CNI. Now, another thing you’re going to want to make sure, along the lines of security, is you’re going to want to make sure that you have a plan for scanning your application binaries, your Helm charts, your Docker files, everything. And ideally, it’s going to be best if everything is in one place, right? You tie ChatOps into it, you have it as part of your CI/CD. In today’s demo, we used jFrog Artifactory and Xray. Now, when it comes to scaling, you’re going to want to make sure that you have a plan for scaling even in seconds, especially as your workload and your needs start to grow. It’s great that you can rely on pod replicas and nodes. But what happens if you have to do more? Okay, today’s demo we used Keda and RabbitMQ. And then you’re going to want to make sure that you embrace infrastructure as code. Don’t just say that you have CI/CD because you can deploy your application and you have all the tools and balances in place. If your infrastructure falls over and dies, you need to have a plan in place so that you can make sure that the application deployment is successful. I recommend adding infrastructure in as part of your CI/CD process and check-in your infrastructure as code right alongside your application code. So now you can cross check any changes that you make. Now today, we use infrastructure as code job and GitHub actions workflow, I could have just as easily added that into codefresh or Azure pipelines. If you’re using Jenkins, you can do the same thing as well, you’re just gonna want to have a plan for that. And you’re also gonna want to make sure that you have a solution in place for debugging Kubernetes applications. However it is. In previous demos, I’ve used Helm and draft and done local development. Today, we actually did real time development using Visual Studio code and Azure dev spaces. So I could actually redirect that one service back over to my system and still debug the bikes API in the context of the full application. And then finally, you’re going to want to make sure that you have a plan to tie up all of this confidence. You’re going to want to make sure that you can include everyone possible in the review process so that you can confidently approve changes. Now you have confidence in the infrastructure. You have confidence in the application security, the health, the integrity, the binaries, and you also have confidence in the debugging and pull requests and changes as you’re moving from sprint to sprint. Today, we use GitHub actions, pull requests bot workflow, you can use something that works for you and your environment. Finally, my name is Jessica Deen. I’m here frankly, because I love technology. I love community. I love all things Linux, open source, DevOps, containers. Kubernetes, feel free to reach out to me on Twitter, Instagram, GitHub, no relation to James Dean. So my last name does have do two E’s. And you can talk to me about really anything, I’d love to hang out with you. Finally, all the resources for everything we walked through today is already available online. You can head on over to aka.ms/JLDeen/K8sMeetsWorld. Thank you very much.

Try JFrog for Free!