Kubernetes meets Real World: The DevOps Edition [swampUP 2020]
Jessica Deen ,Senior Cloud Advocate
July 7, 2020
< 1 min read
July 7, 2020 | < 1 min read
Here is the JFrog’s journey to Kubernetes: https://jfrog.com/whitepaper/%20the-j… In the past two years, we moved to deploying and managing JFrog SaaS applications in Kubernetes on…
April 12, 2019 | 2 min read min read
Today, Kubernetes is the defacto standard if you want to run container workloads in a production environment, though that wasn’t always the case. We had/have…
July 1, 2020 | < 1 min read
4 Ways Xray and Artifactory Complete DevSecOps: https://jfrog.com/blog/xray-completes… In this webinar, we will discuss concerns over security, privacy, and compliance holding back organizations from making…
July 1, 2020 | < 1 min read
4 Ways Xray and Artifactory Complete DevSecOps: https://jfrog.com/blog/xray-completes… In this webinar, we will discuss concerns over security, privacy, and compliance holding back organizations from making…
What’s up everyone. My name is Jessica Dean.
I’m super excited to be back here at SwampUp.
And today we’re going to learn about Kubernetes in the real world.
So let’s go ahead and dive right in and
do a brief overview of an agenda
so that you know what you can expect from me in this session.
First off, we’re going to start off
with an introduction into the architecture
behind running production level Kubernetes.
There’s some considerations
you want to make in advance
rather than just the basic getting started.
Then we’re going to move into covering resilience Lee,
how do you cover or handle failure
when it comes to production level Kubernetes?
How do you handle security,
maybe network security?
How do you handle things when there’s
obstruction and distributed systems and
there’s kind of microservices all over the place?
We’re going to talk about how we handle that.
Then we’re going to talk about scalability.
There’s several different ways you can scale in production,
Kubernetes, and sometimes seconds matter.
So we’ll talk about how we can handle that as well.
And then at the very end,
we’re going to have confidence and
we’re going to learn how we can have confidence in this entire process
from a DevOps perspective.
So let’s go ahead and dive right in with the architecture review.
Let’s recap on how Kubernetes actually works.
So if you’re not using something that’s managed,
meaning something that’s provided by Azure,
Google, Amazon, Oracle,
whatever your cloud provider is,
and you’re taking care of your master node,
or what is commonly referred to as a control plane,
you probably have something like this,
where your master node or nodes has the API server that at CD,
that’s your database,
your scheduler, your cloud cloud controller and
your controller manager
and you can have multiple schedulers.
But either way, in this masternode scenario,
this is what’s actually communicating to with
what’s called your agent pool or your worker pool.
This is how many nodes you have.
Typically, when you’re getting started
and you’re playing around and you do a basic command,
you might have three nodes.
But as you start scaling,
you’re going to have significantly more with more resources,
more CPU and more memory.
That’s your agent pool.
And then there’s a Kubernetes API endpoint
that communicates to your control plane.
And then the control plane then goes and schedules things
accordingly over into your agent pool.
And then of course, there’s you.
And this is where you would use some sort of
workload definition being a manifest JSON file,
and maybe you’re doing a bake.
Or maybe you’re using helm,
which is the de facto Kubernetes package manager.
And it’s powered by template engine,
so it gives you more control, especially for
large microservice deployments.
Either way, you’re doing something declarative
that you’re handing off to the Kubernetes API endpoint.
That API endpoint is that communicating with your control plane
and then scheduling things accordingly
with your agent pool and your worker pool.
Now in a managed scenario via Azure, Google or Amazon,
usually those cloud providers are taking care of the control plane for you.
That way, all you have to worry about
as the developer engineer that you are,
is actually taking that workload definition,
handing it off to the API endpoint,
and then everything else gets scheduled and managed for you.
Okay, so now that we kind of have
the architecture and a brief overview of how this works,
what kind of considerations….
Do we have to make when it comes into basic verse production?
First off, regardless of whether or not you’re using
Azure, Google Amazon, whatever your cloud is,
chances are you have a simple basic command with something like this.
It’s some sort of create command.
And in Azure, we call it a resource group.
This is where we put our resource objects
or a Kubernetes cluster or network or virtual machine scale sets,
whatever we’re using.
We put that into this resource group.
We give it a name for a cluster,
we describe how many nodes we want.
Typically again, getting started is three,
and we either generate SSH keys or we provide our own SSH key pair.
But production level Kubernetes tends to look a little bit different,
you actually have a lot more parameters that you have to consider.
For one, you’ll still give your resource group your name,
your node count,
but then you might also in the case of Azure,
to find a service principal and client secret.
And this is essentially authorization that
you’re giving to this cluster.
And by creating a dedicated authorization account,
you can actually attach that over to your Container Registry,
or ultimately attach that over to your pods
if you’re using manage pod identity.
So there’s a lot of flexibility you can do from a scalable perspective
when you consider things like that.
There’s also things that are on this slide that
you need to be aware of in advance.
When you’re creating your cluster.
For example, a network plugin
if you’re using something aside from Kubernetes,
and that’s just the basic network within Kubernetes.
Every cloud will have something more advanced,
In our case and as yours if you’re using something more advanced,
chances are you’re going to have to define
that advanced network at the time of closer creation,
you can’t go back and retro actively at it.
So if you’re with another cloud provider other than Azure,
you’re going to want to ask them is this something that
I can do retro actively and change?
Chances are you can’t, because that’s the underlining networking.
But you’re going to want to know that information in advance.
You’re also going to want to know
what type of virtual machines set type you want.
in Azure as case you can choose a virtual machine scale sets,
or you can choose availability sets.
The last two flags on this particular side zones and network policy,
Zones is what we’re going to talk about in our first demo.
And this is how we handle failure.
This is how we handle which region each node gets put into
if you don’t specify that
all of your nodes might be put in the same region.
And if that region or that data center goes down.
Now your entire workload and application goes down.
No zones is something that needs to be configured at the time of cluster creation.
You’re going to want to ask your cloud provider
if that’s still the case as well.
Network policy.
Now Calico’s a little unique because
Calico is actually an open source project by a company called [A-JIRA?].
And this is probably the easiest way
to get started with Calico, especially on aka es
because you just use a flag,
but that network policy has to be defined at the time of cluster creation.
Now you can install Calico manually since it’s open source,
you can go and deploy it and configure after the fact.
But it’s considerably more time consuming than just setting everything up. Now,
Just because you have flags enabled
doesn’t mean that you have to use them immediately.
But now you’re setting your cluster up
to be able to scale as your needs and
your production use cases scale as well.
Now let’s go into how we handle result
or how we handle resiliency and specifically how we handle failure.
That’s where we’re going to talk about availability zones.
So right now I have a picture of a kms architecture,
you can sub A kms for G Cloud or Amazon or whatever you want.
But typically, you’re going to have nodes in a data center, right?
And they’re going to be sitting in different regions,
whatever your cloud provider is.
Let’s say that one of those regions goes down
and now your cluster goes down right with all of the nodes.
But if you’re using something that has availability sets,
you can actually put those nodes and
therefore your pods and your containers in different zones
or different areas, different data centers.
So let’s take a look in the context of Kubernetes architecture,
you still have your user your workload definition,
your control plane,
but now each node gets put into an availability zone.
So let’s say that you have a larger cluster,
you have six nodes,
as we can see, in this example,
nodes one and two are in availability zone one, three,
and four are in zone two, five, and six are in zone three.
And let’s say that you have three replicas,
your replicas are going to be equal to the number of zones that you have.
So each copy of your part of your application is going to be balanced
across zone one, zone two and zone three.
And now zone two goes down.
That means you still have two other zones are two other nodes
that are in different data centers,
to make sure that your application is still up.
Right, so now we can handle failure with a little bit more planning in mind.
Now there are some things to know
when it comes to handling application failure.
As I’ve already mentioned,
you need to make sure that your deployments
with replication sets or replicas are set equal to the
number of zones that you’re using if you have three zones,
three replicas, and I’ll show you an example,
definition or deployment YAML,
so that you can kind of see it’s the same replicas we’ve been using,
but you might not have been balancing that across different zones.
You also want to make sure that
you use an Ingress controller that’s highly available,
that’s not just going to go down because those zones go down,
right, so maybe something that sits outside your cluster,
as opposed to an Ingress controller that’s in that node
or in that region that fails.
And then regardless of the fact
that you still have nodes in different regions,
you want to understand that your disk still mounts directly into your pots.
Other thing you’re going to want to consider
and you’re going to see that in availability zones itself is
how you check whether or not you have zones added in.
Now all you have to do is do that dash dash zones flag
but you’re going to want to Know
how you can actually verify that after the fact.
So you can do that once in your actual terminal,
you can simply do k get nodes wide.
In this example,
I have three different nodes in a virtual machine scale set.
And then I can do a describe on those nerve nodes
and search for failure domain.
And I see I have three different zones.
If I want an easier, more visual way,
I can just log into my Azure Portal.
And I can see that under my virtual machine skill set
that I have zones One, two and three available in East us.
Now obviously, this will change depending on your cloud provider,
but you’re going to want to make sure that
you can check it both from a command line
provisioning state standpoint
because you can add that in as a failsafe check to see ICD.
But then also if you’re still learning and you’re still playing around,
you can have that visual confirmation as well.
Alright, so now that we have seen how that works,
and how you check and verify that zones
and everything is properly configured,
remember that that has to be configured at the time of cluster creation
with dash dash don’ts and this flag might change depending Again,
on what cloud provider and what offering they have,
but you’re going to want to make sure that you plan for that in advance,
because it’s going to be important
also when you start considering your infrastructure as code
and how that fits into your ci CD.
Now, again, we talked about deploying two availability zones,
and I mentioned replication sets or replicas.
And I should told you that
I would show you an example deployment yamo.
Let’s pay to pay attention to under spec there’s replicas.
Now this is hard coded.
This is obviously a declared definition where
I just said replicas equals three. If you’re using something like helm,
that might change that might equal replicas
is equal to dot values dot replicas,
or however your Helm chart works,
but the answer is the same however many zones
you put equal to the amount of replicas.
All right, so now that we’ve handled failure,
let’s talk about how we can actually handle security
and specifically network policy in the midst of abstraction.
Now, by default, if you’re using an advanced network,
you’re going to have what’s called a flat network,
meaning pod a over here can communicate with pod B over Here.
Or more descriptively service – production service Canary service,
however you have your cluster set up over here –
can communicate with this service over here
that might not even be connected to it,
even if it’s in a different namespace.
So let’s take a look at this example.
You have two worker nodes here,
each has their little cubelet on it
to communicate with the control plane,
and then you have two different applications deployed.
Now by default, if you don’t have something in the middle,
the left is going to communicate with the right and vice versa.
So that’s where Calico comes in.
Calico again, is an open source project.
And you can put it in the middle
where it can choose or you define sorry, not it,
you can choose to either allow traffic and
allow that communication across that network.
Or you can also choose to deny that traffic.
And you can do that very easily.
Still with a declarative workload or yamo, file or manifest,
all you have to do is say that you want to
either allow traffic based on whatever parameter
in our example in our demo,
we’re going to use a label or then deny that communication.
So when the policy is applied,
that’s gonna go ahead and deny traffic
when you delete the network policy,
then it’s going to go ahead and allow traffic.
So again, in Asher’s case, that that network policy flag does need to be
set at the time of creation.
If you’re using a different cloud,
you’re going to want to make sure that you understand
if that’s something that can be retro actively applied.
So once you have that enabled,
let’s talk about how easy it is to actually apply that network policy.
First of all you really have to do is apply YAML.
So you can see right here,
I have 11 different micro services
that’s at the bottom. And that’s right search in a specific namespace.
At the top, I’m just going to run an Alpine image.
Right? I’m just going to use this to test.
I can do a W get on one of my API’s.
And I’m going to query my stock API,
Just like that, I could communicate and download the index.html.
So now I’m going to actually apply a network policy
and I’m actually going to have it match a label.
So everything in prod is going to have a label that’s equal to Tw t app.
That’s my tailwind traders app.
You can see that my pod still exists,
none of them had to restart.
But now when I try to communicate to that service and to that pod,
I can’t anymore.
Now this is accomplished through a label.
If I do K, get pods, and I again search,
but this time for label, now I see those same services pop up.
Now I can easily delete this policy…
and allow that communication to happen by running the same YAML
only this time using the delete command,
and I delete that policy from my cluster.
Nothing in my pods changes,
I’m not deleting the label.
But now when I run w get again,
I’m able to redownload that index.html file.
That’s really how easy it is.
Once you have Calico enabled,
you’re just going to go ahead and actually apply
your declarative syntax to allow or deny traffic.
All right, so now that we’ve learned about network policy,
let’s talk about scaling.
And specifically scaling with adding in serverless.
In the interest of seconds when seconds matter in scaling and
in your application.
There’s three different types of scaling in Kubernetes.
First off, you can add or remove nodes.
These are virtual machines.
But typically adding a node can take about three minutes.
And let’s say that you have something
that’s highly transactional.
And it’s running 10,000 transactions a minute.
Now that three minutes it takes to spin up…
Means that, You’ve just lost the ability to process 30,000 transactions.
Of course, you could also add a remove pod replicas,
but those replicas are only going to be able to spin up
based on the architecture and how much architecture you have underneath.
Another option is actually using what’s called a virtual node,
which is kind of it’s a fake node that inserts itself
in your cluster that will start scheduling things.
In this instance, something’s called Azure Container instances.
This is like Docker run in the cloud.
And we’re accomplishing this with an open source project called Virtual cubelet.
Virtual cubelet itself does not work or does not only work with Azure,
it works with other clouds.
So you can check out to see what you can leverage with virtual cubelet.
And in this scenario,
But today’s demo, I’m going to show you running with virtual nodes
at this time is only going to work in Azure itself.
Now, let’s talk about how the networking works
and how we handle this.
First off, let’s say that we have our cluster,
right, we have our Kubernetes control plane,
we have a highly available Ingress controller,
we have our nodes and our pods running.
And then we have a virtual node.
And the virtual node is actually just really a golang binary
that’s sitting kind of in the control plane,
telling the control plane,
hey, I’m available to schedule things, too.
If you specifically tell me you want me to run jobs.
Now once that get that virtual node gets the jobs,
it’s going to start scheduling them in that Docker run in the cloud
or container instances.
So better looking at it is you can actually see the traffic coming into our Ingress,
realizing it needs to get scheduled over into our virtual node,
which is then going to fire off container instances.
But because it’s still part of our cluster,
the container instances can still communicate with the remaining nodes
within our cluster itself.
Now because this is networking,
and it’s actually scheduling thing and it’s kind of running on its own channel,
it needs to also have its own subnet.
So you’ll just have to consider that when it comes down
into your infrastructures code and your architecture.
And ideally, you’re not just running basic commands again,
regardless of what cloud provider you’re using.
Ideally, you’re using something that’s declarative,
it’s some sort of JSON template or manifest file,
something that you can repeat and stand up
so you have confidence in your infrastructure.
And we’ll talk about that in the confidence stage.
So here’s just an example of creating your own dedicated subnet.
And then you would tell your virtual node or your virtual instance
to use that particular subnet.
And to add the virtual node, at least an Azure case,
you would just use the Add on or better yet is you would actually define it again
in your declared infrastructure as code.
Now I want to make sure that we also are clear on what virtual node supports.
First off, it supports Linux containers.
It’s pretty common in in all things Kubernetes,
but also supports windows containers
And it supports GPU,
and even cooler as I mentioned, that [goaling binary?],
That [goaling binary?] is actually being deployed via Helm.
So if you have been on the fence about using helm,
or maybe you’re still using bake or you have your own manifest files,
Helm, especially Helm three right now…
is probably going to be something you want to consider
just as a way to manage your all up microservices.
It’s even the way that we handle deploying out again,
a virtual node binary in production across the board.
Now I mentioned briefly earlier,
but I just want to make sure we cover this,
you need to be explicit to tell your pods to use virtual node,
Similar to how if you have a hybrid cluster,
and you have Windows workloads and Linux workloads,
you have to declare the node selector to say
which node you want Windows or Linux.
In this instance, You have to do a similar thing where you have to be specific
and say that you want to use the virtual cubelet node.
So you have to specify that otherwise it won’t work.
You set your toleration accordingly and then you move on.
And you can again,
add that into your Helm chart or whatever your definition is.
So now I want to show you now that we understand
how virtual node kind of works.
Let’s show you it in action.
To do this, I’m going to use an application called Tailwind Traders,
Tailwind Traders has 11 different microservices,
it’s a highly abstracted application.
And we’ve recently added in rabbit mq to start processing messages.
But the unique thing is, is
once we send those messages over to rabid mq,
which is running in our Kubernetes cluster,
we’re gonna have our message processors actually run on the virtual nodes,
or in the Docker run in the cloud…
And Azure Container instances.
So [Rabbit MQ?] will send that over to message processors
and message processors will start spinning up based on that demand,
so we don’t have to wait for that extra node to spin up okay.
Now to do this, we’re actually going to use
event driven scaling and this is where we start
having serverless kind of tie in.
So we’re going to be able to scale Kubernetes based on events.
This works with what a wide variety of Azure services,
queues, Event Hubs and blobs,
but it also works with other clouds
AWS Google Cloud and it doesn’t only work with [Rabbit MQ?]
you could also use it with Kafka or Prometheus.
You can check out more about the Qaeda project,
which again is open source.
And by the way, it was deployed to this cluster through a Helm chart as well
as was rabbit mq,
but you can check it out [@qaeda.sh?].
But now I really want to show you it in action
and how we would handle that in production with an influx of messages.
So first off, let’s take a look at the middle box right here
where I’m doing k get HPA. That’s for horizontal pod autoscaler.
And you’ll notice under targets, it says unknown out of five.
Now on the bottom, I’m actually going to get pods wide
so that I can see the node that the pods are running on.
At the top, I’m just going to apply a batch job
and this is actually going to go ahead and say hey,
send this job, send it 300 messages all the way through,
keep it going super secret username and password,
and that batch job is now going to be created.
Almost instantly, You can see jobs slowly starting to come in.
And this does take a few seconds right to kind of start getting traffic Going.
But if I cancel that watch command and restart it,
you’ll notice that the jobs are starting to get scheduled
on my virtual node or virtual node, ACI Linux, right,
you can see some are waiting, some are pending,
some are creating, they’re starting to get created every
you’ve see seven seconds, 20 seconds,
16 seconds, and now they’re starting to come in a little bit faster, a little bit faster.
This job is going to slowly start to pick up
as it starts to slowly inundate with X amount of messages right.
now we’re starting to get zero second, zero second, zero seconds.
If I kill my watch command on horizontal pod auto scalars,
I can see that I actually have almost 35,000 metrics
that are now coming in, right.
So now that’s going to start putting more and more demand where I need to start spinning up,
Docker run in the cloud, or ACI.
Literally, by the second this is when something’s highly transactional.
You can see that all of these now are starting to spin up
with my virtual node or virtual node, ACI.
Now, the same way that I started this
And the same way that I started the Calico demo.
I can also delete this job.
All I have to do is go back into the yamo.
And I can go back in and actually change that to delete.
You can see right now my targets have slowly started to drop.
But let’s change this now to delete and we’re going to delete
this batch job.
Now when I change or do a query on HPA.
Now,I can see that I’m back to zero to five targets because
it’s scaled all the way back down instantly, and it deleted the job.
So if we go take a look at the pods,
all of the pods are now terminating.
So that’s what we can also confirm that yes,
all of these jobs were being scheduled by rabbit mq
and rabbit mq message processors were taking place
in the virtual node that’s living again in that Kubernetes cluster.
So now we’ve learned how we can also handle scaling
and scaling by the second. Right?
How can we gain DevOps confidence?
And this is the part that I just I love talking about.
There’s been a lot more we’ve highlighted here, right?
We’ve talked about things to consider
that you have to be aware of zones and Qaeda…
And different applications, all of that.
And how do you recreate that reliably?
Well, in this demo, you’ll notice that I’m using GitHub actions.
And if we click on deploy infra,
you’ll notice that every single thing I went through is actually
read deployable in an infrastructure stage or infrastructure DOP,
including down to creating special subnets,
creating namespaces,
creating things for my j frog environment,
certificate manager.
And anything else that I had in my pipeline doesn’t change,
I can still use j frog to build my Maven packages,
Or new get or whatever out language and programming.
And I can still push that information over to artifactory.
And I can trigger slack notifications.
I can even label my pods for network policy.
So that I can apply Calico,
I can do that in both my dev environments
and my production environments.
In fact, the last three stages you see here,
build images, deploy to Dev and deploy to prod
is almost identical to the demo I ran.
It’s fun, Last year****
All I had to do was to add in the infrastructure stage.
Here’s another visualization using a different ci CD system,
code fresh, which is Kubernetes native.
And actually, every single step runs in a container itself,
it’s Kubernetes. On the back end,
You can see that I’m still able to use j frog to build my jar file.
If I’m doing Maven,
if I were using a different language,
which tailwind traders, by the way, actually,
was written in both dotnet core
and node or JavaScript, but you can,
it doesn’t matter the process is still gonna stay the same.
I just want to make sure that I have JFrog as a
private package feed or have something set up
that can actually scan binaries and be able to host things.
I like j frog, because not only can I host my packages,
I can also host my Helm charts, I can host my Docker images.
And as you can see, right now it’s on the X ray security scan.
I can scan all of that and have it in one local one location.
In fact, I can even send that information back over to slack.
So now I also add chat ops into this process.
Part of making sure that you’re running Kubernetes in production
is making sure it’s repeatable as possible possible,
you have confidence in what you’re doing.
So if we actually….
Drill down into any of these,
you can see the security scan right now or the sock notifications,
you can see that I can actually promote right from slack
because I have that plugin added in.
But I can also scroll up and drill down into the J frog notifications,
it will take me right over into my dashboard
where I can see not only the build information,
the helm charts attached the Docker files attached,
But I will also be able to verify the X ray status.
So you can see the build ID right here,
I can see which CI server doesn’t matter if it’s Jenkins, GitHub actions,
Travis CI, whatever you’re using,
I’m using several to show you it’s not about the product.
It’s about what you do with it.
But you can see there’s my X ray status of medium
and you can decide whether or not that’s a risk
you’re willing to take or whether it’s something you need to mitigate here.
Now we can also scroll up and see the X ray scan report.
Now in the event that it failed, and I’m going to find right now
one that did fail, I would also be able to make that decision like,
Okay, this is not a healthy…
scan, or it has some security risks,
I obviously don’t want to allow that to be released into production.
This kind of goes back over into security,
but it’s on the DevOps side. You can see right here,
Here in my modules,
The one artifacts, a single artifact is my Helm chart
right here that gets scanned,
I can see my manifest on Jason,
that’s my Docker image right there,
I can see the X ray data and all the different violations.
And I can set watch policies and security policies
that are based on if I’m willing to accept..
This problem, or if I want to automatically fail the build accordingly.
And obviously, I can choose to approve or deny,
and I would have dev tests.
And just to prove it really doesn’t matter what your CI CD system is.
Here’s Azure pipelines with again,
the exact same setup. The only thing that GitHub actions had
was the infrastructures code.
but this is the exact same as code fresh,
the last three stages and GitHub actions.
And now we have an A pipelines,
It’s still using j frog, I’m still able to publish my triggers,
I’m still able to send that build information over.
And I still can publish that through a web hook over into
my Slack channel, right. So I still have that confidence of
whether or not it was clean, whether or not it was secure.
But there’s one more area of confidence that we need to talk about.
We’ve talked about some of these things that you’ve seen on the screen,
I mentioned Travis CI, I mentioned Jenkins,
code fresh, which is DevOps, but Kubernetes, native.
JFrog, Artifactory and X ray,
you can tie in, if you’re using Teamcity, or White Source.
It’s not about these products.
It’s about how you can gain confidence from these products.
So one of the biggest questions I get asked is okay,
I’m running this application with 11, microservices,
or however many in production.
How do I debug something when it’s failing?
So since we’re in quarantine,
I thought it’d be fun to show you
something like that, and to add additional confidence into our DevOps process,
but I’m gonna show you that with a bike application.
This is where I can actually rent a bike by the hour.
So here we have adventure works.
And this is a cyclists thing.
So I’m going to log in as a customer,
and I can click around and find a bike that maybe I want to rent,
but I can’t see any pictures.
And since I am, I would say I’m a millennial. But…
Really, I mean, come on, everyone wants pictures, right?
We read picture books or whatever,
I want people to see the bike that I’m renting before I commit to,
I don’t know, $1 an hour, however much it is.
So I can go over to Visual Studio code.
And I have several different API’s or services here.
And I’m going to focus specifically on the bike service.
And that’s what I have open in Visual Studio code.
You can see that I’ve deployed it with a Helm chart,
and I’m going to go down to my JavaScript file for my server.
Now the cool thing is, is I can actually set a breakpoint right here on line 231.
And this is going to be something specific to Azure,
but I believe it also exists for other clouds.
I’m going to run the debugger with Node but in a Kubernetes cluster,
so this is actually going to deploy out everything here locally.
But it’s going to reroute the traffic from
my dev environment over into my local system.
But for that one API, so I can do live debugging.
And I can hit a breakpoint and kind of play around
whether or not this fix will work.
So for example,
if I go back and I search for another bike,
now that I set that breakpoint,
you’ll see that I’ll be forced back
over to VS code because I hit the breakpoint.
So I know that the image problem or resolution is in this block of code.
In fact, the image URL was hard coded to a static placeholder.
So comment those lines out, and
I will restart my debugger here, right,
this is going to re deploy
that container and rerun that service to redirect it back over to my system.
So I can see if that resolves the issue.
Now this is doing it in a dev environment.
So I can refresh and I can see okay, there’s my bike.
I just to make sure that it’s not specific to this one cruiser.
We can click around and we can find other cruisers.
So just by kind of having that interaction,
I have confidence now in the fix that
I’ve made for something that’s even obstructed,
but I want to make sure everyone on my team has the same level of confidence,
we can have confidence that our binaries are being scanned
and that we have network policy,
but how we how can we have confidence in human errors
and fixes.
So I’m going to push this on a Special Branch over to my repo,
and then I’m going to create a pull request.
And to do this, I’m actually going to use GitHub actions.
In fact, there’s a pull request workflow
that we can actually integrate with Kubernetes.
So you can see that I’ll create the pull request right here,
we can go and take a look at the files that were changed.
You can see that I commented out 232 and 233,
we can go back to the conversation tab.
And you’ll see that my bikes API PR or pull requests workflow has started.
Now what’s happening in this workflow
is it’s actually going to take the changes that
I made all the changes or the two in this instance,
and it’s going to build a new Docker container.
It’s gonna push that image or part of the Docker image,
it’s going to push that over into artifactory.
It’s going to create a child namespace over in my Kubernetes cluster,
it’s going to release the helm chart for that only one API.
And it’s actually going to create a prefix or a special..
Test URL for everyone involved in this change,
to be able to go see this private version.
So you can see GitHub actions bot
commented with a special name of my pull request username,
I can drill down into this and now I’m taken over into a special URL
that’s going to show me that change life.
So now everyone in the team is going to have confidence in the same fix.
And we’ve added that to our DevOps process.
We’ve added that with our existing tooling,
we still have artifactory and play an X ray in play.
But we have additional confidence that
not only does our infrastructure exist,
but now we also have
this fixed chip this fix implemented.
Now notice that even when I remove that prefix,
I haven’t actually changed the overall…
I haven’t merged anything yet, right?
So I still have the placeholder on the overall URL.
The only way that people can see that private version is with that
private prefix that you see that was added to GitHub actions.
So this gives us an additional layer of confidence.
And it includes people that were previously never able to
participate in the review process.
For example, it allows designers to be included project managers,
people who don’t speak code,
everyone gets to be included,
and everyone gets to have the same level of confidence.
All right, so we’ve really been on a journey
and we’ve covered a lot of material,
a lot of what we need to consider when it comes to production,
Kubernetes abstraction, infrastructure confidence.
So what are some key takeaways that
we can kind of go home with, okay?
Specifically, when running Kubernetes in production,
first off, you’re going to want to make sure that
you have a plan for handling failure, whatever your cloud provider is,
whatever your application is, whatever language it’s written in.
Today, we used an example of using availability zones.
Now you’re also going to want to make sure
If you have a plan for network policy,
right, something that you can handle your traffic
and handle networking accordingly.
Even though you’re dealing with abstraction and distributed systems,
we use Calico again by a JIRA and we use as your CNI.
Now, another thing you’re going to want to make sure,
along the lines of security, is you’re going to want to make sure that
you have a plan for scanning your application binaries,
your Helm charts, your Docker files, everything.
And ideally, it’s going to be best if everything is in one place, right?
You tie chat ops into it, you have it as part of your ci CD.
In today’s demo, we use j frog artifactory and X ray.
Now, when it comes to scaling,
you’re going to want to make sure that
you have a plan for scaling even in seconds,
especially as your workload and your needs start to grow.
It’s great that you can rely on pod replicas and nodes.
But what happens if you have to do more?
Okay, today’s demo we used Qaeda and rabbit and Q.
And then you’re going to want to make sure that
you embrace infrastructure as code.
Don’t just say that you have CI CD because you can
deploy your application and you have all the tools and
balances in place of your infrastructure falls over and dies,
you need to have a plan in place so that you can make sure that
the application deployment is successful,
I recommend adding infrastructure in as part of your ci CD process
and check in your infrastructures code right alongside your application code.
So now you can cross check any changes that you make.
Now today, we use infrastructure as code job and GitHub actions workflow,
I could have just as easily added that into code fresh or Azure pipelines.
If you’re using Jenkins, you can do the same thing as well,
you’re just gonna want to have a plan for that.
And you’re also gonna want to make sure that
you have a solution in place for debugging Kubernetes applications.
However it is,
In previous demos, I’ve used Helm and draft and done local development.
Today, we actually did real time development using Visual Studio code
and Azure dev spaces. So I could actually redirect that one service
back over to my system and still debug the bikes API
in the context of the full application.
And then finally, you’re going to want to make sure that
you have a plan to tie up all of this confidence.
You’re going to want to make sure that
you can include everyone possible in the review process
so that you can confidently approve changes.
Now you have confidence in the infrastructure.
You have confidence in the application security,
the health, the integrity, the binaries,
and you also have confidence in the debugging
and pull requests and changes as you’re moving from sprint to sprint.
Today, we use GitHub actions,
pull requests bought workflow, you can use something
that works for you and your environment.
Finally, my name is Jessica Dean.
I’m here frankly, because I love technology. I love community.
I love all things Linux, open source, DevOps containers.
Kubernetes, feel free to reach out to me on Twitter,
Instagram, GitHub, no relation to James Dean.
So my last name does have do two E’s.
And you can talk to me about really anything,
I’d love to hang out with you.
Finally, all the resources for everything we walked through today is already available online.
You can head on over to
[READS OUT WEBSITE LINK*] aka.ms/JLDean/[keights?]meetsworld.
Thank you very much. that you have CI CD because you can
deploy your application and you have all the tools and
balances in place of your infrastructure falls over and dies,
you need to have a plan in place so that you can make sure that
the application deployment is successful,
I recommend adding infrastructure in as part of your ci CD process
and check in your infrastructures code right alongside your application code.
So now you can cross check any changes that you make.
Now today, we use infrastructure as code job and GitHub actions workflow,
I could have just as easily added that into code fresh or Azure pipelines.
If you’re using Jenkins, you can do the same thing as well,
you’re just gonna want to have a plan for that.
And you’re also gonna want to make sure that
you have a solution in place for debugging Kubernetes applications.
However it is,
In previous demos, I’ve used Helm and draft and done local development.
Today, we actually did real time development using Visual Studio code
and Azure dev spaces. So I could actually redirect that one service
back over to my system and still debug the bikes API
in the context of the full application.
And then finally, you’re going to want to make sure that
you have a plan to tie up all of this confidence.
You’re going to want to make sure that
you can include everyone possible in the review process
so that you can confidently approve changes.
Now you have confidence in the infrastructure.
You have confidence in the application security,
the health, the integrity, the binaries,
and you also have confidence in the debugging
and pull requests and changes as you’re moving from sprint to sprint.
Today, we use GitHub actions,
pull requests bought workflow, you can use something
that works for you and your environment.
Finally, my name is Jessica Dean.
I’m here frankly, because I love technology. I love community.
I love all things Linux, open source, DevOps containers.
Kubernetes, feel free to reach out to me on Twitter,
Instagram, GitHub, no relation to James Dean.
So my last name does have do tw
Your action was successful
Please try again later
Modal Message