The JFrog DevOpsing Journey

Batel Zohar | Developer Advocate, Anastasia Grinman | DevOps Engineer

As DevOps engineers, we truly believe that automation is super important. We are humans, and humans make mistakes and we would like to make sure that we are automating our processes as much as possible.

In this talk, we are going to show you how we are using K8s in our organization, while combining automation relying on DevOps tools and securing the process – all based on our experience.

Video transcript

Hello everybody, thank you for joining us. Batel, how are you today? I’m good. Thank you, how are you? Thank you, Batel, I’m good too. Want to talk about DevOps a little? Sure. Let’s share a brief of our DevOps team journey we have JFrog. Yeah, we hear a lot about DevOps and implementing DevOps in an organization. We tend to think the DevOps is all about tools. You know, a lot of people say let’s bring Kubernetes to our organization, let’s work in the cloud, let’s move to microservices and so on. So in our DevOps in Germany at JFrog, we understood that DevOps is more than just tools and work environments.

 DevOps is about processes, tools, and people. Alright, so let’s talk a bit about processes, tools, and people. When we’re talking about processes to deliver stuff faster, getting into liquid, we would like to use automation that helps us to reduce the manual task and processes. So every step from development to the release should be automated as much as possible, when joking about tools like CI\CD tools, or cloud provider tools, or distribution, installation, or just monitoring of our deployment. And of course, people like you mentioned.

 Implementing DevOps is changing the mindset for all the people in the organization. So we all agree that combination of people, processes and technology enables continuous delivery of value to you and your users and customers. We would like to take you through our DevOps journey at JFrog moving Kubernetes orchestration to scale and increase our productivity. So just like in other organizations, at JFrog, we need… to release… -fast. To scale… -fast. To maintain… -fast. To support customer needs and the business… fast, of course, I love our customers. Have a fast… self service and integrate JFrog application… fast.

 Of course between each other, you know, we have a huge platform. So we want to make sure that everything can communicate very, very fast. Or in other words, we would like to release fast. No way. Alright, so let’s start with a quick quiz.

 Do you like Kubernetes? Let’s see your answers in the chat. Let’s wait a bit. You can answer if you use Kubernetes, if you like Kubernetes, how you feel with Kubernetes… And do you think you know how to work with Kubernetes? I think we are good. We can start actually. Sure. Let’s talk a bit about us, and we’ll go back to the answer later. So my name is Batel Zohar. I’m part of the developer advocate team in JFrog.

 Before that I’m already four years in JFrog, so before that I was part of the support team and the solution in JFrog. And before that I was an embedded engineer. And I have an amazing doggie as you can see in the picture, his name is banjo. And feel free to reach out to me after the session. I’m always happy to help. So please do not hesitate to send me an email or Twitter or whatever you want. And I have an amazing partner today, Anastasia, thank you very much for joining.

 Thank you Batel, thank you.

 My name is Anastasia, and I’ve been working at JFrog for the last five years as a DevOps engineer and I’m challenging myself each and every day by making our life easier with automation and infrastructure adoptions. And feel free to reach me if you need questions. With pleasure. Thank you very much. So let’s start from the beginning deploying the JFrog application.

 So yeah, JFrog application is prepared to be distributed in various deployment package types, like you see in the pictures including helm, rpm, Debbie and document and much more. Each of these application version needs to be built and tested and include all the distribution package dependencies.

 Today, we have nine different distributions, and there are more to come. We know that the release cycle will be much more challenging. To adopt a more proactive approach to delivering and verify trusted code and to enhance the process by adding more functionality. We all understand that we must use automation as much as possible. Maybe it’s obvious, and we all know this, but it’s important to mention humans make mistakes.

 Of course, we all make mistakes and it’s totally fine. Rather than spending 10 minutes manually triggering some semi automated tests to get what we actually needed we instead focus on enhancing application protection layers, adopting the different resources like CPU and memory per application or distribution type.

 We’re improving our security program depending on our organization needs, configuring the service for the customers needs and we would like to ease the support for adding new applications and services. Right. So why Kubernetes? Using automation, we increased both the scale and the effectiveness of our deployment preparations. So here we get into microservices. – Exactly.

 It enabled us to focus more on the details of the code rather than the infrastructure where the codes run on. Kubernetes is about 7 years old and it becomes to be one of the most loved platforms and the de facto standards of orchestration container workloads. In order to move running micro services and user Kubernetes cluster, we had to prepare some infrastructure changes and make sure we knew what we are getting into.

 Let’s observe a brief high level four to do’s for the application changes that need to get done before moving to Kubernetes. Alright, so let’s start with application security. Move to the state for stateful versus stateless application. We also have graceful shutdown. And cover the infrastructure as code.

 Okay, cool. So let’s start. Yeah. So first item is, as I mentioned, application security. Kubernetes cluster behaves as a proactive wrapper for isolating the application from auto containers and OS computer system.

 It’s running the application as a dedicated user, we will route privileges permissions, scan our container for vulnerabilities, protecting API’s that interconnect with the system like network applications and devices. And it also stores sensitive information in secrets that provide sensitive environment variable from secrets and much more.

 Yeah, so the second item we would like to cover is the stateful versus stateless application. Before you actually start running application production Kubernetes cluster, it’s important to understand the application architecture. So let’s talk about the best practices for stateless application. So the best practices for stateful application is do not rely on local storage.

 Storage could change in different environments and can act differently with different applications. So do not rely on local storage or do not store state information in case your application can disappear at any time, you know, or just can be restarted. We want to avoid specific server information.

 Docker is also very lightweight, so it would like to use environment variable or system properties to make sure that application can always recover, to make sure that we can just easily you know, recover it, change it again and just run it once again. Replicas, when several replicas are running, the application needs to recover and seeing each one between all the replicas because we want to grow out very easily and make sure that we have the same replica in different way. So can we summarize just in context of Kubernetes cluster, we can talk about some a capeterium CAP of reversal, like consistency, availability and partition tolerance.

 What does it mean? Consistency means with every read gets the most recent write, it means that if write request was done on pod A, so pod B must show the latest data or give you an error message. Availability in context of application, that it must function as close to perfection as possible, and when we are talking about partition tolerance, it’s about the ability to survive the failure of not running application. So the third item was graceful shutdown, when we start watching for signals on the application interruptions like kill signal, keyboard interruption and much more. We want to make sure that once the application where we redeploy, restarted or just crash, your runs and processes will gracefully shut down and the information will not be lost, we want to make sure that everything is keep and safe and nothing will crash one day and disappear.

 We need to set up a time to the task to be completed. So it’s important to track it, you know, we can say like it’s taking 40 seconds to run the server for example, which is very optimistic but let’s say it’s taking 40 seconds to run the server and make sure that we have our application. Moreover, we want to add more output for the application state to be able to debug it whenever it’s needed. So again, if I know that it takes me 30 seconds to run my application, but I need a log that’s showing me that it’s actually run and the server was started, I want to add more and more errors or debugging code to make sure that I can find this solution as soon as possible. To have visibility of actually what’s going on. Exactly. So the last but not least, is item of infrastructure as code.

 In fact, infrastructure as code is the key and foundation for DevOps practices such as version control, code review, continuous integration, continuous deployment. Provisioning of new infrastructure can take a long time. For example, network layout, servers, databases, security rules, this is done carefully, but mistakes happen. And configuration drift happens occasionally.

 Moreover, there can be uniformity between regions and cloud providers. For example, provisioning time of new region can be reduced from three weeks manually task to 30 minutes automation provision, and even more. Yeah, so whenever we’re talking about three weeks, it’s taking you know… like you told me, the node and the configuration… and servers, databases, security rules… everything that we need to do manually can be easily automated, and it will reduce the time dramatically. So when preparing to scale, it must come together with unified configuration management to act like source of truth. It means that no manual changes can be done but only from configuration management tool. Any change has to be documented or permitted and we want to make sure that the team is aware of these processes, and it’s super important to document it for better understanding in the future, and correct the changes and have a full details information for every change. So this allows us the changing and versioning infrastructure safely and efficiently. And when the code reaches the master, it means that it’s production ready.

 The version released from the master should be tested and promote for production rollout. Immutable infrastructure avoids configuration drift, the code is compared to the current state of the environment. When plan is applied, resources are added or removed. So we tend to think that the journey to microservices often comes with containerization but these paths are not necessarily bound to each other. Yeah, I agree. They want to benefit from the Docker toolset, but that’s why Docker will tell you the usage of micro services.

 Some organizations implement micro services for your file Docker, or Xamarin monolith application file Docker. Every organization is different. So no one can tell you if you are doing it right or wrong, just do it according to the need of your organization. This always depends on your organization’s needs. Exactly. So the larger the application skill, the most system administration and control is needed.

 Even we started to use manage cluster, we still need to change the and adopt the cluster to our organization policy and needs. We have some example here to share with you what we had in our journey. Yes. So let’s start with the first item of NGINX Ingress controller and why we recommend it. So actually, it’s very easy, we recommend to apply the presentation tier and multi tier architecture for separation between the public endpoint and to the application.

 What does it mean? It means that the application will be running behind the web server or load balancer that can be the first line of defense for attacks and to protect, prevent and offload the vote from the applications. The access to the application will be TLS configured. Moreover, unencrypted like plain text communication is not recommended at all. Yeah, for sure. We want to try to keep safe. So we can also talk a bit about DNS server that by default, the cluster comes with cube DNS that is already deployed and you may want to migrate it to accordionist, and I have some several reasons why I prefer accordionist. So accordionist is multi threaded, it’s written in go and instead of single thread that’s running on cube DNS, which is written in C and CPU and memory consumption is the number of services and endpoints scale up so the memory requirements for each DNS port at the default setting, my accordionist should be expected to use less memory than cube DNS.

 It’s due to hardware overhead frequently used by cube DNS against only one container that uses Accordionist Yeah, the next item is a plugins, there are a lot of plugins in Kubernetes cluster we can start using and recommend to set up. Let’s just examine some plugin like Calico. It’s enabled by default on Azure. It’s a plugin for network policy.

 The main motivation of Calico and that’s why we recommend to use it is to protect and isolate the applications from other services and third parties running in the same cluster. Moreover, it can be used inside the namespace to have a control and insulation on a particular pod or service. Of course. So we also have logging and monitoring that we can decide if we want to manage the stock by ourselves like installing DRK or Meteos or we can use elastic load service and ship the port’s STD out to the system for further analyzing and monitoring.

 And here we get to NFS provisioner. And why? Sometimes you might want to use it for storage, for some application, for sharing common data or keep information even after the pod is restarted or deployed. Each cloud provider offers the NFS provisioner like AWS, or NFS Client provisioner like GCP. We are free to deny provisioning for Kubernetes. While on Azure, this layer is unnecessary, since you can use storage class directly to connect to your Azure files.

 Yeah, but again, you can always add more services to deploy on your cluster, each organization using its own tools and there are so many open source tools and plugins that you can use, so feel free to edit according to your needs. Yeah. Also, they started every progress with security enhancements on our Kubernetes cluster. And as part of the journey, we can recommend the following.

 First of all, we always recommend to include the connection to the database with TLS. Also, we always use private network to run our Kubernetes cluster nodes for a better protected cluster. Any external access can be done, we’ve managed not or port forwarding where it’s needed. But again, try to keep it simple, keep it safe and private.

 We can’t talk about security without encryption, right? While working with specific cloud providers, you can use cloud provider services like KMS for encrypting the sensitive data. In addition, we use the private Docker registry, which is trusted source for Docker images pulled as part of the deployment in Kubernetes cluster. Moreover, in fact, we must keep Kubernetes cluster version updated to get the new features support as well as security vulnerability improvements. Also, once you run different services on dedicated node pools to protect sharing the container around with our microservices. So for example, we will run a front edge service application in different node pool to isolate from the malicious application that could run on other nodes and that’s like, you know, the database and so on, that will protect our core services to be damaged, we want to isolate the for different nodes and try to keep it simple and try to isolate them. Yeah, sounds good.

 Now, when our cluster is ready, we can step in with our application deployment. So whenever trust the code, we can say iron code, or barzel code in Hebrew. For that we must have some business deployment service, we can call it self service that’s running business logic of the deployment of the installation or an upgrade. It means that it doesn’t matter in which state the application service or Helm release is running, the redeploy action will bring the application or service to the desired state from perspective of application version, Helm chart version, environment configuration setup, it could be per development or staging or production release.

 This code comes with business logic support, and the ability to have feature flags to be enabled or disabled on demand, and much more. Alright, so we’ll talk about a feature flag in the next slides. And for now, let’s talk a bit about how we can get the process faster, you know, we when revoke releasing or while loading any new version to production, what if we need to redeploy some server application with or without default configuration? Or how we can assure that the configuration is not deleted, you know, or overwritten, or something, you know, happened by mistake?

 Because as we said people make mistakes. As usually rollout requires a testing cycle in staging environment, it’s a usually pains and takes time and the deployment process can rely on external configuration. So it’s better to rely on external configuration that is changing frequently. So this configuration can be loaded, like check out it and be used as part of the runtime deployment process. So it really shortens the process time and delivers the code when it’s already tested by using different configuration management.

 Okay cool. So let’s show some examples. So whenever we’re using configuration as code, we are having the deployment process as iron code, so we can redeploy the application with using a new configuration loaded from the external source with or without changing the automation code. So from the example above, you can see the application resources initialization inside the Helm values YAML but it doesn’t change. So this parameter is included inside the deployment, and this is the static initialization. So since it gets all the actual resources from the external cells, managed in Git and this makes the changes be done frequently, like you see in the below example. So here’s another example and you can see the Java options configuration, that can be changed for application needs. So the Helmvalues, YAML can stay persistent using for a low function for each Java options.

 It gives us the flexibility, the liquid process automation and delivery, it gives us continuous update, not releasing a new version because of the configuration changes and actually, it gives the confidence in the process. Yeah, we can just change this specific configuration and make sure that everything will run exactly like it used to before, and it provides us the ability to trust our code, to trust our product actually.

 Because actually what the change in the configuration and not the business layer of a product deployment. Yes, exactly. We are just changing the Java options, we don’t change anything with our application. Right.

 Also we would like to have the ability to enable a specific feature, right? Like we said before, we want to add a specific feature for a specific region, or change the default for a specific customer, or just to allow a new feature for everyone right? But once you create a specific new feature that will be allowed sometime, I don’t know when. So in order to enable it automatically, we will need to add some code changes that will run silently until the feature is enabled, it will require redeployment, we will be provided with the initial startup of our barzel code. So actually having that redeploy will always be source of truth in all cases of broken configuration, of missing objects, of other miscommunications or connection between the application services may be broken, application version updates or deployment change delivery, or even a restart of the application is running on some customer, and much more. So we would like to make sure that our code is reliable, we can trust it and we can just redeploy again, like you said, and feel free to ask any questions. We hope you enjoyed the session. Yeah. Thank you for listening to us.

 Thank you very much for being here today, it was a pleasure to meet all of you. And have a great day.

 Bye bye. -Bye bye guys.

 Have a great day, bye bye.

 

Trusted Releases Built For Speed