DevSecOps at Scale Using Amazon Elastic Kubernetes Services (EKS) and the JFrog Platform
In this talk, we will simulate a real world scenario by building a shared services platform using the JFrog platform on Amazon Elastic Kubernetes Service (EKS) using AWS Cloud Development Kit (CDK) to deploy a containerized workload into test and production AWS accounts, using deployment pipelines.
Along the way, we will cover topics such as static code analysis, image scanning and deploying a Helm chart from a management EKS cluster to clusters in Test and Production AWS accounts, and implement DevSecOps best practices using the JFrog Platform.
Video transcript
Hello and welcome to the session to learn about implementing DevSecOps practices at scale using Amazon EKS and JFrog platform. This session is pre-recorded. However, when it is streamed live, I’m available to chat with you, so feel free to ping me at any time during the session when it is getting played.
A quick introduction about me, I’m a specialist Solution Architect at Amazon Web Services and have over 15+ years of hands on experience in application and infrastructure development. And I’m really passionate about building large, scalable solutions for developers and enterprise customers.
I have won multiple awards ranging from software developer, architect, engineer manager and product manager in a variety of industries ranging from financial services to retail. When not building solutions for my customers and partners, I love spending time running behind my three years old daughter indoors and outdoors when whenever Seattle weather permits.
Let’s have a quick look at the topics that we will cover today. We will first talk about what is the shared services platform and why should you build one within the organization. We will then go through in detail and understanding management shared services platform architectural topology on AWS, and then see how to create a shared services platform on Amazon EKS using AWS cloud development kit, commonly referred as cDk. We will then shift gears on understanding in depth how to implement DevSecOps practices for the JFrog platform on AWS and EKS and then go through how to use the application platform that we created to deliver the code to an application EKS cluster, running in a cross account and implement EKS practices all the way using X ray.
We will also go through the different components of implementing DevSecOps practices on JFrog platform and [inaudible].
Towards the end, we will wrap up with a quick demo, next steps and then open up for Q&A.
We have a lot to cover in like next 25 minutes or so. So let’s get started.
To level set what we will talk about in the session today and make a good use of it, let’s first understand what is a shared services platform or commonly referred as SSP and why should an organization be investing in actually building one?
Shared services platform is an organization’s internal platform which is used by multiple teams internally. As the platform is leveraged by multiple teams it should be secure by default.
Shared services platform allows the separation of concerns in software delivery, very similar to AWS shared responsibility model, which relieves the customers from the operational burden as AWS operates, manages and controls the components from the host operating system virtualization laid down and then the customer assumes the responsibility and management of the guest operating system.
Shared services model provides a team within the organization to manage a portion of shared services. Let it be the tooling for continuous integration and deployment or identity and access management. This allows application teams to build software that delivers business values to its customers, leveraging the services from shared services.
Typical shared Services architecture is provision using AWS control tower and AWS organizations allowing different AWS accounts to be provisioned at scale for each purpose.
In an organization there are typically around four separate AWS accounts at least there could be more depending on the size, the nature and the use cases for the organization.
First category of AWS account is used as an example hosting CI\CD platform or tooling leveraged by multiple application team. In this talk, we will talk about and we will see later in the session also today to use and provision the JFrog platform in a totally totally separate AWS account. The second category is generally used for identity and access management or for network management. Third one is a master pay account which is used for billing purposes. And the fourth one, typically is used for ensuring the desired state of your security.
This is an account where all the security logs from multiple sources and multiple accounts like cloud trail, cloud watch, guard duty are aggregated to perform analysis, forensics and remediation.
Managing and optimizing the software lifecycle is often a disjointed effort and disjointed process with developers and IT operations teams working in silos. This lack of coordination can introduce in consistencies, errors and vulnerabilities. Continuous integration and continuous deployment helps to avoid these challenges.
According to a recent CNCS survey report, there has been an increase in enterprises using containers for CI\CD as companies realized the need to shift to a cloud like development and deployment model.
Shared Services platform on containers help companies achieve the benefits of agility, portability, security and speed with the practicality of CI\CD methodologies, as well architected platform provides automated integration of workflows and solution as well as the functionality for developers to quickly scale on demand. In addition, as day one, it should be architecturally designed for immutability to limit the potential of cyber attacks.
Shifting gears a little bit, let’s now dive deep on building a shared services CI\CD platform using Artifactory and Xray. But let’s first understand what are the various components and the ecosystem as there are like different moving pieces all the way around, I promise to make it like as easy as possible to understand.
Shift services platform is typically provision in a separate AWS account, because of the reasons that we mentioned before.
In order to be highly available, the VPC spans out to at least two different private subnets, there are no public facing subnets involved as the load balancer with which the platform actually accesses does not needs to be accessed publicly. So there is no point in like creating a private subnet, and everything is accessed within the organization itself.
We then create the Amazon EKS cluster, expanding on to these nodes and probably is best to use a managed nodes to join to the EKS control plan. A quick introduction on what are managed nodes.
With EKS manage node groups, you don’t need to separately provision or register the EC two instances that provide compute capacity to run your Kubernetes application, which in our case is the entire j frog platform. You can create, update or terminate nodes for your cluster with a single operation.
Nodes run using the latest EKS optimized EMIs in your AWS accounts while node updates and terminations gracefully drain nodes to ensure that your application stays up and available always. All managed nodes are provision as a part of auto scaling group that is managed for you by EKS.
All resources include the instances and auto scaling group run within your AWS managed account. Each node group uses the EKS optimized EMI and can run across multiple availability zones that you define. You can also provision the platform on bottle rocket. Now quick segue as we are touching the topic of the operating system.
Let’s talk more about what is bottle rocket operating system. Bottle Rocket is a Linux based operating system that is purpose built by AWS for running containers on virtual machines or bare metal host. Most customers today run containerized applications on general purpose operating systems that are updated package on package which makes operating system updates difficult to automate. Updates to bottle rocket, however, are applied in a single state rather than package on package basis.
This single step update process helps reduce management overhead by making OS updates easy to automate using a container orchestration service such as EKS itself. The single step updates also improve uptime for container applications by minimizing update failures and enabling easy update rollbacks. Additionally, and most importantly, in my opinion, bottle rocket includes only the essential softwares to run containers, which further improves the resource utilization and greatly reduces the surface attack.
We also use Amazon RDS MySQL database as the database tier two with the JFrog platform connects to.
When you provision a multi AZ database instance, Amazon RDS automatically creates a primary database instance, and synchronously replicates the data to a standby instance in a different availability zone. In case of an infrastructure failure, Amazon RDS performs an automatic failover to the standby instance, since the endpoint your database remains the same after the failover.
Your application can resume database operation without the need for manual administrative intervention. We also configure ideas to store the admin credentials in AWS secrets manager so that users and applications retrieve the secrets with a call to secrets manager APIs. Essentially eliminating the need to hard code sensitive information in plain text. Secret manager offers secret rotations with built in integrations for Amazon RDS. We use Amazon S3 for the persistence layer and artifactory uses cash file system as a read buffer to cache the frequent requests, thereby reducing the number of hits going to S3 significantly and drastically improving the performance.
Artifactory also uses eventual cache as a write buffer, which means uploads are asynchronous and early and the cache is local on the disk.
In the end, the front pods, which are powering up the Artifactory HA and X ray in application with elastic load balancer. Our security is of prime importance as a de-zero responsibility, we also use AWS certificate manager to add and link it with Amazon load balancing services for seamlessly manage and rotate the certificates. Now, as we just saw, there are like multiple moving pieces in this setup, and provisioning this entire open ended setup can be challenging.
As we need to install the JFrog platform in a Kubernetes cluster, by using traditional approaches there will be a need to do context switching by using at least two different APIs, one for provisioning and managing AWS resources and then Kubernetes API to deploy the workloads in the cluster.
The workloads in this case are the different moving pieces of the JFrog platform, including Artifactory pods and X ray pods. Developers and administrators need to be aware of at least two different CLIs, syntax and technology in order to make the platform in its completeness.
Not to mention AWS’ and JFrog’s efforts are always to make the installation of the platform as seamless as possible as this is an open ended infrastructure. This specific challenge is greatly solved by using Amazon cloud development kit, as I said commonly referred as AWS cDk. Before we go on further, a quick introduction on what is AWS cDk. CDk is an open source software development framework to define your cloud application resources using familiar programming languages such as go Lang, Java, TypeScript, and Python.
Provisioning cloud applications can be challenging process as we saw earlier, as it requires you to perform manual actions, write custom scripts, maintain templates or even learn domain specific languages. AWS cDk uses the familiarity and expressive power of programming languages for modeling your applications.
It also provides you with a high level component called constructs that pre configures cloud resources with proven defaults, so that you can focus on building cloud applications without the need to be an expert in it.
AWS cDk provisions your resources in a safe repeatable manner through AWS cloud formation and uses the same cloud formation engine in order provision the resources. It also enables you to compose and share your own custom constructs that incorporate your organization’s requirement, helping you to start new projects faster.
The JFrog cDk construct behind the scene also uses cDk in its library to install the platform using Helm as a package manager for Kubernetes with smart defaults and optional overrides. With that, I’m happy to share the developer preview of the new JFrog cDk L3 construct which is an open ended abstraction of the entire setup that we just saw. And that is something that we develop in partnership with JFrog. This cDk construct will empower you to quickly install the entire JFrog platform on Amazon EKS with the recommended architecture that we just saw.
We strongly encourage you to try out this L3 construct and let us know any feedback so that we can further iterate on it for improvement.
You can download this effective immediately from NPM and pipey repositories. And the constructs are available with the package name of JFrog-cDk-constructs.
Now let’s talk about DevSecOps. But before we go further, let’s first define what exactly is DevSecOps. DevSecOps is the combination of cultural philosophies, practices and tools that exploits the advances made in IT automation to achieve a state of production immutability, frequent delivery of business value, and automated enforcement of security policy.
DevSecOps is achieved by integrating and automating the enforcement of preventive, detective, and responsive security controls into a pipeline.
Working backwards from achieving automation, there are three different components of DevSecOps that we target, which is security of the pipeline, security in the pipeline, and finally, enforcement of the pipeline.
Let’s dive deep on each of these. First step security of the pipeline. The first step of achieving security of pipeline is essentially treating the pipeline and the related infrastructure as workload and treat it with the same regard like any other piece of application code. We provision the entire JFrog platform with AWS cDk, and any deviation to aid must be code reviewed as well. Moreover, JFrog platform or JFrog pipelines or even for that matter, AWS pipelines can be saved as YAML dsls and it is strongly recommended that each team manages the pipeline as a code. Next up, have a monthly review of the permissions attached to the JFrog pipelines and rotate the keys on a regular basis. Grant access to AWS resources based on the lease principle… privileged principle to be very specific.
Use role based access control to identify who can access the pipelines because that becomes a key element in order to deliver the software products. Configure access and system logging and metrics in analytics services like Amazon, Elasticsearch, cloudwatch, Amazon managed grafana, or Prometheus using flow in the plugin. And lastly, practice the operations required to patch and update not only the pipelines, but also the entire platform. Next up, let’s talk about security in the pipeline and how to achieve it. Security in the pipeline needs to be considered throughout the lifecycle of the application. Let it be in a form of containers or other binaries.
It starts with static code analysis at rest, in the version control software like AWS code commit. Now how would you actually go about integrating or doing the code analysis when your code is stored at rest in services like AWS code commit? The answer to that is use a service like Amazon code guru, which is a developer tool that not only provides intelligent recommendations to improve code quality and identify an application’s most expensive line of code, but it also uses machine learning algorithm and automated reasoning to identify critical issues, security vulnerabilities and hard to find debug, or hard to find bugs during application development, and provides recommendations to improve the code quality. Next up, during the build process, and let’s consider container application build.
A typical life cycle of containerized application involves doing an image build. However, when pushed, trigger the image vulnerability scanning using X ray and fail the build if vulnerability is found, so that the pipelines will not be able to further deploy to a target environment. If there is a vulnerability found, the build is marked as failed, and the pipeline execution stops. Lastly, let’s talk about enforcement of the pipeline. First and foremost, have a multi account strategy to separate out the test and prod workloads with the shared services platform.
JFrog platform can deploy to these tests and prod workloads by assuming the necessary roles from the respective accounts.
Go back. Secondly, as principal enforce pipelines to do the deployments and ensure there are no more humans involved in the deployment process, other than clicking a button in your pipeline only if required. It will be hard to do this enforcement to begin with, but start by identifying why would a human need to access to production account inside the workload and then use tooling to orchestrate the specific requirement and address that specific requirement.
My common requirement is I need to view the production logs.
You can eliminate by identifying what logs are required, where they are sourced from and use tools like flowindy or cloud watch agents, or AWS open telemeter distribution to output the telemetry data to an external aggregator like cloud watch. Alright, let’s do a quick demo. Alright, so we start the sample application by creating a cDk application and declare the L3 construct that we created, which is JFrog-cDk-constructs as a dependency.
Once we declare the dependency, you just need to create new instance of the class and fill in the properties for the cluster, the nodes and IDAs, as we’ll see in this case, and specifying the properties of Kubernetes version to be used, the endpoints to be accessible, that’s a Kubernetes cluster specific endpoints.
What is the kind of nodes in this case, I’m using the bottlerocket OS nodes and the specifics of RDS database. And I want the Postgres equal version of 12.5 and that database name and the username that I should use. Once I do that next step is to simply compile the TypeScript to the JavaScript. And once that is done, use cDk deploy command to deploy to the specific AWS account. And you’ll notice that once cDk will tell you the list of changes that it is trying to deploy and it will start creating a new stack into the AWS account that you have specified.
If you have to see on the AWS console how it looks like, simply log into the console, and you can see the cloud formation stack. And you’ll see in a few minutes, a new stack will come up, which will have the entire stack deployed end to end, which includes the EKS cluster, the ideas databases, secrets, load balancers, and pretty much everything end to end.
Alright, so you’ll see that it’s getting deployed. Take a quick pause and resume once the deployment is complete.
All right, so once the stack deployment gets over, you can see the enter output in the console itself, something like this. And if you want to log into the Kubernetes cluster, you can run the update, cube CT update cube config command in order to, to connect with the cluster that is created. And at the same time, if you have to see the output in the cloud formation console, it will look something like this, the stack is in create complete status. So once the stack it gets created, once the platform gets deployed onto EKS cluster, you can open the URL that you have created, and then create the pipelines.
Something like this, the entire pipeline, DSL can be saved as a code in your version control so that you can you can collaboratively work on that and also as a good DevSecOps practice, it’s probably best to save the pipelines as a code as well.
The steps in the pipeline code looks something like this, for a containerized application as an example, you do the Docker build, you do the Docker push to the repository, which is an artifact itself and this triggers the X ray scan and once the X ray scan is complete, it is promoted to the repository, which is then used to deploy to the EKS cluster and test and then to prod. So, in a nutshell, this is how the entire application lifecycle using DevSecOps looks like. And as you see the Docker promotion step here will not proceed further unless and until the X ray scan which does the vulnerability scanning is complete, because the policy is for x rays are configured like that.
All right, so switching back, that was all I had for the demo. Quick next step that you want to try it out is give the AWS cDk construct for JFrog a try.
It is available at JFrog-cDk-constructspackage in NPM and pipey repositories. And also, we have created a hands on lab with JFrog.AWSworkshop.io.
Give the workshop a try so that you can try out the end experience on your own and reach out to us if you have any questions.
We’re happy to hear the feedback at any point of time. With that, thank you so much for listening to this session.
I hope you find it useful. And again, I’m available here to answer any questions that you may have around the sessions or anything in general.
Happy SwampUP and looking forward for your feedback.
Thank you.