Cloud Engineering – The Final Frontier in Application Delivery

Paul Stack
Staff Software Engineer

DevOps has changed how teams work together to ensure value is delivered faster.

As teams become more mature, infrastructure and security practices are embedded into the delivery of these systems.

In this talk, Paul will talk about the Cloud Engineering team and how application delivery is driven by infrastructure as software. The practice of infrastructure as software allows infrastructure testing and compliance to be driven as CI / CD pipelines to ensure that not only can we deliver customer value faster but in a more reliable manner.

Video transcript

Hi everyone, my name is Paul stack. I’m here to talk to you about why I believe cloud engineering is the final frontier in application delivery. I’m a staff software engineer at Pulumi and I’m delighted to be speaking at JFrog’s conference this year. So we’re in the evolution of the cloud right now, and not only are we finding people moving to the cloud, but we’re also finding the different people are moving in different variants in the cloud. And different people are moving at a different speed.

 Unfortunately, the COVID events in the past the year have accelerated the movement to the cloud. So we’re starting to see a lot more activity and a lot more need for tooling and for understanding about how things work in the cloud. So a quick sort of guide as to the different variants of how people are actually moving here. And at Pulumi, we’ve identified these as V1, V2 and V3.

 V1 being mostly N-tier architectures that are picked up from a physical data center and moved to the cloud. If you think of it in terms of probably the company’s first foray into the cloud and they’re not really sure how to get started potentially, or potentially taking a risk in order to get there. It’s very much private based, they have a couple of application servers, a couple of web servers that they’re probably managing them by a conflict management tool, or even by hand. And this is very much an experiment within the company themselves. Then you’ve got people in what we classify as the V2 ecosystem evolution, and that evolution is much more people starting to be a wee bit more experimental. So there could potentially be taken a hybrid between virtual machines or compute, and containers. And they’re starting to mix in possible SaaS products or platform as a service products, you know, maybe even starting to talk about tools like data dog, or JFrog, or New Relic or systems like that, it’s a little bit more dynamic, it’s more experimental in the terms that different teams can actually go in a different variant on it and it sort of gives people an understanding of what they can potentially do in the cloud. And then we have people who are very much in, you know, in what we classify as the modern cloud transition, you’ve probably heard terms like cloud native, Soa, microservices, all different sorts of terminology that people use here, it’s a very dynamic infrastructure, it’s got a lot of hyper connected services, it’s most likely using something like a container orchestration system, possibly Kubernetes, hashey, Corp, Nomad, any of these types of tools, you could be taking advantage of some functions, some lambda, you know, some analytics and some machine learning. And it’s starting to bring even multi cloud into the mix. And these types of teams are… they’re definitely not going to be able to manage things by hand, and they have yet to understand where different pieces come together and how the different pieces actually talk to each other.

 Now, of course, that means that there are different ways of managing the cloud. So we have the web console, which is clicking around in a portal which is easy to provision a few services, but it’s really hard to scale, and replicate.

 We’re humans, we make a lot of mistakes, we cut corners and because of that we’re not designed for these types of jobs. So teams then have realized that maybe the console is probably not the most effective for them. So they’ve started taking advantage of CLA tooling, scripts, templates, they can start to, like improve the automation of the workflows, what they’re actually doing but that still brings in some reusability problems. And updates are hard, you know, we’re looking at areas like cloud formation, Azure Resource Manager, cloud Deployment Manager for Google, you know, areas that were really sort of permanent in driving people towards infrastructure as code. And then we have people who are a little more mature again, and they’ve started to realize that actually, building their infrastructure in the cloud is actually very much like writing and building the software that runs in it. So what they’ve actually done is they’ve adopted code, and what I mean code, I actually mean that they’re starting to take advantage of the best practices of the code and ecosystem in order to actually build their infrastructure.

 Now, DevOps has transformed everything that we’ve done right now in this part of the ecosystem.

 It has transformed the communication patterns, it has tried forms of work patterns and I believe that it is actually starting to even transform us a little further. It used to be that the operations teams were the gatekeepers to everything that went into production. Now, they were the people who thought about the security, they were the people who thought about the performance, they were the people that thought about the operability of the system and there was like we’ve seen before in talks, things like where developers create packages and throw it across the wall of confusion to the operations team and it actually created some contention, DevOps broke down those barriers and DevOps made this conversation happen between all these different teams, then we have DevOps DevSecOps, we’ve had, you know, a number of different terminologies for it, but ultimately a Pulumi, we’re trying to start creating an umbrella for this term, for all these different teams working together and we’re actually calling a cloud engineering. And what we’re actually starting to see is that the operations teams, or the SRE teams, and the infrastructure teams, or the platform teams, whatever you call them within your organization, are actually the enabler of application delivery. So not only do they give the application development teams the guardrails, the policies, the infrastructure in order to deploy their systems into, but they actually work very closely with the security teams, they work and communicate between the application teams and the security teams and it creates this real understanding that they enable the delivery of systems. And that started to become more par of mine as we start to strive towards moving faster and actually delivering systems better for our customers.

Now, just before we go any further, I think it’s very important to get a little terminology across here to make sure that everyone understands what I’m talking about. So I mentioned that people can manage their infrastructure via code. So what is infrastructure as code? And it’s a way of eliminating manual error prone changes.

 It’s a way of bringing best practices in to your infrastructure management and it’s also a way of being able to gain visibility of your change s through not only code reviews, but also previews. So you can understand what the apply or the deployment is going to look like because the tool will be able to tell us what it’s actually trying to do.

 Now, why should we care? Or why do we care about infrastructure as code? Or if you’re not doing it today, why should I start doing it? One, it’s automated and repeatable. So it’s not clicking around in a portal, it’s no trying to second guess, to try and skip steps to make things faster. It is faster in order to actually get your applications because things can be done in parallel.

 The tooling out there actually understands the order in which operations need to happen.

 Make the changes that are required in order to actually sit side by side from each other. And lastly, and most importantly, as I said, before, you get the preview. So that’s much safer, much more predictable changes than what we have by clicking around in the console, or actually just running bash scripts or CI tools. At Pulumi, we’re starting to say that we’re driving this a little further again, not only are we infrastructure as code, but we’re modern infrastructure as code, we’re taking the infrastructure is code ecosystem, and we’re driving it with adoption of real programming languages, which means that you can create, share and reuse your abstractions, to hide complexity from those who don’t need to know the complexity. And you can start to use and integrate with your favorite tooling, your IDs, your testing tools, but of course, still stay within the guardrails of your CI life, your Dev and Ops familiarity, your integration to your CI\CD workflows, and of course, importantly, the audit trail of all the changes that are going on. So we’re starting to change how the creation of your of your infrastructure happens, not just the deployment itself.

Now, of course, as we’re starting to bring software development practices to infrastructure management, we’re starting to remove these manual changes, and we’re enforcing best practices about abstractions and… and being able to create reusable functions, we can then start to do things like introduce semantic versioning, code reviews, even testing. And maybe we’re on the point that infrastructure is not just code, but infrastructure is actually now software because there is that relationship and that understanding in the most mature teams, that without good infrastructure, your applications cannot run and without applications, your infrastructure is just costing you money in the cloud. So there is a real direct relationship between the two concepts. And bringing these two concepts together starts to become very important. So Pulumi is a CLA based tool, and it works With your cloud, in your language and with your workflow. So today in 2021, we support over 55 providers.

 We have Amazon, we have Azure, we have Google Cloud, we have Kubernetes as our main providers. We also support tools like fastly, economics, metal, digitalocean, OpenStack, Ivan, Docker, Rancher, we of course have integrations for NOJS, for TypeScript, for go, for Python on any of the dotnet core languages, which means that you can start to bring in your favorite IDs, like VS code, or JetBrains IntelliJ. And of course, the CLA tool means that you can start to embed this as part of your CI\CD pipelines with tools like Circle CI or GitHub actions or Spinnaker, or so on and so forth. So it gives you the flexibility to choose your way of creating your infrastructure, but also where to deploy it to. So I feel that we’re on the journey of infrastructures code right now. And I think that a lot of people are just at the beginning of that journey, or are just starting to accelerate that journey. So before or up until this point in time, we had a declarative syntax, okay, where you declare the end state of your resources, and you kind of have a consistent object model for your resources without needing to know any of the underlying cloud API types, or upload API operations. And this is a sample in Pulumi, this is in TypeScript, where you would declare your infrastructure in TypeScript, but it is extremely declarative still. So what we do firstly is we import the Pulumi AWS package, we declare a couple of constants, we actually declare a security group that has two Ingress ports, or two Ingress rules, one for Port 22, and one for Port 80. And then we actually declare an instance, the instance refers to the size constant, it also refers to the AMI constant and it refers to the security group ID that was actually referenced above. So Pulumi will understand how to deploy this application and it will understand that it needs to create the security group before it creates the EC2 instance, because there is an implicit relationship between these two pieces of information. And then lastly, of course, we can export some information.

 Then we started to be able to introduce conditionals, and loops. So at Pulumi it becomes a little easier, because we’re able to use the programming languages to actually be able to do this. So you can see that if there is a variable public subnet ciders, and that variable is not equal to mil, then we can take that variable, we can split it, and we can run a for each using split and mop and then for each item in that array, we can actually create a new subnet. So you can see that it’s become a little more complex, you’re able to take advantage of a little more functionality of the programming language. And of course, that allows you to have a bit more reusability about what’s going forward, then you can start to introduce multi provider workflows, which allows you to mix your Kubernetes and your cloud and your server lists and your Docker containers all together in the same workflow. So in this case, we have an S3 bucket, which is actually being used to store our NGINX configuration. And then after that, we can actually use that bucket as a location to download the NGINX configuration from as part of our Kubernetes deployment. So Pulumi will understand that the S3 bucket has to be created first or has to exist first before it is even able to be used in the Kubernetes deployment. And notice that the Kubernetes deployment object actually follows quite closely with the API spec for Kubernetes. So we have the spec, we have selectors, we have replicas, we have templates and templates also have metadata and spec, and so on and so forth. So you can really start to declare and understand what’s happening, but within the confines of plants a validation of what’s going on. And then the teams have started to understand that because it’s code, we can ingest packages and we can ingest libraries that exist in ecosystem today, in order to be able to create more sophisticated deployments and applications. So what would usually happen for a canary releases that you would run your infrastructure as code tool as an apply, you would then go off and check your application metrics to make sure everything was working as expected and if everything is working as expected, you can then continue the rest of the deployment.

 With Pulumi, you can actually say, let’s create the first set of deployments. So let’s create three replicas. Then let’s ingest Kubernetes… or excuse me, the Prometheus SDK.

 Let’s run some queries against the Prometheus SDK in a function in this case wrapped away called check app metrics. And once we’re happy that everything works as expected for the pre determined amount of time and we understand that our system is… is working as expected, then we continue the rest of the deployment to the production environment. And this gives us the ability that you can encode everything within the same flow, within the same deployment, making sure that you actually have end to end visibility of what’s going on and of course, you can then start to work out where the deployment has gone wrong with correct logging and understanding.

 We’ve been able to take that concept even further.

 Not only do we want to be able to like test that our applications are deployed correctly, but we also want to be able to test that our code is written correctly. And it’s not just about testing if the cloud has created the resources that we want, we actually want to be able to test the infrastructure or the actual infrastructure code itself. So if I wanted to write some code to test an AWS EKS deployment, what would actually happen in the ecosystem today is that I would have to write the Pulumi code or whatever code that you actually create your infrastructure in, I would then have to deploy it and then I’d have to actually run smoke test against the deployed environment.

 It takes about 20 minutes. And of course, that’s quite a slow amount of time as a development, understanding, and before you have to fix anything, but of course, we can mark these requests and responses to the cloud.

 We know you give Amazon or Azure or Google a well formed request, it will do what it’s supposed to do and it will return a well known response. So let’s mark that away and let’s actually test the internals of our infrastructure code for real understanding and you can see that testing of an AWS instance actually takes 17 milliseconds rather than the two minutes that it was actually take to spin up in Amazon itself. And as we start to then have testing, and we have advanced workflows, and we start to have a lot of logic of what’s going on in there, we want to be able to create and share reusable components. And this is extremely important as we start to move towards self service deployment, self service infrastructure within your organization.

 You know, we’ve heard for a long time about the mythical platform about, you know, a platform that people can actually sell service across multicloud, or not even care about the internals of what’s going on, they just say, give me a Kubernetes cluster, and this is where we’re going today.

 Okay, in this example, you will see that we wrap away all of the complexity to create a Jenkins cluster. So you as an operations team, or an operations person would create all of the logic inside the Jenkins cluster package, you would then package it up, put it on pipey, or NPM, or new Git or using a go module and the teams that actually need to use it can just download it or actually declare it in this way and they don’t need to understand what happens inside. So as an application, or excuse me, as an infrastructure developer, you can encode the best security practices for your infrastructure and for your company, and your application, developers will have to use those guardrails that you actually have to see.

 I kind of see these as these are steps of… a sign of engineering maturity.

 As you go through and you start to create all these different things. Now you don’t have to go through step by step by step, you can skip steps, you can come back the steps, it can be in different order. But understanding that these steps all exist is a real sign of engineering maturity. But one of the things that even though we are mature engineers and something that we’re not very good at today is running secret management. And it’s really important to tell you here that whether you like it or not, with Pulumi, we will create a secret provider on your behalf when we start to allow you to create infrastructure. So it allows you to configure your secrets that are not only stored in… using passphrases but of course, they can integrate with Amazon KMS, Azure Key Vault GCP kMS and hashey Corp Vault. And we try and make it that you have to opt out of secret management, rather than opt into secret management.

 I think that actually distinguishes us why we’re trying to be that mature tool and driving forward in this space. So for us, we try to see that Pulumi powers or cloud engineers or cloud engineers in the ecosystem to deliver quickly. So creating reusable infrastructure and integrating a CI\CD, we can allow you to deploy confidently use an infrastructure testing, operate in security using organization policies, and of course scale easy, which we think is the easiest way to Kubernetes and serverless these days. So let’s actually have a look at Pulumi in action.

 I’m not going to show you the CLA today, what I’m actually going to show you is a small sample of code that I have. So if I am grab cube comm [sentence was inaudible], you’ll actually see that I have a cube configuration right here and if I cube CTL, get nodes, you’ll see that I have pre-created a Kubernetes cluster just before this talk. And lastly, if I cube control, get namespaces…

 If I can spell it right, I apologize. You’ll see that I just have the basic namespaces and nothing else within the system. So one of the things that Pulumi allows you to do, I said about creating these reusable and re-shareable packages, and one of my colleagues, Lee Briggs has actually created an example of this, which is a tool called ploy. And this is my favorite thing to show people in this area because what ploy actually allows you to do is hide away all the complexity with a simple CLA tool. And ploy will actually firstly it will build and interact with Docker on your local machine, it will then create an ECR repository and it will upload that Docker container to an ECR repository, it will create a Kubernetes namespace, it will create a Kubernetes deployment, and lastly, it will create a Kubernetes service.

 Now, of course, as an application developer, I don’t know a lot about Kubernetes but what I do know how to do is run simple tools in order to deploy and test that my code works as expected. So if I run the command ploy up…

 Well, firstly, if I show you what ploy is, ploy is a CLA tool, and you can see that it’s up, get destroy and help. And if I just say ploy up, then what ploy will do is it will use in my local machine, a Docker file that is available. So if you think of it as my application that I’m actually going to do, and you’ll actually see that it’s going to firstly, create the Docker image.

 It’s then going to create the ECR repository, it’s creating the Kubernetes namespace, it’s creating the Kubernetes service that goes with that namespace, and then it will be able to actually deploy that image that I have created so that we can actually see what goes on when it’s deployed. And it’ll just… it takes a boat 100 seconds in order to actually make this deployment. And it’s actually quite fast in order to do and what I can do in the interim is I can quickly show you all the logic that is actually encoded within itself. So if I go to Pulumi, and I say my Pulumi program. And what you can actually do is you can say look a new ECR repository, and it understands how to get the credentials for that ECR repository, it’s going to create a new Docker image, the Docker images of that specific name, and it will upload that image to that specific registry that it created, then it will create a new Kubernetes namespace, it will create a new Kubernetes deployment and it’s using the cube config on my local machine, you don’t need to pass it… of course, you can pass it into the program, if you want, if you want to control where developers deploy into. But you can see it’s creating a deployment and the deployment is actually based on the image that I have created before and then lastly, it actually puts this behind a load balancer so that you actually get a service that comes at the back of it.

 What actually is available at the end is I get this specific URL back. And of course, this is my URL. So if I take this, I push this in, you can see it’s an NGINX container. And you can see the NGINX container is actually refreshing and it’s changing the time every time it refreshes as part of that. And lastly, we can actually say ploy get.

 Ploy will show us all of the applications that are in the environment that goes in. So we can see it’s called terminally star frog and if I cube CTL, get namespaces I can see that there is a namespace called terminally star frog. And if I cube CTL, get pods, and the namespace will be that terminally star frog you will see.

 You will see that it has three replicas of my application that are load balanced and hidden away from, it is… it’s put behind a load balancer for simplicity. So we’re allowing people to create these best practices, this ability in order to actually do that.

 One of the other things that we can do here, based on this is one of our teams… one of our developers also created the ability to do a similar thing, but hidden behind an electron app and this will just take a few seconds to start. But again, it’s hiding complexity away by creating a tool that he was able to give the marketing team and the marketing team could then point this at a local website and actually deploy this local website into the cloud in an S3 bucket and it would just give them a URL back that they could test some stuff. So I could actually say, my SwampUP demo and the stack here will be production. And the path to my URL can be whatever it actually needs to be within my machine. And by doing that, simply, it would then allow you to create the preview, update and destroy it, and so on and so forth. So this gives you a lot of flexibility of the types of things that you can actually do. And one of the last things I can show you here, because I’m running out of time, is that my colleague Kemal, actually, they created a very simple UI, that they were able to create, like a self service platform, and they called it Pulumipus Self Service Platform, where you can deploy static websites, virtual machines, or you can even do databases or VPCs and this has started to encode that logic, hide it away from the teams that don’t need to care about it, as I talked about before, and it was very simple and very easy to actually create.

 Everything I’ve shown here today is all in our open source repositories that you can go and have a look at and you can try. So you can hit github.com/pulumi/automationAPlexamples, you can see all of the examples that are in here are on that.

 One of the coolest examples in this specific ecosystem that we have is there is an example of how to use Pulumi as part of Jupiter notebooks, because it’s code, and integrates with code, and it allows you the flexibility that you can start to do that and this is exactly the types of system that we believe we want to give people the way to create infrastructure, in a way they know how, and in a way that they actually know what’s going on. So let’s give them the ability to do that. So for us, it’s a case of we’re starting to build cloud engineering for everyone. And we very much believe that this is the next evolution of the deployment of systems that we give people, the tooling in order to allow them to create it themselves, create these layers of abstraction create these shareable packages and we’re very excited about the Pulumi ecosystem, where people create and share these different things. So I want to say thank you so much, I want you to enjoy the rest of the conference.

 My contact details are Paul@Pulumi.com, or @stack72 on Twitter, if you’re interested in speaking about this any further, and I will always be glad to talk more on this subject.

 Thank you so much.

Release Fast Or Die