Today’s cloud environments present novel challenges in scaling and security.
Hear how Microsoft’s open source contributions in the Kubernetes ecosystem inform Azure products, enabling customers to adopt strategic new technologies with confidence.
Hi, I’m Steve Lasker. I’m a PM architect in Azure, and I work on our container registries such as ACR and MCR, as well as a bunch of these open source projects that we’ll be talking today about, related to securing the supply chain.
I’m Bridget Kromhout, I am a PM for Upstream open source at Azure, and I am excited to learn more about supply chain security.
So one of the things that we wanted to focus on is, how do you secure that supply chain from the points of where it’s built and making sure that when you’re deploying it, it’s actually signed by entities that you trust?
Now, this one is about securing public content that I might be consuming, so then a big focus of what we’d been doing here, it’s not a cloud-specific solution, but solutions that are open source, that we can actually have Microsoft content that might run on other clouds or on-prem or open source content that I could consume into various clouds.
So the technologies that we’re using must be open source because they need to be cloud agnostic, vendor agnostic so that anybody could use them.
And I think to your point, Steve, a lot of customers are building solutions for themselves that are consuming content for many disparate locations, and they need a vendor agnostic, open source solution to meet their security or compliance, or just implementation needs.
Exactly. The idea is that people don’t build everything themselves anymore, it’s the way we’re getting productive, is we’re using all these additional projects that are out there, so I want to be able to securely bring them into my environment and then continue to build, enhance them and deploy them and verify them.
One of the things, if you think about how much content that’s out there … sure, we have package managers for things that I’ve built, but things that I’m releasing and deploying, registries have served that main purpose.
Every cloud provider has one, there’re many projects where you can run on-prem, in air-gapped environments, VNET environments, IoT environments, and so forth.
So that’s been a piece that we’ve been leaning into, to leverage those container registries that people have configured, they’ve managed, they’ve got private networks set up, on the matter of [inaudible 00:02:43] replication, so we feel that core infrastructure is key to leverage, because we don’t want customers to have to run yet another service, we want them to use the services they’ve already have.
I think it’s probably an interesting point too, to talk about this in the context of a Kubernetes ecosystem because I think we get a lot of messages out there about; you should have SBOM, you should have… Verify your software supply chain, you should verify all sorts of things, but how do you glue that to your Kubernetes exactly? And today, where we’re going to talk about exactly how that works.
And not only how that works, but how people are securing it because it’s not just that, “Hey, I can pull something from a registry and so forth.”
Is it secured in the environment that people are trying to lock down? So let’s dig in a little bit more, so the ideas you check into an airport and in case, it’s the transportation security authority, not a timestamp authority, but you’re checking into the airport and there’s somebody there that says great, you hand them a [kube 00:03:52] deploy file and say, “I want to be deployed on that plane, that Kubernetes cluster…”
And that agent is sitting there, “Great, you give me this Pod spec, but who are you, do I trust who you are?”
Now I can give them a signature, a document that says, “This is who I am.” But the document has to be something more than a note from my family, right? It has to be some document that can be trusted. That agent is configured with a policy that says which are the documents, the entities, that they are trusting to allow it.
So great, I give that TSA agent, that document, one of the many ones that they allow, and I might do a digital form right. We sometimes have them on our phone.
In either case what’ll happen is after that TSA agent has analyzed the policy that they have for how you’re allowed into the zone, they stamp it with another signature, right?
We get that little blue dot or the green do or digitally it’s done, so that now when I do enter the airport, and guess what, I’m going through the scanner, right? It’s just like our software, it should be put through a scanner, even though it was verified.
So that entity in this case, the container, we’re going to introduce that, but the person is now allowed into the airport, has been scanned and they still have that pod spec.
So now they’re in the staging area, right? The staging area of the airport, I say, “Hey, great it’s time for me to get on the plane.”
And now there’s another admission controller that’s saying, “Hold on, you’re not just getting on the plane, I need you to prove who you are.”
And in this case, agent 44 doesn’t care about your driver’s license or passport, or any other document that you gave, all they care about is; Were you allowed into this environment? Do you have of our TSA agent stamps on it? And if you do great, then we will let you on, and of course, agent 44 is going to give it another stamp that says, you have been promoted.
And now you can board your Kubernetes plane. The analogy here is each one of those promotion stages was isolated. When I went from the public into the airport, I wasn’t going back out of the airport security to get some document to prove it, everything was traveling with me. And that’s what we’re seeing customers doing in their environments as well. They’re trying to lock down their environments into VNETS and any hole in the VNET is… The analogy was it’s a hole in the submarine. You only need one, doesn’t matter if you have five or 10, if there’s just one hole left in the submarine then it’s not secure.
So we want to make sure that you literally shut down all public ingress and egress, and those signatures, SBOMS, whatever, it can be validated holistically inside of that environment, with no external connectivity.
When you say independent, I think, well doesn’t some data have to pass between, so how do you make sure that handoff is what’s secure?
Yeah, so that’s what’s interesting, so the content that’s gets promoted, that’s part of what we want to do with a registry, and what we’re enabling with the RS artifacts work, is that you can actually attach everything together and it can travel as a unit.
Now just because it’s a traveling unit doesn’t necessarily mean that it didn’t get manipulated. So each one of those things, whether it be the image, the SBOM, the scan results, each one of those things is signed as well. So you can verify each and every one of those artifacts as they’re promoted, because we have this concept of location independence, and that is, when I checked into the airport, it didn’t really matter where I came from. I’m Steve Lasker, I’ve got an identity, doesn’t matter whether I’m in Washington or Texas or New York, my identity is independent of my location in each one of those, but I can validate that through signatures and the keys that are associated with those.
You’ve said, SBOM a few times, you’ve dropped some SBOMs here and there through this talk.
I want to make sure that we clarify in the stages of supply chain, exactly what is an SBOM. I know we’re going to go into it in some detail, but can you explain in this context, where does the SBOM fit in here, and what is it exactly?
So if you think of the supply chain, so there’s the creation of content, I’m building those images… I’m building the packages that go into those images, and then I want to deploy them.
An SBOM, a software bill of materials, or systems bill of materials to be fair because the groups want to be able to use SBOMs for hardware devices as well. So it tends to be more systems bill of materials, just like container registries aren’t really containers anymore.
So the idea is… I like to use the analogy of a soup bowl. If you look at a bowl of soup, especially if it’s liquid, doesn’t have chunks of or vegetables, you really don’t know what went in it, unless you’re doing some DNA analysis.
What you really want is; here’s the list of ingredients, the build system that went in, what were the packages that were used? What were the compiler flags? What was the environment that was being configured?
Because, by the time… Especially with native binaries, by the time everything gets assembled, it’s the details of how that got assembled that was really important. Security scanners can look at something after the fact it’s the systems bill of materials of what went into that, because that’s the unique insight that’s important,
But hopefully an artifact would still… Spoiler alert. We’re going to talk about verification of some sort, and hopefully you wouldn’t just believe the recipe, you would actually be able to look at the artifact and see what came out the other end.
Well, it’s a combination of both honestly, a lot of us, we want to be able to do Speed of deployment. If you take the airport analogy, the reason that agent at the gate doesn’t reverify you, is about performance, right? It’s security, how do I get things on the plane really quick? How do I get images deployed on a Kubernetes cluster really quick? We don’t want to re-scan the entire image every time it gets deployed or updated.
We want to do some checks ins that say, “Hey, did I scan that within whatever the reasonable policy was?”
Because, what I know on Monday is different from what I know on Wednesday, but an SBOM is still the static analysis of what was built. Now later on… I use my food example, peanuts weren’t in the SBOM in the soup, but it turns out that somebody discovered that the equipment that was used, was building some peanut butter the day before, and they didn’t do a good cleaning of it.
So on Monday they thought everything was fine, and on Wednesday, it turns out that build system was done on a particular build node, and it was making peanut butter the day before, so it has been tainted, so now there is a vulnerability that was found now… Is that example.
So SBOMs are critical because they tell you what goes in, and the security scan results are the things that you’re constantly updating, that new information that’s discovered, so that’s the creation of content, so all of that building and so forth.
What we think about as Notary v2 is, how do I distribute that content? Because I might build in a private environment and I might put it on a public registry. I might pull it from that public registry into my private environment, and then I want to promote it again and again and again, so that I can get it across [dev staging in prod 00:00:12:04].
We want to make sure that everything that you need to be verified can travel with it, and it can be signed, so that’s how we think of Notary v2, it’s really securing the distribution of that content from the time it’s built, to the time it’s consumed as it’s promoted.
Now, of course, with Docker files, you have the [from statement 00:12:25], so there’s a little bit of overlap of the build, you want to be able to validate that your image that you’re building from, in your from statement is also from one you trust.
So I think people might have a question right here when they say, how do I know I have a notary, and is my docker file good enough? How do I get the stuff you’re talking about here?
Yeah, so the idea is that you are using the notary client to validate your signatures. We’re hoping that there’s not 15 ways to sign things in the industry, right? That’s why we’re really trying to get to a common signature format. So whether I’m pulling something from Docker Hub or NVIDIA or Oracle, or MCR, all these public registries and software distributors., That I can use a client that says, “Yeah, that is signed with a key that I have decided that I trust.”
Because that’s what we’re enabling with notary, is to be the tool, one of the tools that you can use to validate the signature that was put on it, is from an entity that you trust, because that’s the really important part, right? It’s not that it’s signed, because bad code is going to sign stuff as well, it’s that it’s signed from the entities that you trust, and you might decide that we’re going to sign all of the Microsoft software with root keys that are from Microsoft, but there’ll be probably, separate keys for things that are officially supported products, versus samples let’s say. So you can say, |I trust the official products, but I don’t want samples that might get produced, you want to know that that sample is produced and comes from Microsoft, when you want to delegate the difference there, as an example, that even the [public consume 00:14:10.
And then is that a differentiation people could make in their different environments? In a development environment, I’m fine with pulling in example code, and I want to make sure that does not get green lit from my production environment.
Yeah, exactly, that’s actually what you see right here in this example, right? The TSA agent was trusting a bunch of public things that come in, right? It was all 50 states, it was any number of countries, by the time it got to agent 44, agent 44 says that, “I don’t care what you had up there, you’re not getting through unless you’ve got a TSA agent sticker.”
Have you ever done that by the way? I have tried to get onto the wrong flight, I think it was because my gate got changed and I was not paying attention and I was on a call, and I was trying to scan my way into literally the wrong flight, and they were like, “Beep!”
You don’t belong here, you’re in the right staging zone, but that’s not the right cluster you’re supposed to deployed to.
It’s modified by the way.
Just like, what is happening.
Aren’t you glad you didn’t wind up on the wrong plane, going the wrong direction.
I’m glad agent 44 was on it.
And that is part of this, right? You want to set it up as a policy, because… You’ll do validations in your deployment environment as well, but if your deployment environment is validating what you’ve built, they’ve done a lot of [checksums 00:15:33] in there, you want to say that your cluster has been configured. That this is the high critical security environment.
It must be signed, not just with the ACME key, but the ACME secure… The critical key, and if it comes from Docker Hub or Microsoft? Not good enough, because it didn’t pass the internal checks that you want inside that community.
So I’m glad you brought up all those policies because that’s exactly what I think about, the airport analogy is interesting, but let’s take a look at that something of what we would actually deploy. So we have our Wabbit networks, the small software vendor that puts something out, that nobody’s really heard of. It’s not as big as Microsoft, Adobe or other software companies, and of course, you have your traditional company, that’s just trying to get stuff done.
They want to take some [inaudible 00:16:19] software and deploy it, and they might do some quick policy management and it’s fine, and they say, “Hey, has it got some key?”
And yeah, fine, whatever, and they deploy. There’s a bunch of details there. And part of it is they actually want that in that secure environment, right? This is the submarine thing. They want to make sure that environment’s locked down, there’s no public ingress or egress. It’s the same private environment that they had when they were on-prem, they just want to run it in the cloud. So the reason that we want all those things to travel with the image in this case, is because if they don’t, then that policy management, it wants to reference that systems bill of materials now, well, how does it get in, because it’s outside of the firewall, same thing with scan results, or source, or whatever other things that you decide are important that you want to validate with that image, before you deploy it.
So the way customers get around this, is they have a private registry that they put inside that environment that is available inside the VNet, and then they have a promotion environment that they do this, and of course they’re repeating this across, not just dev staging and production, but different groups inside the company. So what they’re doing is they’re setting up that content, those registries, but they don’t want to pull the same public content to each one of those environments. What they set up is another registry inside of their company, that becomes that shared place that everybody inside the company can get from, so you can think of each one of those boxes as the airport, and you have the staging area.
I’m looking at ACME Rockets and I’m thinking, Wow, when you say air gap, there might not even be air in this gap, if we’re sending this stuff into space, are we worried… Basically, I’m suddenly thinking, Hey, if we’re talking inter planetary expeditions here, there might be latency issues as well.
So it’s not just, worry about making sure you’ve got the image you want to have. I imagine that, especially when we’re talking about IOT and other such use cases, some of it is, we can’t be pulling things constantly, how does this apply?
I love it, I hadn’t really ever thought about that one yet in that way, air gap is truly… There is no air and they are running code on the space station, they are running it in satellites, how do they get it there? It’s not just the cruise liner that’s going across the ocean that’s got expensive satellite connectivity they need to have with them, or the oil platforms… Yeah, there is the true physical air-gapped environments as well.
And that’s always been the point, right? I want to consume public content, but I want it to be within my environment because I need it when I need it. It wasn’t as big a deal for the package managers, because if my build fails… Yeah, that really stinks that I have to go fix it, but there’re humans involved to some extent. When my production environment needs to scale for self-healing, it can’t be wondering if the internet is there or if they’re on the dark side of the moon, right?
So you need to make sure that you bring the content you depend on, as close as possible within a trust boundary that is available for you.
So if I go back to the lots of sources, it’s not just Docker nodes, it’s not just Wabbit networks, it;s NVIDIA, it’s Oracle. There’s lots of, of public registries that are out there.
So you might pull some stuff from Docker Hub because Wabbit networks is too small, so you decide, “I trust the certified content from Docker Hub.”, just as an example.
So Docker Hub will put another signature on it that says, “This is certified content.”
So if you trust the certified content from Docker, you can know that because it’s got a Docker certified signature on it, you don’t care about the Wabbit network signature as an example.
And other content might be there for Spacely Sprockets, that’s not certified or whatever. Now you’re you want to bring that into your environment, and you’re going to stamp it with a signature for ACME Rockets, right?
This is the TSA agents that… “You are allowed in my zone.” Now everything has to have ACME Rocket’s key. And you configure that trust policy, right? This is exactly what you were just asking about. I will say these are the sources I trust, and I will configure the keys from those sources that I trust, and if you’re not in that trust list, you’re not getting through.
Now somebody in the company says, “But I like company X, can you please add that?”, and somebody will decide, yes, and you pass that policy there, them configured, and then they can come in as well.
But those are the boundaries that customers keep on asking us is, how do I control the content that comes in and put some verification on it? And not just once, right? This is every time an update to Debbie and her [inaudible 00:21:29] comes in, you want to run a scan on it. You want to run a security scan on it and maybe even do a validation of functional testing, because every update has some changes. Do those changes break you? Do those changes give you some level of insecurity? So we want to make sure that everything can be checked, especially in this automated world, that everything is updating all the time.
What happens when Cogswell Cogs gets acquired by someone, that we really don’t want to consume their content?
Or we’re not happy with the way they’ve changed their verifications or whatever, and so we don’t want to include them anymore, but we already had them in our trust policy. What does changing your mind or revocation, or this continuous trust you were talking about, look like?
There’s a couple of ways to achieve that, right? So first of all, the stuff already… Let’s take the example, it was acquired, as opposed to; it was a vulnerability that was found six months ago.
In this case it was acquired by evil code. The software that you got before it was acquired by evil code, you’re still running and you don’t want to revoke it and have it fail, because it was fine yesterday. What you’re saying is, “I don’t want to allow anything new coming in.”
And let’s just say that evil code doesn’t change the key, and it’s still signed with the Cogswell Cogs keys. So what in that case you could do is say, “I don’t want any updates from Cogswell Cogs anymore.”
And this trust policy on your ingress, as you’re promoting stuff into your shared internal registry, you no longer allow Conswell Cogs’ updates to come in. But in here, in these environments, you weren’t just checking the Cogswell Cogs key, right? That was checked with the ACME key, because you stamped it and said, “I brought in this content, I scanned it, I now say this content [niches 00:00:23:19] my policy, inside the ACME Rockets environment.”
So everything internal isn’t even looking at the Cogswell key, it’s only looking at the ACME Rockets’ keys. You get that nice clean separation, of when and how you want to trust various entities.
That’ll make the entire software supply chain as it were, a little easier to unpick, if there’s anything that you look at it and say, “We can’t do this particular part anymore.”
I feel like that can seem intimidating to untangle it, this makes it easier.
Yeah, that’s why the promotion policy is so important, because you’re choosing when and how you want your content to be moved in, and it gives you those dials to change that’s as we want to. We originally wanted to just start with signing container images, we realized that that infrastructure for detached signatures was really important and is applicable other things.
So if you think about that TSA agent, when you go to the airport, they don’t let you into the secured zone, you’re standing next to it, “Hey, how you doing, here’s my passport”, right?
If you think about how exploits are done, I have a path of being able to execute code and I have a way to get code onto an environment, usually those two have to combine. If I ask the Kubernetes deployment, for instance, to run an image from bad code, and if the signature is tied into the image, then the image has to go to the node for it to be evaluated, and it would kick it off the node and say, “Oh, you’re evil code, I don’t trust you.”
But that codes already there.
Yeah, possibly, the damage is already done.
It’s too late, right? You’ve got some hooks somewhere, somewhere that, Hey, a push event happens, something happened, exploit is gone. So a very key piece, not just because we wanted to make sure your Helm Charts and your kube deploy files, if you’re using digest or tags, those don’t change just because you’ve signed them.
So that’s another reason we didn’t attach signatures, but it was also because we want to make sure that we could promote the content, but validate independently.
So it turns out, that infrastructure that we use for detached signatures was applicable to other things as well. We weren’t thinking about SBOMs when we started this, just so happens that you can put an SBOM in the registry that also make that an attached reference. You can put a scan result, [n 00:25:50] scan results, and you could put them into the registry.
By the way, you probably want to sign your SBOMs and scan results so you can trust those too. So it turns out that whole graph of detached objects is really valuable, that I can promote this content across environments, and I get a nice life cycle management around this as well. If I delete that Net Monitor image, I don’t need all that other stuff, I want all that stuff to go away. If you’re trying to archive stuff, then archive it. But having an SBOM for something irrelevant, that’s not super interesting, so you want to keep that content together.
You mentioned here, an OCI distribution based registry, you’re talking about capabilities in a registry-
So do you want to clarify exactly what a registry needs to have, in order to make the policy changes you are describing, possible?
So one of the things what we looked at here, was what is that core infrastructure? All customers have, anyway certainly in a cloud or native environment, has a registry already.
How do we leverage that infrastructure, so you don’t have to build yet another storage solution? And we looked at going, all right do I need to build more side car services, other services, or can we just extend the infrastructure we already have?
So that’s the piece that we’ve been focusing with the ORAS artifact for, that you can actually establish that graph in a true native way, we really just didn’t want to hack things on. These are long-term investments. We felt that this was the right design to go off and do, so that we’re not just bolting things on to a very fragile house.
We’ll have fallback support, there’ll be a bunch of things related to that, but we felt like this is the beginning of a long term ecosystem, that we wanted to make sure that the right designs and infrastructure were in place for all of that.
So it is OCI distribution registries, with this ORAS artifact manifest support that gives you that ability to do reference types.
Now we talked about location independence, right? It doesn’t matter where it came from, it’s here, is it signed by an entity I trust? That’s really key to what we’re trying to do.
So we talked about… There are associated with what we call a subject artifact, they’re promoted, you can do multiple signatures, all of that is… This is just that reinforcement of what we just all talked about, right? There are multiple signatures because I want to trust the Docker Hub signature or the ACME signature, that’s the ones I want to trust, doesn’t matter about the other ones.
So when you say signatures are separable, can you talk a little bit about how Notary v2 is protecting against Trojan horse attacks?
Oh yes, so if you think about the way we support Trojan horse mitigation is… Because that signature is not physically coupled, you do not have to download the image, the signature is not a layer on the container image. In fact, the way we’ve done Notary, we don’t know or care what you attach the signature from. You just said, I want to sign another thing in the registry, it could be a Helm chart, could be an OCI image, could be an ISEC module, could be a Wasm.
The whole idea is we want to make sure that you can sign things in a registry that are… There’s a reference to it, when you put a signature into the registry, you say, “What am I pointing to that I’m signing?”
So there’s a pointer, and then the signature itself has the encoding that says, “All right, not only is the pointer, but if I look at the details of that container image, that’s what’s in the signed payload.”
That’s… You can pull the signature, and the references, the reverse reference is what’s super important too, when you’re… Your deployment charts say Net monitor V1, maybe it puts a digest for the Net monitor V1, that’s what your deployment script says.
Your deployment script doesn’t have something special that says, “this is where the signature is.
So the ORAS artifacts work has this concept of a reverse look up, and it says, “what are the things that reference that net model V1 image?
And the notary client will validate the notary signatures so that there is that infrastructure there that says, I want to look up signatures based on the thing I’m deploying, and I can get that back, I can get the SBOMs back, I can get the scan results back, right? You’re just basically asking what is the type of reference that I want back, so that’s that detached signature verification, or detached SBOM validation.
I’m thinking that when people are deploying artifacts to their environment, they want to know that the artifact is what they expect, and they also want to know nothing is getting interfered with, and then I guess what you referenced earlier, the idea of what if we thought something was fine, and then later we find out it’s less fine than it used to be? How do we make sure that we don’t experience great sadness later?
Well, look, this is where no one thing is the perfect solution, right?
Signing something doesn’t guarantee you to find out later on that there was no peanuts in the ingredients, but it was built that they… It didn’t get cleaned very well.
When we consumed the good package that went into our build, we thought everything was fine, the SBOM signature doesn’t change that, in fact, if you look at SolarWinds, it was signed by the company, so they knew there wasn’t a man in the middle attack later on, that help them to quickly identify, to go back, and it was a build system problem.
This is where signatures say it was built by the entity of trust, it was deployed and validated, and then on Monday, when I thought it was fine, Wednesday, I find out about a problem, that’s where the security scanning comes up, right?
The security scanning is going to be continually looking, continually learning about things, and if they find that a particular package was tainted on Wednesday, then all they need to know is what are all the package IDs, including who it was signed by, and now that other security system, the process that you’re running on your production nodes, on your registries, that is continually running says, “oh, I just learned that anything that was built with this package, actually is bad.”
Turns out the stuff that you have in your registry, was built from one of those sources. So Notary doesn’t necessarily solve it, no signing solution solves that. That’s where the security scanners come back in and say they’re providing another level of security around that.
So let’s jump into a demo, so what we’re going to show in this demo is… I’ve got my keys secured in a key vault. It’s the keys that I use for my company, the private keys that I signed my content, it might even be the public keys that I trust, remember, because we don’t want to take public keys from the public. We want to store them privately, to make sure I can validate the policies that I want, I don’t want to be dependent on public sources.
So you are keeping all your keys in the key vault that you’re using today. We’re not replacing them. We want to make sure-
Whatever your key vault of choice is.
Whatever you’re using, we’ll have Azure key vault. There’ll be AWS key vault, there’ll be HashiCorp key vault, whatever the key vaults you’re using, you’ll be fine.
Then we’ll use the notation CLI for signing and we support remote signing, because you don’t want to take your private key that you’ve got locked up in the vault, bring it to your build environment, to sign it and then have some code steal your private key, and now everybody’s signing with your key.
So we want to send the content to be signed in a remote key vault, it’s a standard practice. Once things are signed, we’ll store it in the registry, we’ll store the image in the registry, you can sign the image, you can create the SBOM in your build environment, you’ll sign the SBOM, you might have a scan result, you’re going to sign the scan result and know those results were created by a policy that you trust, and now the registry becomes that central location for all the things that you need.
And that can be whatever registry you’re choosing to use?
So now I want to deploy to my Kubernetes cluster. The Kubernetes cluster doesn’t necessarily get locked down to a registry, it might, that is a good policy to do, but you might want… So you want to be able to say your Kubernetes cluster is no necessarily locked down to a registry, but it’s locked down to things that are signed by the entities you trust.
I don’t know how you got into the staging area of the airport, but you’ve got to have a signature to board that plane. That Kubernetes cluster says, “I don’t care that it was a passport or whatever, unless you have an ACME Rockets key, you’re not getting into the cluster.”
And it might be a specific ACME Rocket’s key, right? Might be the high security environment versus the environment let’s just say.
We implement that with Gatekeeper, good name, right? It’s the gatekeeper you’re not allowed in unless you match some requirements. Now we want to be able to verify Notary’s signatures. It turns out trying to verify things… There’s a couple of things you might want to verify, might be a collection of signatures, might be an SBOM or other things.
So we have the RATIFY project, which plugs into Gatekeeper to do those validations, so now I can set up a policy on my cluster, it may come from one or 10 different registries, as long as it’s signed by the entities you trust, then you can get in, otherwise you’re not.
When you’re verifying, you’re not just verifying your signatures are set up the way you intend, you’re also verifying those public ones?
You’re verifying whatever ones that you want, right? That’s part of that policy. You might… When you’re importing content into your shared registry inside your company, you’re going to give it a list of… Here’s the end signature… The end keys that I trust, might be Wabbit networks, might be Docker Hub, but not Cogswell Cogs, we decide we don’t like them anymore. So we don’t allow those keys anymore, so that’s the verification you would do there through a policy, by the time it gets to your production Kubernetes cluster, the only thing you trust is the ACME prod, it has to be signed with an ACME prod key.
So verify is a way to verify a list of keys that you give it, and then that policy is enforced by Verify and Gatekeeper with that RATIFY plugin.
And on an environment specific basis?
Correct yeah, exactly-
According to the needs.
Right, production A might require certain things, production B might require different things. Dev… You might be a little more loose in what you allow in there, so it’s an environment specific policy that you set up.
So what I’m going to do, just to show we’ve got nothing up our sleeves, we’re basically going to take A Kubernetes instance and create a namespace. We’re going to deploy a public image, happens to come from MCR and I’m going to deploy it into that free zone namespace, right?
So things are just going to work, because we haven’t put any policy on it, and we can see the free zone is running there.
So I see a gatekeeper, I see gatekeeper stuff in there already, is that coming from policy, or where is that coming from?
So this is… What I’ve got here is basically just a Kubernetes instance set up with Gatekeeper, actually haven’t configured it with Notary or RATIFY yet.
This is just that baseline infrastructure, so now I want to sign and verify with the keys that are inside of our company, so what I want to do is, we’re just going to create a policy for a key that I want to create as part of this flow. So I’ve got a set of configurations, this is how you set up policies for a particular certificate in Key Vault.
If you already have a CA issued cert, which a lot of our large customers do, you can use your CA, right? There’re multiple ways of doing this, we just depend on X.509 certs, we believe that’s what customers have told us they need for their production environments.
So now with that key in the Key Vault, whether I created it, or I used the one I’ve already got in my key vault, I want to get the key id out, this is the reference that has it stored in Azure Key Vault.
And then I want to configure the Notation binaries, this is that policy we talked about, I want to tell Notation, add the Wabbit networks key to your list of validations, and we’ll add the cert as well since this is the public and private portions of it [inaudible 00:38:57].
So now Notation, it knows how to sign and verify the Wabbit networks key, but if I look at this, because we configured it with the Azure Key Vault plugin, what you’re seeing is, this is a path to the Azure Key Vault, and it’s implemented through the Azure Key Vault plugin.
We felt it was really super important, even for us, we’re maintainers of Notary, we want to make sure that Azure can rev its plugin without any dependency from the notation maintainers, if some other cloud or vendor wants to build a plugin, they shouldn’t have to come to the Notary project and check in their code, for the initial or any servicing event. So we built a very clear plugin capability so that they can completely be self-service.
That sounds useful as well for people who might be in more classified environments or just development environments, the secret project inside company X, where they don’t necessarily want to be leaking anything that they’re working on to the public internet, and so this is a capability with this plugin, where they can have it be, completely behind closed doors.
We… The Notation project will never know about a bunch of these key vault providers, by design, right? As long as we have the right extensibility, they can plug in, then we shouldn’t have to do anything about it, right? And by we, I mean the project maintainers.
So now I want to be able to configure RATIFY to say, well, what keys should I trust to be able to allow for deployment?
So I’m just going to poke around a little bit here, we’re going to make this experience a little better. What I’m doing is, I’m asking the key vault, in this case, Azure Key Vault, I’m using the Azure CLI and I’m just going to download the public key to a file pen, and then I’m going to export that to an environment variable.
So now I’ve got the public key available to me. So let’s go and create a name space that we will secure. So I’ve got a demo namespace, and I’m going to install RATIFY with that key, to secure that demo namespace.
So now, I don’t have access to that. If I apply the constraint and I’ll look at that the constraint here, if I display that… You’ll notice there is a deny constraint, for anything that does not match the policy that we’ve set up for the demo namespace, for pods in the demo namespace, so that’s how we’ve secured that environment.
So now that I’ve got that policy, let’s see if we can run the same public image.
No, right? We set up a policy, that unless it’s signed by the entities we trust, it can’t be deployed, so the failure is intended, so that’s great-
Fails as intended.
Fails as intended. So now let’s go and build an image, so that we can build, sign, deploy something that we do trust. In this case, I’m just using ACR tasks to do our remote build in Azure, and it streams the content to the client.
So I’m doing the build; Wabbit networks, Azure Services IO, Netmonitor V1… And it’s done.
That’s great, I might do a security scan on it, whatever I decide that is the gate, before I sign it. And then I decide, “It’s good, now let me sign it.”
Now, if I look closely here, what’s happening is I said, Notation sign, I gave it the key name, that was the Wabbit networks IO, right? That’s the entry that we added, but if you remember when we configured Notation, I said that key is off in Key Vault, so that private key is nowhere on this machine.
It said, “Hey, Key Vault, can you please sign this image, with the key you’ve got in there?” So all of that happened in that detail there.
Now, if I try to ask Kube control to run that image, it will just work, because it signed with the key that I trust.
So it’s nice, the demo is probably uneventful because it worked, right? The failure was intended and the work was intended, right?
We enabled registries to understand that graph of content. We wanted to be able to put container images in there, put a signature as a detached reference, we wanted be able to put an SBOM as a detached reference, because it turns out that capability is generally applicable, and we want to be able to sign the detached reference of the SBOM as a detached reference of the image.
And then just like file systems were set up, where the windows and Linux file systems don’t know all the different file types, there’s a core infrastructure there. You can just save things on it, we’re doing the same thing here.
Registries don’t need to know whether it’s a signature, an SBOM or other things, it just stores it, there’s a piece of data that says it’s an artifact type, so you can determine that this is an SBOM versus a container image versus a signature, because that is important.
You want your security scanners to know when they’re scanning a container image, versus the Helm chart versus a Wasm, they’re going to look at them differently, so that’s what that piece of data is about.
And so all of this is set up, enabled through the ORAS artifacts pack, that enables this additional capability.
And you have a link to the artifacts pack here, is that itself an open source project then? What happens if someone thinks, “oh, I would like the artifacts effect to behave in a different way, I’d like the specification to be clarified in a distinct way.”
What does that community look like?
So the ORAS project as a whole, OCI Registry as storage, is a CNCF project, so that all of that is there, we take contributions, you can implement it, you can add… Make PRs, however it makes sense to you.
It is a completely open source project under CNCF. So these are the list of projects that we used to assemble this, we felt it was important to not bundle everything together as one, and make them separable, because they tend to have more general applicability. Notary v2 is all about signing, to put detached signatures into a registry, we needed a way to do that in a scalable long term way. OCI artifacts is how you can store anything in the registry, ORAS artifacts is how you know that there are references, so that you can manage the life cycle of them, and we actually have a reason to play, to manage the content in registries.
Then the ORAS CLI implementation, the ORAS Go library, is how you could implement the ORAS capabilities in your CLI, we wanted to make sure you can… You don’t have to break shell out to do ORAS CLI, in fact, the Notation libraries use the ORAS Go libraries to implement the registry capabilities.
There’s our implementation of distribution that has the reference types, and you can run this standalone, this is what we’re using early on. Now we’ve got Azure, has support for ORAS artifacts, the zot project has support for ORAS artifacts. AWS and Docker are both committed to supporting ORAS artifacts so now you’re starting to see the registries start to implement this rich capability.
And then RATIFY is our newest addition on how we can make sense about validating all these different artifact types with things like; whether it be Notary, REST validations or others.
There is a lot here, we are trying to… At the end result, we want to make sure it’s easy to use. So the multiple projects are aimed at enabling a wide swath of things with the right level of usability, and that’s another great place for feedback, so if you’re finding it too complex, that’s good to know too.