DevOps Automated Governance [swampUP 2020]

John Willis,Senior Director,Red Hat

July 7, 2020

< 1 min read

Achieving centralized visibility and governance around open-source security vulnerabilities and license infringements: https://jfrog.com/webinar/devsecops-b…

This presentation is intended to guide organizations on implementing an automated process for tracking governance throughout the deployment pipeline, by providing a reference architecture. A sample use case is also provided to further enforce these best practices. Ultimately, a DevOps automated governance process can give organizations the assurance that the delivery of their software and services are trusted.

Video Transcript

Hello, It’s John Willis. So we’re gonna cover a topic called automated governance
and it will be an overview.
So I work for Red Hat. About six months ago I
joined up a team called a Global Transformation Office with
Andrew Clay Shafer there, he’s the guy on the left, and then Kevin Behr,
and then that’s me so we’re going towards the right short guy, and then Jay Bloom.
We’re trying to figure out the next
ten years in terms of what sort of
transformation should look like.
We’ve written a number of books, I’ve written a DevOps handbook,
“Beyond the Phoenix Project.” Kevin was the co-author
of the Phoenix Project.
Andrew was one of the authors of Web Operations and
did some work on _ reliability… so, as Andrew likes to say, we wrote some books.
So that’s enough of my intro.
So I wanted to talk about this idea of
automated governance and how I…
how I got involved in this
and it’s really a number of things but I think I’ll start off with
the place where I was able to at least galvanize
this idea into a paper
and then I’ve done some further papers here but…
Gene Kim, part of IT revolution, the author of the Phoenix Project,
invites about 40 of us to Portland every year
and we’ve been doing this for seven years and we work on these
papers that become forum papers, little e-Books,
and I’ve been doing it, I think,
2014 was the first year so I’ve done it every year.
Overall I counted, including this year,
we just finished up about four or five books
that will be coming out this summer,
I think it will be about 30 books over seven years,
and so I’ve been involved. Some of them I’ve shared the projects that I’ve worked on,
many years I was just a floater on many of the books. So,
some years I like to go back and forth between the different projects.
But the reason I’m pointing this out is
that starting back in 2015,
there was a working group
that I worked with indirectly on
this eBook
and all of this is on Creative Commons, you can download any of these from…
from IT Revolution forum papers.
It was called “An Unlikely Union: DevOps and Audit”
so that was at least from the sorta DevOps
IT revolution perspective.
The first conversation about
how audit and what we do with DevOps
should have a tighter conversation.
Segregation of duties, those type of things to
sort-of prove out and we talk about this in the DevOps handbook too.
We talked about how certain ways of patterns, if you could use DevOps to deliver software,
could actually apply to some of the
compliance requirements PS ideas as others.
And then in 2018, this was a
great paper, I didn’t write this but I was
sort-of an advisor and was working around it.
It was called “Dear Auditor” and it was great.
It was really an apology letter to auditors.
Like “Hey, we’re really sorry, you know, we
we should have done this, we should have talked to you about this”
and
it’s not only the apology letter but then it has a whole
checklist of things that we promised to do
and there’s actually… if you Google
“Dear Auditors” actually
a Github project on this and this was really good,
and the thread was about audits and DevOps.
A couple years ago,
I started getting this idea how we can do
a better job
from the pipeline perspective
of audit, and… so last year
in 2019,
actually and usually these forums start in about April
and we usually wind up publishing the work sometime around
the fourth quarter
or the end of the third quarter or fourth quarter. So I got a team together
to really focus in on what we call DevOps automated governance
and its reference architecture
and I’ll explain that in a little more detail.
So, a couple of things
the prior year prior two years that got me really interested in this,
Capital One has been heavily involved in these forum papers for years and
over the years we’ve had these discussions about
pipelines, gating in the pipeline,
and in 2017, Capital One wrote a really interesting paper on how they did,
you know, focusing on DevOps pipelines
and there was a subsection in there called “Creating Better Pipelines”
and what they talked about is
this idea of these gates, they call them 16 gates.
And these were things that allowed
service teams
to sort-of bypass centralized
authority like CABs.
In other words, they can get a sort-of auto-deploy allowance
if they could evidence these things like that it
indeed came from source control,
it had optimal branching strategies,
static analysis, all these things.
I think I saw a presentation last year, they said they are up to like 30 now.
But in conversations about this, it’s great to have these gates.
But, since you have these gates anyway,
couldn’t we turn these gates into evidence? So over
a year or so, we kept having this conversation
and that sorta started me thinking about this and then
around that same time as that article,
Google announced an open source project
called Grafeas and it turns out this was a
project that they had actually been using for internal
auditing governance
and it had a lot of features, and I think a lot of features
that haven’t really been utilized
but one in particular was
an attestation. Meditated attestation source, attestation being evidence.
I started thinking, and actually Kit Merko,
a good friend and a Frog, worked for Frogs,
had approached me about why aren’t you thinking about using
Grafeas and we had a great conversation. So
that all led me up to
sort-of last year,
trying to get a group of people together, saying
“Hey, could we actually create a reference architecture
around this automated governance idea?” you know,
these terms are very overloaded but we had a specific charter
that we wanted to accomplish
in this particular. So we actually
published, this is out, it’s Creative Commons,
it was, as you saw earlier, it was published
last year in September
and actually it was Mike Nygard from Saber.
You might know Mike Nygard from “Release It,”
the inventor of Circuit Breaker pattern, Tapabrata Pal over at Capital One,
Steve Magill, Sam Guckenheimer who has been heavily involved in
most of Microsoft’s infrastructure, myself,
John Rzeszotarski of PNC, Dwayne Holmes who run
large Kubernetes infrastructure at Marriott,
and Courtney Kissler over at Nike.
So we all got together for a couple days and we sort-of tried to hash out,
could we actually
put this to a paper in terms of reference architecture, like
could the end model be a microservice Java, in a container
that gets a go, no-go
into Kubernetes,
using Grafeas and _ which goes along with that.
And so one of my goals coming into this was
changing the sort of language
of how evidence is created for auditors.
You know typically,
auditors come into a site and they spend
somewhere in the neighborhood, 30 days
working with the organization, looking at changes,
and what they really do is the evidence
is actually subjective.
So evidence and attestation,
as far as this conversation, means the same thing.
So, the idea of changing subjective evidence, so subjective attestations
into objective evidence or objective attestations.
So currently people create change records in most large enterprises.
And then it’s a human
discussion about like Sue is going to do these things,
Bob will read these things,
maybe Sally wants Sue to add a couple more lines.
And these are all complex systems, so it is…
It’s usually a human telephone game
trying to scribe the complexity of change.
And then in the order it comes and sees this record
that is this discussion of union, the subjective discussion
as the evidence
and then tries to make sense of this,
and it’s a lot of toil, it’s a
lot of disconnectedness, it’s…
and it just doesn’t have high efficacy.
And so, could we actually change that?
To actually making this evidence
as built in automation,
non-human intervention,
automated and built into
the pipeline itself
in a digitally signed mechanism.
So it’s a set of signatures
that basically become one immutable link
list of signatures.
And so, when we sat down and we
thought about writing this thing, it was really threefold
the objectives. One was,
short and order time, could we turn 30 day audits into half day audits?
The idea that, instead of the subjective discussions and
comparing screen prints,
could we actually
just show an immutable list,
we don’t really wanna call it blockchain, but it’s sort-of
based on a block chain model,
could you just show sort-of a mutable list of evidence
of a change that no human had…
actually, there was no human interaction…
and so literally, you just look at these immutable list and next, next, next.
The second was, could we increase the efficacy?
I mean, the truth in the matter is, I spend a lot of time with CILs and then interviewing
a lot of people in an organization over the last three or four years.
And most people,
most of the audits they have I call it Security and Compliance Theater,
and that’s not even included when you get into modernization of
cloud native and microservices, even way worse then.
So the risk profiles that they
they sort-of think they need attestations in this sorta
rapid deployment structure it’s just completely disconnected.
So you find this in most organizations,
the efficacy of an audit
is extremely low.
So could we increase the efficacy of an audit
from like 20 percent to high 90 percent
and then last but not least, if we could do this,
we could make a malleable argument
of moving away from
a change of advisory board or CAB or centralized authority,
you know, if you think about the original
Capital One article.
And so what we did is we sat down and,
I won’t go into all this in gory details
because it’s in the reference architect. I just wanna expose you to the ideas, if you’re interested,
you can download the Creative Commons copy of it,
it’s on ITrevolution.com.
And so we broke this down into seven stages,
and the idea was to not really focus on
how people perceive the pipeline
but to create boundaries
for attestation. Remember every time I say attestations, I mean evidence.
So, what were the logical blockings or boundaries
for attestations,
so we came up with a development build,
packaged at its own stage,
non-prod/prod,
and the reason we put dependency and artifact,
it’s in the sort of life cycle of the
the traditional CI/CD path
but the dependency managing artifact have their own life cycles,
sorta asynchronous
to that. So we wanted to make sure that that was understood.
And then we concurred that what we called
common controls and common actors
and the controls were the, basically, attestations.
So if you remember what I said earlier, it was Nike, Capital One, PNC, Marriott,
and Saber Group.
When it was all set and done we had about
75 attestations. Now I don’t think
one company or one organization or one service
would use all of these,
but it was a reference artifact to show what
could be accomplished. So you could look,
now again I’m not gonna go into it, but
if you take the source code stage,
things like appearing on a
poll request or unit test coverage,
clean dependency, scanning,
if you get to the build stage,
unit testing, linting,
immutability from the input/output perspective,
and again I’m going fast on purpose because if you’re interested all this is in,
every one of these is actually spelled out,
there’s like two pages, three pages on every one of the control
points in the aggregate,
dependency management, license checking,
approved external sources, security check,
aging, so
not allowing
stale artifacts, approved versions,
package, things like
notary, or signing of a signature, or
if you’re going to apply meta-data from like a zookeeper or something
at operation time
to make sure that the metadata can’t be hacked,
or it can’t mend in the middle, so again
a lot more here,
like I said 75
artifact stage, retention period, immutable artifacts,
product stage, and then if you look at _ or
prod or non-prod, a couple of subtle differences but
the allowing configuration. So,
when you’re dealing with Kubernetes and containers,
there’s a lot of opportunities for adversaries to
or, creating adversaries by misconfiguration definitions.
So making sure you’re capturing those type of things as
artifacts for evidence
and then what we try to do is then go through and identify
as an example, not a recommendation, just an example of
where would be the control points, where would they come from,
things like SonarQube or Checkmarx, and then of course
JFrog XRay,
but again none of these were really recommendations, it was really just a…
it was a sense-making exercise for us to say,
“Does this make sense?” and then,
“Could we go through a quick list of where all these attestations might come from?”
And as the project went, we didn’t get through all of the things we wanted to accomplish.
We finally end up with sorta reference architecture for _ architecture
and Kubernetes and
we wanted to do this simple _ which is
in the admin controller, it was really very simple in notation.
But what got really interesting is
when we were having these conversations,
when we were writing the paper,
you know at night we go to dinner and you know all the people working on the paper would
sorta hang out, and then we started thinking about like, if you could create
this DevOps automated governance architecture,
then could we start thinking about templates,
so like, advisable templates for these things.
And if you could do that,
could you actually create human readable code
to apply those templates to? And so
it was really sorta just a dinner conversation, but one of the banks went out
and went full into this. They put a bunch of resources into it,
what we’re calling now Policy as Code.
So we’re actually going to do this this summer, start a second
version of this document where we’re gonna really focus in on policy
and I’m going to show you some of this stuff that
as… me as an advisor, but one of the banks had really taken
this to a whole new level and sorta post the reference architecture.
And so, here’s some of the sort of
principles for governance, human readable, platform agnostic, durable,
and again I’ll let you read this with the slides, condition parameters.
But
here’s the thing, so what this one company did is
went in and created this
what they call, pack files, policy is code files.
And here’s the interesting thing,
these are human readable
files that the policy people now are engaged in.
So the actual policy people,
so what happens now, service owners
that wanna go through this sorta newly defined way of processing stuff,
and there’s a lot of advantages, so people want to do this,
they have to go to the policy people and this was built
by design with the policy people
and then they will actually
come up with a human readable pack file definition
that will be associated with that service,
and then so the things like the thing we talked with attestations
will actually be defined. So if you look at pipeline versioning,
that every
application or service has to have
a mnemonic for the actual
component, has to have the,
I’m sorry, the mnemonic for the service, a component ID,
and a version.
And then you can look like later down as unit test coverage,
and then you have a pull request review, so these things would actually get written
in collaboration with the service owner, with the
actual policy people,
and by the way, now that policy people come to design the requirements,
and these become immutable because they get actually stored
in source control
along with the artifacts so that they can deliver the service.
So there’s not, and you have the
DevOps automated governance architecture, I’ll show you how it’s been advanced here in a minute.
So what happens is every time you do sort of a merge,
you’re gonna pull in all the evidence of the commit,
you’re gonna pull in the at-that-time
pack file
with all the other artifacts,
and that’s gonna be the immutable evidence
that will end up in this case, in Grafeace and in attestation store.
So now you’ve got the best of both worlds. You don’t have policy people trying to give spreadsheets to
infrastructure people, service people,
and service people having to interpret those spreadsheets into things
that will wind up being
maybe gated,
very in little of it evidenced.
Now you have everything, you have the
gating and the evidence all built in one.
And this is where it gets really cool, I’ll show you the architectures one bench is using.
So they’re actually using now OPA and Rego
and so if you remember in that pack file, there was
a versioning pipeline right, so that all
services had to have mnemonic component inversion.
So basically what you would think now is sort of interface definitions
would be the pack files would be the collaboration between the
the collaboration between the risk people and service owners
and then the implementation
could be something like Rego.
So here’s an example of
actually using the pack file and Rego
to control in Kubernetes
whether something is allowed from a policy perspective. This gets really, really cool.
I mean, we’re even talking about in this next version of,
if you have all this, you could do
policy aero-budgeting,
right? So here’s sort of a sample architecture, it’s a little more complicated than this but
you basically put Kafka on both ends. Actually right now, we’re working
with trying to figure out if we could put in,
I say we in an advisor mode but,
if we could put in a…
a serverless architecture
in between so when you get the sum level of volume
then like Kafka, serverless implementation,
Knative, and then Kafka, and then into this enforcer evidence engine
and the enforcer can integrate with OPA. So it’s just really cool.
Hopefully in the next reference architecture definition we’ll get the open source for some of this stuff but either way we will have
a pretty robust reference architecture of how this works.
Another thing that’s really cool about this is one of the problems you have in most enterprises today is
that,
in my experience, very few enterprises… one of the problems, let me step back here,
one of the problems that you have with sort of audit
in the enterprise, is
it’s based on, most of
the audits are based on a traditional service management model,
where every change
has to be associated with a service owner
and service owners traditionally are associated with
a CNDBCI configuration item.
I can’t tell you that I visited in the last, you know,
three or four years,
any large institution that told me
honestly that CNDB was more accurate than 25 percent…
So if the whole model of your
of your evidence
in an audit is based on this sort of false idea,
and not based on what’s happening in like Git, you know,
Github, across to sort of a Jenkins, or a build model…
Like,
the beauty of this model, is that
in order to play,
you have to actually define
so you really start creating an emergent configuration management database
because you have to find a mnemonic component inversion
and by definition all services that are sort of building and gaining
more value in this process
are actually becoming the emergency NDB.
Not only that, you have this continuous
audit replay you can look at
because remember I said earlier, these artifacts are immutable, and the fact that
you know exactly what version of a pack file got pushed,
and in fact actually the Rego files as well
got pushed at that time so you can always go back and sort of replay.
And the other thing, remember the thing we talked about aero-budgeting,
so now you can actually sort of analyze the
the continuous evidence or continuous compliance if you want
or what we’ve been calling automated governance,
are the things that fail,
right? And
so, one more subject I wanna talk about is,
that paper came out, I’ve been talking about it
for about a year or so and then
there’s a group out in New York called the Open Network User Group
run by a guy named Nick Lippis
and their board members are like the largest banks in
in New York and really in the world.
One of the focus of the year has been software-defined networking, SD-LAN,
they’ve been moving more into
DevOps and
some of the board members had saw this paper that we did,
and Nick Lippis knew me so he reached out to me and asked me if I would want to help
drive a cloud automated governance based on
that paper that we did.
So we got together and
this time we created,
and really this is really cool, because people that were involved in this one was
Fedex, Kaiser Permanente,
Signa,
and JP Morgan Chase, and
and I think… yes, JP Morgan Chase, and then
indirectly Don Duet, was the VP of engineering for Goldman,
he’s independent now
and we focused on the relationship of
attestations from the cloud providers to the tenant.
Now we’re going through this really quick. Again, this is a
you know,
Creative Commons book,
a paper that’s available from ONUG. You know,
it’s pretty easy to find, you can download. You do have to fill out some
some names and stuff like that, but it’s a free book.
We actually, this paper
about a month ago now,
a little less than a month ago, the Wall Street
Journal wrote an article about it
where they interviewed the,
it was actually sponsored by
Fedex, Signa, and Kaiser Permanente, so
the three CSOs who sponsored this
were interviewed by the Wall Street Journal
talking about the work we did in this paper.
So really, really significant work,
I’ll try to summarize it again, I’m not going to go into gory detail,
like I said, I’ll leave you, the listener, the reader, if you’re interested, obviously
my contact information will be all over this presentation
if you wanna talk about it. If anybody knows me, I love talking about this stuff
or exploring but… our goal in this paper was,
we had three goals
and they were really focused on the cloud providers showing evidence backed to the tenants
or the consumers of those clouds.
So one of the things we wanted to make sure was,
could we ask
the community,
one thing I wanna be clear too is,
initially when we sat down, everybody said “You’re crazy if you can think of all three cloud providers
or all five cloud providers or however many there are
to actually take your advice.” Even though
we had about 25 billion in
in spend
on that team
but, I said from the get-go, “Let’s not worry about like,
trying to convince Amazon or Google or
or Microsoft…
like that shouldn’t be the focus of our paper. What our focus should be
is, convince another hundred other companies
that would represent a trillion dollars in assets buying power”
and if they all agree with this paper, then maybe we’ll,
by the way, we’re doing a second version in this summer
and already two of the top three cloud providers are in,
so mission accomplished.
So here are the three things.
One is,
could we get in a unified format?
So that means, each cloud provider
agrees to create a unified…
normalized version of some type of signature that tells you
that nothing’s changed
from the way they did sort of _.
So it’s a signature that tells you
that you know the known state,
they don’t have to tell any of the IP how they do it,
just keep real simple, I know it’s not this simple,
but imagine a check sum
that told you that the
posture of anything you’re looking at hasn’t changed. So its scale
could change to that boot sequence, or boot infrastructure
could actually create
new opportunities for adversaries
or for example, a sort of
the first principle of all incident review
is, go back to the last change. So imagine if we could keep,
normalize across all the providers
an event that tells us
every time we do a new deploy or once a day,
that nothing has changed in that sequence.
Or something goes wrong, we can check to see that sequence.
So again, a lot more detail.
The second point is,
since all
cloud providers see all Ingress
requests from a tenant, a consumer,
all the API calls,
could they actually expose that back to the tenant
in some normalized format?
And the reason why you’d want this is,
today a lot of consumers of cloud will scrape logs,
they do this so they you know scrape different logs, different logs for different providers,
there’s no normalized way even in enlarged structures, but let alone
having to scrape logs and do things sort of all things like that
which is brittle, you know, the logs change.
Could we just say,
since you’re seeing everything, couldn’t you just spit that back in an event gateway,
something that we could process through a sort of
Knative or some type of gateway process
and that we could get firsthand look at what you see.
And so for things like service request forgery, or
some run away activities,
or somebody that’s sort-of
not following the rules and not put the right metadata, it would be much easier
to have a single point of control to identify
any sort of anomalous behavior that actually could get us in trouble.
Last but not least is,
look for a normalized structure from a security framework.
All the different products have security frameworks,
they speak different languages
and they don’t really talk to security professionals so
again, a little more of that in the paper.
This is the model for normalization. Just winding down here,
the last thing is one of the things we tried to do is create pseudo-code based
on _ it was based on _ so I wanted to try to explore
a different model, seeing if we could try to do something with _.
In here I’ve got a couple of examples based on those two models.
One is the boot integrity,
so this is basically just a _ of checking to see if
if the, sort of that intent or checksums
boot sequence hash has changed
and then this is one where if we’re receiving all the
Ingress traffic to a provider
and we were to listen in on it,
we could look for meta-data
that should have been there then if it’s not,
then we could go looking for sorta nefarious actors like a crypto miner and things like that.
And I just wanted to end up with,
as always, one of the, I talked about Kit Merker, I talked about JFrog,
I really do love the JFrog family,
they invite me to speak, I guess that’s why I like them,
I guess they like me,
but this book a lot of this book the liquid software which talked about
a lot of the principles in automated governance in this sorta creating trust in the component pipeline.
That was another “I think I’m doing the right thing” a couple of years ago,
when I read this book and it talked about Grafeas and
it just helped me to say that more than one person is telling me that this is needed,
it just everything fell into place.
It’s just a fabulous book. I always try to end these presentations saying
that it’s a quick read, it really sets the mindset perfectly.
Anyway, thank you so much for listening, I hope you enjoyed and please,
if anything of this interests you, I’m pretty easy to find,
reach out to me. I love to have discussions about this stuff.