Spearited Guidance: Learning About DevSecOps
Let’s talk about security in an organization. Most commonly, security sits at or after the last phase of the software development life cycle (SDLC) and can make or break the decision to release into production.
Unfortunately, waiting on such decisive feedback until after something has been built frequently results in needing to make changes after it’s been marked as ‘complete’, which is costly and inefficient.
Instead, let’s learn from how we created shorter development cycles – instead of making Big Decisions at the very end, make smaller, iterative decisions throughout the entire journey that are easier to implement or reverse.
One way to do that is by implementing DevSecOps, which adjusts the workflows of development, operations, and security so that security decisions are made on smaller scales at every phase of the SDLC.
As with development and operations, even with preparation there can still be incidents – in this case, security incidents – so I’ll also be reviewing our 14 Step Secure Incident Response process, including the what and why of each step.
Video transcript
Hello, my name is Quintessence and this is separated learning about DevSecOps. Don’t panic, there will be a few text heavy slides but whenever I use something that’s going to be referenced in my resource materials, I have this handy little link on the slide ao you will be able to actually click on anything that I’m talking about, and it’ll be all made available to you. So let’s talk a little bit about the current state of the industry.
Usually, when you have some sort of issue or request or something, there’s a ticketing system involved. So you have a human, he or she or they put through a request, and it tells you something that you need to fix or add, it gets handled in the normal way. And then it goes through to issues. And this is all pretty straightforward. We’ve seen it a lot, you know, and every time we actually build out these features and fixes and so on, we’re using the software development lifecycle, right?
Plan, analyze, design, implement, test, integrate, maintenance mode. But what’s missing is the security wall. So things get bolted over the wall kind of to, do this last step, for lack of a better way to say it, of the software development lifecycle that’s not actually even in that diagram. So what happens is you get everything basically all ready to go.
It’s in some sort of finalized state, or it’s considered to be finalized, and then it gets bolted over to security and maybe it wasn’t so done, maybe there are things that are known to be missing exploits that are not being taken care of, other things like that. And so when that happens, security vaults back over and says, hi, here are the items that you need to fix. And it can be a little contentious, right? Because when you have something like this going on one side had the mentality of this is done and then the other side had the mentality of, not yet I didn’t even see it. And then some of the other things that can result from this is people viewing one group or another as a roadblock or an unnecessary process.
We’re not feeling listened to, we’re not feeling appropriately communicated with. Or you can have a situation where maybe there’s not enough or not the correct information, right? When you’re sending a request, and you don’t necessarily know what to include, you might be including too much or too little of the wrong or mismatched information, and so on. So this is very frustrating. So how do we resolve these issues?
Well, one way is to implement DevSecOps. And what is DevSecOps? DevSecOps stands for development, security and operations and it seeks to integrate security across the software development lifecycle, and streamline the workflows of those groups: of development, security and operations.
To be very clear, what DevSecOps is not, it is not replacing security with Dev and or Ops, it is not expecting Dev and or ops to become security specialists or expecting security to become development or operations. Okay. Essentially, what that means is that DevSecOps is trying to do for security, what DevOps did for development and operations. And so what that kind of means, if we think about the pre DevOps life, or the pre DevOps transformation, what you saw happening was the same kind of communication patterns between Dev and Ops, where things were getting bolted over, communication wasn’t efficient groups, were getting frustrated, opinions were being held. And so when we’re trying to wrap security into this process, so that they’re included across, you’re trying to resolve these issues the same way we did before. But more than saying the same way we did before, how exactly are we trying to do this? So there’s the secure software development lifecycle, and then they’re shifting left. So first, we need to talk about the secure software domain lifecycle, which is about breaking down barriers, you want to forge forward and kind of wrap security around the DevOps lifecycle and very briefly, it can look like this.
No detail whatsoever, just know that there are security steps at each step. Or you can have something a bit heavier like this, which starts to discuss what type of security activity can be done at each stage and here you can see different names for activities. So for example, you have secure architecture and design. So going back to the features and fixes or even entire, you know, services, applications, etc.
If you’re bringing security in the design phase, and you’re having a conversation with them, along with any other architects that are involved, you can say, oh, well, I think I want to build it out with this, and I want to rely on these things. And then they can say, oh, did you know there’s a vulnerability with this one that you can resolve with this? And they can be a part of that conversation so you don’t find out later.
You’ve chosen something that maybe had some more vulnerabilities or other things that you weren’t aware of. Similarly, you can do threat modeling, which is another design phase activity. And what that basically is, is when you’re trying to mentally model and then literally model, what type of attacks you’re at risk for and what you can do with security is, again, try to involve all of the experts that need to be involved at this stage, right? So depending on what you’re designing, this will vary a little bit, but you’re going to have a conversation where you say, I intend to do this.
If someone were going to be malicious, what would they do? Right? Simply put, and then there’s also testing, which is static and dynamic application security tests, some of these things can be pipelined, some of them cannot and should not be due to their duration but the idea is to implement security tests against getting images, dependencies, and so forth, into the pipeline. So instead of you know, vaulting back and forth between these groups, you can say, okay, well, just like I have a CI\CD pipeline to do QA tests, and so forth, I can put these security tests at their appropriate stages in the pipeline, and then they’re running automatically, and then the person who needs to review them will have visibility at those stages.
Instead of finding it out at the end, what I thought this was done, we can actually find out right here or right here, which is earlier, and you can have better outcomes with that. And fuzzing I just wanted to mention, partly because it’s fun to say, but also it’s good idea to make sure you’re checking all of your potential inputs, for what happens to your application if someone tries to throw garbage at it, or legitimate code at it. And this isn’t just to make sure that you’re sanitizing your input. But realistically, if someone tries to take down your application by putting in a binary file in a text input, or something like that, do you feel gracefully?
Do you just reject the input? What happens? So those types of things are very important. And of course, as you can see in the diagram, there are a lot more than just these activities but these are some things to try and get you started thinking about, okay, what do we do at the different phases in the software development lifecycle? And to the point, why this is called shifting left, if you think of that diagram, and you look at what’s on the screen here, you’re shifting left, because you’re literally shifting earlier in the way that the diagram is designed. So instead of waiting until the end, you’re shifting earlier, shifting left into that diagram.
Something that’s important to mention is that you do not have to do this yourself. And I do not just mean please hire security people, although also please hire and work with security people. But you do not have to design every test by yourself, right?
If you’re thinking to yourself, oh, no, how do I design, you know, a static Application Security test? Or where should I use it? Or when should I use it? Or you’re hearing things like threat modeling, and not sure what conversation to have, there are frameworks, there are things that you can use that describe all the activities, you can see how mature your organization is compared to certain standardized benchmarks, a few that are commonly referenced are the BCM, DCM and Sam and I’m providing links to those later.
But the idea here is that again, you don’t have to start from scratch, your security people don’t have to start from scratch, you can pull from these existing tooling and frameworks and so forth, and say, okay, how do I want to customize this for what I know we’re doing here? And all of this is great, right? How do I get this done, though? Because once I have all of the tooling, how do I get people to implement the tooling, the tests, and all the conversations? You do cultural support, which means that this section of the conversation is about humans.
DevSecOps is supported by humans and humans are supported by technology that is supported by another group of humans. It’s a very human experience and I covered the basis of what can go into this but let’s talk a little bit more about humans. So part of the reason I named this talk speareted guidance has to do with a literal sphere, spear…
I can’t apparently say it correctly, it is a spear. And the idea is that you have what’s called the blunt and sharp end, and we’re not unique to using this metaphor, but basically what it is, is who has the power and who implements the decision. And usually you have a decision maker layer and then you have an implementation or individual contributor layer and they’re usually not the same layer. Although, of course, depending on what the decisions are being made, there can be overlaps in different areas.
But the idea is you’re guiding the spear in the back or you’re poking it in the front. So blunt versus sharp, and it’s all relative. And because it’s all relative, you need to have the appropriate level of buy in and this kind of builds on who’s making the decision versus who’s implementing it. So who’s implementing it might have a really good idea about what they think should be implemented and how it should be implemented, timelines it should be implemented and so forth, but they need to get the buy in from a manager, skip levels and so forth in order to say, we need to make time for this project, we need to represent, like, people and focus on all sorts of things to get this done. And so that’s the purpose of getting buy in. 4 Once you have that buy in, you can also get a certain amount of support from other groups as well because then you can say, okay, well the reason our group isn’t doing this other project, which is also important, is because this project is important for these reasons.
Pivoting a little bit, something that I always want to mention is to never trick staff ever. And this goes back to like, common exploits that sometimes you see pop up, sometimes on Twitter or elsewhere, where you hear people talking about oh, I clicked the fishing link, and now I have to take a 30 minute module or something like that. And it’s not ideal, right? So there are lots of other threads where you can see people talking about some legitimate emails looking fake, because email design is hard, and some fake emails looking legitimate because, you know, design can take time, and they can do it correctly. And because of this, it’s easier and it builds more trust, to show common exploits, rather than implement them on people. And the main benefit of this is you can actually have people trust you when they have accidentally click something that they shouldn’t have or when maybe they’re using an email client, and it downloads an attachment automatically, right?
That’s just how it happened and now they have a compromised machine, and you want to kind of condition and teach them that they can reach out to you. So if you have a contentious relationship, the ‘I gotcha’ relationship, then that’s not really the type of relationship that fosters these types of conversations. So it is just so very important.
Don’t trick staff, but show staff, you know, if you’re doing your annual security training, show them what those common exploits are. And speaking of that training, it means you can customize that training. So when you’re trying to, you know, say oh, well, actually, we do really well on fishing, people rarely click them, if we happen to deploy them just to check or whatever you’re doing with that information.
You can say, okay, well, people do really well with this, but actually, people have a really hard time understanding something like multi factor authentication, or why they should enable it, or why we have forced it to be enabled, depending on situation. So let’s teach them about that. And so when you get to have customized trainings, and you’re working on your trainings, you can actually take the time to say, I’m going to talk more about where everyone’s not doing so great and talk less about where people are actually doing really awesome. And maybe give them a props for that.
It also really helps if you make the training somewhat engaging. So show a cross site scripting attack, show how easy it is to implement some of these… we’ll say easier types of exploits that people can use so that when people are thinking, oh I… I didn’t think people would do that, because I thought it was harder for some reason to do this than it apparently is and now people can keep that in mind when they’re trying to keep themselves secure. It’s also really helpful to have slightly separate trains for staff in general versus the non security engineers that are actually working on code, infrastructure and everything else that they’re working on. This is also something by the way that you don’t have to start from scratch, we have a guide on this – Sudo.pagerduty.com, also linked in my materials but when you take a look at that guide, you can see that we have some templated staff training and security for engineers training that you can use and customize and build off of, so that you can start from a good starting point.
Something that can help build a certain amount of repertoire between the security and other groups, is something called full service ownership. This is also something I’ll be linking to in my references and resources. But full service ownership is the idea that if you’re maintaining it, or you’re building it, you own it so that there’s a clear path to who do I contact if something’s down, who do I contact if I have a question, and that sort of thing. And the reason this can be beneficial for security as well is you can have a security specialist or a security team that owns a service relevant to them, like vault, right? And they can get an understanding for what the production requirements are for development and operations. And they can use an understanding to help build an empathy, it’s bi directional right?
To build an empathy backwards and say, okay, so this is why you make requests that you do or are working on the timelines that you are.
Clicking back to development operations, but you know, also security too, which they might do this for fun, there are these games that you can play, for lack of a better way to say it, called Capture the Flag. And this is kind of a digital equivalent to the game on the ground where you try and capture the flag but in this type of setup, what you’re doing is you might have a server that you need to grab a file for, maybe you need to access something owned by root but not as a root account and you have to do certain exploits to get there. And the idea is to try and increase the security posture, not only if you’re security specialist, sure, but also if you’re not a security specialist, your other engineers, and this goes back to a comment I made earlier about oh, I didn’t realize it was this quick and easy to do this certain thing. And so if you make it in kind of a game like this, you can say oh, well try and access this file or try and log in without logging in and doing stuff like that, you can actually have them realize, oh, this is why you make these requirements, the way that you do.
A conversation for both groups to have together is something about threat modeling. And I touched on this a little bit before.
So threat modeling is the idea of trying to figure out what you’ve designed and how it’s potentially at risk, and assess that risk and see what things that you can mitigate and what things you can’t. And the idea of having not just security in this conversation, but other groups as well is they can bring their domain expertise in. And so when security is talking about, oh, well, you need to protect against this because of this type of vulnerability. You can have other groups saying, well, can we design around this way, and have it be an actual conversation instead of a bulleted list that maybe they receive. And this again, helps improve the communal security posture so that everyone involved in this conversation can actually learn something as they’re trying to build out whatever the common goal is, this actually is one of the many things that has a framework, or one of the frameworks that you can jump off of, instead of just trying to build from scratch, also linked in the resource materials. And after all of that, there will still be security incidents, right? So you do your best effort, you assess your risks but ultimately, you’re going to get surprised by something and because of that, I want to talk a little bit about secure incident response. And so the next slide is text heavy, but I’m going to be going through them one by one separately, so don’t panic.
This is the 14 step process that we abstracted out for how a security incident generally runs. And you’re not necessarily going to do every single step for every single incident but you will mostly be ordering them in this way for the ones that you do actually implement.
The first thing to talk about is stopping the attack and progress. The idea is that you need to balance out the need to bring in others versus the need to stop the attack quickly. And some of the things in the secure incident response process, you may start to notice deviate from a non security but still engineering production incident. So in another incident, you might try and get all your subject matter responders in one place, and then triage and handle it that way. But with a security incident, you could have data that’s vulnerable, you could have other situations you don’t want to continue to maybe escalate through your environment. And so it’s very important to stop it early. So if you can quickly stop it early, do so. And then cut off the attack vector, whatever it is. And so the idea is like, if you have a compromised server, and you know, you come across an area of the network that you can block a port or whatever, block that port. So this is cutting off the attack vector, which is slightly different than stopping the attack in progress. And the way to kind of mentally model these is, let’s say someone broke into your house, step one would be to get them out and step two would maybe be locking doors and windows. And so this is the doors and windows step.
After this, assemble a response team, again, with the caveat of balancing the need of the severity of the attack and the expertise of the person who noticed that the attack is in progress. Once you’re at this step, you need to prioritize everyone who has the appropriate expertise to take action over stakeholders, so this is not a communication step at all.
This is solely to bring in people who can continue to respond to the incident and handle the next step I’m going to be talking about in a moment here.
When you’re at this stage, it’s important to notice that you’re probably going to need to assume hostile intent while you’re stopping an attack in progress but you can mentally shift into a neutral or positive intent once it’s resolved and the idea with this mind shift is when you don’t know the intent, you have to assume hostile because you have to map out what’s the highest risk and where someone with hostile intent would go. But after that’s taken care of you can say, you know, people do things by accident all the time, download attachments that spam out, or do whatever they’re doing and that’s when you can shift into a more neutral stance, once you hopefully don’t have reason to continue to presume hostile stance.
You also want to make sure you’re isolating anything that’s affected if there’s a breach. So the idea here is if something’s trying to access a database or any other data source, you want to make sure that those are isolated and contained and that they’re not continuing to be accessible in this way, anyway. You also need to, at this point, identify the timeline of the attack.
So you’ve noticed the attack now, roughly for some definition of now, it might be 30 minutes, it might be an hour, but the total duration of the attack might be longer. And we might recall for like the Equifax breach sometime ago.
What happened, they noticed an attack at some point but when it was dug into it had been going on for a long time and the same can be true of any security incident where you know that somebody is in there now but they may have been coming in and out and not tripping ladders or anything like that. So establish that timeline, figure out how long we’ve been around or how long this attack has been going on.
You want to make sure you’re handling both scope and severity. So what did they have access to and how sensitive was that. And while you’re looking at how sensitive it was, try identifying the compromised data, if any, there might not be any, that might not be what they were doing. But if there is, make sure you’re identifying it, and again, assessing how sensitive that compromised data is. Then you need to look at other systems. So the idea is, if someone comes through entry point A, where can they go? Or can they use that same exploit point for a for B, and C, and so forth? So what is the risk of these other systems in your environment? Then you need to assess the risk of re-attack. So were you able to permanently or did you only temporarily fix this when you cut things off and handled the incident earlier?
Is there an unpatched CVE, for example, a critical vulnerability where you were able to patch it or able to architect around it but at its core, you need to get an update for maybe a dependency or something to truly patch it. And this goes into your assessment. You also want to make sure that you’re playing any additional mitigation. So thinking here, I made the example of Equifax, but basically, with anything like that, if you had something intrude on your system, and your monitors didn’t catch it, while you’re assessing how that happened, you can set up monitors to capture their behavior in the future, you can also do any sort of key rotations or anything that was compromised , that makes sense to change to prevent it from being reused later.
If you set aside those systems, so that you could and out analyze them later, now that everything is resolved, and scoped and everything like that, this is when you’re going to do a forensic analysis of those compromised systems. And again, you do this only once everything is solved as best as possible, right?
This is when you want to see what actually happened, any new information to add and this is to kind of augment everything you have thus far, you might be using a third party for this depending on, you know, a variety of factors and knowledge in house and that sort of thing.
This is one of the last steps and with intention and the idea here is unlike a non security, but still production or engineering incident, you don’t want to communicate, there’s an active incident, and then do the 30 minute, you know, update rotations with security because there’s not much to say and it mostly just pronounced anxiety, there’s no action for anybody to take, right? The action that’s being taken is by the security team than any other subject experts they brought in. So when you send out this internal communication, it’s towards the end, once you have all the information in hand, and you only send it out really, if it’s necessary to do so because again, what action do people need to take and this will be kind of equivalent to instead of doing those 30 minute updates I made reference to if you only set out a post mortem for a more regular, we’ll say, engineering incident. And the idea here is now people can read through, they can say oh, this is what happened, this is how it was resolved, this is what I need to do with anything and then they can move on.
Now, after that, you might need to involve law enforcement. This, again, depends on what type of security incident you’ve just experienced.
That might be a requirement of your company regardless, so this is really going to rely on your internal knowledge and documented process around this. And if you are involving law enforcement, you also want to reach out to external parties that may have been used as an attack vector. So think of something like this, if you’re a company A and some company B is the origin of the attack, it might just be that someone accidentally got, you know, compromised malware or whatever on their system and that’s why you’re getting an attack in this way. So you want to make sure you can reach out to them, but don’t use you know, ‘Hello, domain-name’, like reach out to the appropriate contact, if you have it, you might have to go through law enforcement if you don’t just so that the you know, notification is legitimate and not ignored as just spam right? In and of itself. And then only if necessary, do external communication. And this is if you need to notify again of any sort of notification of breach data, or any action that needs to be taken on the part of any customers, users, etc. of your application, people who have submitted data to you.
Now, as a quick recap slide, so we just reviewed all 14 of these, I do have a link of these in my resources and materials, which are on this slide. So I have everything up on my noti. st account, you can take a look at everything I’ve linked, the pseudo guide for training, DevSecOps Guide, which is the origin of this talk and then also the security incident response page, which outlines that 14 step process.
There’s also links to frameworks in stride and full service ownership and some other things. So I definitely recommend taking a look at all of those.
Next month, we’re going to be hosting our pager duty summit so I really think it’d be great if you registered to join.
It’s a free event and registration opens on the 4th of May. And I’ll be around.
If you have any questions, please let me know.
Again my name is Quintessence and I’m a developer advocate at Pager Duty.