Software Supply Chain Security for Open Source Projects – it’s time to prepare!

Attacks on the open-source value chain (OS supply chain) are becoming more sophisticated, and we, as software developers, are becoming the focus of these attacks. So what are the essential first steps, and what should you focus on? This raises the question of suitable methods and tools. At the same time, the company’s strategic orientation must be considered in this security strategy.

In the recent past, we have also learned that attacks are increasingly targeting individual infrastructure elements of software development, such as the classic CI/CD pipeline.

In this webinar, we address the following questions:

  • What potential threats are there in general
  • What are classic attack points in software development from the source code to binary
  • What tools are there, and where should they be used
  • How can I arm myself against the challenges of cyber attacks tomorrow

Resources:

JFrog Platform on AWS to Manage Your Software Bill of Materials Solution Sheet

JFrog Xray Solution Sheet

JFrog Artifactory Solution Sheet  

Video Transcript

Sven Ruppert:
Hello and welcome to this new video. It’s a pleasure for me to see you here, and what we are talking about today. So today we want to talk about software supply chain security. What are the key points, what are the different mechanics? You can see here some Open Source projects from the Linux Foundation I want to highlight. And we want to have a look at vulnerabilities, malicious code package and in the end, what you can do against all these attacks.
If you’re interested in this, stay here. By the way, my name is Sven Ruppert. I’m Developer Advocate for JFrog. And as you can see, mostly I’m out in the woods. Yeah, so if you are watching the first time one of my videos, then it’s big pleasure for me and big welcome from my side. And if you want to have more videos like this about Java or DevSecOps topics, then have a look at my YouTube channel, you will see a bunch of them there. And if you like a video, give me a thumbs up, and it would be a pleasure for me to see you as my new subscriber. If you start subscribing my channel, you are missing no other video anymore. And now it’s time to start.
Okay, let’s start talking a little bit about supply chain security. Supply chain security is a very broad topic, and software supply chain security is just a part of the supply chain security. What does it mean? Supply chain is everything that is used in terms of human power about humans, machines, material, third-party components, processes, everything that is producing something, it is in the supply chain.
So in everything that is disturbing it or compromising it is something against the supply chain security. And supply chain security is focusing exactly on this topic, how to make everything smooth so that you can work or this process is running without any interruptions or disruptions.
Software supply chain security is just a part that is focusing on how to create software. Okay, what changed over the years with the supply chain attacks? Long time ago it was more or less an individual or a group of hackers that tried to break into the supply chains, and it was more or less a financial-oriented aspect, so they wanted to get some money somehow.
But over the last years, and especially right now, so we’re in the beginning of 2022, we have a global political not so nice situation right now in the East. And then it could happen that if you’re working directly or indirectly for a company that’s working for a government or you are working inside the supply chain for a company that is a target of a different government, let’s say it like this, then it could be that you’re not attacked by individuals or a hacker group, you’re attacked by a government. And this is a complete different thing because they have completely different resources, different possibilities. And even if you’re a small or medium-sized company, if you’re part of the supply chain, it could be that you are now attacked by governments instead of individual hackers. And this is a complete different beast.
And what we can see is, the big companies are improving their security day-by-day. They have a huge amount of resource in terms of manpower, money, infrastructure and so on. And this means that the pressure or the attacks, they are more and more not against the big companies, they are more and more against the small companies around these big companies. And this means that over the time this pressure will increase and even if you have a small or medium-sized company with, let’s say, 10 or 15 employees, you will get now the full amount of attacks on your part of the supply chain because it is way cheaper to attack the small and medium business size companies, and a few of them, instead of attacking the big company. So it means the pressure is increasing step-by-step and as much as the big companies are increasing their protection, it means the pressure against the small, medium business size companies will increase as well.
And one of the biggest question is, what is a key factor against all this attacks or what is a fundamental thing you should have in mind? And whatever supply chain you have, the traceability is one of the key points against all these different attacks, or it’s a fundamental thing to protect against compromised elements inside this. So traceability means that all parts, what you’re doing, at what time, who’s involved, what material is used, what is the output, where the output is going through and so on and so on, if you are able to have this traceability throughout the whole supply chain, this is one of the key factors. But now we want to limit this group from general supply chain security to software supply chain security. And here we are focusing on from source code to binary.
Okay, let’s talk a little bit about the software supply chain security. It’s a subset of the supply chain security. It’s focusing just on software, and I have two open source projects from the Linux Foundation I really would like to highlight here. And one is Project SLSA, and the other one is Project Pyrsia.
Let’s start with SLSA. SLSA is a documentation project, and it means that a bunch of different individuals or cyber security experts, or security experts try to create documentation, first of all, to give advice to you so that you know at what level you are with your security, what are the next steps, what you can do to increase your security, and the description of all the different common attacks against the software supply chain. So really what are the attacks from source code up to the binaries in production?
So, part of Project SLSA are these levels. These levels are more or less so that you know where you are and what you can do, and what are the next steps to increase your security. And the first level is at Level Zero, is just you have to document everything that is used inside your software development process. So it’s a full documentation of everything so that you know what is going on, where is something involved, what components you are using, and so on.
So, level one then is describing that you have to create an SBO in software builds of material so that this binary is depending on all the other components or the other dependencies so that you have a full dependency list of your created binaries. Don’t worry, we will focus on this SBO, what it is, how to create it, where you can get this stuff later on in this video. But yeah, so this is the level one.
The level two is that you start using a GitHub repository on source code version server and the ICD environment as well as in the repository for the binaries, and making sure that everything is automated as much as possible.
Level three is introducing security audits. So it means that external parties are checking what is security level you have, what you have done right and wrong and what should you do better. And if you have done this, then the level four is describing the definition of immutable and reproducible build. So it means that you know what is part of the build, that you reproduce it and that you’re creating boundaries just once and then using them so it never recreates a boundary.
So all this is a very, very short overview of this project, but I have a video focusing just on this project, SLSA on my YouTube channel. There I will explain all the mechanics, the details and all different flavors, what you can see and get there. And have a look there on my YouTube channel, and search for the video about project SLSA.
So the next part from this project SLSA is the documentation about most common attacks against software supply chain. And this means that we are focusing now from source code to binary, what could happen, and we have a few things.
The first thing is for example, no source code modification will be done without any review. If you have done this, what could be the next attack? The next attack could be that you are compromising the source code compository itself. It means that you’re just seeking in with bad commands or changing source code at this part, so you have to hand this source code repository. From the source code repository, it’s now going to the build CI environment. And here again it could be, the build could be changed or it could be compromised in a way that’s fetching the original sources but overlying with some other or additional sources or that you’re just compromising the build itself.
A very prominent example of this is the solar winds hack where with a rebuild inside the CI environment, the binary was corrupted inside the CI environment. So hardening the CI environment is one thing. From the CI environment now, it’s pushed to a repository, and here what you can do is, you can bypass the CI environment. Say I’m the CI environment and pushing compromised battery into the repository or you can attack the repository itself. But there are few things. So we are talking about bad dependencies or during its build, what you can try is to provide bad dependencies so that this is used during its build, and you can change or promote boundaries from outside that were gripped by the repository.
Long story short, meaning we have several hotspots here and the main hotspot is first of all, the three components, the source code repository, CICD environment, as well as to get the binary repository. Here you just have to, but you have to harden this environment.
This is a operational part, but for the software developer there are two things left. The own source code and what’s going on with the source court, and then all this stuff was the binaries. So all dependencies, and I think it’s worse to have a look at exactly this part. Source code and binaries.
Okay, let’s talk about the next project, and the project is called Pyrsia. Pyrsia is a project, it’s an open source project from the Linux Foundation and it was initially created by the company JFrog. And what we want to do, so in this project we want to focus from, the binary will be built until the binary will be delivered.
It could be as dependents could be for production, but this is a part where project Pyrsia year is focusing on, and all the other parts are external of or not included in this project. So it means we are focusing here from, now we are building a binary up to it will be delivered.
How is Pyrsia securing the software supply chain of this build process? So you want to take once again this build infrastructure. So all this build threads, and here what Pyrsia is doing is, you’re providing to this decentralize, its in P2P network or peer to peer package manager. What you’re doing is you’re sending the URL where the source code is and a commute, then different nodes will grab the source code, will build it locally and then sharing the information about the binary itself. And then if all binaries are the same, then the build infrastructure is not compromised.
So you can create different nodes of Pyrsia, but you have no control about if your nodes are selected for building something. So it’s really randomly selected in a way that it’s very hard to really provide this nodes in a compromised way so that you can bring in compromised binaries to business platform. So now we have this binary inside this P2P network, and then it will be delivered to the distribution layer of Pyrsia.
By the way, this is a very short description of Pyrsia, so if you want to have a more detailed, then check out my YouTube channel. I have the video just about this project Pyrsia, and then I’m going really with every step in detail so that I’m providing all the information about the internals. Here, it’s just a very short overview.
Okay, how is Pyrsia delivering this binary? So it’s a peer to peer network. So if you have now this, for example, native dependency inside Pyrsia network, you are asking here, okay with a Maven coordinate, this Pyrsia nodes, then it will be selected where this binary is, and then you can fetch it from several points. So if you have bigger binaries, you have all the advantage of a P2P network that it could be delivered partially from different nodes to use the bandwidth as much as possible.
On the other side, we have gateways to Docker Hub and Maven Central, for example, these are authorized nodes. So if something’s not inside the poser network, it will be fetched from this authorized node and then stored inside the PTP network as well. So if these nodes are going down once or maybe for a few minutes or whatever, you can always ask a poser network and it will be still there. On the other side, just have a look at this project website, it’s a young project and yeah, I can say try it out.
Okay, we saw that we have different parts, and that one is done by the organizational or operational part, and the other thing is source called binaries. And I want to highlight four main areas of cyber defender, cyber security or the [inaudible 00:14:58], whatever you want to say to it or where you want to place it. And it’s first of all [inaudible 00:15:03] that the application security testing, it isn’t testing mechanism where you’re testing each component until it’s not running. So it’s just from the first line of code, you can say this depends on you than your scanning.
If something is running already, you have this dynamic application security testing which means the application is running and you’re looking from outside on this and you have more this hacker approach, and the combination of both is IAST, Interactive Application Security Testing, which means you’re ramping up the environment you have from outside the tech and you’re looking inside and modifying the tech factor and all this stuff. And then the last part is runtime application security protection and the name mentioned already, it is just for production, it means inside your production environment, you’re analyzing what’s going on and try to identify it is an ongoing attack.
The last one, I just kind of added this part because it’s just for production. If you’re focusing on IAST, is it means you need someone who’s highly skilled. So this is mostly something that you’re doing later if you have experience with the security stuff already. The Dust part is quite late inside the product line because you need something that’s running already and you can’t really scan a hundred percent of the components because you’re just looking from outside. So you should focus definitely on the static application security part because with the first line of code, you can start scanning all the included components and identifying if there are malicious packages or vulnerabilities.
So we saw that we have leftover source code in binaries if you want to start with Dust, and I highly recommend to focus first only on the binaries, because if you’re comparing how much code you write and how many dependencies you’re adding and how many lines of code this is, then for the most projects, by far, biggest part are dependencies. So focus there and scan there for vulnerabilities, malicious code packages, it’s the low hanging fruit if you start with cyber defense or cyber security topics or inside [inaudible 00:17:16].
So, next question is, “What’s the best source for vulnerability information?” And here I can say whatever single database you’re using, make sure that you are building a superset out of different database because single database are mostly having a lack of vulnerabilities because their market is so huge and you’d never know to what provider this information is sold.
So, this is exactly what we at JFrog has done or doing already. So if we are aggregating different vulnerability database, commercial one, free one and we have a dedicated research team on top of this vulnerability database that is enriching this data with mitigation remediation information, and we are adding the knowledge about our own zero days as well. So whatever you’re choosing, make sure that you have its opposite as we have done it in JFrog.
Let’s talk about malicious packages, and we have different aspects I want to highlight here, and the first one is the infection methods. So, what is an infection method? So, the way how this malicious could is, provided or it’s provided a way so that you’re consuming it. I will start now with types quoting. So what means type quoting? Types quoting means that you have common packages, very well known packages, and they have a name, and you’re grabbing this code and then you are changing the name of this dependency based on common typos. And then if you have a common typo, you’re just providing this package in a regular official repository. And then with this typo you’ve just referencing a corrupted package, and inside this package you can do whatever you want.
So the next thing is masquerading. So Masquerading is focusing on imitating the whole environment around dependency. So you think you are duplicating the codes, the metadata, you’re adding small pieces of malicious codes here inside this one, and you’re building Creme packages. So, you’re building something that looks exactly the same, that maybe has a same name but provided at different places, and then something is insights or that you were infected. That’s it. So the only difference from the original package is maybe one line was an obfuscated code that is doing something, calling something, sending something hands on.
The next thing is a drawing package. A drawing package is more or less like the historical view. You have a package that’s doing something, a PDF library, and you can print and all this stuff. So everything, what you need, but inside this you have additional functionality that’s obfuscated, hidden somewhere, and it’s just activated during the time you’re using this really well working library. So sometimes there’s drawing packages, they are good libraries for them, they’re giving you a good value, but an additional value as well.
Another way is dependency confusion. Dependency confusion means that, if you have internal packages and external packages for example, so inside your company, I know for example because I know someone who’s working there or this information is bleeding out. I know what is the name of internal dependency, then what I can do is, I can create a dependency exactly with the same name and a higher version number or very high version number and putting this one in official repositories.
So I’m really using exactly the same definition but my CI environment is looking first maybe at main central and grabbing there is this wrong dependency, was the same name, maybe same functionality, slightly different version number. So that automatic version increases will grab this one from outside, and then you have this dependency confusion in a way that you see. It’s my dependency but it’s grabbed from outside.
Now the infecting method is hijacking. Hijacking means that you have access to the infrastructure of this project. You are taking over the ownership in some way. It could be in aggress way that you’re really packing the page, having this, or if there’s a non-maintained project and you see it’s used, then you’re just taking this free domain and building again something around it or you’re taking the open source project that nobody’s maintaining anymore and doing this one.
So hijacking is, yeah, it’s more or less, you are the maintainer of the project but with different intention. So these are the common infection methods. And now the question is what are common payloads?
Another thing is what are common payloads? Common payloads is more or less what they’re doing. So what is this code that is in this malicious packages? And then one of the bigger things is sensitive data stealers. So it means they want to have credit card numbers, they want to have user tokens, environment variables, passwords, usernames, whatever. So they want to steal this data and sending them somewhere. So you have this, okay, look if you have this environment, very able check this name and with the next request send an additional request to the attack server. So that is one thing.
The other thing is that you have something like a connect back shell, it’s like remote shelter. There is malicious codes that’s waiting and connecting back to the attacker server so that he knows, okay this malicious could is there, I can connect, then I’m sending commands, it will be executed on the other side and the result will be sent back.
So this is just, whatever you can do at this system to do this one. And an other thing that is very popular these days is that you have download and execute. So you have a malicious code set, it’s connecting to a room, it’s about downloading a binary and start executing them. And this is quite often used for example for crypto mining. So just to mine cryptocurrency with other people’s energy and money, and sending this one back. So these are the most common payloads that you have in malicious codes, but how to hide this malicious codes, and then we are talking about obfuscating techniques.
Now talking about obfuscating techniques. Obfuscating techniques can be just by public available obfuscator or custom made obfuscator. But mostly you can really search for obfuscators and then you’re using it. So what they’re doing is more or less, they are, they’re renaming variables, they’re encoding commands in different encoding and all this stuff, so that you are not aimed to read it immediately. So you have to re-encode it to see what’s going on.
So this is one thing but a little bit more interesting is a control flow flattening. The control flow is more or less we have this control. So it’s running A, B, C, D, E, F, G, and then in the middle you are breaking some IF and ELSE stuff in. And then depending on the amount of codes or variable or whatever, different code will be executed. And this is so implemented in the code. It’s not obvious or that you’re not really immediately seeing it. But if you’re analyzing the whole control tool. You will see that there is something that is around this main logic.
The next is homoglyph characters. Homoglyph characters is something like, you have this different Unicode character that are looking like plain ASCII Latin characters, but they will be different if you’re comparing strings and all this stuff. So with this, you can make sure that some comparisons always succeed or always fail. So for you it looks like the regular ASCII sign but it is a different Unicode sign. And with this you can do a little bit more and this is this bidirectional control characters and this is cooling.
So, we are reading left to right, right to left, whatever, and we can say to the machine as well. And we can use this control characters inside my source code so that a human is reading maybe from left right to right and seeing, okay, this is some source code, it has some commands, its sorts, its ends, and so on. And the compiler will see it completely different. It will come to this character switching from left to right to right to left, reading and interpreting this stuff and it’s completely different. And this is something that is quite cool in terms of getting this understanding for this, but it’s not so easy to detect.
So we heard so much about vulnerabilities, malicious packages, master techniques and all this stuff. But the first question is, inside the software development chain, where is the right place to put security in? And out is quite easy everywhere. So every single step should be involved or should be hardened with a security approach. So security is like quality, it is really part of every dedicated step.
On the other side, what makes so unique, there is combination of [inaudible 00:27:01] so this dependency management and vulnerability scanning. Have in mind that all dependencies of all tech layers, they have some meter data around. So, like easy and dynamically linked dependency is there, compiled scope is in Tesco, is in a version range is [inaudible 00:27:22] is a statically linked, dynamically linked and so on. So all this information is available inside these different dependency managers. If you are grabbing all dependencies of this artifact and you have the whole meta data and the knowledge of this, then you can use this meta data to analyze, for example, reaction mitigation information.
You can use this for defining the whole of tech vector over different technology borders or insides the whole tech stack. So, having this information of all the pensy miniatures of all tech layers, and the possibility to skin these boundaries is a huge plus compared to, I’m just getting one technology or I’m just getting one binary because mostly this information is not part of the binary, and then you can’t use this information. So depends in management and scanning for vulnerabilities is a very good combination.
The next question is, “Is shifting left to the CI environment enough.” Well, shifting left to the CI is good because this isn’t fully automated gate where parsing through, it’s the place where it can implement this security border that it must pass before it’s going to the next steps. But shifting left to the CI is not enough because if it’s reached CI, then you spend already so much time, and maybe you can do it a little bit earlier.
So the only thing on a little bit earlier than CI environment is IDE as well as command interface. So we have both of them, we have the command line interface where you can work straight on the command line, you can script it, you can use it to see what vulnerabilities are there. And we have this IDE plug in. So if you’re typing the first line of dependency, then you see immediately if there are vulnerabilities in the dependency tree or if there are some compliance issues.
While this makes sense, first of all, if you’re spending time on creating something, pushing it to CI, getting back, that’s not possible and you have to rewrite it, then you’re getting bored and the quality of the second solution is maybe not so good because you are under time pressure now, you’ve wasted a lot of time already, and you are bored because you’re doing things twice. So it makes sense to have this information as early as possible available so that you are focusing on the core things once and doing this right without being bored or wasting time.
Why you should use a CLI command and how to use it. So first of all you should use a command because with this you can work without any other tool. If you’re cloning repository like Maven project, then you can just go in on a terminal shell on a command line, you can go inside this project and call the ordered command JFrog ordered minus nvm. So with ordered minus nvm, it’s a command line interface, now there is a project and may even project. It really extracts a whole of [inaudible 00:30:26] and give you all the vulnerabilities and compliance issues that are defined here. So you can configure it with watches and all that stuff, but plane is just order it and get all information that are available.
So with this. You’re not wasting time. So you can check it without starting any workflows or just opening ID and all this stuff. On the other side you can script it so that all other tools, existing infrastructure on your side is able to use those capabilities as well.
On the other side, you can do a little bit more in terms of analyzing. For example, the docker imaging is called on-demand scanning. And on-demand scanning means that you’re creating docker image on your machine, you edit stuff, you grab different things and you want to know if this docker image is good enough. So in terms of compliance issues or if you have any vulnerabilities you should get rid of.
So you can extract this docker image so that you have entire image on your disc, and then you can use the Jfrog in line command to analyze this docker image. Or if you have docker desktops, you can use the docker desktop plaque, and to analyze this docker image. You have immediately the information about what’s inside in terms of vulnerabilities and compliance issues, and you can send this information to Artifactory, and then it’s on demands getting available, so it’s documented. And if you’re changing the dock image then you see the difference between different scans.
So you can work together with your colleagues who have it documented without pushing anything of this docker image to this artifact. So nothing is bleeding in if you are composing stuff on your site and without using the CLI resources and waiting there.
So using CLI makes sense for integration makes sense to be in fast and make sense to analyze what’s going on and give you the flexibility to work straight on your challenge. What you have at this environment where you are a little bit earlier I talked about the SPO.
So what’s in SPO? SPO is a software built of materials, means a full list of all dependencies that are used to create this binary. This is quite popular these days because the executive order of cybersecurity from the US president, Mr. Biden, explicit says that everything that is used, owned, run, whatever buzz US government must fulfill this as SPO software built of material.
So you have to provide the full list of all dependencies of all take layers. We know this since a long time. We called it these days earlier, build info. And build info is a super set of SPO. So you can head here during the time you’re creating a binary, not only the whole depends on this, but it is added all meta information you want to add like environment, variables, day. Time, machine, agent name, whatever. So everything what’s necessary and what you want to push there can be sort and seem beautiful on the other side.
For this, your health is tab aboard x-ray and inside x-ray, you have the actual knowledge of vulnerabilities. So it means if you’re passing a bill today and it’s green because today we don’t have some knowledge about vulnerabilities that are inside and passing this to production. Maybe tomorrow with an update of the vulnerability database, we know, “Oh we found a new vulnerability,” then we know that for this binary we have no vulnerability, and so we house this information without scanning production and on the other side isn’t good.
Yeah, good thing to, on the daily, in the morning just to scan for the importance binary to check that I created yesterday. So if there is a new vulnerability entry there for example. So with this, you can maintain production without scanning it. And on the other side you can at this x-ray tap going to this action thing, clicking there and then you can create SBO and extract the SBO in different variations. So it depends what’s dominant you have to fulfill, and then you have everything to be compliant with this executive fraud of cyber security.
So it means, we have now the possibility to create this SBO. We have the possibility to see vulnerabilities from in the past created binaries, results, skinny everything. And we can use this to actively maintain stuff that is running in production. By the way, you will find there are a lot of CVSS values in certain places and if you’re clicking on CVSS values, you see all the basic metrics.
I have not enough time to explain now all these different metrics values, but I have a dedicated YouTube talk that is going through all the CVSS metrics and you will see that you have the possibility to scale the CVSS metric to your environment. And for this, you need environmental metrics. How to use this and how to deal with the CVS calculator is showing in one of my video there.
And it’s good advice to see and play around with the CVSS as well because it will give you some advice, which one is more important for your environment and you can adjust the order in which you want to fight against vulnerabilities.
Okay, we had a lot of stuff now. So we talked about supply chain security, about software supply chain security, about this two Linux foundation projects. We saw what is the difference between vulnerabilities and malicious packages, different techniques of obfuscate, and so on. I showed you how to extract this [inaudible 00:36:04], what you can do on command line, and why shift left through the CI is not enough, near should shift left.
So the IDE or command line interface altogether is a package, but there are a lot of more detailed videos to see or topics on my YouTube channels or just check it out. And if you are interested to try it, we are on free tier. So there you can register and try all this stuff by yourself or attend one of my webinars or workshops where we have hands-on practice on exactly these topics. The other third, my day is done. I found this lake, I will make my camp here for tonight, will enjoy it. And whatever time you are seeing this, I hope you have a good rest of your day and, well, stay safe, and see you.

Trusted Releases Built For Speed