Pyrsia – Securing your OSS Supply Chain
With OSS, not knowing where all your software comes from means hard-to-spot risks to the integrity of your services. Without constant identity checks and safety protocols for keys and secrets, open-source dependencies can open the door to breaches, exploits, and supply chain attacks.
Enter Pyrsia — your torch that lights up the open-source supply chain!
Learn from our product engineering team how this new OSS tool enables you to:
Assure package provenance (e.g. Signed commit, Build log attestations, Non-repudiation of publisher)
Create immutable history (e.g. transparency log of every package in its original state and its metadata as it changes over time)
Distribute securely and efficiently (e.g. verifiable integrity of the package and its source)
Independent build network to verify builds from open source repositories
Video Transcript
Speaker 1:
Hi everyone, my name is Sudhindra Rao and thanks for joining the webinar that we are hosting to showcase Pyrsia and help you secure your OSS supply chain. So specifically today we are talking about your software supply chain and focused largely on open source software. The reason we are focusing on open source software is that is where we find that good automation is lacking and Pyrsia tries to solve that problem. So if you want to find me, I’m Sudhindra Rao on Twitter also, you can find the Pyrsia team at Jfrog and I’ll put the website that Pyrsia is hosted on so that you can take a look at it. But let’s look at what the state of affairs are today.
So let me make you aware how dark the state of supply chain is today. And when I talk about supply chain, we are aware of what happened to this supply chain during COVID and how badly and hurt we are as consumers. But I’m not talking about this, I’m talking about something that hurts me more dearly and I care about more dearly is that of the software supply chain. And when I talk about software supply chain, this is the most recent thing that comes to my mind and that also resonates with most people. The solarwinds attack that happened right before COVID and then continues to hurt organizations and systems around the world. And this was a classic open source software attack where a particular patch was not applied in time and that was exploited to cause irreparable damage to systems.
Another famous one is the one that happened with Equifax a few years ago around 2017 timeframe where millions and millions of people’s database was compromised and we still see repercussions from that compromised data from people’s credit history and all of that. And that happened because Apache Struts, which was hosting their data was not patched in time. And there is no easy way to do this because open source software is all over the place. From our research, we found out that people have looked at this problem and about 75% of software built today contains open source components and that’s huge. So we need solutions to fix this situation. Log4Shell, which happened very recently, is on our minds. And it also shows how fixing these problems is hard because typically there are one or two people maintaining and managing this and it’s upon them to actually fix them and it’s a burden. We need better tools and systems to do that, to provide the fixes.
And as recently as May 10th and the most secure programming language, Rust or the community faces similar issues. So this problem is not going away. And there are people writing recipes on how to actually take over open source software and act maliciously if people needed a recipe. And that’s how deep this problem. And we put our bread and butter on it, we put our healthcare systems on the same software, which depends on the same software and so on, and we don’t have a good way to manage it. And whatever the examples that I gave you so far are just the tip of the iceberg. These are the most talked about, received the most press coverage, received the most feedback, received the most attention. There are tons of libraries which are either dependencies or transitive dependencies off or other open source libraries which people just overlook because they don’t have the tooling in place to address them. And this is just a time bomb waiting to explore.
Just to keep things easy for us, we just implicitly trust these systems. We trust NPM because it is hosted by Microsoft and PiBY because it’s hosted by Python community and so on. And we rely on their systems to verify that whatever they have published, the binaries and packages, they’re trustworthy. But that trust, trust is also based on very meager amounts of trust controls, which is or put multifactor authentication so that we know that the particular committer is committing the binary, but there is no way to verify that actually the binary came from the source that the committer is claiming and so on. So there is a gap between what they claim and what actually might happen and those areas of greenness are areas of concern and areas of exploitation.
So essentially given the state of affairs, this is what we are doing. We are picking software that we found on the street and plugging it into our production systems that manage our finances, that manage our health, that manage our automation in terms of traffic and our satellites and all kinds of communications that we do, that manage climate change. All of them has depended on such software which we cannot guarantee or which we cannot prove to even ourselves that it has gone through some sanity tests and we know that it is trustworthy. So what do we do about this?
So given the recent attacks, even the White House has a sprung into action and there is an executor order which says that you need to act diligently and figure out what your software material contains, publish it and make sure it’s up to date and all of that. And they have published some documentation around how to do it. And there’s some activity in the open source world to actually build tools that will make this really easy because as we know, if we make it a manual process, it takes hours and people lose interest and do a poor job at it. And if they have an automated tool, then they’re more likely to run it over and over and even produce reports that help them improve their posture.
There’s also research going on within large organizations to solve this problem. One of the similar research is the SLSA effort, the software supply-chain levels of software artifacts. And what it aims to do is actually hold a mirror against what we are doing in terms of building software. And one of their artifacts is this very simple CD system diagram. Even in this simple CD system diagram, they show that there are about nine gates that can be attacked even in this simple diagram. And we know from practice that typical CD systems are way more complicated, way more involved, have way more steps and hence more gates or attack vectors. And what the SLSA architecture shows is we need to put controls in all these places instead of just relying on where the binary comes from. And that typically happens when people commit their code either in GitHub or finally commit their binary into the binary package holder like the Ruby Gems or PiPY or NPM.
But everywhere in between, there is so many ways it can be attacked and misused. Pyrsia at JFrog, since we were talking about this problem, we realized that for these situations from B to edge where we actually build binaries, we know where the binaries have gone, we know where the dependencies came from, we know how the package is built, we manage all that given our experience with artifact, we thought this is the right place for us to come in and provide that technology that we know and have the same effect that we have had in the close or homegrown software to the open source world, so that we can leverage that same amount of rigor that we present through our technology. So that’s where Jfrog found itself when thinking about this problem. We have this vision when we talk about software within Jfrog, that in the future, software is going to be liquid.
And if you look at that vision, what do we need? We need a supply chain that is a hundred percent automated like iron man. It’s trustworthy like Wonder Woman. And it is dependable, right? It’s dependable. We have to be able to rely on it at all times like Black Panther and make sure that we can put our money behind this supply chain so that the software that we deliver at the end is worthwhile, it’s actually traceable. We can produce an SBOM with it.
So allow us to present Pyrsia, where Pyrsia is actually multiple things. It’s a consensus based build network. This is where you can build binaries from your Git comics and you can build it in independent fashion and we’ll talk a little bit about it.Pyrsia will also have a provenance log. At every point in time where the particular open source software came from, what happened to it, what were [inaudible 00:10:18] discord, what were the actions taken, what were the fixes, et cetera. So there’ll be one central place you can go and ask those question and print your SBOM. And also Pyrsia is meant to be from the ground, a decentralized offering. It is a decentralized package registry which will help you and protect you against the single point of failures that have been observed with either either AWS systems or NPM itself going down for hours together and thus hampering the continuous delivery of your production software.
And with Pyrsia we wanted to build something, and these were the tenects we were building Pyrsia on. We wanted something that is secure from the get-go, that cannot be compromised. We wanted something that is reliable, hence the decentralized nature and we wanted to build it in the open, because it is built for open source software, it is meant to be used by everybody who consumes open source and it is meant to solidify this community which is hurting. So we wanted to build it in the open and that is what we think will bring the trust, that is the best way to bring the trust. When it is in the open people can comment, critique and build a better software tooling like Pyrsia.
So if you’re wondering where did this name Pyrsia come from? Pyrsia was actually a distributed communication mechanism used by ancient Greeks to communicate over mountain tops of impending dangers or impending domes. And we thought that was a good metaphor to apply to this same problem that we’re facing of an impending doom of the supply chain. And since it’s we have building a distributed mechanism, we thought Pyrsia was a good name. If you want to find out more, here are a few links to learn about how they actually did it. Pretty interesting history lesson for us.
So let’s talk a little bit about Pyrsia and how Pyrsia is similar to the ancient Pyrsia. So Pyrsia is based on peer to peer technology. Peer-to-peer because we know from centralized internet that there are many single points of failure that hurt us when we are trying to do continuous delivery across networks, across the regions, across geographies. So from the get go, it is based on peer-to-peer. There will be trusted package registries which will hook into nodes that we already trust like Docker hub and NPM and all of that. But this network itself will be resilient to their failures because now this network is downloading and caching all that information, all those binaries and giving you the resilience that you need. Think about it in similar ways as you think of distributed nature as Git, right. Git is for code and Pyrsia will be similar for binaries. Pyrsia, will also contain a consensus based build network where an open source commuter can just come submit the commit hash.
The prerequisite is that the open source needs to be open source, meaning it has to be on about a GitHub repository where it can be accessible by Pyrsia. What Pyrsia will do is pull the commit hash that the committer gave us and pick random nodes on the network to independently build the same software. Pyrsia will bring up infrastructure so that those are independent builds and in the end they will verify that they produce the same result. And then once they have verified, that result will be committed to the network and then will be available for all the consumers. So that way we know that it is not built by this one developer on this one machine which could have its own situations which could have its own malicious software, right? We are building it independently. So that’s what Pyrsia does.
The other thing that is missing today is there is no single place for you to go and ask these questions. Like where did this binary come from? Who actually built this? We can glean that information from various sources, but again that adds the manual factor. As soon as you add the manual factor, there is less excitement around getting that information. Pyrsia aims to provide that all in one place. It’ll tell you where the source came from, it’ll tell you where this binary came from, it’ll tell you how it was built, it’ll be connected to the vulnerability scanning mechanisms, it’ll tell you if there were vulnerabilities discovered against the software that you’re trying to pull or the version that you’re trying to pull. It’ll also tell you if that vulnerability was fixed in the future release so that you don’t have to download and order it.
And that’s what we are calling the Provenance log. And this provenance log is the crux of this system so it needs to be immutable, it needs to be easily distributable and that’s where we are trying to leverage what has been made super popular by the cryptocurrency technology. But we are going to use the base technology, which is the blockchain, the immutable ledger so that this information is intact and cannot be modified and tampering this information will be a flag that there is malicious activity against the Pyrsia network itself and Pyrsia network and then discards the new updates and so on and make decisions. So that’s how the trust will keep on growing because the immutable ledger will protect against any such attacks against the network itself. And from the get-go, pyrsia needs to be really easy to install. So as you’ll see, we can use Pyrsia command line, which we have started building already, to fetch images and do things with it.
But we are also ensuring that you don’t have to change your tooling that is in place. So if you are doing Docker pool today, we don’t want you to change that because that is the hard part. There are many more CI/CD systems that are running things and that are harder to change than developer machine. So we are not so much worried about what happens on a single developer machine. But if your CI/CD system is using a particular Docker image, which is open source and you have to change that from Docker to Pyrsia command line, that is a no-go. So Pyrsia will make sure that it is transparent to you when you’re using it in your CI systems. And then on top of that, Pyrsia command line will provide you more information, will provide you the Provenance log, we’ll provide you the intelligence that you need to build other automation.
So it’s going to be really easy for you to use. So we have started building the minimum viable product as you say, and starting to build the first and the second integration. So we started with Docker. We have a demo which works, works like this. We can bring up two Pyrsia nodes, they connect to Docker hub or one of them connect to Docker hub and acts as a proxy. And then in your CI system you continue to use the Docker pool by configuring your docker to Pyrsia. And when you pull that Pyrsia will act as a cache to pull that and the next time you use it or the subsequent build run against it, you don’t have to go to Docker hub. If Docker hub is down, that’s okay for the time being and you can still continue to run your CI/CD system.
If another system needs to run the same image and is on the same network or can connect to the same Pyrsia node by peer-to-peer, you don’t need to depend on Docker hub to pull all this. And this especially helps when you have to download really huge images and network traffic is a challenge. And what we are doing is this demo is actually on YouTube so you can download and look at it and make sure you’re comfortable with it. You can also run the steps that are on our website to run this demo and I would encourage you to do that and give us feedback on how it works. Over the course of the last two, three months, we have made changes to the demo so that it’s works smoother and smoother. Many other people have tried it so it’s a very feedback based mechanism and I have realized that the documentation that I wrote is changing from time to time.
So we appreciate any feedback that you want to give and tell us if we built the demo well or if we did a bad job in the documentation. And we want this to be a community effort instead of us saying this is how you use it. So we want this to be community driven more and more.
A little bit about what we are building, what are the guts and how the architecture will look. So like I said, it’ll be based on a Provenance log where it can ask those difficult questions about your SBOM. There’ll be a command line interface. We are also planning on a desktop client, but that’s in little bit in the future. We started with a Docker integration which is ready. I highlighted the Conan integration in the same blue because as JFrog we know Conan and the C+ source community and we think we can build it.
But for everything that is great, it is really great in our minds. We need support from the community. We know the Java world but we are not experts. And so we need community members like yourself to come join us and tell us how we should build it. We have started building the basics of Maven and Gradle integration so that Java community can integrate, but we really need your help. So if you’re passionate about any of these languages or if you don’t see a language that you are passionate about, please come join Pyrsia in the ways that I’m going to show in a few minutes and tell us how you would like to help us.
Some of the talks we did, people asked us about what is a security model and there are some questions that we have to answer.There are language ecosystems that can produce reproducible builds and in that case it is very easy to build a consensus among amongst the build nodes and prove that a certain binary was correct. In the case of languages where reproducible builds are not possible, we rely on trusted registries like Docker. In the case of Docker, we fall back on Docker and say, “Hey Docker, we build this image, is it similar to yours?” And then we plan to do some comparison so that we know that it is similar and can provide the same binary as result. Also, we rely on proven system like GitHub for example, where they have the multifactor authentication, the SSH keys and DPG keys that they require you to sign in with and we rely on that so that the source itself is verified. We also plan to add that but nothing that will change significantly that side of things. But we will add more stringent security requirements on top of what already exists.
So to get started, you can install Pyrsia, release is available on our GitHub repo and you can install Pyrsia. You’ll get some basic commands that you can use with the Pyrsia command line interface. You will need to configure your Docker desktop a little bit. Instructions are in the demo documentation but you do not need to change your CI/CD scripts. You can just continue to do Docker pull and it’ll magically work and you’ll get the efficiencies that Pyrsia can provide.
A little bit about what is inside Pyrsia. We have chosen the Rust as the language for development because from the get go, we wanted to support multiple operating systems. We want to compact low attack vector surface binaries and Rust seemed to be the right language to do that. Having said that, there are a lot of things we are learning about Rust.
So if you don’t care about what else we are building but are passionate about Rust, please, we appreciate help in that direction as well. And if you care about both though then that is even more amazing. We are hiring people to help us with how we are building things. So if you’re interested in joining the team in different ways, let us know. We have built the integration with Docker. We are doing a similar one with Java, just the beginnings of it. It is based on a project that already exists and has been successful, IPFS, we are using the libp2p library. They have currently open source and we are using the Rust implementation of that and we have found that we are actually breaking the boundaries of that and we are making contributions back to libp2p even through this effort.
And for the immutable ledger implementation, we are using AlephBFT as a consensus mechanism so that we prove that whatever we are committing to the network is trustworthy. What is coming up next? We are actually really working hard on making the Provenance log usable so that you can use it to build your SBOM. You can query elements, you can make security decisions based on that and you can actually write automation on top of the Provenance log so that you can make release decisions on that. So that’s where we are going.
We are also working on providing the high throughput that we are promising so that we can leverage the peer-to-peer network to stream large binaries fast to you.
And then we are doing the build node side of things where we are going to build binaries from different languages on Pyrsia.
As far as Pyrsia in terms of collaborating with the community, we are already using the libp2p rust implementation. We are talking to that group which builds this and we have actually ported a couple of changes from the golang version back to libp2p just because it’s lacking in the rust implementation. We are using the AlephBFT rust implementation as well and working with that community to understand how it’ll fit with Pyrsia and all that. And we are closely looking at the SigStore and Notary V2 for future integration because we don’t want to reinvent that side of things. SigStore and Notary are doing what is right for signing things on the source side of things and we don’t want to ignore that part. We want to just leverage that and put the rigor that Pyrsia has on top of that so that you can then use automation that you have with SigStore as well as with Pyrsia to build your supply chains.
So just a shout out. Pyrsia is open source. We have had actually way more than 25 public meetings. They’re all published under the openSSF, under Linux Foundation. So you can find all of that information on the Slack channel that we have. We have a bunch of contributing organizations. Actually the roster of contributing organizations has grown. Recently, Huawei, Futureway, Oracle have joined us and they’re already starting to contribute to how we build integrations with different languages. And we are pretty active on GitHub. If you want to come chat with us or send a PR, we welcome all those interactions. So to get involved, just go to our website. At the bottom of the website you’ll find a bunch of links to YouTube and Twitter and Slack and Google groups so that you can join us in the right way. Download and install, give us feedback using our Twitter handle or any other means.
Join team meetings, listen to past recordings to learn how we are doing things. We have marked some good first issues if you want to start coding as well. So we welcome all in every way you want to participate.
And to summarize, supply chains attacks are still here. With or without COVID, the attackers haven’t stopped doing whatever they were doing and a lot of them, I would say majority of them, leverage the vulnerabilities in the open source landscape and they take advantage of that. So remember that. And even NSA hackers can’t get enough sleep because supply chain are happening every day and we need to secure it now and for that, we need every single one of you. At JFrog, we believe that every one of us is a super frog and hence this pretty picture. So we want all of you super frogs to come join us and help us build a better tomorrow.
Thank you very much. Find us on our website, pyrisa.io, very easy or find us on Twitter PyrsiaOSS. Thank you very much. I appreciate your time coming here to listen to me and humor us around the idea of Pyrsia and open source software supply chain. Thank.