Bringing Continuous Delivery to Open Source @ Continuous Delivery Summit

November 28, 2022

< 1

Bringing Continuous Delivery to Open Source – Sudhindra Rao, JFrog

Open-source software plays an essential role in the supply chain of modern software development. Proprietary software is typically composed of 75% or more open-source dependencies. In open source software we rely on ad-hoc methods of software process and quality control.

A few of those ad-hoc methods have received much attention in the last few years – need of MFA on source repositories, need of signing every binary, need for verifying such signatures and building trust in open source packages.

In this talk we want to cover different tools that help in making these methods easy to implement and help you decide which ones fit your way of working. We will talk about the recent attacks on the open source software, SLSA framework, Sigstore, Notary, Pyrsia. We will also highlight how the Continuous Delivery of open source often does not receive the same attention and rigor as compared to proprietary software. We discuss how to apply this rigor and enjoy the same benefits with open source.

For more Continuous Delivery Foundation content, check out our blog: https://cd.foundation/blog/

Speakers

Sudhindra Rao

    Sudhindra Rao

    Development Manager

    Sudhindra Rao currently works at JFrog as Development Manager to help build communities and partnerships to provide visibility into JFrog's liquid software mission. He has been working as a developer/architect for critical business applications developing in multiple languages including, Go, Ruby, and Java. After having worked in traditional application development, Sudhindra became part of the Pivotal team and built their Kubernetes(k8s) platform offering. Sudhindra's diverse project experiences include: building an application for the largest publishing company in Chicago, a large datacenter automation effort, a large auctioning system, and a voter campaigning application for US national elections.

    Video Transcript

    while you read this quick trivia question how many of us have played badminton quick raise of hands
    does anyone know the original name of badminton and this is going somewhere but
    no so I come from the town that invented badminton where it started uh the
    original name of Abandonment badminton is Puna and that’s the place I come from the reason I mentioned that is um you in
    during this talk you will also see me sharing some of the pictures and imagery of that City that I love and that will
    hopefully add to the context that I’m going to talk about um so I’m sudindra I work at jfrog um
    currently I am the development manager for Persia uh the project that uh that
    we um just um announced as part of CDF and I wrote this talk when we were actually
    researching um the the reason to have the reason uh for securing uh supply chain uh and open
    source and we were we were sort of exploring um what is happening uh in in
    this in the open source world and how the supply chains are being managed and that’s where that’s where some of these
    ideas came to mind um so uh and I need I need your help
    here so instead of just writing a agenda I saw I thought I should phrase it in a
    way so that you know it is more participative so during this talk I’ll try to share some of my experiences
    hopefully uh I’ll be able to share how I have learned open source software being
    built and uh distributed and you also learn some of the community efforts but
    I need your help to make it a really great talk and this can be a great talk
    if if you are able to participate if you are able to take this message back and come back come back next time next year
    by bringing contributions to these projects that I’m going to talk about
    let’s let’s go back a little bit in into my experiences I I came from a world
    where you know I was working on Ruby projects where two-minute builds were were really at the rage where people
    used to we used to talk about how they reduced three seconds and five seconds on a test and then more recently I actually uh was
    building a kubernetes platform uh with a name that ends in KS and the the task of
    that kubernetes platform was to be able to run on any cloud of your choice any cloud and also on-prem and let’s walk
    through the Journey that we took taking the newest release of kubernetes packaging it adding Integrations and
    what it looked like looks like hopefully this will give you an Insight on what you know or this may be similar
    to what you are using in building your proprietary software so we had some source code there are some components
    that came together there are there were Integrations on that kubernetes platform like networking and observability that
    that are not uh part of the communities platform so all of these are built and they take you know X number of minutes
    to build individually and the the really heavy task comes when we start building
    these on variety of configurations that that these need to be deployed and the
    reason I stopped at 49 is that those were the number of configurations we were we were building uh on this
    platform each of them then went through some acceptance testing some of which was manual uh and then we used to
    publish a downloadable binary or downloadable software that people can install on on cloud of their choice now
    just to put some stuff into context each of these pipelines took eight hours right and that added up so it used to be
    a whole three-day effort to bring bring up a new version and then figure out if things uh things broke and so on right
    and this is the kind of rigor we put in in developing a proprietary software
    right this this is a dedicated team trying to trying to build things everyone knows everyone uh or at least
    they can be traced back to what they are doing every commit can be traced back to a system that it came from and and if a
    component builds uh failed in the integration we could go back to the component team and say hey this is this
    is the pr that it that it went down to right and this this took about three
    months of effort right this is the continuous delivery World um but it didn’t go to production after
    it went to uh a public download our customers used to download it onto their staging because it’s an and it’s a
    platform level deployment and then bugs would be found and we used to do some work in going through the integration
    Loop right and then people used to manually go through and
    figure out the provenance of every single thing that became part of that build every single open source software
    every single uh small library that has been used all the way uh to the top right and and this is manual because
    there is no there is no infrastructure that will give you all of this information and no trustworthy information and some of this was
    required for security like identifying CVS and all of that some of this was required for legal purposes to ensure
    that we know we didn’t violate copyright and all of that and are we done we’re still not done because there is
    some process that that happens like there’s some marketing that needs to happen some legal documents needs to be
    signed if you are in a HIPAA or such environment then some uh signatures have to happen and then General availability
    happens so this is the rigor that we put in I know this is a far-fetched example or an extreme example but this is the
    rigor we put in to bring continuous delivery on proprietary software
    now let’s reflect on what it means and or what we do to do to get to the same
    level uh of Open Source and are we even there at the open source
    the reason that these are good practices is that you know you have a dedicated Team all of them know each other there’s
    an inbuilt trust um and how do we do that today in open source
    we know these are these are some uh some of some of the research data I found from my references 70 78 of the code
    that is currently shaped is is open source uh meaning out of all the code
    bases that that are shipped uh 78 of a code in those code basis is open is open source
    almost all projects are dependent on some open source somewhere right and we
    we have we have come a long way in adding automation at various levels to help with releasing them and Linux
    project itself is a very good example uh but how does it how does it scale um
    we when we decide to use open source we we just think oh it worked for Linux so it
    must work for other projects so we just go by and do the search find the top either it’s a download either it’s the
    GitHub stars or whatnot and pick that and or the most use is that is that is
    our quick rubric to identify a good project we find badges like that saying
    that if it is green or if it’s some salsa level or something that is flashy uh
    we go go after it and and pick that one uh or we look at the GitHub stars like I
    was saying or if there are more Forks that means you know there’s more Community around it but is that enough
    and this is what I think when I think about open source right open source is infrastructure and this is this is a picture of you know a Mumbai Pune
    Expressway that I love to ride and it is it is really beautiful during the rainy season uh because there is rain coming
    down and there are clouds and it’s amazing and it’s it works really well when infrastructure is in good shape
    like this right and what happens when when that is not the case
    when that is not the case this is another picture from Pune where where there is there are famous restaurants and really good locality and it survived
    all the rains for last 50 years but there were some rains like last week for
    for an hour or two hours and the whole system just choked and the whole system just choked because there was no obvious
    drainage for the water for such torrential rains right and this is similar something similar to what we’ve
    seen uh see in open source right log 4G is hacked and that’s it the thing that
    worked for us for years and years is now in trouble and we don’t have a way to protect it
    right to improve the infrastructure system I mean some some solutions are
    obvious that you need to put the right right drainage system so that uh so that
    this doesn’t happen in the future we need we need similar things in open source and if we look in a similar way if we
    look deeper onto this project this is what I see in log4j and I know that
    people are working on this so it’s the build the workflow is kind of disabled
    right now maybe they are working through some things and last few builds are read so this is not really you know this is
    this is what my software depends on but this is the this is the trust level I have
    Rascal arrest client the one I showed has had a name spotting last week right
    so again and the third thing that I showed you which was Salsa level 3 project
    that actually is not a project that’s a Ping Server that is no code Ping Server that was demonstrated by Kelsey
    Hightower in in a in a recent talk so I just picked that and that doesn’t exist even it’s because it’s somebody he just
    he was he was demonstrating how you can fake uh salsa levels just by putting Badges and that’s what it is so we we
    rely on those kinds of trust mechanisms and that is very weak
    and some some more like emphasis on how open source is sort of ingrained right
    if you if you build a web app that’s how much open source um software is uh as part of the web app
    if you look at uh very famous libraries that everyone
    here knows right pandas people probably know or have used it it shows that it has 1400 commenters but the actual work
    is being done by four people how can you expect the four people to have a complex
    um a mature CD system you can’t they need support
    and there is this trust or inbuilt trust that oh what what is what is this person
    going to do wrong just be if if I give them access well this person can write a script that can steal Bitcoin that’s
    what happened on a recent project uh that is a event stream library that was very popular and the original
    maintenance said hey I I’m getting help maybe I should just trust them and what could they what could they do wrong
    right and this is what happens if you put put a trust in in in the um
    in the people that that really don’t deserve this this is this is another picture of floods that happen in
    Bangalore and I’m I and I’m not I’m this is not to critique the uh creating the
    infrastructure but sort of showcase how the parallels are uh this was luxury housing built on on top of lakes and
    sort of reclaimed land and this is what happened after after three days of rain and people had to be rescued through
    boats and all of that right and and this is the kind of stuff that is happening with our software somebody
    just giving the keys of the city to people who don’t know how to handle infrastructure and and here we are
    and this is this is the most used slide I think in the in the talks we had uh today but dependencies these small
    dependencies that uh that we blindly use if you if you actually run a hello world application in npm it seems there are
    about more than thousand dependencies that come come with it and we don’t pay attention to those and this is just the
    first level dependencies and there are transitive dependencies the other days the other day I was assessing you know
    why is my application using openssl I found out that openssl is being required
    by native TLS and that is required by Tokyo which is a async library for rust
    and I I only picked Tokyo and I never paid attention to that oh now openssl is
    required and it’s important in this context because openssl also has difficulty in being compiled for
    different operating systems which was what we were trying to do right dependencies are so hidden that we
    really really don’t know where they came from and we don’t really pay attention to how they’re being used right and and this is
    I think this is a well-known story where a bridge in Pennsylvania was on its knees already and and the infrastructure
    build bill was the impetus to fix these kinds of problems so we need we need
    that kind of emphasis to improve the CD um etiquette uh for open source
    and generally people talk about GitHub when we talk about open source but here is a research that actually compared
    GitHub and other places where uh where open source software is and it found
    that more or less the number of committers for every project stays the
    same between two and ten uh it’s not a big big team they are doing it more more
    on a philanthropic basis uh they’re doing it on the side it’s not their full-time job but they’re still trying
    to support it so so there is we need to put more people more more energies there
    so talking about Doom and Gloom um what can we do about fixing the um the CD
    etiquette for our open source right there must be some things we can do and I want to share some of the some of
    the work that has been happening before the pandemic and even through the pandemic
    um at jfrog we believe that in the in our future we will have liquid software
    liquid software that that is automatic updates and and continuous in the in that fashion and for that we need a
    supply chain that that is automated as much as possible 100 automated if if
    possible remove the human parts of the often my uh build pipeline that I showed
    it needs to be trustworthy it can’t depend on the badges and the stars and all of that
    uh and it needs to be Dependable I need to know every single detail about every single dependency that I’m using uh also
    this system cannot go down you cannot have the situation where npm or Us East one was down and it stops uh going to
    production it’s not so much just going to a continuing development with development But continuing to release
    the production uh so in in the last few years there has been more emphasis on protecting the
    supply chain uh there have been tools that allow you to write software bill of materials there is also government
    emphasis uh regulation on how to improve the cyber security etiquette and some of
    this is required the regulation literature so that the practices start start flowing but this is not going to
    be all regulation is going to operate at a higher level and it’s not going to flow into practices as fast as we want
    so there needs to be action from our end um some of the and this is the most
    talked about research around software supply chain and and the first thing this does is this lays lays out the land
    for to Showcase what are the places where software Israel is vulnerable
    right and this gives us something to play with and figure out you know if our tools are capable of handling all this
    and if you compare this with my uh my original CD flow this is much less
    complicated and in a real life situation there might be more attack vectors that you will need to handle but this is a
    good reference right so we’ll use this reference to understand what has been happening in this field
    uh there has been some work in trying to sign uh sign the stuff that goes into
    production sign your images sign your Docker images that go into into your kubernetes deployment notary uh is one
    such project uh now it is in in the second phase and the project is called rotary V2 basically the point the idea
    is to make sure that when you put any image into production that is already
    signed and it can cannot be tampered with and so on so it it allows you to do continuous delivery on that because a
    lot of those signing verification processes are now automatable right and that’s all try to tries to solve the
    very end of uh of the supply chain where where the packages are sorry the images
    are being shipped uh to your kubernetes clusters right so that’s good that’s a good start
    there’s another project called six store and it tries to do the same thing uh signing but with a lot more variety of
    ecosystems now you can sign your Pi Pi packages now you can sign your ruby gem so you can sign your Java packages and
    so on right and you can you can take it all the way to your through your supply chain uh into kubernetes
    so if you haven’t looked at these projects go look at them and and uh at least find out how you can use them
    and this uh six store and the projects are underneath do a little more like I
    said they they sign and you can verify all of those at the end of the supply chain and at the beginning also when you
    start consuming them you can verify that hey this thing that I’m trying to download is now signed and and and I
    know that it came from a certain developer is that enough there are all these things in between that that need
    to be addressed though tekton chains is another project which
    allows you to take take it for kubernetes deployments that aren’t tecton all the way uh through through
    the chain right same thing tries to address the the one end of the supply chain and we
    need that we need we need to know what goes into our production uh yeah
    right and with with when we looked at the uh landscape that is what we found that we
    need we need something that allows you to build those dependencies better and that’s when we what we started uh
    investing time in building this and we called it Persia so what Persia does is actually allows you to build from Source
    you can take any open source project and build it from Source uh you can figure
    out how it was built you can verify that it was built built in the way that you wanted it it also does multi
    oh yeah so Persia tries to sort of fix uh or help with the situation where
    supply chain uh needs help with building from source and verifying and and approving um
    approving uh all the binaries that you’re going to use uh into production
    uh one of the other things that Persia does is that it does it doesn’t trust just world one build system uh it has an
    inbuilt mechanism to run the same build on multiple machines and randomly chosen across the network Persia runs on a
    peer-to-peer uh Network so uh these builds are just shipped uh
    sent across to different random nodes and when they build and then they verify that they got the same result only those
    builds are actually uh submitted to the network so that they can be downloaded by uh by anyone who is looking for a for
    a binary so so this is something similar that um that people are doing uh to to ensure
    that there are no uh single points of attack like the ones that happened in solarwinds case uh so this is uh this is
    something that they are investing in in it and Persia also has a prominence log so
    you can actually write automation to figure out where certain thing came from and it’s not just the Stars it tells you
    that this binary was built from this gitsha and these were the steps that were run in building it here here was
    the end result and then it’s then it is uh stamped with uh with vulnerabilities
    even the known vulnerabilities at that time and you can ask these kinds of questions what this allows you is to
    write automation on top to us to actually make release decisions so you don’t have to wait and figure out you
    know somebody has to go and do a due diligence on whether you know this this binary is something that I can use
    and there’s there are a few few more projects that I wanted to talk about because as I showed it’s not enough that
    we have this this uh the simplistic supply chain uh covered uh Persia to
    work well needs uh needs to build uh uh needs to have reproducible bills because only reproducible bills can be verified
    against each other right so there is a project uh which uh Google open source team is I think doing uh around
    reproducible builds and it’s it’s a heavy lift because people have a lot of variety of ways in which they can get to
    the same result sometimes or they have a very complex ways in which they get to the build and in between do do many
    other things and and what the Google team is trying to figure out is what is that one script that you can derive and
    make that build reproducible um another thing that we Overlook is
    GitHub stars and salsa badges they don’t tell you what is happening real time with with the libraries that you’re
    using what is let’s say you download an Alpine image 351 what are the vulnerabilities that you’re facing yeah
    is that vulnerability vulnerability in your development staging or production and how are you going to handle that so
    there’s more there is more stuff happening around integrating vulnerabilities into the into the CD
    pipeline itself and uh and Percy is going to do that this is another project
    where you will be able to find out about the vulnerabilities right when you start downloading and the go team is actually
    trying to bring that in in one platform and the go team tries to address this
    problem for one ecosystem per se teams tries to do it for a lot more ecosystems
    not just not just for golang but that’s a start so this is somewhere something where we need to put some emphasis on
    so just doing doing a recap uh I shared some projects that allow uh us to
    address some of these attack vectors they are not going to be stuff they’re not a complete list there are way more
    things that need to be done there is automation that needs to be built on top of this so that you can actually have a
    continuous delivery pipeline that looks more continuous and not not as broken as it is today
    and as you see there are still some gaps how do we how do we make sure that the
    source that is coming from out from a source control system is secure how do
    we how do we make sure that Source control system is not compromised right there are some unsolved problems
    that that other teams are working on and we need we need way more tools to
    cover cover the entire supply chain so here I want to sort of summarize with
    a call to action uh you should go back and find out if you do have a good continuous delivery plan for your open
    source uh have you used any of the tools that I mentioned if you’re not are you do you
    find any opportunity to do to do that uh let’s let’s do this as a community from
    from the get-go we found found that people are doing these one-off Solutions and some of them work for them and some
    of them not but this is open source we need to invest invest in this together and make sure that we are trying the
    same things and we have we built the same trust mechanism instead of uh instead of instead of Bank a trying to
    invent their own and and the company B trying to invent their own right and time to act is now the supply chain is
    the supply chain software for software is under attack we need to do it now
    and this is also an opportunity to sort of wish Happy Diwali to all of you we
    are celebrating Diwali uh this week and today happens to be the day when actually businessmen in India uh they
    they open new books they start new projects so if you’re waiting for the right timing today is the is a very good
    day very auspicious day to start contributing to any of these projects thank you good question actually yes x-ray is very
    good at uh going through your uh registry or repositories and identifying
    what vulnerability vulnerabilities exist uh what actually doesn’t do is it
    doesn’t tell you how the software was built who built it where it came from whether it was built on multiple uh
    systems uh whether whether it came
    uh in in a way and but it is very much focused on vulnerability yeah it is very
    much focused on vulnerability yeah
    yes yes
    good question so the question is why would you not just use x-ray uh and and uh instead of instead of what we just
    talked about instead of Persia uh so the answer is x-ray is also meant for your
    proprietary systems and it it works as a product when when you have configured it for uh for all the things that are
    important from your proprietary uh perspective what we are saying is that we need that etiquette of X-ray to be to
    be used by everyone here and we can’t do we can’t just have companies who have purchased x-ray doing that for their
    open source and doing that you know individually and trying one-off Solutions what we are saying is we need
    something that we all can trust and we can all can trust in the same way right
    so Percy app provides that because then you you are looking at the same Province log that I am looking at and we are
    making the same decisions if because the provenance log is also telling me that my log 4J has availability and the new
    version doesn’t have that vulnerability so it’s okay for me to release it and you have the same information so that gives us way more control on how we deal
    with our open source right and for open source we think the solution has to be open source and it cannot depend on the
    tools that that companies are able to buy there are some companies which do critical stuff but they are not able to
    invest in like I said right even the lock 4G maintenance they don’t have access to x-ray for example and we are
    trying to trying to provide that via various means but they won’t have that expertise
    further in just the Elan for Jay you’re looking at all their
    transitive dependencies we will have the ability to look at yes because we will
    build log 4J and then we’ll build the dependency and the dependency Independence
    correct yeah so it we will be able to like what if effective palm and what what we are
    what we just released is the first level and then we are you will see the nested structures come come out of that as well
    so you can ask the questions about of that nested structure am I using openssl
    we can’t answer ask that question today but you could you could ask that question to Persia and you will know
    any other questions hi
    and all these files are also shipped with the source open source libraries or
    applications so if you use policies
    to people with that people which is not conventional as well but you want to say
    running Edge route running as privilege configure has 50 conversation
    yeah and so the question is are we thinking about uh you know configurations and and artifacts that
    are not traditional uh binaries right so at jfrog we use the term binaries uh to
    to uh to cover all all any any type of software that goes into production so we
    consider the configuration file as as an equivalent of binary that needs to be stamped that needs to be verified there
    needs to be run on multiple machines to make sure that it produces the same result as well
    yeah yeah
    so we started small with Persia and we started with Docker images and Java
    builds but we are expanding to other ecosystems that that go into binaries right so even implementing these kinds
    of policies will can become part of Persia
    rather than waiting until it’s already been published I think I think the
    question is most of this should be caught when when we are trying to build instead of trying to publish and I agree
    completely agree with with you most of this should be caught when we start start to ingest it but we don’t have
    tools that allow it allow that deep and Analysis and what we are saying here
    right right yeah and there are cannot exactly and that’s that’s the
    problem we see that one of ecosystems do have mature tools like Maven and uh what
    you mentioned Kyra they do have some tools that allow you deeper Insight but
    there is no good singular platform that allows you to answer that awesome
    okay thank you thanks [Applause