The Software Extinction Event That Wasn’t

Think CrowdStrike’s recent update impact… x1000

Note: This blog post was previously published on DevOps.com

Imagine if the world’s most pervasive programming language, used in the majority of organizations, services, websites and infrastructure today, was itself made to be malicious?

Cybersecurity researchers from JFrog recently discovered a GitHub Personal Access Token in a public Docker container… due to the popularity of Python, inserting malicious code that would eventually end up in Python’s distributables could mean spreading your backdoor to tens of millions of machines worldwide! 

 Bruce Schneier, Cybersecurity Guru, NY Times best-selling author

Picture this: you wake up in the morning, and notice your mobile device has no connectivity. You check your wifi. It’s also out. You turn on the TV to see if there’s any big news – only a couple stations are even functioning and only over the air via antenna. Before you can get oriented, the power goes out.

You feel like this is the beginning of every disaster movie, but dramatically it’s happening in real life. What’s to blame? An EMP? Terrorism? Nuclear war?

No, but something potentially just as devastating.

A widespread cyberattack that derails nearly every aspect of billions of lives – even digital services based in space are vulnerable.

What enabled this unimaginable disaster?

For the slightly more technical: one wayward Docker container hosted on a public hub, which contained an access token to provide administrator access to Python, PyPI and Python Foundation’s GitHub repos. In the wrong hands, this leaked token would have provided access to the entire Python infrastructure, allowing a nefarious actor to potentially – very sneakily and quietly – take control of systems running Python.

In even more simplistic terms: an engineer accidentally posted login/access information online that would allow someone to get a “back door” into many of the world’s computer systems.

It’s not science fiction. It almost happened.

And it could have taken down some of the most critical systems in the developed world. At JFrog we are not prone to hyperbole, but this wasn’t an exaggerated risk. It was a very scary reality. In fact, if not for the dedicated work and investment from security researchers at JFrog, you may have tragically experienced it.

What is Python?

Python is a multi-purpose programming language that is designed to be flexible and easy to use. As such, it is a default language and framework for a wide variety of computing systems from website development to software applications to workflow automation to data analysis – even as the default language that enables many of the AI technologies used today. Due to its flexibility and simplicity, it is often one of the first languages used by developers as they learn to code. Software libraries and other packages written in Python are some of the most prevalent open source components used by modern developers.

Who relies on Python?

To understand the potential scope of this attack, it’s important to point out the prevalence of Python in digital systems. It is not a niche use case. The vast majority of critical computing systems utilize Python in some form and could be targets for such an attack. Think CrowdStrike’s recent update impact… x1000.

  • YouTube, Instagram, Facebook, Reddit, Pinterest and other popular sites and services are written in majority or whole in Python (NIT Academy, Probytes). Say goodbye to the majority of services that run and display social media as they black out in an attack.
  • Nearly all machine learning models and AI-enabling software utilize Python to execute. (NIT Academy) Imagine if suddenly everything that runs on AI is taken over by a nefarious actor and trained or retrained with any data they wish.
  • Amazon, Google, Microsoft – All the cloud services and infrastructure from the major cloud players could be blacked out, including all applications and services that run on them. This alone would bring global commerce and interactivity to a halt. (Learnenough, Manpreet Singh)
  • P2P financial services like Venmo, and traditional firms with TRILLIONS of dollars under management (such as JPMorganChase and Goldman Sachs) utilize Python heavily. This attack would have crashed global financial markets, as well as crippling most of the world’s major stock exchanges. (Planeks)
  • Governmental organizations and election systems rely on Python, creating a very real possibility of complete government shutdowns and democratic disruptions. (NIT Academy)
  • Space exploration, satellite-based networks and critical research missions (including NASA, space telescopes, SpaceX and more) all depend on Python, making it very likely all advances in these areas would be orphaned for an extended period (perhaps permanently), leaving a very lonely Mars Rover on the red planet, even dangerously trapping astronauts in space. (Builescu Daniel)

In this disastrous case, the “end of the world” would come about not with a whimper, but a cyber-bang that crashed most global systems. Identities and information all stolen or missing. Companies destroyed, fortunes wiped out, records deleted, medical care unobtainable, workers unpaid or without access to funds. It could take decades to recover fully.

Thankfully it didn’t happen.

But it was a close call.

How was the potential attack prevented?

JFrog is investing heavily not just in products and product development, but also takes our responsibilities to the community very seriously as part of our security research team’s efforts. Part of that ongoing reinvestment effort is to proactively download and examine some of the world’s most popular packages for potential vulnerabilities. Some discoveries – while important – may have a limited blast radius. Some may affect specific operating systems, such as the CrowdStrike issue or maybe certain solution sets such as with SolarWinds. In rare cases, issues are more prevalent, such as the Log4Shell instance. This Python takeover could have made them all look like child’s play.

From our technical blog on the discovery, we find that JFrog secrets scanning engines detected a “classic” GitHub token in a binary Docker container hosted in the public Docker Hub repository. The risk with “classic” GitHub tokens is that, unlike newer, more fine-grained tokens, they grant permissions across all repositories the user has access to.

PyPI Supply Chain Attack VectorPyPI Supply Chain Attack Vector

The discovered token had permissions to change any piece of code related to Python, from the simple Python executable to any Python package that’s hosted on the official PyPI repository.

The leaked Python admin token, inside a compiled binary fileThe leaked Python admin token, inside a compiled binary file

This showcases the “daisy chain” of consequences that could have come about, as an infection moves from one system to the next.

Why only scanning source code guarantees blind spots

In the case of this secret leak, the token was found ONLY in the compiled binary file. It was not discovered to be in the source code for this Docker container, indicating the developer may have temporarily utilized the credentials to build the binary, then cleaned up the code before making it available alongside the binary in the Docker container, forgetting to scrub the artifact.

Why does this matter? According to JFrog’s recently published Software Supply Chain State of the Union report, a staggering 27% of companies are utilizing ONLY code scanning to provide software supply chain security, and only 56% of companies are using both code and binary scanning together to secure their supply chain. This means more than half of companies have a glaring blind spot that may not have detected this vulnerability.

Software Supply Chain State of the Union ReportSoftware Supply Chain State of the Union Report

What is the community saying?

Thankfully, the gravitas of this scenario has not been lost on the community, spurring all of us to not only provide even-better tools, but also to be more diligent in our approaches.

What is most interesting, however, is how JFrog found the token. They discovered it in a compiled Python binary file! It turns out that searching for secrets in the code itself may not be sufficient. Artifacts in the binaries themselves can also pose a serious threat to supply chains!

Maciej Markiewicz, Product Security at Egnyte

Interestingly, this secret wasn’t exposed directly in the code base. That’s something developers, especially those mindful about security, are well aware of. But you might not have known that when .pyc files are generated, they can inadvertently store sensitive information, including secrets. Jfrog (sic) notes that many secrets they uncover are found within binary artifacts. FWIW, I’m not calling out the admin for this. It’s an honest mistake; they wrote an excellent summary and investigated what they could, identifying no indicators of malicious usage.

 Kyle Kelly, Security Researcher

The full blog post by the Python engineer who inadvertently made the error may be found here. As you see in his (very transparent and authentic) blog recapping the issue, it is very easy to make an honest mistake that has potentially devastating consequences. And with 20+ million developers globally, mistakes will inevitably happen and secrets will be leaked if proper guardrails are not applied.

Key takeaways

The majority of industries and most digital infrastructure rely on Python in some way, making threats to this core framework highly consequential and possibly more devastating than recent software supply chain attacks.

The digital world may have just avoided a landmark Python event thanks to researchers’ diligence. Many thanks to them for saving us from consequences we will hopefully never realize.

Scanning both source AND binaries is essential to ensure your company doesn’t accidentally reveal secrets or introduce vulnerabilities. If you are not currently doing this practice, there is far too much risk to waste time (if you need a place to start, JFrog powers all of our security products with this approach, led by the same research team that found this secret).

Supporting the community is key. There are earth-shaking discoveries that need to be shared with the community and fixed quietly before nefarious actors can take advantage. We should learn from the close calls to ensure none of us is next in the headlines, and build better tools and automation to safeguard our companies and posture.

Huge thank you to the JFrog team for finding this before it could have been much much worse, and as always thank you to the volunteer team running Python’s infrastructure for handling this so well. These things can and do happen to anyone, and their response was world-class.

Dan Lorenc, Founder & CEO, Chainguard

Let’s find, fix and fortify together.