Code Cleanup: The How and Why
By John Cabaniss, Strategic Solutions Architect
November 18, 2022
5 min read
Between August and September 2022, JFrog helped a primary U.S. telecommunication carrier host a series of sessions about developer best practices. Some 1,300 developers attended at least one session on package-specific examples covering Docker, Maven, Gradle, NPM, Nuget, and PyPI. One of the intriguing takeaways from these best practices sessions was the uniform questions that arose in each session about conducting code cleanup.
What is Code Cleanup?
For this article, we will define code cleanup as how we should approach the removal of unused code. A significant amount of code written is useful and makes it into the final releases. However, there’s also code that’s written that no longer should be included with the final work product. The approach of what, when, and how to remove code should be cooperative manner between developers, storage management, and stakeholders.
Purpose for Code Cleanup
For the average developer, there’s a focus on speed and quality. Velocity is the key concept, and passing tests is a must. Every development team is aware space and storage are limited, but not that big a deal usually. That is until storage runs out, and an administrator starts a code cleanup. Administrators want to ensure, for example, that a cluster out of storage will not freeze up and cause an outage due to storage issues. And here is the actual problem. Fundamentally, all the approaches and lessons learned are from a reactionary approach to storage running out. This process is contradictory – eventually, storage will always run out (or, in the case of cloud storage, cost too much). What no developer team wants is for certain elements to be disrupted. These elements change from team to team, along with organization and structure. Each company has its own process for how the removal of stored data works, but a system made without developer input won’t be good.
In general, an administrator will ask developers two questions regarding stored data: 1) What binaries have you used lately? And 2) How old is the binary? The age is simple to find, which is easy for administrators. The usage and distribution are needed for any remediation of code, so that’s a process that security teams normally are involved in and administrators handle. For example, after the announcement of log4j, every company became aware of the traceability in short order.
But these two data points alone aren’t enough to do the desired job. They are just the metrics a storage administrator can easily get
Planning to clean up storage isn’t normally a consideration while planning a repository or team architecture. The entire concept is an afterthought, and to make a really good cleanup process this needs to be part of the fundamental plan.
How to Cleanup Code
The system of preserving code well speaks to the entire lifecycle of a binary used by a company and is worth some thought at a high level. In an ideal scenario, there will be a process to move or promote code through a series of repositories. Each gate represents a different value to the company and storage policy. The final policy is normally a “cold storage” of some kind, to preserve binaries given out to customers indefinitely. The JFrog platform has a native solution for this outlined here.
At the start of this process, there are a lot of variabilities—often one “developer” repository for all binaries, irrelevant of usage or importance. The first step is to identify the binaries in common use. After, move them to a common location. Some companies prefer to rename binaries to note their importance instead of moving them.
Both approaches can break pipelines. Using “virtual repositories” can also help where a system searches multiple locations for the presence of a binary and the user is unaware a move happened.
If virtual repositories aren’t an option, use metadata to provide the same feedback. Here are some examples of how to do this:
- A well-controlled versioning system provides importance instead of an actual move
- If a system allows a simple key-value pair to be added, this can be set to note importance (e.g., Artifactory provides “properties” that can be searched on)
- If one considers RBAC as a form of metadata, the delete permission can be removed, requiring some effort to delete
Whatever method is used, the key is for developers and administrators to have an active discussion about how workflows for cleaning up should work. The factors will be discussed and well-known by all parties, along with the plan to clean up when disk space runs out.
Part of the plan should include actions on both sides as disk space runs low. A cooperative relationship is best, where developer groups are given routine notices from automated reports on disk space usage. Ideally, a shift-left mentality will extend into cleanup activities.