Last Mile Delivery at Scale

Navin Ramieni
Director of Infrastructre Engineering

This session covers the use of JFrog Artifactory for meeting the growing needs of Salesforce’s CI systems. The session describes the transfer of artifacts from development to production securely with Artifactory’s inbuilt replication and Salesforce’s learnings with replication in a large scale deployment architecture.


Hello everyone. I hope you are doing well and staying safe. I’m Navin from Salesforce. My colleague Chekri and I’m here to discuss about JFrog Artifactory and how it’s being used in Salesforce. In Salesforce Artifactory is not just a binary repository supporting our CA systems and developers at scale – we also use Artifactory’s replication as a primary distribution mechanism to transfer artifacts across the globe to all our production datacenters, but before getting into the details let me talk about what is Salesforce.

Salesforce is number 1 CRM platform today in the market Salesforce was started 20 years ago in a small apartment in San Fransisco for the last 20 years, Salesforce has grown or expanded into 67 offices across 28 different countries. Salesforce was founded on three core differentiators. One, cloud-based multi-tenant solution. Two, subscription business model. Three, focus on philanthropy.
Salesforce for the last 20 years has done well is doing well and is doing good in the communities that we belong to. Our core values has propelled our revenue to $17.1B in FY20.
According to Forbes, we are the world’s most innovative company. We are number 1 on People’s top 50 companies that care. We are also on the list for Barron’s most sustainable companies. We have been a leader in philanthropy in culture and in innovation. Our 111 model is donating 1% of time, 1% of equity and 1% of product for the community that we serve in, which, over 20 years translated to finding and volunteering hours 310 million dollars in grants, and over 45,000 non profit and educational institutes user our product for free. Now, let’s look at 24 hours in the life of Salesforce.
Salesforce powers trillions of B2B and B2C interactions. From financial services to manufacturing from healthcare to retail and beyond Salesforce transformed businesses. At the center of this transformation is an army of engineers that are empowering this transformation. My team’s mission is to provide the best experience for this army of engineers. The persona that my team supports, is of a developer.
The developer can be from product engineering, can be from infrastructure engineering, can be from database engineering, can be from quality engineering and / or security engineering. For all of these individuals we wanted the tool that can be considered as a single source of truth that can manage the lifecycle of an artifact that can also empower us to deploy on multiple substrates and also provide the flexibility to accommodate multiple rollout strategies. So these are needs for the tool but what are factors that define the tool that we pick? Is the growth and scale at which Salesforce developers are building the artifacts and deploying multiple times a day across the globe. Developer experience is very important to us. Developers should worry about the core functionality and shouldn’t worry about the deployment strategies or the substrate. Security, compliance, and secure transmission of artifact is very important.
The availability of artifacts and the consistency across the globe is paramount critical for all of Salesforce to be successful. So with all these factors that matter and the needs that our development community has we chose JFrog’s Artifactory as our solution. Artifactory’s performance is critical in our R&D environment and Artifactory’s replication capability is critical for our deployment across the globe.
Chekri, will go into details on the scale in which we’re operating and the system architecture overview of how this Artifactory is set up in Salesforce. OK, now, let’s get into details for factors that matter to us while picking a tool like JFrog Artifactory to support our use case. Across Salesforce, we have more than 200 instances of Artifactory deployment these 200 instances are supporting R&D environment, DMZ environment and production environment. The production instances are distributed across the globe supporting the deployments for Salesforce in all the data centers across the globe. Across all of these 200 instances, we see 92 million requests per day.
These requests are for puts, then for gets, they’re for help, all types of requests including in this 92 million. We support Docker, we support RPM and generic repo types. On a daily basis we transfer 4TB of data across the globe through Artifactory’s replication. There are 20,000 builds that happen per day in Salesforce supporting the engineering eco-system. Out of these 20,000 artifacts that get built roughly about 150 artifacts get promoted and consumed in production every single day. The replication is very critical when the artifact is transported across the globe. The end-user experience whether he’s in North America, whether he’s in Asia Pacific should be consistent. So if the replication doesn’t work, or has latency based on the location that’s not a solution that would work for us. This is the scale at which Salesforce is operating today. Another important thing is security. According to our security, R&D environments and production environments should be isolated. They cannot communicate with each other.
R&D environment can only initiate one communication to any other environment – that is controlled. On the other side, the production environment is also isolated and can only communicate one way and the initiation should be from production to a different environment. So now, R&D environment and production environment should initiate the call, but they can’t talk to each other. So that’s why we have a DMZ environment, I will be taking about it in the next slide. But, another aspect to security is the support we do for government customers. Government environment, should be physically and logically isolated. So, now, we have this R&D data center where the artifacts are generated, but needs to go into to the government environment. It is only a through replication of Artifactory that the artifact gets pushed to the government environment. So that’s the main goal and requirement from security. Now we have multiple roll-out strategies. Apart from the roll-out strategy, there’s one thing that we are very distributed across the globe and across different substrates. So we have bare metal first party data centers, and we have public cloud solutions as well. So our roll-out strategy should accommodate whether it’s a bare metal first party data center, or it’s a public cloud data center.
That’s one requirement. The second requirement is canary deployments. Canary deployments are very interesting, it’s only for infrastructure changes that we are doing as code. So our infrastructure artifact gets deployed to a canary environment and gets big and tested for a week or two, before that change rolls out to all the other production data centers.
So these data centers should be isolated and shouldn’t impact our customers, but should be serving similar volume of traffic. The other aspect is staggered releases. So Salesforce doesn’t want to test anything big-bang across all the data centers. They want to try a particular feature or a a particular product in certain data centers. So,that data center can be in Asia Pacific, can be in North America or can be in Asia Pacific and North America. So we should have a mechanism to be able to roll out an artifact only to the data center that that product team requires.
So that is supported by repo level replication that Artifactory provides and this helps us with the staggered release. The other aspect is geo-location based releases. This is important for us because we don’t want to impact the customer while we are releasing and it’s a big time. So we want to do the releases during off-peak hours. So geo-based, like for Asia-pacific we do it off peak hours of Asia, and for North America we do off-peak hours in North America. So this is about the rollout strategy and the next is the developer experience. It’s very important to us to make sure our developers are focused on the core functionality and not bothered by all the roll-out strategy, deployment substrate, and all of those. So, we want to provide the best experience for our developers by not worrying about these things. So we’ve also have a varied customer-base. So we need to support different types of packages. And we were looking for a tool that can do that and Artifactory was one of the tools that we really liked the extensive support for different packet types.
In Salesforce, in the last 6-7 years there was growth in microservices. And its exponential growth, everybody started moving towards microservices, we had a monolith, but the new features are build as microservices, So the 20,000 deployments or builds per day are coming from these microservices. As I mentioned, there are 150 promotions per day. So how do we continuously integrate, and how do we continuously deliver, is an important aspect or a factor that matter to us.
So those are the factors that helped us decide what is the tool that is right for us. And we chose JFrog’s Artifactory primarily because of not only being a binary repository, but also a tool that could help us with our roll-out strategy with our security needs with our scale and also the replication that would help us distribute our artifacts across the globe. So next slide is basically a functional overview about how we have set up our Artifactory so that it could solve all the needs that we had to adhere to. So on the left side of the screen, You can see the first one is the engineers.
Engineers both read and write into Artifactory. They build code from which an artifact is generated, but they also read artifacts that have been built by other platform teams. So a typical use case is an engineer builds on top of an artifact, he checks in his code and the CI systems pick up that code and build a new artifact with the new change in the code and that artifact will be pushed back into Artifactory. Roughly about 20,000 builds happen on a single day which is reading and writing into Artifactory in the R&D environment. So after an artifact is built, the team decides when to release that artifact. So if a team decides to release an artifact, then we re-generate an artifact to be signed by our security systems.
Our security systems not only validate but also verify some of the basic verification that has to be done. So once the validation and verification is done by our security process the artifact is signed. Now, the artifact is ready. Teams might not be ready to push that artifact into production, they want to time it at a certain time, they might have a specific roll-out strategy, they might want to just deploy in canary and see whether that particular change is behaving as expected. So that’s when the promotion process gets into the play. So the product infrastructure team once an artifact is generated and then signed will create a PR which will trigger the promotion process.
Now, the promotion process, it reminds, what is a roll-out strategy that the team has chosen, and based on that roll-out strategy the artifact gets moved to different repos within Artifactory. I’m going to talk about all the different types of repos that we use with an Artifactory in the next slide, but for now let’s say the promotion process is moving artifacts from R&D environment in one single repo to another repo. Now, each repo within the Artifactory has a certain set of replication properties and we use that very heavily within Salesforce.
Before getting too deep into the replications, I want to refresh that, what I spoke in the last slide about the security. So as I mentioned, R&D environments should not be communicating with production environment and production environment shouldn’t communicate with R&D environment. R&D environment can only initiate a conversation similarly, production environment can only initiate a conversation. So, there is a different type of replications that we need to support this use case. So, the environment that listens to both R&D and production is our DMZ environment. So R&D environment uses the per repo replication and the mechanism is push replication.
It pushes the artifact into the DMZ Arftifactory instance. Once an artifact comes to DMZ environment it’s going to sit there the production data center invokes a call. How does production invoke call? Currently, we have a mechanism to use for replications with chron-based settings. So periodicallu, each production data center comes and pulls the artifact from Artifactory in the DMZ. So, as I mentioned, repo level replication settings in R&D environments help with the roll-out strategy, and push replication helps to move artifacts from R&D to DMZ. Chron based replications on production data centers allows us to pull artifacts from DMZ periodically and once the artifact moves to the production data center there’s a process that triggers all the application servers, to talk to Artifactory and get the artifact and deploy. So, this is the whole end-to-end process, how the artifact is generated in R&D how it moves to DMZ and how it’s moved to production.
So, there’s a lot of logic and strategies around the deployments within Salesforce and there’s so many teams so many engineering teams, that needs different flexibilities and different roll-out strategies. So, Artifactory pull replication, push replication, event based, chron based replication have been heavily used within our infrastructure. So, the scale of which Artifactory is helping us has been amazing, and we really appreciate all the support that we have got and the constant upgrades and feature enhancement that Artifactory is doing on this product to help and scale to customers like Salesforce. On the next slide this the system architecture of how our deployment is across the globe. So, on the right hand side, you see the production data centers.
So we have multiple variations of production data centers, we have multiple substrates, for example, bare metal instances in our first pary data centers. We have public cloud presence in some of the geo-locations where we needed to respect the security compliance that that location requires. We are in HA and non-HA mode, the reasons being requirements for government data centers, we should have an isolation at the app level as well, so that would limit us from having a common data base and common storage. At the same time, we also have aging data centers that do not support certain infrastructure. So we need to be able to serve those data centers as well.
So Artifactory, supporting HA and non-HA mode was also a critical aspect when we were looking for a tool that would solve a lot of problems. So now on the left hand side, you have on the top of the left hand side you have different CI systems. It’s micro-service, it’s monolith, infra CI, database CI, package process and also the promotion process. So all of these interact with both PtoP instance and also Dev instance of Artifactory. So the PtoP instance is actually path to production, so any artifact that is generated and signed goes into PtoP Artifactory.
As the PtoP Artifactory is set up in HA mode with a common data base and shared storage. So the developers interact with R&D Dev instances and this primary use case will be a read. So we also have a remote repository, we use remote repositories, we use local repositories, we use virutal repositories. The reason we use remote repository is any artifact that was built for release should be available for our developer use case.
So, the package process wants to push to push to artifact to P2P that artifact will be available for our engineer’s through R&D development environment. Then we use, local repositories which is basically any repository that we can write to, primarily in P2P most of it is local repositories. We do have virtual repository. The reason we use virtual depository is basically to be able to query the artifact without knowing the actual path for the complete repo.
So we group things together so that the consumer doesn’t have to worry about where to find a particular artifact. Now, so the R&D environment,P2P, has an artifact that is ready for deployment, and based on the roll-out strategy it gets pushed to the DMZ environment. Now, that push is based on event so once the artifact is signed and promoted, it gets pushed. On the production side, we use pull replication and pull replication is chron based and it it triggers, periodically and pulls artifacts from the Artifactory. Now, the local app servers, infrastructure servers or data base servers pull artifacts from the local Artifactory in the production data center and deploy the artifact at a periodic time based on a promotion process, again production data centers.
So our experience with Artifactory and the support that we needed to provide for the scale for the scale of the Salesforce needs was huge. So Artifactory has been a key to improve our developer experience and also to support our roll-out strategy across different substrates across the globe and replication from the Artifactory really supports our needs from a security standpoint.
So, overall, Artifactory has been very immensely adapted tool within Salesforce and it is one of the main tools of our delivery of artifacts to production data centers. So, overall, the support that we got from JFrog through our journey has been amazing we look forward to a lot of features coming and we are pushing some of the boundaries on the roadmap as well. We hope our partnership will continue and we will be successful together. Thank you all for listening to me, and I hope that some of this input is triggering some new thoughts in the way you guys deploy Artifactory in your own environments. Thank you so much and we are open to take any questions, if you have any.

Try JFrog for Free!