How deployed artifacts are stored in S3? [Video]

Scott Mosher
2021-01-28 08:58

Uploading to S3 flow

Video Transcription

This is Scott from JFrogs report. Today in this video, we’re going to take a quick look at trying to better understand the work flow when uploading an artifact using S3 as the backend. We’ll quickly discuss a standalone instance and then focus on a HA cluster, cause there are slight differences that all that I’ll discuss.

So with this, I’m currently on the primary node of HA cluster two-node cluster, I want to be hitting this primary node directly. The binary store that I’m making use of, is just the template for the cluster S3, cluster S3 pertains to an agent cluster and S3 template pertains to a stand-alone instance. We can take a look at the stand-alone binary store configuration. You can see here, this template contains this chain, a cache fs, the eventual retry mechanism, and the S3 end point itself.

In each cluster, you have a similar setup, but in this case, nodes need to be aware that other nodes exist. So you have a cache fs that’s eventual, the sharding cluster eventual, the eventual cluster, the return mechanism and the S3 end point. So what we’re going to do is take a look, or in the Artifactory data directory, you can see there’s the cache and the eventual. So within this eventual, we see the underscore pre in the underscore queue. If we were on a standalone instance, within this eventual directory, we would see and underscore pre, underscore add and an underscore delete. The idea is, this queue folder ultimately combines both the add and delete.

So we’ll perform an upload, and the idea and concept of why we make use of this eventual is for performances, for improvements. Ultimately for uploading a five gig file or a 10 gig file or something and there’s quite a bit of network latency between the Artifactory server and the S3 bucket. That upload may take an extended period of time, possibly other users, other builds. Other applications are trying to make use of this binary. And won’t be able to, until the upload is complete. This eventual works as a temporary storage for a binary in the process of being uploaded to S3.

So this queue we can see is empty currently. So what we’re going to do is want to perform an upload to this primary node and lets hit the artifactory node. We’ll upload to this generic local repository. We’ll call it something.txt, and I have a temporary file. Let’s make use of art.txt, lots of proposed output for this. So we’re going to form this upload and while we do that, we might as well, we have the other same node eventual here. We’re going to navigate into this queue directory. Should be empty. Nothing has been uploaded, nothing’s in the process of being uploaded, and we’re going to do a watch. And so now when we kick off this upload, we can start to explain. 404 because I misspelled artifactory.

So let’s just kick it off one more time and now we can see, so the upload is in progress. Now you navigate over here, we’re doing a live watch on that queue directory. What we’ll see is once this upload completes, we should see the binary along with a timestamp and the process in which this binary is in action of performing. So we see the checks on D zero four, we see a timestamp and we see that dash add, this is for an upload. The idea is, as soon as this artifact is available in the eventual, I could pull this artifact from another node, other users could pull this artifact, and right now the upload to S3 is still going on. Once the S3 upload completes, that binary will disappear from the eventual, that’s just being used as a temporary holding place.

Now, as far as performance, we can also see that it was a successful upload and the binary should now be stored in the cache. We saw it as a D zero and here it is in the cache. So, if another user request is artifact, it’s going to be pulled from local storage or possibly an NFS, but won’t need to reach out to S3 every time, that’s the purpose of the cache. And that’s the video. I hope that kind of clears up the use of the eventual and also the differences between a standalone artifactory instance making use of the S3 file store and an HA cluster making use. Thanks again, and let me know if you have any questions.