What’s the difference between sharding cluster and filestore sharding? [Video]

Patrick Russell
2021-02-02 19:35

Although they share the word “Sharding”, these two filestore types behave very differently. The precise difference is in how the two filestore types distribute binaries


Video Transcription

Hello. My name is Patrick from JFrog support. And in this short video, I will be explaining the difference between sharding cluster and filestore sharding. Let’s start. To begin with sharding cluster also known as cluster file system, this setup is used mainly in high availability or HA configurations. The main purpose is to make the setup of such HA clusters simpler. As you can see the mechanism here treats the disc of the local artifact. The local Artifactory disc as a shard in this setup and the HA nodes pass each other files as they’re requested by external clients. So for example, if node two receives a download request for a file kept on node one’s disc, node one will stream the file to node two, which then passes it to the client. This setup is much simpler on purpose. As you can see, the binary store configuration is just three lines to achieve this complex behavior.

On the other side, we have filestore sharding, which is more complex to implement. It involves mounting disks to the Artifactory host to expand the filestore space. It can be used with both standalone and high availability setups. The main difference between standalone and high availability is in the high availability configuration, the melts need to be network file systems. That way the HA nodes are able to access the same files.

So the other difference is there is a lot more complexity in configuring this. This is just a part of the binary store configuration you would need to implement. You can check our wiki for the full configuration. The main point is that each mount path needs to be directly specified to have this sort of setup work. So how are these systems the same? Both of them have a redundancy factor. This is mainly implemented for safety purposes. So in the event that a disk or a node should fail and its artifacts are not recoverable, there is still a copy of the file floating about. So for example, if there is a redundancy of two, that means that there are two copies of the file, two artifacts total in multiple disks. The both of them also share the benefit of adding storage is now easy. So if you are running out of disc space, you can add an additional HA node to the cluster in the case of a cluster file system or just mountain additional disc in the case of store sharding.

Do keep in mind there’s an upper bound on this kind of configuration though Artifactory has to search through each disc to try and find the file. So if there’s more than 20 shards, the system will perform poorly.

That was my video on the difference between cluster file system and file store sharding. Thank you for watching. I hope you enjoyed the video. Please feel free to leave your questions, comments and feedback below as a comment. Thanks, bye for now.