Create Scalable JFrog Ecosystems Using a NetApp Cloud-native Solution [swampUP 2021]

Anuj Kumar SVP Cloud Sales Netapp ,Fabian Duarate Strategic Cloud Sales NetApp ,Shankar Hariharan Senior Product Manager Netapp

June 30, 2021

< 1 min read

Join the JFrog & NetApp technical team to learn how you can reduce/eliminate developer community downtime for your JFrog ecosystems and enable true disaster recovery across cloud regions in GCP. Get started with your instance today: https://jfrog.co/35OKwXW

With NetApp’s cloud-native data services solution — Cloud Volume Service(CVS), we will show you how to: 1. Reduce/eliminate outages 2. Protect against regional failures 3. Enable increased capacity & performance on demand 4. Manage NetApp and IaC solutions through Terraform Please join us in this session which will include a step-by-step walk-through demo that can be easily recreated in your own environments without needing to call a storage administrator!

Video Transcript

Hi, everybody, I’m Fabian Duarte. I work for NetApp on the cloud data services site,
I specialize in Google Cloud Platform.
I am a go-to market specialist at NetApp,
talking about all sorts of different solutions,
especially in the DevOps space.
We have this awesome partnership with JFrog
and in today’s session here at SwampUP, we want to talk to you about how you can build these
highly scalable environments with cloud volume service right into your JFrog artifactory ecosystems.
And so without any further ado, I want to introduce Shankar.
Thank you for being here. My name is Shankar Harihara, and I’m a senior product manager at JFrog.
The past 15 years or so, I’ve held several roles in product management and engineering.
I’m really passionate about building products and services to help
enterprises accelerate their digital transformation initiatives,
particularly really passionate about DevOps, and observability.
But today, we are here to have an awesome session on scaling DevOps with JFrog HA and NetApp.
Now, a lot of you might be using artifactory for a while here.
And I want to say that you are already familiar with a ton of rich features within artifactory,
including unlimited scalability, universal compatibility,
virtual and remote repos,
flexible searchability, and an extensive automation.
So with all these capabilities within artifactory,
we are here to focus on a different aspect of the JFrog HA architecture,
which is about the storage layer.
Now, those of you have been using artifactory for a while, you know that your volume and size of binaries
grow exponentially over time, right?
So as you know, today, like Docker images can take up to one gigabyte or more
and Java applications can take up to 100 megabytes or more right?
Now, in terms of storage, what we have seen is like different customers have different storage needs
and this really depends on their specific use case.
For example, like some of them might need in the order of tens of terabytes, whereas
others might need in the order of hundreds of terabytes, right?
So now, when you’re dealing with such a large scale of data,
you often have requirements on how you want to increase or scale your storage needs
as your demand increases
and this is one of the aspects that we will be talking about today with cloud volume service.
The other thing when you’re dealing with such a large scale of data,
is that localization matters a lot, right? Because
where your physical server is present is really important,
because that correlates to performance
and how fast are you able to resolve your dependencies and upload download artifacts, right?
At the end of the day, it really impacts developer productivity.
Every minute your developer is waiting for an artifact to download
is a minute not spent and writing awesome code, right?
So data architecture matters
and this is what we will see in our next slide.
What does this architecture for JFrog HA look like
and how does storage come into play here?
Now, when you look at this diagram,
you see that the JFrog HA architecture
for artifactory has a load balancer in the front,
and it has several stateless artifactory nodes that are serving the traffic.
In the backend, we have a shared storage that could be either S3 based or NFS based
and there is a database that is really holding the metadata for all of the software binaries.
When you look at this JFrog HA architecture,
one of the goals that it aims to serve as really maximize your uptime.
There is no single point of failure here, right?
And your system can continue to operate as long as one of the artifactory nodes are operational.
This architecture also allows if you can see clearly that
larger load bursts with no compromise to performance, right?
So you can easily scale your servers horizontally,
and also increase your capacity to meet any load requirement as your organization grows.
This architecture also allows you for minimizing your maintenance downtime.
Now with all of these different aspects of this architecture,
we will focus on how we can benefit from cloud volumes service as one of the storage solutions
in this JFrog HA architecture.
I will now hand over to Fabian, he is going to take over from here.
Absolutely Shankar and thanks for sharing.
Let’s actually go back to that image that we were looking at where
it begins to show us how w can leverage that storage for scalability.
So when you look at us here,
you’re beginning to see that you really want to build out these massively scalable environments
and if you have a lot of developers that are coming in here and beginning to
generate a lot of load in these artifactory environments,
you want to have them be able to scale out horizontally to meet your demand,
and if the demand shrinks, you probably want, you know, some of those instances to go away.
So you can again, keep up and keep to scale with the demand that you’re having
and that’s where that NFS external storage really becomes valuable
as you’re trying to grow, because it’ll grow with you,
not just in terms of the portability,
but it will also grow with you in terms of the size,
and in terms of the features that you’re pretty much going to need
on these day to day operations.
And so that’s why I want to talk to you in particular about cloud volume service.
So cloud volume service, in all fairness
is an actual storage array living in a Google Cloud or near Google Cloud Data Center,
that we’ve peered into your Google cloud environments,
and so you can get it through the Marketplace, it then requires just three commands to peer
our network, your network, and then you’re off to the races ready to use cloud volume service.
And the beauty of it is you need to have zero management experience and storage.
You don’t have to know the intricacies of managing these environments.
If you know how to use your Cloud Console,
you can already use cloud volume service, it doesn’t get any more native
and simple than that.
Again, our workloads that we’re looking for are these Linux based workloads
and the SMB based workloads, and it’s a very quick conductivity.
So it’ll provide you a very simple provisioning UI.
And if you don’t want to use a UI like myself, you can use terraform
or you can use an API that we enable to Ansible
for you to be able to go and deploy these environments yourself.
The benefit of getting this at the data plane layer for storage,
is that you’re going to be able to get features on demand, which will help you scale out for performance
on the storage side, scale out for accessibility of your data,
scale out in terms of growth as well.
So if your artifactory and your binaries are growing,
we’re able to keep up with that growth as well.
If it shrinks, we can shrink it as well.
And not just that, we’re going to actually go into how you can replicate the content
at the storage layer to move it from one region to another.
Again, since this is a cloud native solution,
you’re going to be able to purchase it from the marketplace,
which means that you’ll transact all your billing
through your Google accounts or through your AWS account or Azure,
you’ll be able to monitor it with their existing tool sets,
you’ll be able to provision storage with existing IAM capabilities, and much more.
We’ll go into all of that in the next slide.
Now, one of the nice things about this is that it’s available in Google Cloud,
in Amazon web services under the name of cloud volume service,
we’re also on Azure under Azure NetApp files.
And we have another tool that is not going to be covered here but it’s called Cloud volumes on tap.
This is an actual virtual storage array
that consumes cloud resources,
and allows you to provision these exact same file shares
from a single location.
What’s cool about this is that that on tap portion
is the exact same operating system that’s running inside of your data centers today,
if you’re an existing customer,
which means that if you have a big repo
that’s living in the data center, and it has an NFS mount point from NetApp
in the data center and you want to move that bad boy up to cloud,
it’s just a very simple snap mirror,
you’ll get all the content up there,
once the content is up there,
build your compute nodes in cloud, and you should be able to be up and running in no time.
So focusing back on cloud volume service,
like we had mentioned before, cloud volume service is a complete NAS tool
and NFS mount that exists inside of the Google Cloud ecosystem.
It’s highly available, so you don’t have to worry about
your availability in the region.
It has multiple storage classes, which is also a great tool.
So if your performance demands go up, because you have more developers checking in code,
you can very easily change from one tier to another without migrating your data.
It’s just the simple terraform, API or GUI call
the same way with growing and shrinking volumes.
It’s all secure, right?
We integrate with LDAP,
we integrate with AD if you’re on the windows site,
the content sits encrypted in flight and at rest.
We monitor our environments, you can go into Google Cloud Stackdriver and build your own
consoles to monitor the environment.
You could go and
subscribe to pager duty as well just in case there’s any incidents you need to be alerted of,
and on top of that, we provide the capability to do instant clones and snapshots,
they literally are instant.
So if you want to create a copy of your entire
artifact environment or any other environment that’s using it,
you just simply enact the snapshot through an API or through the UI,
and you already have it available for use
and using that exact same concept, we can actually replicate your content from one region
and move it to another region in cloud as well.
So it’s very much a useful tool.
So we’re going to talk about how we can actually begin to implement it.
And so that’s the next slide where Shankar and I are going to walk you through more or less
how we took our initial setup,
grew that setup into a highly available setup and ultimately wound up
with a true business continuity design.
And so here, we’re going to show you
what looks like an initial deployment.
So Shankar, why don’t you walk us through what we’re seeing on the right hand side?
Sure. One of the things that you will notice about this architecture is that the
application is really decoupled from the data, right?
So this is an example of an architecture where you’re
spinning up a single artifactory instance using cloud volumes.
Now, as you can see here, we have the region US West 1,
where you have a Linux server
with some persistent disk provided by Google
and Adobe operating system along with the JFrog DB and the application.
Now, the JFrog DB can also rely on a different persistent disk,
this would be an external database,
which is a specific use case for some customers.
And on the right side, you have NetApp cloud volumes,
which is essentially your storage solution,
which is hosting all of your JFrog binaries.
Exactly. And so you can see, it’s a very easy setup, right?
So you continue to use native cloud storage to house the OSD application,
and let NetApp handle your data, right?
We’re experts at handling it,
we know how to keep this data secure, how to move it from one location
to another to help you build a very scalable environment.
So great, you’ve done this work,
we’ve already set up the initial environment.
Let’s take it to the next step.
So Shankar explained what we’ve done here.
What they’ve done here is really spun up two JFrog instances in two different zones.
As you can see, here, US East 4A and US East 4B,
with the familiar architecture that you had in the previous slide.
And now, both of these instances are sharing the same cloud volume service,
which is hosting all of the JFrog binaries.
Also, you would notice that you have the external database, which is Postgres sequel,
in a different subnet.
The reason why it is in a different subnet is
we want to make sure that only the JFrog application is able to access the external database
and the database has no access to the public network.
So with this,
what we have achieved is like a JFrog HA architecture using NetApp’s cloud volume service.
Exactly. And if you notice, when we go from one standalone environment running NetApp
to a highly available environment, we’re not duplicating any data on the cloud volume side,
we’re just simply re pointing the new server to that NetApp storage system
and we’re making sure that it accesses it day in day out.
And we’re going to be able to do some real nice stuff with it,
as you’re going to be able to see.
So what we’re going to talk about is an environmental setup that looks something like this.
So let’s talk about what we built up in our environment.
So if we come in, and we start looking at what looks like our Google Cloud Console,
this is actually cloud volume service,
you can access cloud volume service to the marketplace, as you can see here,
simply go into your marketplace, search for NetApp,
NetApp, cloud volume services, or cloud volumes on tap will appear
or Azure NetApp files depending on which hyper scalar you’re in.
Like we said, once you enable the service, you’re going to peer it and you’re off to the races.
The beauty of using cloud volume service is that it’s a completely cloud built tool, right?
To make sure that you’re accessing your environment exactly like it is in the cloud.
So if you’re going to provision let’s say from the console,
it’s a very easy setup, you’re just going to come in here and say,
this is the name of my file share,
I’m going to come in here and provision performance storage.
If you’re going to create a copy in a net new region, and this is your copy,
just go ahead and select this. If not feel free to ignore it.
Determine the region where you’re going to build this in.
Keep your path name.
But here’s what we were talking about, the service levels.
Notice how you can go from 16 megabits per second
to 64 megabits per second. In 128 megabits per second
from the same capacity you’ve provisioned,
which is pretty amazing, because you don’t need to actually migrate your data
to get more performance from your storage.
So we’ve built it in there,
you can come in here and determine what capacity you need,
you can come in here and then pick the protocol that’s your favorite protocol to operate from
and then, of course, the network.
All these things you’re already familiar with as a cloud engineer,
as an SRE, as a DevOps guy,
and then, you know, decide which servers can access the storage, and
if you want to keep a snapshot schedule,
you enable it all from here.
So you can see it’s pretty straightforward to be able to use it.
Now we’ve gone in, and we’ve talked about an environment that we set up
so just to show you, here’s our load balancer that’s up and running,
we have our Compute Engine instances
that are also up and running, in both US East 4 and US East 2.
And here’s that awesome Cloud SQL database we’ve taken from Google that’s running Postgres
to make sure that it keeps the metadata in line
with your environments.
And so we’re going to come in, I want to take the moment as well, to show something.
I took the liberty to come in here and create this entire environment
off of terraform.
And so you can see here we have our main TF file…
sorry, wrong place.
Our main TF file is here
and you can see that we’ve called the NetApp dash GCP provider
so we’re a provider inside of terraform,
simply add it to your main TF file,
you’re ready to go.
And then we’ve come in here and started creating resources.
Since this is sitting on my laptop, and it is a local deployment,
I’ve called a whole bunch of local variables,
if you’re more comfortable carrying global variables,
go ahead and add it in there.
But as you can see here, I’ve given my volume a source name,
I’ve picked the region, I’ve picked the zone where I want it to be in,
the size of the volume, and the type of storage that it’s running.
I’ve done the exact same thing for my destination environment
and here’s the module that calls the environment, as you can see,
just like the console, you give it a name, a region, a protocol, pick the network, the size,
and the speed that you want of your environment.
I’ve exported everything here, you probably want to be more selective as to which IP address range
can access it, a snapshot schedule, and we were good to go.
At the bottom, you’ll also find the servers that we built out in each region.
We’ve also gone the route of creating a bootstrap to make sure that as we deploy the servers,
they come in with the JFrog built already in them,
and that it automatically mounts that NetApp shared storage so we can scale as we need it.
So one of the…
So this, I’m assuming that the bootstrap
includes all the instructions for installing artifactory.
Correct, it’s the instructions for installing artifactory
minus the Cloud SQL portion, right?
We went and did that manually.
Again, you can put that into terraform,
create a single turnkey environment.
Alright, so let’s come back over here
and show something really quick.
You can see here that I’ve created my primary volume in US East 4.
It’s very easy to use. Like I said, I walked you through how to provision it.
Once you get up here and you’re ready to use it,
we even give you the mount instructions on how to mount it into your system.
So you see here we were doing Debian.
it’s that sudo apt install NFS common files,
I made a directory and then I mounted that directory with these instructions.
I actually took these, slightly modified them because I wanted to give my particular fileshare
a name that I liked.
So I made that directory JFrog CVS Vol.
but it was the exact same same instructions. I just needed to modify one little thing there.
I copy pasted. I mean it can’t get any easier than this.
Now Shankar like you were mentioning,
how can you grow these volumes? How can you shrink these volumes?
Well, as you can see, here, my volume is running on extreme,
and it’s sitting inside of a four terabyte volume.
I’ve already updated my terraform
so if we come into our terraform here in the resources,
you’ll see that I updated it to actually be twice the capacity.
And I’m going to actually move it down from extreme down to premium.
So I’ve already gone ahead and I’ve done my terraform plan
and I’m going to do terraform apply.
I’m going to kick this off
to 80
and let’s just wait for that yes to come back.
And then we’re going to go out there and just basically change the size of the volume.
Now, again, if we were doing this in our environments,
let’s say we took this guy here,
and we decided that we wanted to click Edit.
We’re going to actually go through and see how this environment can be changed as well.
So we’ll change the performance tier here as well,
we’re going to change, this is a completely new volume.
So bear with us, we’re going to actually change it to premium
and we’re going to drop the size down by half,
4096. I click Save.
And this stuff now begins to happen in the background.
So when we come in here,
one of the interesting things to be able to get us into that cross regional replication
that we’ll talk about in a few minutes,
is that it’s actually very easy.
Within that same cloud volume service console,
you’re going to be able to come in here and say, I’m going to create a relationship
and after I create that relationship,
I’m going to go through and select my source volume,
which is that JFrog volume source,
I’m going to replicate my content over to the west
and that destination volume will appear here shortly.
And so you can see here, I named it JFrog CVS Destination,
I’m going to have it replicate every 10 minutes to make sure that it stays up to date,
if your favor is to do it hourly, daily, weekly, monthly.
Again, it’s your choice how frequently you want to replicate this,
but it is a continuous replication.
So I’m going to come in here, I need to give my…
…replication a name,
and basically just let it run in the background.
This stuff is all being taken care of in the background.
While this is all being done in the background,
let’s go back and talk about what we were actually doing here.
So we come in here,
and we have a conversation in terms of what was going on.
We basically came in and replicated this entire environment,
we’ve then gone back and stood up the exact same environment in a net new region.
So Shankar, why don’t you walk us through what we’re seeing here in terms of replication?
Yeah, sure. So this is basically the reference architecture that we came up with
using cloud volume service to enable cross region replication.
As you can see here, the storage layer replication happened
at the NetApp layer, which is across the cloud volume service.
But the metadata level replication will happen across the JFrog artifactory application.
And that is how we can really take advantage of
both the storage based replication as well as the metadata replication with JFrog.
Exactly, and one of the nice things here is that the actual data all these binaries
that are being copied from one region to another
is being done at the storage layer,
meaning that you’re freeing up all of your CPU cycles at the very top of this diagram,
for the use of what you actually want the CPU cycles for which is
increase your deployment, make sure your pipelines are running smoothly,
and that it’s keeping up with the ever growing demand that you’re going to have
with your applications.
One of the great things when you look at this particular architecture
is that if you ever need to take out one of your nodes for maintenance,
you can take off any nodes without causing an impact,
so your developers will have all the uptime
that is necessary for them to continue to do their day to day operations.
If you need to add more servers, you just simply add them into
discontinued configuration.
And you’re also protected against zonal regional failures, right?
So if you’re having a particular region go offline,
you can very easily suspend that replication from the NetApp side,
and then you’ll be up and running on the secondary region ready to go.
Once you’re ready to pull it back into that previous region,
it’s very simple, you’ll just reverse that replication, make sure it’s all synced up,
and then you’re back up to operating like you were prior to your outage.
So again, we’re showing you an infrastructure that can be very easily built in a cloud native fashion.
It takes a few minutes to stand up cloud volume service,
and a little bit more time to just stand up this configuration
that’ll get you being completely
independent, having high availability and such.
But more importantly than that,
what we’re showing you here is an actual customer use case.
And that’s how Shankar and I connected initially right?
By building out an actual use case for a customer of ours.
And this customer is Dexcom.
They’re a fantastic medical device company specializing in
devices for diabetics,
and their team, highly performing team. They’re fantastic to work with.
They actually worked with us in order for us to help them build that environment
that was going to be scalable for them. And you can see the use case here,
you can see that everything that we’ve presented in this particular scenario
is an actual use case that’s been implemented in our production environments.
We welcome you to go do much the same with your artifact environments.
Thanks Fabian for the awesome demo, as well as working through the
use case for Dexcom.
I hope everyone learned a bit in this session,
really about like how do you scale up your DevOps environment with JFrog HA.
Absolutely, and Shankar thank you for the time.
I’ve had a lot of fun working with your team.
JFrog is amazing, they’re so much fun.
And we hope that you guys onboard this.
If you have any questions you can reach out to us here in the chat stream.
If you have any questions that go offline,
look for me on LinkedIn, I’m under Fabian Duarte
from NetApp, you should be able to find me pretty easily.
Shankar, any way that they can get ahold of you?
Yeah, same here, you can get ahold of me on LinkedIn, as well.
And thank you again, everyone for listening to the session.
Alright guys and in closure, we have made the fourth, we have made the sixth
and may the DevOps be with you here. Thank you.