Depending on your use case, managing access in Artifactory through Users, Groups, Tokens and Permissions can be done any number of ways.
I would like to share some of the scenarios we ran into and how we optimized for better performance and where we’re looking to migrate with some of the upcoming changes to the platform.
Hello everybody, welcome to my pre recorded session on Artifactory access patterns, and how to optimize for unique use cases. My name is Hank Hudgens. I’m a master software engineer at Capital One. And so let’s get started here. A little bit about me. I’ve worked at Capital One, as a developer for five years. And I’ve been working with Artifactory for four of those five years, actually.
Our team is actively supporting Artifactory as the enterprise solution for binary storage and we’re ensuring the stability of the platform and AWS, as well as providing support for our internal developer community as they use Artifactory. So the agenda for today, first thing I want to do is do a brief overview of the Artifactory security management entities, just so we’re all on the same page. And then go through two different problems that we had to solve for, with their requirements, how we ended up implementing them and what challenges we ran into with each of these. And then we’ll close out with some tips and learnings when designing access patterns, and also look at where Artifactory is heading next, which is actually pretty exciting stuff. So let’s get started here. So the Artifactory security management entities, these might look really familiar to you if you’ve been working with Artifactory. But I just want to go through these, like I said, just make sure everybody’s on the same page and understands what we’re going to be talking about here. So first off, we have users.
Users are the lowest level identifier for who is accessing the platform. And so all requests are tied to a user, including non authenticated requests, which are actually tied to the default anonymous user. And users, or admin users have global access to the platform.
They supersede all other access. And so if you mark a user as an admin user, that’s all you need, they will have global access to the entire platform. And so, after users, we have groups, which are a collection of multiple users. And it just provides a way to better organize access for larger sets of users.
They can be internally defined, or you can actually import groups from an external source like Active Directory, for instance. And same with users, you can have admin groups and all users in admin groups will have global access, just like an admin user would. And then after groups, we have probably one of the more interesting security management entities, permissions, this is where all the magic happens. And so this is where all the non admin access rules are defined.
If you define… First thing you’re going to do with the permission is actually define the resources that it applies to, which can be repositories, builds, release bundles, destinations for those bundles, and pipeline sources. And then, once you’ve defined what resources the permission applies to, you’re going to be defining which users and or groups have access to the resources and then you can modify what level of access they have so they can have read, annotate, deploy, delete, overwrite, and manage access. And then last here, down at the bottom, we have access tokens.
Access tokens can be used to grant temporary access, defined by whatever groups you’re creating the token for.
For the purposes of this session, we’re going to be focused mostly on users, groups and permissions, not so much on tokens. But tokens are very interesting and definitely worth a look if you are looking to grant temporary access in your organization. So, for the first problem here, we wanted to create a self service Docker namespace Access Management situation. So what does that mean?
We need to provide users with the ability to manage access on individual Docker namespace folders within a large central Docker repository.
The requirements for this, we’re actually going to use an external stateless application web application for self service, users can log into it, manage access on their namespace and all this state is actually kept within Artifactory. So it’s going to be completely using rest APIs interacting with Artifactory to view and to edit these accesses. And one important thing we wanted to have in this application was right as the user logs in, they need to be able to see a list of all of the namespaces that they have access to. So we need to be able to populate that list of Docker namespace folders.
As soon as they log in, fetch it and display it to him.
That was something that we really needed in that application, so it’s a pretty important requirement to call out here. And then, we’ll have two levels of access per namespace folder. So we have namespace admins, which have read, annotate, deploy, and delete access.
Delete is actually the same level of access as overwrite. So namespace admins can delete Docker images, or just re upload and overwrite Docker images as they see fit. And then we also have namespace members, which are very similar to the admin just without the Delete access. So they can read, annotate and deploy. And the access itself should be granted on the namespace folder, and anything under it within the Docker repository. So for the purposes of this demonstration, we have a repository called Docker local. And we have a couple namespace folders in there, namespace 1, namespace 2. Great names I know. So that’s kind of what we’re looking at the top level folder is the namespace, so they will have access to anything under that namespace if they have access to the namespace. So how do we implement this?
Here’s our nice little repository over here. Obviously, we’re going to need a permission tied to that namespace. And the naming convention we decided to go with was actually a prefix of a capital N dash, just to kind of separate away our namespace permissions from all the other permissions that we had in Artifactory. And it’s just the namespace with that prefix on there and that’s how we named our permissions. And then we also threw in a group with the exact same name. And this group is a little special.
It doesn’t grant any access, and I’ll explain why we have it in just a second. For now, it’s just a secret. So let’s take a close look at this permission.
You might be able to guess what we were trying to go with here. But if you look on the left side, obviously, we’re going to point to the Docker local repository with the include pattern, whatever the namespace was, so namespace1/♪♪ the double star lets us, you know, they have access to anything under namespace 1 any number of folders, if we had just one star, it would just be whatever is directly under that namespace 1 folder and so we had to have a double star in there. And then the individual users, if they’re an admin user, we give them read, annotate, deploy, delete, if they’re a member, read, annotate, deploy. And so here we have a couple examples of two admin users, and a member user also with some very creative names.
There we go. And so why do we have this group? Why is this here? The group itself has the same users that the permission has. The reason we created these groups was actually for populating that list of namespaces. So at the time, we could not get a list of permissions for a user but we could get a list of groups for a user with the get user details API. And so this is these groups really serve more as a label to say, this user has access to this namespace specifically for populating that list of namespaces. And it seems a little dirty to create a whole bunch of groups specifically for this but you know, that’s what we went with it worked out. So that’s what we did. So what are some of the challenges we ran into with this approach? So obviously, challenge one is exactly what I just said, we were missing this effective permissions API, which would have given us a list of permissions for a user so we wouldn’t need all these groups.
We didn’t have that API. So we had to create all these groups, just to have an efficient way to list out all the namespaces a user would have access to.
Challenge two, some of the namespaces went over the allowed name length for a group. So there’s a 64 character limit on the names of groups. And some of our users had very, very long namespaces. So we ended up having to truncate some of our group names to make their long namespaces work. And then eventually, we ended up just enforcing shorter namespaces because if you start truncating the group names, you could run into some situations where you have two very long namespace names with maybe a little difference in the suffix and if you truncate them at 64 characters, they could be the same group name, and then you’d have a problem, so this is where we had to just start enforcing it, because we couldn’t maintain that. And then challenge three, we actually ran into some performance issues.
As the repository started growing, we ended up having thousands, I think, over 20,000 or more permissions on this single Docker repository. And that uncovered some issues in the platform itself, that we actually needed to get patched in order to resolve. So luckily, JFrog was happy to work with us, and we got that worked out.
But definitely, this solution worked for us, it just caused us a little bit of issues. And we still use it to this day, actually. So problem two, were we needed to allow our self service Docker namespaces to be either public or private. So what does this mean? So we need to provide users with the ability to create either publicly accessible Docker namespaces. And by public, I mean, all internal developers Artifactory is not accessible to the internet. So it’s just within Capital One, all developers would have access to a public namespace. So they could pull images from this public namespace. And then the private namespace is only the users that would have deploy access, would have access to pull images, or even see that namespace in Artifactory. So what are the requirements for this one?
Like I just said, public Docker namespaces need to be readable anonymously, or by any authenticated user. And then private Docker namespaces should only be readable by the namespace admins and namespace members. And then, finally, you know, all the previous requirements from problem one still apply.
We still have that self service management application running out there. So all of those things need to continue to work. And so less requirements with this one, but got to keep in mind everything from the first one too. So what do we do? How do we implement this? So we already have, you know, a permission and a group for each of these namespaces. So we thought to ourselves, you know, maybe we could just work within the bounds of what we already have, why add more stuff?
Let’s just stick with what we got. So that’s what we did. So for private namespaces there’s really no difference with the permission, it’s got the same Docker local repository, got the same include pattern, and the users the admin and member two users. And you know, as long as no other permission is granting access to this namespace it’s effectively private, all the only ones that have access are these two users here. So what makes public permission?
It’s almost the same, just with an anonymous user added in there with read access. And also a group, we’re just calling all users with also read access. And so you can create a group that will automatically add users as new users are created. So you can have a group that encompasses all users.
It’s, I think, for our use case, though, we actually had an Active Directory group that we imported that had all the users in it, so we ended up just using that group instead. But you can definitely have a group created within Artifactory, that encompasses all users as well. Unfortunately, we released this to production and as you can see from this nice graphic, it didn’t go very well.
We had users reporting large amounts of latency with their Docker polls, both authenticated and not, so we had to quickly recover from what we had done, and go back to the drawing board and figure out how to do this better. So what what happened, what was the problem? Obviously, the new things that we added, you know, the anonymous user and the All Users Group being added to all the existing permissions, actually increased latency quite a bit. And so we wanted to understand why did this happen? So we work with JFrog. And they actually produce some very interesting information. And that is what is going to lead us into talking about Artifactory security map caching.
So what is this? In order for Artifactory, to check access faster for incoming requests, it actually caches maps of security objects. And so in this little graphic here, the arrows are representing a cache mapping. So users are mapping to repositories, repositories, map to permissions, and it checks these cache mappings very fast when these requests come in but if you are doing something that doesn’t align to the way that this cache mapping flows, you can actually run into some high latency. And that’s exactly what happened to us. So here, we have users mapping the repositories, which map to permissions and we have groups that maps repositories that map to permissions. So what does this mean? Let’s go through an example. An example request, as artifactory is checking the access through these mappings. So the first thing it’s going to do is check the repositories that the user has access to. So if you’re making, let’s say, I’m anonymously requesting something from the Docker local repository. So first thing it’s going to check is, does the anonymous user have access to the Docker local repository? If so, then it’s going to check through each of the permissions that apply to the Docker local repository that have the anonymous user in them.
One by one, it’ll check through all of those it will iterate through. So I think you could start to see where our problem lied. This is exactly what happened to us.
We had thousands of permissions with the anonymous user on them for this single Docker local repository. So the cache mapping itself didn’t actually provide much benefit to us. Because it had to iterate through each permission individually that applied to that repository with that user, which is all of them. So it had to go through 20,000 permissions to check to see if the non anonymous user had access to that specific objects in the Docker local repository. So it had to basically go through and check the include patterns on each one.
Does this apply to this namespace? Just keep going and it slowed down all the Docker requests pretty dramatically. And so let’s say that the user, you know, it didn’t find that the user had access to either the repository, or if it did have access to the repository, the permissions didn’t grant access to the specific thing it was requesting, the next thing it would do is check all the groups that that user is in. So it will iterate through all the groups that that user has, and check each group and see if the group has access to the repository, it will go through all permissions that applied to that repository and to that group. And so definitely, the fastest thing that you can do is have the user directly on a permission for a repository, that is the fastest thing it will, look up with this cache mapping, it’s not too much slower if it’s granted by a group, it’s just a little bit slower. And if the user has a lot of groups, this can also cause a problem because it has to iterate through each group and check if that group has access to what you’re trying to request. So you will definitely want to limit how many groups you have. And you definitely want to limit how many permissions you have per repository. So those are two very important things we learned from is Artifactory security map caching. So we’ll look back at our situation here.
What happened? Yeah, these two, these two guys. By having anonymous user on all of these permissions, like I was explaining before, it had to iterate through all the permissions to see if the anonymous user had access to that specific namespace that you’re trying to request. And same thing with all users group, so even if you are authenticated, if you’re trying to request a Docker image from Docker local, it would have to do the exact same thing, it would have to go through I mean, the user itself, assuming the user itself didn’t have access to that namespace.
You just wanted to pull an image from someone else’s namespace, a public namespace, you’d have to do the same thing, it would have to go through and iterate through all the permissions. So both of these were a problem. So we had to figure out how we could do this differently, to dramatically speed up these resolutions. And so what we ended up doing was actually breaking our repository into two. So the Docker local, we decided would serve all of our public namespaces. And then we had a separate Docker private local repository that would have our private namespaces.
The namespace permissions themselves are going to be very similar in this situation, the only difference is an extra permission, we’re just going to call a public read, that’s on the Docker, local that grants that public read access to the repository as a whole. And this approach is definitely much more efficient. And we’ll see what this looks like a little bit more detail here. So like I said, the public permission, exactly the same as what we saw on problem one.
It’s on the same Docker local repo, same include pattern, users got the same access on there. 4 So the only difference with a private permission is the repository. So instead of Docker, local, it’s going to be applying to the Docker private local repository.
Same kind of include pattern, same user access stuff. And there we go, I highlighted the difference, in case you missed it. And then we have the public read permission, which is on the Docker local repository, no include pattern needed, it applies to everything. And that is where we stick our anonymous read and all users read access.
So you see how this can be a lot faster now if anonymous request comes in, for a public namespace, it’s going to check does it have access to Docker local? Yes. And then it will iterate through each permission on Docker local that has the
anonymous user. And guess what? Now it’s only one. So boom, done instantly.
Same with all users group, same situation there. So way faster from that cache mapping perspective. And then, you know, if you requested, let’s say, you’re trying to anonymously request a private, something from a private namespace, first thing, it’s going to check, does the anonymous user have access to this private repository? Nope. So instantly, 403 forbidden on that.
There you go. So it’s way faster. And you can definitely tell Artifactory was designed with breaking up your repositories in mind, having a bunch of permissions on a single repository is definitely… it’s an approach that it allows, but it doesn’t seem like it’s the way Artifactory was designed, it really is encouraging you to break up your repositories. So let’s look at the challenges the challenges we ran into.
Obviously, we cause user impacts with our original design and that is something we should have foreseen, we should have tested a bit better.
Effective performance testing is super important to do. And you definitely want to make sure that what you’re about to do to production, you have thoroughly made sure that’s not going to cause any severe impacts. And I feel like we definitely could have done a better job testing that out. So that was definitely the biggest challenge that we ran into with this one. Also, though, I didn’t really mention, but our rollback process was slow.
We release this to production, obviously, we have thousands of permissions now with this anonymous user, and this all users group on there.
In order to resolve the problem, we had to iteratively go through and remove those from each of these thousands of permissions. And it just, it took a while. So you know, obviously we want to grant immediate relief to our users, or at least as fast as we can. And in this case, it took a while to get through, it’s not a fast process to update thousands of permissions. And so that was another issue that we had. And actually, I think we had to write the script also to go through and revert the change.
We didn’t have that ready to go. So that would have been good too. And then challenge three, we had to break out of this mono repo mindset we had… from the beginning of setting up Artifactory, we really had just one repo for each package type.
It just made it a lot easier for us to manage and so we had gone with that for a while and unfortunately, as our Docker repository grew in size, it’s… it became just unmanageable, especially with this new permission model we’re trying to enforce on it.
It just wasn’t going to be feasible going forward, so that’s when we made the decision, we’re going to break this into pieces, we’re going to have a public side and a private side. And honestly, at this point, we’re definitely moving more in the direction of breaking our repositories up into smaller pieces.
More and more now. And we’re hoping to get more down to a team level. So repository per team. And although it will dramatically increase number of repositories we have, it’s definitely the right direction to be going in the access management side of it definitely reinforces that. So tips and learnings when designing access patterns.
Obviously, the first one I want to call out is the very interesting thing we learned from JFrog support was Artifactory security map caching is very interesting to keep in mind and if you can get used to thinking of the way Artifactory is going to resolve these access entities, you can apply it to any situation that you’re trying to plan out. So whatever your use case is that you’re trying to solve for, I would keep this in mind, it’s definitely the those two tips that I gave earlier, you don’t want to have a user with too many groups on it, and you don’t want to have a repository with too many permissions on it.
Those are definitely two things to really keep in mind also.
That kind of came out of this. And secondly, work with the JFrog solutions team whenever possible, you know, they may not be able to foresee every situation that might arise but they can definitely draw on past experience helping other other customers, sometimes very large customers. So they have experienced dealing with solutions at scale. And so they have some, they definitely have some knowledge when it comes to how best to implement certain things. And so I would definitely, definitely work with JFrog solutions, especially if you’re going to be making some wholesale changes to your platform.
Super, super important. And then also understand all of the Artifactory constraints that can come into play, such as max group name, length, for instance.
What we ran into, there are things like that in Artifactory, that can definitely hit you and you don’t think about them initially, but it’s so important that you plan for these things because if you implement, you know, 95% of your solution and then you realize 5% of our customers, stuff that we need to migrate over, we can’t, because we just hit this constraint that we didn’t even think about, you know, that’s something you don’t want to have to run into because it can push back your timelines, it can, you know, cause headaches for your users, it’s just something to really, really be careful with. So definitely something to keep in mind.
Also performance test effectively, with long term expected data scale. So like even though your scale maybe you know, a few hundreds Docker namespaces now, down the road, you’re going to have like tens of thousands. So if you can test it that scale, and see that you’re not going to hit any situations down the road.
That is very, very helpful. So definitely something to think about also. And then finally, have a rollback plan.
Always good to have a rollback plan, when you’re releasing new things. And also, you know, not even with a rollback plan, also have a plan B, if there’s something you can do to immediately alleviate issues if they arise, maybe not a full rollback, but you know, some intermediate solution that will help alleviate issues. It’s always good to have as well. So those are some of the tips and learnings that we took out of these two problems that we had to solve here. So where is Artifactory heading next?
Projects, that’s where. Version 717 released these new projects, which is a new way to organize repositories builds, release bundles, and pipelines, more on the scale of like a team, you know, a team’s resources, just gathering them altogether and then giving them the flexibility to control access to their own resources. And along with these projects, you get these global and project roles, which are actually really similar to permissions but they apply specifically to projects. And they can also be applied by resource environments. So you can label your resources, I guess, with dev, prod environments, and then you can have these roles apply specifically to these different environments, which is really interesting.
Something that we definitely want to get into.
We haven’t quite gotten into doing too much with projects, but it’s definitely a direction we want to try to go. So what makes projects great?
They are shifting, granular access management away from platform administrators, and shifting them more on to the teams to manage themselves. So while it’s nice to have more of the Access Management distributed out, we’re probably going to be careful with how much flexibility that we’re giving.
We have a lot of security audit concerns we need to think about, we also need to make sure there’s a segregation of duties kept in there.
You know, you don’t want to give your users the ability to hurt themselves. So it’s something you have to be careful with when distributing out this kind of control. And like I was saying before, this is yet another example of reinforcing breaking up those mono repositories into smaller repositories, by application team and environment. So yeah, it’s definitely the direction Artifactory is trying to get us to go in. And it’s the way that we’re probably going to be heading here in the next few months, hopefully. So yeah, definitely good to go in that direction. And something along with these projects, it’s important to keep in mind that you need to understand how the new role based access control is being cached to optimize performance like… so the caching that we looked at before, that was before projects came out, there may be some new caching structure that they have with these projects.
I have a feeling it might be a little bit similar to what they had before, but maybe just with roles instead of permissions.
But it’s something we can talk to JFrog about and make sure we keep that in mind as we implement our new project structure. And that’s it. That’s it for today.
I really appreciate everybody watching my session here. And hopefully you learned something.
I’ll definitely be in the chat here for any questions you might have, and I hope you have a great rest of your SwampUP. Thanks again.