Use Case – Artifactory As The Backbone For A Continuous Delivery Tool Chain
Abstract:
Craig Vosburgh / CA Technologies, May 2016: CA Technologies has been on a journey over the past few years as we adopt Agile methodologies to better meet our customer’s demand for high-quality software at a faster delivery cadence. In this session we’ll discuss the tool chain and architecture that has evolved to support the challenges of Agile development in distributed development organizations and how Artifactory has become a core component in that success.
Talk Transcription:
Thanks for joining me this afternoon. My name’s Craig Vosburgh. I work for CA Technologies. I work out of the office of the CTO and I’ve been with the company for seven years. Came through acquisition, you’ll find that’s how many of our companies come into CA. My personal background, I’ve been doing software development for 25 plus years. I’m about half and half in terms of being on the management side or being on the individual contributor tech side of things. I’m currently back as an individual contributor working for the CTO right now, doing a bunch of stuff around what I’ll be talking about today. Around tools transformation and that kind of stuff inside the company. Prior gig to this was running a turn around on one of our business units with a specific product we have which I’ll use as one of the examples as we go through this.
So I’d figure I’d open up with a bit of, set the stage, right. Cause we maybe aren’t quite the same as some of the other folks, we’re not, you know, we’re sort of mid-sized to large — Whoops. I’m throwing things up here. And so wanted to set a bit of the stage in terms of what the environment’s like, that kind of stuff for us.
So. Start off with: Who is CA Technologies and where are we? We’re a company of about 11,000. Something like that. We got about 3500 people from a development standpoint. We’re spread across about twenty sites, those are the ones you see up on the map. Those are the big ones. We have actually got a bunch more. We tend to grow through acquisition. And so we end up with a bunch of little sites as we pull the companies in, that kind of stuff. We don’t have a model of strictly trying to pull people in, you know, like the old Microsoft model used to be where everybody, when they got acquired, got moved up to Redmond or something like that. We don’t have that kind of model, so from a development standpoint, we need something that works in a very distributed, geographic environment.
From a revenue standpoint, we’re a little north of four bill in terms of what we do. So, you know, again we’re a decent sized company kind of stuff for what we’re working with. Multiple products, multiple different BUs. We work in the security space, we work in the infrastructure management space, we work in the application performance management space. We’ve over the last few years have gotten into a bunch of things around Agile and dev ops and started to move into being able to do a lot of the collaboration aspects that are required for that kind of area.
Moving on, the way we think of Agile. Right. So, bottom of the slide we’ll talk here a bit. So bottom of the slide is basically standard Agile. Safe kind of stuff if you can see, it’s probably a little small for you. You know, product owners, scrum masters, all those kind of things are down there towards the bottom. If you’re, you know, a small-medium sized company, you’re running at a single location and you don’t have multiple products, that kind of stuff, the answer is that’s probably as far up as you have to go. Right.
For us, we spend a lot of time up there because we’ve got multiple different business units, we got them spread across the world. So we have, you know, single product line — something like security — will have a dozen different products that are spread across the world. So we have a dozen different products that are in the portfolio. Those products typically are not the same site so we’re spread out geographically for any given product. And so that adds some extra challenges.
So a lot of people look at that slide and go, oh my god that’s a lot of process, right. So something I would say is we use this like a backdrop, right. So it’s not process for process sake. We basically pick and choose the pieces out of this stuff that solves our problem. For some of our smaller BUs, they don’t have some of this upper stuff when you get into the portfolio management and that kind of stuff. But some of our bigger BUs, they spend a lot time up there trying to figure out how they’re going to invest their money into the pie. Only giving this as just a bit of a backdrop. This is how we think of Agile and scaling Agile and that kind of stuff. We went down the path, initially, we were going to try and sort of build this process ourselves. And over the course of a couple years we figured out, wow it’s a lot of work. It’s a lot of training materials, it’s a lot of getting out in front of people, and we got plugged into the safe stuff about the two-oh point and did the wow that’s actually pretty close to the stuff that we’re thinking about wanting to do. And so we jumped over onto their bandwagon at the beginning of the three-oh stuff and they’re now up to four-oh. So again, just as a bit of a backdrop for kind of the way we think of the problem as we go through it.
Almost anybody who’s done anything with acquisition and has a large product portfolio’s got a picture that looks something like that: where you find yourself sitting in front of this board going, oh my lord, right. How in the world do we think of this thing? How do we integrate this stuff together, how do we get this stuff built? Well we have some new products, right. So I would say like our Rally product that we’ve acquired last year. Very much a dev ops kind of model, they released production every day. We have other products that are 20 years old. We have some products that are 50 years old. We have a chunk of mainframe business, right. So back in the day, we didn’t have any of these dev ops tools, right, so the answer is we got some pictures that look like that and we got to figure how to get that to be something that looks more Agile. How do we actually get to a common tool chain that can actually have leverage across the organization as we go forward.
So what’s the thing that’s being asked of us in the context of those pictures? So we’re geographically distributed, we’re scaling it up, we got a bunch of different organizations that are having to scale geographically. We’ve got this big spaghetti code base in some cases cause the products have been around for so long but, right, we’re being asked to move from delivering these things in month to year kind a boundaries down to delivering them in hours, weeks, you know, a month kind of boundaries. So we got to figure out, how do we skin this problem. Right.
So for us, that all started with. Oops, sorry. Before I go into that, let’s walk through an example that we have for the thing and then we’ll drop into how we’ll going about it.
So this is an example product that we have. I removed the names to protect the innocent on this one. So this slide is about circa 2014, so a couple years ago before they started the move. So, you know, this is how they were. Right. They were inconsistent and dated in their tools. I would have said their tools were sort of stuck in the late nineties, early 2000s, in terms of what they did and how they did it.
Their build artifacts were all stored in SCM, the source control management tool, right, as opposed to putting it into something like Artifactory. That was actually one of our first steps.
They’re monolithic from a build standpoint, so everything all packed into one build and the build took a long time to be able to get crunched through. Release cycle, 18-24 months. That’s what where this was coming from. This was an On Prem product, of course. So they were coming from an environment where — they were churning them out with very long durations, a very waterfall kind of process associated with it.
Limited automation, right. So 2300 ideal days worth of testing — manual testing — that went into testing this product. Right. That means if I had 100 people working on it, right, I could basically get it out in a month. You couldn’t get 100 people working on it to do the testing so that becomes the long pole, right. As we were working through this thing trying to be able to decrease the back end on this.
We had no definition of Done. So as you start moving to Agile, right, you start getting hogged by DOD, that sort of stuff. These guys had adopted Agile at the point that this snapshot was taken but what they did it, they sort of read the book and then okay we’re just going to throw it out and put it together kind of stuff. They didn’t do a number of things that actually make Agile successful in the model. And definition of Done was one of the things they haven’t done. So they were basically working in sprints, but they were just basically throwing code in whenever it was ready kind of stuff. They didn’t have any formal process for getting it promoted.
No code coverage, right. I’m a big believer in sort of checks and balances, right, and set up the definition of Done of what I expect the team to be able to execute to. And then I want something in the tool chain that actually can give me a sanity check in terms of what the code quality is and whether the stuff’s being followed.
A lot of technical debt, you’ll see we added a tool in to help track some of those things. We had a lot of dead code in that specific code base where people had done poor things years ago where rather than using a source control management tool for what it’s designed to and actually revisioning things, they literally made copies of chunks of the source base, right, to sort of set it to the side while they were working on something because they didn’t have any concept of feature branches or anything like that at the time. And then priorities had came up that never get cleaned up and so there was a lot of extra code that was hiding in the codebase. Took time to find that stuff and get it eradicated out of there so we wanted to make sure to have a tool put in place to be able to do that.
We had a separate sustaining model which is something I’m not wild about. I don’t like having a separate team sustain my code from the team that actually does the development work. I want that back in the main line cause otherwise you get into branching issues, you get into a whole bunch of ownership issues, that kind of stuff.
No isolation of features as I’ve already mentioned. We had a skills mismatch. I talk this here, this is all, when we get further into the presentation here. All of this is about more of the tools and the process and how we use Artifactory and a bunch of the other tools to enable stuff. But I just want to make sure that people understood, you know, tools can only make things better in terms of automation and that kind of stuff, right. They don’t solve some of these organizational problems. And so in case of the skills mismatch and the poor adoption of Agile, this thing called the golden hour, those are things I actually had to go back and work with the actual owners of the business units so that we can actually restructure the way they were doing things. In the case of this specific product they were actually, to begin with, at about 23 different sites that goes down to sites that only had a couple of people in it. We ended up collapsing them down to about five sites total. When it was all said and done and had to rebalance skills across things. So I just wanted to throw that out there as, you know, you’re going to see a bunch of stuff around tools and it all looks great but the answer was a bunch of the stuff that happened in here was predicated on having to make a bunch of these organizational changes.
So that’s sort of the backdrop of an example of sort of what CA’s about, right, an example of one of the more legacy products that we have that we had to get fixed up. So, what did we need to do? Well first thing we had to do was get some kind of replacement CI/CD tool chain put out there so we can be able get this thing working at a more relevant way. Our first step on that was, basically, what you see here, as a centralized solution. So I’ll walk through the different components up there.
So Artifactory at the top. For us we used — initially, we used Artifactory really just doing artifact management for all of our builds, right. So we pulled the stuff out of source control so it wasn’t being revisioned and that kind of stuff and basically used it to be able to put artifacts in post builds, right. So we have a place to be able to go get them, be able to version them, be able to bring them back in. We’ll talk more. We’re using Artifactory for a lot more as we moved forward but that’s where we started with the initial stuff.
GitHub. Enterprise. Specifically. In our case we had a big discussion — we’re a pretty stodgy company when it comes to IP and that kind of stuff. And so there was a lot of discussion on, you know, could we just go to GitHub outside and do private repos and after a bunch of conversations, the answer was no, we can’t. Right. You know, it’s a four billion dollar business, we’ve got fiduciary responsibilities against this stuff, and so we did the next big thing and for us that was pulling in GitHub Enterprise. Specifically when after GitHub Enterprise as opposed to a variety of the other tools that were out there because we wanted, one: this whole concept of social coding. Right. We wanted to figure out how we started to get that into our DNA inside the company. And so we’ve gotten — GitHub’s been up and running for about 18 months I think at this point. We’re just starting what is a inner source program. We’re sort of riding on PayPal’s coat tails on some of that. And the selection of going with GitHub has actually worked out really well for us because it’s got a bunch of those things that are already in there. We’re also getting a fair amount of lift off of — we’ve got a bunch of kids coming out of school, and what do they have experience with, they got experience with a whole bunch of GitHub and that kind of stuff. So it transitions easily from a skill set standpoint.
TeamCity and Jenkins. Those are our two predominant CI environments. TeamCity sort of firsts, Jenkins more grass roots kind of stuff. The approach I’ve taken with the CIs is it’s like religion when it comes to editors. Right. You have some discussion about, is Vi better, is Emacs better, is Eclipse better, the answer is, I don’t want to argue about that, right. I want to — you figure out what’s best for you from a CI standpoint and the answer is we want to be able to enable you in the environment. So we took, very much, an open approach to that. We’re starting to see a bit more Travis and that kind of stuff starting to come in now. But, you know, the two predominant ones, I’d probably say, you know, 60 percent — 70 percent’s on TeamCity right now and then the remainder, you know, is on Jenkins with a small percentages on any of the other tools that are out there.
Everything so far on the top ones there, all color coded in green mean, basically, they’re On Prem. We run those things, we host those things ourselves.
The next two, the CA Agile Central and the Flowdock. Agile Central is the rebranded name for us for Rally, so this is where we do our planning and defect tracking and that kind of stuff. It is a SaaS based solution, so we consume it just like our customers consume it in a SaaS based offering. Flowdock, if you don’t know what that is, if you are familiar with HipChat or Slack, right, it’s a ChatOps tool. Flowdock is, what sort of its claim to fame is a couple of pieces. One, tighter integration with the Agile Central product such that you were trying to meet the developers where the developers work. So instead of the developers having to come out of whatever is their browser or– excuse me — whatever is their ID environment or their chat environment to be able to do an update for a task, or defect, or whatever it happens to be, we’re trying to meet them were they’re at. We’re putting that functionality into Flowdocks so they can actually make updates that flow through into the Rally product or into the CA Agile Central product. You know, so the developers don’t have to back out. You also get all the integration the other direction so all of these tools, right, in terms of the build, in terms of promotions, any of that kind of stuff, all of that stuff flows back in as a […] back into it. So it allows our developers to sort of have one stop shopping to be able to figure out what’s going for the different products that we have.
Next thing we added in was something called SonarQube. This was specifically to solve some of those code quality issues. To be able to see them without having, necessarily, sending people down to hunt for them. Currently hosted internally. For us it’s been a good tool. Does sort of two major things for us. First one is around — it has the concept of technical debt that its — its — calculation. Don’t take the calculation, cause it actually does it in ideal days. Don’t take that as being gospel. Right. It’s, you know, again opinion in terms of how the calculation works. It’s more a trend. I use it more like a barometer. Right. So you get a baseline for the environment and if you get somebody who’s cut, copying, and pasting a bunch of code in, the answer is you’ll see it happening in the environment. Right. You can see it actually start to creep up and see where you’re picking up the extra debt.
The other piece you get out of SonarQube is all of the ability to do scanning for doing code coverage. Multi-language support for the code coverage. And so that gives us sort of a check and a balance. The one product that I was making reference earlier, you know, came in with either zero or about 10 percent depending on which piece of the puzzle it was. You know, normally for a healthy project, I’m running somewhere between 65 and 85 percent code coverage is what I like to see. That’s sort of my rule of thumb, right, that we tend to use for these kinds of things.
Last one down here that we’ll talk about is Black Duck. That’s what we use for our third party scanning. We had a, and still actually do use, a in-house built solution for doing a bunch of this stuff. But the one thing it can’t do that Black Duck and do, which is pretty darn slick, is the ability to actually go find third party attributed code that the header didn’t come over. So you have some developer that went over, found a hunk of code, and said oh I like that, cut, copy, and paste and drop it into my source base. If that’s coming from something GPLed, right, the answer is that you’ve just encumbered your code. Right. And those are really hard to find. Right. It’s easy to find something that has copyright headers and that kind of stuff. Black Duck has the ability to go do those kinds of things, right. And so that’s another one we added into the bag of tricks that we have. What I would say in terms of Black Duck is it can take a fair amount to do the initial remediation against a older code base, right. So it will go through and find lots of stuff and you have to then go sort through it, and tune it, and that kind of stuff. The good part is once you get all of that stuff out the way, get the debt out of the way, then the answer is much like SonarQube, it gives you a nice baseline. You can see how your code is working, basically day to day as the stuff’s getting built through the environment.
So that is what we basically put together as a part of our core solution that we rolled out about 18 months ago, something like that. Rolled it out in a central environment and then basically had, started onboarding people on to the product.
What we — take another example here walk through. So this is one of our business units. And so of the 20 plus sites that we have, they boil down to about nine. Right. So they got their code moved over into the GitHub environment, they got themselves set up into the tool chain, their got their build working, all that stuff. Saw some very good improvements in terms of performance and that kind of stuff — in terms of build time. But they still were having problems with the fact that effectively it’s, you know, a whole bunch of spokes all going back to a hub across with, in some cases, relatively high latency and low bandwidth links. So if you’re having to move artifacts around and that kind of stuff, the answer ew that can be really painful.
And so, that’s pretty much found out. The major two issues that we got after we made this move. The first one was around build artifacts. Not uncommon for some of our On Prem products to have multi-gigabyte, like 20 gigs worth of install material that gets generated out of the build. If you then have to pull that back to your local site, right, the answer is that it can be really prohibited. So the first thing people did was, oh well we can solve this ourselves, we’ll do a little shadow IT, we’ll stick a little developer bar into a little server box underneath somebody’s developer desktop, and we’ll go ahead and do the builds locally.
Well that leads to the second problem up there. Right. Which is now what you have is a whole bunch of people all doing pulls of source that is all going across that network. And that becomes problematic because as you start having that many more people hitting the source base, right, in an automative way, you actually start impacting the folks that are trying to get the day job done, right. So they’re trying to do pull requests and get stuff integrated back in. And we’re putting a lot of load against the system as we’re running through that. So we backed away and spent a little time going so hmm, what can we do to try and solve this problem, given the bag of tricks we have, the tool sets we have, that kind of stuff. And the approach we ended up going with was, we needed to put some spokes on our hubs that we have. I need to get wheels that I can actually ride with as we go through this.
The general […] concept here was sort of like a CDN, right. So we wanted to be able to provide some of the build functionality out at the edge. So we wanted to be able to have people that are in Fort Collins, or people that are in Prague be able to have their builds be local, but we also want to be able to actually get the code out to where they’re at. So that they’re doing pulls across 10 gigabyte networks as opposed to across WANs. And we want to be able to get artifact management out there to the edge so that we can actually check in the artifacts that are getting built so that if somebody else is the developer there that wants to build the items they can do that stuff across the 10 gig network as well. So that came up with this idea of we’re going to build ourselves some kind of a spoke that’s going to be deployable. So we spent some time talking about, so what would we want for goals in terms of, you know, a spoke kind of environment.
We definitely wanted something that could replicate the artifacts as necessary. Making it easy. That’s definitely the sweet spot of Artifactory, so that’s one of the reasons that we’ve been using it and we just continued to add new features or enable new features in the product.
We wanted to be able to turn up new edge environments in minutes as opposed to hours, days, weeks, sometimes months. To be able to get stuff propped up because, you know, this is potentially our IT department trying to do things in Prague or in Fort Collins where they don’t necessarily have all the same resources that they have in other environments. So we wanted something that was going to be relatively well packaged. And we wanted to have minimizing the configuration that had to happen out at the edge. Right. So we wanted something that, ideally, the answer is a team comes along and says, I need an environment for this new product. And the answer is, you know, within minutes to an hour, right, the answer, there you go, you have the environment for you to be able to work with.
We wanted a solution that would auto-magically update, right. So we needed something where we don’t want our IT folks to have to touch it. Right. That become onerous for them if they got to keep reaching out and upgrading all these boxes and being able to keep them up to a patch versions and everything else. So we needed something that became more of a push model. Something where we actually started treating our own infrastructure that looked a lot more like a SaaS based kind of a model instead of a typical On Prem deploy, you know, stick it into Puppet or Chef or something like that and push it out that way, where you’re trying to evolve the environment.
We wanted something that would run out on our internal IT infrastructure, right. So we had done, we had a lot of history of doing stuff that we were call shadow IT. Right. And it became problematic. Because what it is a whole bunch of boxes that aren’t being managed, aren’t being kept up on patch revisions that don’t have the break fixed onto it so then you end up with, you know, something dies on a weekend before the final build’s gotta get done, and the answer is: sorry it’s not actually being managed by anybody that’s your problem, you’re the one that built it up. And so we ended up with getting into finger pointing. So part of the solution we wanted was — we wanted clear roles and responsibilities in terms of who was going to do what against this environment.
And then we wanted the ability to — um, I’m sorry, dynamically scale. There it is. I was missing my bullet item there. Wanted the ability to dynamically scale the environment. So what we tend to find happens is people will get the thing initially set up, they’re get it going and they’re like this is pretty cool, now I want to be able to put more stuff against it. I want more projects running against it. That means we need additional builder capacity whether those are Jenkins builders or TeamCity builders. So we wanted the mechanism to be able to automatically scale those things on demand as we worked our way through it.
We also had some boundary conditions. In terms of some of the technology we wanted to make sure got stirred into the problem. So starting from the right, going to the left there. Nutanix. So if you talked to our internal IT group, a Nutanix box is sort of like a cloud in a box. Right. It’s got computing, it’s got network, it’s got storage. It’s easy for them to be able to manage. They can get it configured up and be able to get it deployed out to the edge. So we wanted it to be Nutanix because, again, we wanted something that was going to work well with our internal IT group.
It’s going to have VMware on top of that. We’ll talk about that here in a sec. But it’s VMware because that’s how they manage that infrastructure, right. So it’s a fair amount of physical capacity, but they stick VMware over the top of it and be able to use it to slice it up to run a bunch of different VMs.
Turns out our solution doesn’t need any of that specifically, but again, back to trying to get something that was easily greased to get plugged in. From a development standpoint we did all our work on top of CentOS and that’s because we’re predominately a red hot — Red Hat shop, right. So it gets us binary compatible. But didn’t have to do licensing for those set of items that are out there so that was a reason for CentOS.
We wanted to use Docker and this is where it sort of segways into the conversation with VMware. Docker because it solves a bunch of those problems I talked about. So getting something packaged up for deployment, being able to upgrade it while it’s out in the field, being able to dynamically scale it, right. Those are all things that Docker actually does pretty well for us. So we wanted Docker to be in the mix, but, you know, people would then ask questions like, why do you have Docker over the top of VMware, right. That seems inefficient. And the answer is yeah it is, a little bit, but for what we’re trying to do right now it’s good enough. Our plan, over time, for at least the edge nodes is to take the VMware component out because we’re gonna recoup a fair amount in terms of licensing costs, right, for that layer of things. But we didn’t want to start there because again we’re trying to work well with the existing IT infrastructure that we had.
And then Artifactory was going to be the, sort of, backbone for what we’re doing. We’re going to be pushing up a whole bunch of these spokes out off of hubs. And we need a mechanism now for being able to replicate the different components that are getting generated at the different sites for those other people that need to be able to consume it. So those were sort of the major five technologies we stirred into the mix as we started trying to figure out what the solution was going to look like for this thing.
Back to, but we weren’t going to be successful unless we made the business through some change. Right. Because I can put the greatest tools out there but if it’s still a big monolithic thing and everybody has to touch everything all the time, it’s not going to work. So work with the architecture team to basically start skinning some of these hard problems. Right. If you have a picture that looks like that, right, the answer is you’re going to have to do the hard lifting to be able to actually solve that and get it down to a bunch of components. If you have something that looks like that and its only at one site, you probably don’t need to worry about it too much, right, because the answer is everybody’s at that one location doing the work but as soon as you take that and spread it across a bunch of geographies the answer is you need charters. You need to have sites that own components of the environment so that they can typically hide behind an API, they’re going to make a whole bunch of revision changes for whatever it is they’re working on, but they’re going to hold the external API consistent.
It allows people, then from other folks who are dependent — have dependencies on them to use Artifactory then to be the mechanism for replication. So you’ll have a group that’s in Ditton Park and they’re working on agent tree. And they’re going to get their stuff up and working, they’re going to get it qualify, test it, everything else and check it in as an artifact as the final binary and everybody else just simply pulls that binary in. That binary might only move on a monthly basis based on the work that they’re doing. Which means we don’t get a lot of network traffic and it’s a very efficient approach for getting things done. But it doesn’t work if you haven’t gotten that problem skinned. In parallel for us turning up a bunch this hardware we started skinning this problem with the different product teams as we were working our way through.
So now we’re back to that same picture we had where we still have nine sites but the difference was rather than having this big monolithic thing that was just peanut buttered across all the sites and everybody working on everything we actually now started to have charters for those different sites. They were working on pieces of the puzzle. Right. So their component could get checked back in and be able then to get consumed by other people in the environment.
So then moving on to what it actually look like. So this is what our current spoke looks like. And I’ll sort of walk through the components that are up there. So Nutanix at the bottom, we’ve already talked about that. That’s basically our cloud in a box infrastructure that we would deploy out. VMware running on top of that. Right now, VMware could run multiple different things right now but it’s just running a Docker environment because that’s how we’re using it for our solution. We’re running Docker and we’re running CentOS. We went with Docker as supposed to adding in something like Kubernetes or going with an OpenShift or adding in Mesos or any of these layers on top because we were looking for simple. And, you know, we didn’t have a really big need for a lot of scaling aspect to this other than turning up builders and it turns out we solved that problem a little differently. So a Docker solution using, you know, if we need to get into some of the automatic scaling using the swarm functionality but allowing us to just use compose, as our mechanism to prop these things up, was sufficient and sort of kept the problem reasonable in terms of complexity.
You move up above that, the right three boxes there, Jenkins is just standard Jenkins. We actually use the Jenkins off the shelf. The gentleman this morning that was talking about some of the best practices, I found myself nodding my head on a lot of our war wounds and lessons learned and that as we went through it. For us, we basically pulled the base Jenkins image, we added just a little bit of extra plugins in specifically that we want to have at each one of these sites and then we basically checked that in.
It turns out we checked that into Artifactory. We do have a bit of a chicken and the egg kind of a problem here where Artifactory is not running at the point that you’re trying to set this thing up so what we’re doing is we’re going back to the hub and using the hub’s Artifactory to be able to maintain all these images — to be able to pull them down. But otherwise Jenkins is pretty much just standard Jenkins. From a builder standpoint, we have a couple of base images. Same kind of thing, we try to build up an image that everybody can use so you don’t get an immense kind of sprawl out of those images. So, you know, in our current bag of tricks for a couple of the products we have an Ubuntu image and we’ve got a CentOS image for doing our builds. What we’ve added in is, you know, a little bit of instrumentation to be able to do stuff. We added in the specific libraries we need to be able to plug into our environment and the nice part then is other people can just pick it up, right. They basically just do a from, picking up that one, add in the other pieces they need for their build environment, and they’re off to the races to be able to do things.
So those are all pretty generic, same standard functionality you would normally see.
Move into the next one, there’s this thing up there called Git Cache. So the thing we found was we needed a CDN for Git. But Git doesn’t have anything like that. Right. So we needed a mechanism to be able to replicate source out to the edge and keep it up to date for a read only cache so we could allow people to do clones. Predominately it’s builders doing these clones at a high frequency but developers if they need to pull over a large workspace, they can make use of the same thing. What we’ve built up there, CentOS seven image that sits at the base of this, you basically stick Git, just standard Git, over the top of that so now we have a mechanism for being able to manage the environment.
You go into that center section, which is called RepoSync and that’s a little NodeJS application that we wrote. Our plan is we’re going to be open sourcing that. Right. So other people can make use of it. I happen to be on the customer advisory board with the GitHub guys and so we’re trying to get other folks who’ve had interest in this functionality. So we’re going to get this out to the outside.
What this thing does is basically it listens for web hooks and then it does a little bit of magic. So we tried to — back to that premise of trying to have zero configuration on the edge nodes that we were putting in — all the configuration for this thing actually happens back in GitHub. So you go into GitHub and you say, I want to replicate a given workspace by putting in a web hook and pointing it at wherever this hub’s address or this spoke’s address is. As soon as you put that in and hit save, it’s going to send out a ping, right. That ping request gets consumed by this and it turns around and runs back and says, okay give me all the information about that repo.
First thing that it does is clone the repo and pulls it down and gets it onto the local machine. It then turns around and figures out who are all the users that have access to this repo, what are their privs. Then it figures out what SSH keys those users have and it copies all those things down and puts them onto the local machine. The web hook then keeps it up to date. Right. So if somebody comes in and changes out an SSH key or somebody comes in and changes out, you know, a piece of code or pull a request or something like that, a web hook fires and this thing will automatically pull the information back down. It keeps in sync within seconds. Right. It’s not, you know, a two-faced commit so it’s not you put it up into the main repository and it’s instantaneously down but even across our slow links, you know, usually its seconds is what you’re getting in terms of the amount of time it takes to propagate the changes once you get the initial clone out of the way.
Technology wise, that’s just a little NodeJS application that’s running. We did Node because it’s multi-threaded and it’s trivial to be able to set up restful interfaces inside of it to be able to consume stuff. When it then does is it makes use of Apache and GitoLite.
So Apache uses one of the prototypes you get out of Git, right. Is to be able to do a clone across HTTPS. And so that’s what Apache is there for is RepoSync is pulling stuff down into the repository for storage and then serving it up effectively two ways. Serve it up through the Apache path for people to be able to do clones via that mechanism or, more typically in our environment, it serves it up through GitoLite to be able to do SSH access into those environments.
GitoLite’s a off the shelf product that’s out there, it’s open source. What it does is it manages all of the access privileges against a set of SSH keys you have, against the repo. So that meant all the magic around being able to make sure that appropriate privileges were being kept as stuff came down, all of that gets delegated into GitoLite because that handles that for us.
So with, you know, a couple of weeks worth of work we ended up with effectively a cache now that runs — it’s read only — that runs out at the edge and allows our people to be able to have our builders out there be able to pull stuff down directly from local stuff on 10 gigabyte networks. You can imagine that makes a big difference in terms of performance, right. In terms of what their build times are and that kind of stuff. We’ll talk about some of the numbers here in a sec.
I think that’s pretty much it on that one so we’ll move on.
So talk a bit about Artifactory cause it’s still just based Artifactory here, right. We’re starting to use more and more features out of it. So we have this hub and spoke architecture, we’ve now taken a big, monolithic chunk of code and broken it up into a variety of repos. We now have different sites that are owning their piece of the puzzle and they’re turning it at whatever frequency they need to turn it at, right. Builds are happening continuously through the course of the day but you know if they want to publish a new version, get qualified once a day, great. If they want to do it once a week, great. If it’s once a month, that’s fine too. Right. The idea is not to try and put a mandate against everybody is to how to work, it’s to provide a set of interfaces that allows us to get insulated. And that’s where Artifactory comes in as the glue for us.
So everybody checks into a local Artifactory instance, and then their dependencies basically get mapped over to the other ones that are needed. So we can setup caching, we can set up automatic pushing, for the environments as stuff gets brought up to date. It becomes our one stop shopping place for Docker images. So, hey, I make reference to the guys back in the office that, you know, it seems like every single time I find myself going, gee I wish, you know, I had a blah, in terms of some kind of repo or capability or something like that I go take a poke inside of Artifactory and like, look at that, it’s already got it. Right.
We haven’t done much in the way of NodeJS functionality. Right. NodeJS requires either Bower of NPM. Went in there, sure enough it had it. Right. You know, we’re a CentOS environment, so we’re YUM based for our repositories. All these things, if you’re pulling them from the outside network, you’re slowing down all your build times, right. So now we got it propped up inside of Artifactory such that it caches all of this stuff for us. The comment that was made this morning about sort of the best practice of setting up repos, for Docker is that you have a dev environment and a prod, right, so two separate repos that you promote between. Same model we took. The image we’re actually running for Artifactory is Artifactory’s image. Right. I love it. Right. I don’t have to worry about, you know, building it, maintaining it, all these kinds of things. All the patches come down for me, right. All I have to worry about is basically externalizing the storage and backing it up on this node. And we can then take the updates that come down from Artifactory, much like the same model that we have going on with the GitHub Enterprise functionality. Although in that case, it’s a VM that we use.
The other thing to mention on here is Bintray. So this one’s new for us. We’re just in the process of getting this out there. We as a company are starting to package our applications up in Docker containers for easy try and buy type deployments. Right. So to be able to get something out there and have a customer easily download it and prop it up and get it running. Often, many of our On Prem products have a lot of knobs you can turn. Right. I can order it in this OS with that flavor of a database with that flavor of a webserver. Right. And so it makes for a fair amount of complexity for the installation. The cool part with the Docker stuff is we can basically fix all of those variables on the edge for a try and buy kind of environment. Right. If you just want to take it out for a spin, you’re just looking to see where the functionalities like, then the answer is that we can fix all those dependencies and basically provide you with an environment.
So then the question becomes, so how do we provide these to our customers? Right. We could put these out into, you know, open source or open environments for being able to do downloads, but for us we wanted something that was tied into our internal systems to be able to assure that this person really is a customer, has access to it, if they’re going to be specifically looking at depending on what component they’re looking at — what product they’re looking at. And so Bintray is going to be our mechanism for doing that. So another capability that, we know, we’re just leveraging out of the box. So for us Artifactory has become the backbone, basically, that we use for doing a variety of managing this distributed environment that we’ve built up. And so far have been very happy with the functionality that’s been provided.
So. Impacts of the changes that we have here. So I’ll walk through, talk through some of the goals here. So already starting to see value, not all this stuff is rolled out yet. Right. So a bunch of the stuff, so everything that was in the hub, all that stuff’s good. Everything’s all rolled out there. You get into a bunch of the things that are in the spokes, those are getting rolled out right now. But we are already starting to see value, even though it’s not fully implemented right now.
So the things that have check boxes against it, those pieces are all up and working. The couple that you see in there that have the, you know, work in progress, you know, one of them is around the ability to automatically evolve in place. Haven’t gotten to that one yet. We think we know how to do it because we picked Docker as our technology that underpins this thing. Because we’ve built up Artifactory as acting as our repo because we got a mechanism for testing these things. We believe it’s going to be a relatively straight forward thing for us to do. To be able to set up more or less a cron job, right, that’s going to go out and be able to compare on a regular basis what the hash is associated with the image and if that hash changes then the answer is we’re going to go through a bounce and bring it back up. That will happen at whatever is the right time, you know, typically something at the middle of the night for whatever is the given geo that we happen to be running in.
So we think we know how to scan it, it hasn’t been the one that’s made to the top of the list yet because we’ve only been rolling this out into a couple of different areas so far. The other one that we have there is the ability to dynamically scale. Right now we’ve gone with the more traditional model, right. Which is you basically prop up builders as necessary in that environment to be able to provide the capacity. Works perfectly fine, right, but it doesn’t give us what we’re asking for, which is the ability to dynamically scale on the fly.
We looked at a couple of different ways to scan this problem, the one we’re going to settle on, it looks like, is going to be basically on demand builders. Right. So the idea is you’re going to come into Jenkins and Jenkins is going to say, I need to go off and schedule something on TeamCity or I’m going to go schedule something. And the answer is we’re going to basically prop up the builder just in time, right, for it to be able to do its work. As soon as the build is done, we then tear it back down, right. So it gives us a very dynamic environment for being able to manage stuff. We’ve — again it’s one of these ones where we’ve looked at that, we’ve looked into a swarm mechanism, a Kubernetes mechanism, there’s a bunch of different ways of going about trying to skim the problem. And came away with we think that’s going to be the one that’s going to be the best for the set of needs that we’ve specifically got at our company. So those are coming.
So given some data about — so how has this helped us. Right. So that original product that I was talking about that was legacy, right, it was, you know, really big, spread out across a bunch of sites, that kind of stuff. So the original build time was around 16 hours. Right. To be able to get a build taken care of. Their worse case now is about an hour. Right. Now this is worse cases, it’s the biggest component after it got broken up, right. Their biggest component takes about an hour. So, you know, for us that’s, what, a 93 percent improvement in performance. For us, if you think about how we look at this, right, we’re trying to basically make our developers just as productive as we can. We want them spending as much time as the keyboard writing code as they possibly can and we just dropped the cycle time, right, for those people by 93 percent or better.
A couple of the components that we have running that environment now take, like, a minute and a half to build in the build environment and so that means, that you know, we’re now getting to where we have a far more agile approach to doing things. Where it’s, you know, build a little, test a little, you know, build a little, test a little — work your way through that in very tight time frame. So it’s made a big change in there for us.
Original time to copy the build result for us using this one specifically, right, it was about a 12 gigabyte image that got generated and it was taking on the order of about 400 minutes to get that sucked back from the main builder’s, back across the WAN, over into the local site where people wanted to do some spot testing with things. Now that we have stuff being able to be built locally and being pulled across the local network, we’re down to about 12 minutes. Right. So again, another where it went from a cycle time standpoint, a huge difference in terms of performance. It’s actually even better than that because with some of the new stuff that we’re doing with Artifactory that’s going out right now we think we’re going to get those things down to like minutes. Only because a bunch of the artifacts are already there, right. So it’s really going to be that worse case, sized artifact, is the only one we have to worry about as we start pulling the stuff over cause everything else will already be staged.
Original time to sync the source was taking around 10 minutes, something like that. Right. In the new model we’re down to about 30 seconds on that worse case component. So, you know, again you’re seeing about a 95 percent improvement in terms of cycle time and performance there. And then the last one. Feature branch concept, right. Before it took us about 48 hours to go through and manually configure and set up a feature branch and get it going and everything. We’re now down to where it takes us about 12, right, so, you know, that’s 75 percent better but that’s still a lot. Well, it turns out that 11 and a half of those hours are qualifying, right. So that’s after we spun it up, got it running, everything was working we then basically threw the regression suite at it to make sure it’s going to work and it takes 11 and a half hours. So that one we’re still working on cause we got to cram it down a bit more.
So for us, right, we’ve been at this now for four years. You know, we’ve been in terms of the journey of starting to move to Agile and those kinds of things. We’ve been at this piece of the puzzle for 18 months, something like that. And starting to see really big dividends coming out of this. We’re starting now to take this exact same pattern and starting to roll it out across the broader organization. We’re getting the hub functionality or spoke — excuse me, functionality now plugged in on a couple of the smaller ones. Getting the proof case proven out there and then starting to work with our IT group to get everything replicated out to the edge.
So the big thing where this then plays in when we go forward. So, again I say we often grow through acquisition. It’s not uncommon for us to do two to three acquisitions a year. And they way we’ve done that in the past is, you know, we bring them in, they have their own IT, they have their own tool chain, they have their own way of doing all these kinds of things, and it used to be really difficult for us to be able to get them integrated into our organization from a development standpoint. And so now what we’re finding is with the one relevant set of tools we’re now making use of in the central portion of things as well as the ability to turn up the spokes in minutes as opposed to weeks and months and that kind of stuff. We’re finding it much easier in terms of our ability to digest a new company as they come in and being able to get them onto our tool chain and get the standardization and get into the bang for the buck for being the size of the company that we are.
I think that’s about it for the major content. I wanted to leave a chunk of time, I think I still have about 10 minutes here at the end.