Unpacking the Container: A Deep Dive into Virtualized Container Technology
Containers have become integral to every phase in the lifecycle of application development. Production grade orchestration tools such as Kubernetes have been built to manage them and container platforms like Docker are becoming commonplace in both testing and development. Web tutorials on how to build and manage simple Docker images abound! But what are containers exactly and why have they become so essential to the DevOps ecosystem? This talk is for those curious minds who want to look below the surface and really understand the mechanics of a technique that has actually been around longer than you may think. Where did Docker come from? What about other projects in the container ecosystem – are there alternatives? What does a Docker image actually look like on the filesystem? How do Docker image layers work? What are cgroups? How are system resources allocated and managed and are there any gotchas that you should be aware of? What about security? How can JFrog Container Registry help me manage my Docker images? After this talk, you will have a solid understanding of the what, how & why of virtualized container technology.
VIDEO TRANSCRIPT
Hi everyone, welcome to swampUP and welcome to this talk on unpacking the container. I want to get started quickly because I’m excited to share with you some of the things that I’ve learned about containers, which of course are still all the rage right now. My hope is that you will come away from this talk with the better understanding of how containers actually work on your system, and some of what’s really going on under the covers. A Q&A for this session will be done at the same time in the chat. So take advantage of the next 30 minutes and get your questions out there. If I’m not able to get to them during the session. I’ll make a point to follow up with you afterward. So a little bit about me, my name is Melissa McKay.
I just started with JFrog as a developer advocate in February of this year. I come from a developer background and I’ve been a developer in some way, shape, or form over the last 20 years. Most of my experience has been in server-side development and in Java, but I’ve had the privilege of working on many different teams over the years in a variety of different technologies, languages, and differential sets. I think most of you know, it’s not very easy these days to just stay in one language. So, getting a polyglot experience has been a wild ride for me. I’ve been on large teams, small teams, in big companies, small companies. I’ve had my share of frustrations and successes and along the way I discovered a passion for diving in deep to understand things, sharing what I’ve learned, and hopefully improving processes along the way. I started speaking a few years ago and I decided that was something that I wanted to do more of. So, I threw my hat into the rain to become a dev advocate with JFrog. Clearly that all worked out and I’m here with you today at swampUP. So I feel privileged to be here with you and I’m super excited to talk with all of you today about virtual containers. So, let’s go ahead and get started. I’ll start out with a brief history to give some background context. Hopefully, it won’t be too boring. But there’s definitely some Milestones that have happened in the past that we should go over. So it better explains how we got to where we are today, then we’ll take a look at the container market, that’s interesting to see what’s been going on over the past few years, and it’ll be interesting to see what the next few years brings us. Then we’ll move into getting a real understanding of what Docker actually is. Docker is a very overloaded word term that is used and thrown around quite a bit. So we’ll clear that up. After that, we’ll be in an excellent place to talk about what a container actually is, and then we’ll review a few common container gotchas just a few simple things to avoid, stuff that I experienced right away when I started. Finally, I’ll leave you with an option for managing your images. So without further ado. Let’s jump in and start learning about containers. Now I know some of you are already wondering if you’re in the right place because that’s not the picture that you were expecting I’m sure. I know the classic shipping container photo is a pretty much expected, but there’s actually a couple reasons I chose to show bananas here. First and foremost, I’m tired of seeing shipping containers on every presentation about Docker, or containerization in general. So, I started a rebellion. You’re not going to see any shipping containers in this presentation.
Second, this is really a story about how our industry has adapted to dealing with limited resources over time, and bananas reminds me of a story that my grandfather would repeatedly tell me when I was growing up, and it went like this: things were a lot different for him as a kid than for me. I think every generation says that to the next, he continued on to share that when he was a kid, he would get a banana once a year on Christmas. This must have been during the 20s and 30s, bananas were such a treat at that time that none of that banana would go to waste. He and his siblings would take a fork and scrape the banana peel to get every last bit of banana off because there likely wasn’t going to be another one until next year. So maybe it isn’t the best analogy, but I like in that story to how computing resources were in the 1960s and 1970s.
I know that’s reaching but hey, very limited and very expensive, on top of that it took forever to get stuff done. Often a computer would be dedicated for a long period of time to a single task, for a single user. Obviously, the limits on time and resources created bottlenecks, and inefficiency. Just being able to share was not enough either, there needed to be a way to share without getting in each other’s way, or having one person inadvertently causing the entire system to crash for everyone. So, once again necessity is the mother of invention and the need for better strategies and sharing compute resources actually, started a path of innovation, that we see massive benefits from today. There are some key points in time that brought us to this state we are in today. I’m going to begin this lesson with Chroot. So Chroot was born in 1979, during the development of the 7th edition of Unix and was added to BSD in 1982, being able to change is the apparent root directory for a process and its children, results in a bit of isolation in order to provide an environment for say testing a different distribution, for example. So Chroot was a great idea, solved some specific problems, but more than that was needed. So in 2000, the jail command was introduced by FreeBSD.
Jail is a little more sophisticated than Chroot and that it includes additional features to help with further isolation of file systems, users, and networks with the ability to assign an IP address to each jail. In 2004, Solaris Zones brought us ahead even further by giving an application, full user process, and file system space and access to system hardware. Solaris Zones also introduced the idea of being able to snapshot a filing system, which you’ll see is pretty important. In 2006, Google jumped in with their process containers. Those were later renamed to cgroups, these centered around isolating and limiting the resource usage of a process. This is huge. Moving right along in 2008, cgroups were merged into the Linux Kernel which along with Linux namespaces led to IBM’s development of Linux containers.
At 2013, it was a big year. Docker came on the scene bringing their ability to package containers and move them from one environment, to another. The same year Google open sourced their “let me container that for you” project which provided applications the ability to create and manage their own sub containers. From here, we saw the use of containers and Docker specifically, absolutely explode. In 2014, Docker chose to swap out their use of the LXE toolset for launching containers with lidcontainer in order to utilize a native Golang solution. And that was something that I didn’t know when I first started is that, Docker’s actually written in Go.
Almost done with this history lesson, because from here on out you would just start seeing a ton of names of projects, organizations specs, etc. That are just confusing if you don’t have a better understanding of how containers work, which is the point of this session. And this last event, however, in June 2015 is important enough to bring up, it’s included because it will give you some insight into some of the activity and motivations behind shifts in the market. The open container project / initiative was established. This is an organization under the Linux Foundation and it includes members from many major stakeholders. Including Docker, which was really important with the goal of creating open standards for container runtimes, and image specification. So that’s it for the history lesson. Let’s take a look at what’s been going on in the market recently concerning container runtimes. So, I did a little hunting, and I found that for the last three years Sysdig a company that provides a really powerful monitoring and troubleshooting tool for Linux; has put out a container report based on the analysis of their own users. Part of the report includes data on container runtimes that are in use. In 2017, they analyzed data from 45,000 containers. There’s no graph available for that because 99% of those were Docker. So, they didn’t feel the need to split up the results. In 2018, however, they doubled their sample size to 90,000 containers.
and as you can see 83% is Docker, 12% is CoreOS RKT containers, 4% Mesos containerizer, and 1% LXC. It looks like other container runtimes maybe are encroaching a little bit on Docker. So moving on to 2019. This is the latest Sysdig container report and this included stats from over 2 million containers. They did state the included data from both their SaaS and on-prem users. I’m not really sure the last two years, if it was just on-prem or why they felt the need to put that information in there. But 2 million is a huge number, so here we go. Docker is still holding relatively strong at 79%, 18% is container-d. But it’s worth noting that container-d is a runtime that Docker actually builds on top of. I’ll tell you more about that one later. and the last 4% is Cri-o. So, I don’t know that there’s enough data here to determine whether Docker is going to stay on top in the future, or something completely different will prevail that remains to be seen. But it’s interesting especially because of what’s been happening over the last few years, which we’ll get to that later. But now that I’ve introduced a few of these other container runtimes that exist out there besides Docker; it’s time to start talking about what a container actually is, and what Docker actually provides in order to appreciate the differences between them. So, what exactly is Docker anyway? This is key. What Docker had over the players, the other players in the container game was this steadfast focus on commoditizing a complete solution, complete end-to-end solution that made it easy for developers to package, and deploy their applications.
Once containers became easy to use we all witnessed the explosion of tools and resources around containers, and the Docker image format rose to become a de facto standard in the market. The stats I showed you from Sysdig are specific to container runtimes, and that terminology is important to remember. I’ll explain the pieces and parts involved in working with containers, and you’ll immediately understand why Docker sucked up the market so fast. So as users, let’s think about what we actually need to get our apps out there and running. Every innovation that is coming out of this space is purely based on what users need or want. Whatever the motivation is behind it. If a user needs or wants something bad enough, there’s a huge opportunity for solution providers. That seems like such a common-sense thing to say, and that is maybe not worth saying but so often we can find ourselves getting so far down into the nitty-gritty details that we lose sight of the actual problem we’re trying to solve. And that of course leads to a ton of missed and overlooked opportunities. So, here’s a list of needs broken up into discrete features.
First and foremost, we need that container itself, and some of you might be asking right now about virtual machines. Those are already out there, discussing the ends is out of the scope of this session. So, I’m not going to go deep into the differences between VMs and containers, or why you would use one over the other. The one thing I’ll say is virtual machine is not synonymous with container; the biggest difference being that a VM includes an entire OS all to itself, and containers share the systems OS that they’re running on. The point of the container is to be lightweight and have the ability to move from one environment to another, seamlessly and quickly. That’s that I know that there are developments happening in the VM space, but that’s a topic for another time.
So, the rest of this list we need an image format to define a container. We need a way to build an image of a container, a way to manage images, a way to distribute and share container images, a way to create, launch, and run a container environment, and a way to manage the lifecycle of the running containers. I didn’t even get to orchestration or anything but this is plenty to prove my point. So, Docker was ready with an answer for everything. You want to start using containers, use Docker engine. Oh, you need an image format, here’s the Docker image format. You need a way to build an image, use a Docker file and call Docker build. You want to manage images, call Docker images or Docker remove You want to share your images or use an image from someone else, call Docker push or Docker pull. Oh, and by the way, we have Docker Hub where you can store and share your images. You need a public, you need a way to launch, run, and manage your containers and their lifestyle. You can call Docker run, Docker stop Docker ps.
Docker succeeded in quickly meeting the immediate needs of a hungry container market on top of that the tool sets that Docker provided made it all so easy. It was enough to walk away with a tremendous part of the market share all the way down the aisle.
And by the way, it was really difficult finding a relevant banana picture for this slide. So, I hope you appreciate this one. So, remember in our history lesson when I spoke about the open container initiative. Out of all of those features that we just talked about, there are two that were taken up for the cause by the OCI. The image format and the container runtime. Docker did quite a bit of reorganizing their code base, developing abstractions, pulling out just great functionality. They are a heavy contributor to the OCI. Giving the Docker V2 Image Spec as a basis for the OCI Image Spec and runC, which was contributed as a reference in full implementation of the OCI runtime spec. There are quite a few other container runtimes out there making waves including, containerd, rkt, cri-o and Kata all with various levels of features for specific use cases. It’s worth pointing out that containerd was actually contributed by Docker to the cloud native to CNCF and internally uses runC. Containerd has also been integrated into Docker, and it’s been in use since Version 1.11, so quite a while now. So, the next few years will be interesting to observe what happens with these specs and how the OCI moves forward. It doesn’t seem that they’re done yet.
And there is quite a range of differing opinions about what should, and should not be in the standard for a container runtime. Lots of discussions going on about that. I’ve added a couple links here that are excellent, starting places to learn more about container runtimes, if you’re curious. The second one, is the beginning of a blog series by Ian Lewis. He is a Google dev advocate. The first subtitle in that blog is literally, why are containers runtime so confusing? Which I’d chimed in right along with, yes, yes. Why is that? Anyway, he does a really good job of explaining some of the issues there. So, now that we understand all that Docker entails, and some of what’s going on in the market. Let’s focus on just the container itself and what that actually looks like on your system. I’ll show you how it’s stored and what is actually happening under the covers. You’ll discover pretty quickly that images and containers aren’t really all that magical. So, my first experience with containers was as a new developer on a project with a tight deadline.
Of course, I could argue that sick-good description of most projects. The best course of action for me, was to just jump in and start getting something up and running on my local machine. I learn best by doing and the Docker documentation is actually really good. So, if you find yourself in a similar position, I recommend going through their get started docs, which I share a link here for that. I’m going through that guide will get you, somewhat comfortable with a lot of the Docker commands that you’re going to need. The first thing to note, is that Docker image is just a tarball of a complete file system. When an image is actually unpacked on your system, it’s just thrown into its own directory, which becomes its root file system. The second, is that processes that are involved in running the containers are just regular Linux processes.
There’s really nothing special about them. You could technically, you know, create and run containers without anything other than just calling the appropriate Linux commands. So, namespaces are worth pointing out there. It’s really important ingredient because this is what is used to provide virtual separation between containers. This is how the process is inside a container, don’t interfere with the processes inside another container. Here you can see some of the names faces that were set up for a postgres container that I have running on my box. The cgroups functionality is integral to constraining how much a container can use things, like CPU, memory, network bandwidth, etc… I can set these constraints by including options on the Docker run command when I’m launching an image. You can see that in this particular one.
I’ve constrained the memory usage limit on one of my containers. And this didn’t use to show the correct of thing, the correct limit. So, it looks like that’s been fixed on a later version of Docker here. So, I want to quickly gloss over some file system details, where containers and images are actually stored on your file system. First off, after you’ve installed Docker running the command Docker info will spit out a bunch of information about your installation, including your Docker root directory, which I have noted here. This is where most everything you’re going to care about regarding your Docker images and containers will be stored.
Note that if you’re on a Mac, your containers are actually running in a tiny Linux VM. So, you’re going to need to use a screen, or something to get in there and to actually get to the Docker root directory to check it out. And if you’re not familiar with how to use the screen command, definitely Google that and get familiar with that first. It’ll mess up your text display pretty good if you don’t enter and exit screen the right way. A little frustrating if you haven’t done it before. This slide shows how you can get information about the images that you have stored on your system. First, I listed my available images using the Docker images command. I actually have several installed but I only list the first couple here to display. Using the Docker inspect command. I can inspect any image I like, using its image ID.
Excuse me. This will spit out a ton of information. But what I want to highlight here is the graph driver section, which contains the paths to the directories where all of the layers that belong to this image live. So, Docker images are composed of layers, which represent instructions in the Docker file that was used to build the image originally. These layers actually translate into directories, and the layers can be shared across images in order to save space. The lowerdir, mergedir, and upperdir sections are important. The lowerdir contains all of the directories or layers that were used to build the original image. And these are all read only. The upperdir contains all of the content that has been modified while the container is running. It’s important to remember that this is ephemeral data, and only lives as long as the container lives. In fact if you have data that you intend to keep you should utilize the volume features of Docker, and mount a location that will stick around after the container dies.
This is how most containers running a database, are run. The mergedir, is like a virtual directory that combines everything from lowerdir and upperdir and workdir as well. The workdir is like an internal directory. So, this slide shows, I actually have a few containers running on my system, two of them are my local JFrog container registry installation, and that includes a container for Artifactory, and another container for a postgres database. The other is a simple test container that I was playing around with. Actually, I cleaned my system up quite a bit just so that I can get some clean screen shots here. I had a ton of images and containers running before this. Anyway, note that the container IDs of the running containers, match up with the containers subdirectory names, and then something else to remember here.
Oh, if you stop a container that corresponding directory doesn’t go away until the container is actually removed. With a Docker remove command. So, if you’ve stopped, if you have stopped containers lying around that never get cleaned up, you might see your available space start to dwindle. There’s a Docker prune command that you can use every now and then to help clean things up. So, the tool sets around building and running images and containers have made things so easy, that it’s also easy to shoot yourself in the foot in a few places. Here are 3 of the most common gotchas that I run into, and that I ran into almost immediately when I first started working with containers.
The first is, running a containerized application as the root user. I’ll be honest here, when I was initially getting containers up and running. I was so excited about how well it was working that I didn’t take this very seriously. I heard it but I didn’t really, you know, it didn’t land. So now that you know that process is inside a running container are just like any other process on the system. I’ll be at a few constraints. It’s scary now to run as root inside a container. Doing that opens up the possibility of a process escaping the intended confines of the container, and gaining access to the host resources. Exactly what we don’t want to happen. The best thing to do, is to create a user, and use the user command inside of the Docker file when the container’s built, in order to run processes that user.
There is a way to specify a user when the Docker run command is used, but that leaves open the possibility of forgetting to do that. It’s kind of nice if the image is just set up by default, not to run as root. Also pay attention to official images you pull from Docker Hub, whether or not they run as root, or if they leave that up to you to figure out. So, even though Docker provides you with the ability to set resource limits on your container. It doesn’t automatically do it for you. In fact, the default settings are a free for all with no limits anywhere. So make sure you understand the resource needs of your application, too little your container will die from starvation, too much the container could smother others on the system. The resource usage of your containers is also something that you’re going to want to monitor over time and adjust as needed. It’s a good way to determine if something is going wrong, or if load on your system has changed for some reason. So, this is a pretty big security issue.
It’s easy to get complacent, and not pay attention to what is actually getting pulled in when you build images. Not only do you need to beware of outdated versions that you specify in the Docker file, but you need to pay attention to what’s in the base image is coming from. Not updating packages in libraries inside your container can lead to some embarrassing results, especially, when there are tools available now to alert you when security issues have been discovered with specific artifacts. So, speaking of tools that help manage images. JFrog Artifactory supports Docker registries and images. You can use it just like you do for other types of artifacts both as a cash for a third party-based images, and as your own internal registry. After uploading your Docker images, you have the ability to gather statistics, and even drill down into each layer of an image for more information. If you’ve integrated a CICD solution, you can also determine what build produced a particular image or what build used it. We discussed earlier the problems with not updating images regularly, JFrog Xray is a security scanning tool that will alert you if there are any known security vulnerabilities, or even licensing issues with your artifacts. For a Docker images it’s especially useful, because it has the ability to drill down into the layers of the image to find out exactly what library, or package has been flagged as a problem. You have control over how sensitive to make these alerts, and what actions to take when they’re triggered Whether to fill a build, prevent a download, or simply send a notification about the problem. All right. We’ve come to the end of our time here.
So thank you for coming everyone. I hope you enjoyed this session, that you got something out of it to take back to your teams, and feel free to reach out to me with any questions you have. Enjoy the rest of swamp up online.