Unpacking the Container: A Deep Dive into Virtualized Container Technology [swampUP 2020]

July 20, 2020

2 min read

What are containers? Check out: https://jfrog.com/knowledge-base/what…

What is Docker? Check out: https://jfrog.com/knowledge-base/what…

Containers have become integral to every phase in the lifecycle of application development. Production grade orchestration tools such as Kubernetes have been built to manage them and container platforms like Docker are becoming commonplace in both testing and development. Web tutorials on how to build and manage simple Docker images abound! But what are containers exactly and why have they become so essential to the DevOps ecosystem? This talk is for those curious minds who want to look below the surface and really understand the mechanics of a technique that has actually been around longer than you may think. Where did Docker come from? What about other projects in the container ecosystem – are there alternatives? What does a Docker image actually look like on the filesystem? How do Docker image layers work? What are cgroups? How are system resources allocated and managed and are there any gotchas that you should be aware of? What about security? How can JFrog Container Registry help me manage my Docker images? After this talk, you will have a solid understanding of the what, how & why of virtualized container technology.

Speakers

Melissa McKay

    Melissa McKay

    Melissa is a long-time developer/software engineer turned international speaker and is currently a Developer Advocate on the JFrog Developer relations team, sharing in the mission to improve the developer experience with DevOps methodologies. Her background and experience as a software engineer span a slew of languages, technologies, and tools used in the development and operation of enterprise products and services. She is a mom, Java Champion, Docker Captain, co-author of the upcoming book DevOps Tools for Java Developers, a huge fan of UNconferences, and is always on the lookout for ways to grow and learn. She has spoken at Kubecon, DockerCon, CodeOne, JFokus, Java Dev Day Mexico, the Great International Developer Summit, and is part of the JCrete and JAlba UNconference teams. Given her passion for teaching, sharing, and inspiring fellow practitioners, you are likely to cross paths with her in the conference circuit — both online and off!

    Video Transcript

    Hi everyone, welcome to swampUP
    and welcome to this talk on unpacking the container.
    I want to get started quickly
    because I’m excited to share with you
    some of the things that I’ve learned about containers,
    which of course are still all the rage right now.
    My hope is that you will come away from this talk with the better understanding
    of how containers actually work on your system,
    and some of what’s really going on under the covers.
    A QA for this session will be done at the same time in the chat.
    So take advantage of the next 30 minutes and get your questions out there.
    If I’m not able to get to them during the session.
    I’ll make a point to follow up with you afterward.
    So a little bit about me, my name is Melissa McKay.
    I just started with JFrog as a developer advocate in February of this year.
    I come from a developer background
    and I’ve been a developer in some way,
    shape, or form over the last 20 years.
    Most of my experience has been in server-side development and in Java,
    but I’ve had the privilege of working on many different teams
    over the years in a variety of different technologies,
    languages, and differential sets.
    I think most of you know;
    it’s not very easy these days to just stay in one language.
    So getting a polyglot experience has been a wild ride for me.
    I’ve been on large teams, small teams, in big companies, small companies.
    I’ve had my share of frustrations and successes
    and along the way I discovered a passion for diving in deep to understand things,
    sharing what I’ve learned,
    and hopefully improving processes along the way.
    I started speaking a few years ago
    and I decided that was something that I wanted to do more of.
    So, I threw my hat into the rain to become a dev advocate with JFrog.
    Clearly that all worked out and I’m here with you today at swampUP.
    So I feel privileged to be here with you
    and I’m super excited to talk with all of you today about virtual containers.
    So let’s go ahead and get started.
    I’ll start out with a brief history to give some background context.
    Hopefully, it won’t be too boring.
    But there’s definitely some Milestones
    that have happened in the past that we should go over.
    So it better explains how we got to where we are today,
    then we’ll take a look at the container market,
    that’s interesting to see what’s been going on over the past few years,
    and it’ll be interesting to see what the next few years brings us.
    Then we’ll move into getting a real understanding of what Docker actually is;
    Docker is a very overloaded word term
    that is used and thrown around quite a bit.
    So we’ll clear that up.
    After that, we’ll be in an excellent place to talk about
    what a container actually is,
    and then we’ll review a few common container gotchas
    just a few simple things to avoid,
    stuff that I experienced right away when I started.
    Finally, I’ll leave you with an option for managing your images.
    So without further ado.
    Let’s jump in and start learning about containers.
    Now I know some of you are already wondering
    if you’re in the right place because that’s not the picture
    that you were expecting I’m sure.
    I know the classic shipping container photo is a pretty much expected,
    but there’s actually a couple reasons I chose to show bananas here.
    First and foremost,
    I’m tired of seeing shipping containers on every presentation about Docker,
    or containerization in general.
    So I started a rebellion;
    you’re not going to see any shipping containers in this presentation.
    Second, this is really a story about how our industry has adapted
    to dealing with limited resources over time,
    and bananas reminds me of a story
    that my grandfather would repeatedly tell me when I was growing up,
    and it went like this:
    things were a lot different for him as a kid than for me.
    I think every generation says that to the next,
    he continued on to share that when he was a kid,
    he would get a banana once a year on Christmas.
    This must have been during the 20s and 30s,
    bananas were such a treat at that time
    that none of that banana would go to waste.
    He and his siblings would take a fork
    and scrape the banana peel to get every last bit of banana off
    because there likely wasn’t going to be another one until next year.
    So maybe it isn’t the best analogy,
    but I like in that story to how computing resources were in the 1960s and 1970s.
    I know that’s reaching but hey,
    very limited and very expensive,
    on top of that it took forever to get stuff done.
    Often a computer would be dedicated for a long period of time to a single task,
    for a single user.
    Obviously the limits on time and resources created bottlenecks, and inefficiency.
    Just being able to share was not enough either,
    there needed to be a way to share without getting in each other’s way,
    or having one person inadvertently causing the entire system to crash for everyone.
    So, once again necessity is the mother of invention
    and the need for better strategies
    and sharing compute resources actually started a path of innovation,
    that we see massive benefits from today.
    There are some key points in time
    that brought us to this state we are in today.
    I’m going to begin this lesson with Chroot.
    So Chroot was born in 1979,
    during the development of the 7th edition of Unix
    and was added to BSD in 1982,
    being able to change is the apparent root directory
    for a process and its children,
    results in a bit of isolation in order to provide an environment
    for say testing a different distribution, for example.
    So Chroot was a great idea, solved some specific problems,
    but more than that was needed.
    So in 2000, the jail command was introduced by FreeBSD.
    Jail is a little more sophisticated
    than Chroot and that it includes additional features
    to help with further isolation of file systems,
    users, and networks with the ability to assign an IP address to each jail.
    In 2004 Solaris Zones brought us ahead even further by giving an application,
    full user process, and file system space and access to system hardware.
    Solaris Zones also introduced the idea of being able to snapshot a filing system,
    which you’ll see is pretty important.
    In 2006 Google jumped in with their process containers.
    Those were later renamed to cgroups,
    these centered around isolating and limiting the resource usage of a process.
    This is huge.
    Moving right along in 2008,
    c groups were merged into the Linux Kernel which along with Linux namespaces
    led to IBM’s development of Linux containers.
    At 2013 was a big year.
    Docker came on the scene bringing their ability
    to package containers and move them from one environment, to another.
    The same year Google open sourced their ”let me container
    that for you” project which provided applications the ability to create
    and manage their own sub containers.
    From here, we saw the use of containers and Docker specifically,
    absolutely explode.
    In 2014 Docker chose to swap out their use of the LXE toolset
    for launching containers with lidcontainer in order to utilize a native golang solution.
    And that was something that I didn’t know when I first started is that,
    Docker’s actually written in Go.
    Almost done with this history lesson,
    because from here on out you would just start seeing a ton of names of projects,
    organizations specs, etc..
    That are just confusing if you don’t have a better understanding of how containers work,
    which is the point of this session.
    And this last event, however,
    June 2015 is important enough to bring up,
    it’s included because it will give you some insight
    into some of the activity and motivations behind shifts in the market.
    The open container project / initiative was established.
    This is an organization under the Linux foundation
    and it includes members from many major stakeholders.
    Including Docker which was really important
    with the goal of creating open standards for container runtimes,
    and image specification.
    So that’s it for the history lesson.
    Let’s take a look at what’s been going on in the market recently
    concerning container runtimes.
    So I did a little hunting,
    and I found that for the last three years
    Sysdig a company that provides a really powerful monitoring
    and troubleshooting tool for Linux;
    has put out a container report based on the analysis of their own users.
    Part of the report includes data on container runtimes that are in use.
    In 2017, they analyzed data from 45,000 containers.
    There’s no graph available for that because 99% of those were Docker
    so they didn’t feel the need to split up the results.
    In 2018, however,
    they doubled their sample size to 90,000 containers.
    and as you can see 83% is Docker,
    12% is CoreOS RKT containers, 4% Mesos containerizer,
    and 1% LXC.
    It looks like other container runtimes
    maybe are encroaching a little bit on Docker.
    So moving on to 2019.
    This is the latest Sysdig container report
    and this included stats from over 2 million containers.
    They did state the included data from both their SaaS and on-prem users.
    I’m not really sure the last two years,
    if it was just on-prem or why they felt the need to put that information in there.
    But 2 million is a huge number, so here we go.
    Docker is still holding relatively strong at 79% ,
    18% is container-d.
    But it’s worth noting that container-d is a runtime that Docker actually builds on top of.
    I’ll tell you more about that one later.
    and the last 4% is Cri-o.
    So I don’t know that there’s enough data here to determine
    whether DDockerocker is going to stay on top in the future,
    or something completely different will prevail
    that remains to be seen.
    But it’s interesting especially because of what’s been happening over the last few years,
    which we’ll get to that later.
    But now that I’ve introduced a few of these other container runtimes
    that exist out there besides Docker;
    it’s time to start talking about what a container actually is,
    and what Docker actually provides
    in order to appreciate the differences between them.
    So what exactly is Docker anyway?
    This is key.
    What Docker had over the players,
    the other players in the container game
    was this steadfast focus on commoditizing a complete solution,
    complete end-to-end solution that made it easy for developers to package,
    and deploy their applications.
    Once containers became easy to use we all witnessed the explosion of tools
    and resources around containers,
    and the Docker image format rose to become a de facto standard in the market.
    The stats I showed you from Sysdig are specific to container runtimes,
    and that terminology is important to remember.
    I’ll explain the pieces and parts involved in working with containers,
    and you’ll immediately understand why Docker sucked up the market so fast.
    So as users,
    let’s think about what we actually need to get our apps out there and running.
    Every Innovation that is coming out of this space
    is purely based on what users need or want.
    Whatever the motivation is behind it.
    If a user needs or wants something bad enough,
    there’s a huge opportunity for solution providers.
    That seems like such a common-sense thing to say,
    and that is maybe not worth saying
    but so often we can find ourselves getting so far down into the nitty-gritty details
    that we lose sight of the actual problem we’re trying to solve.
    And that of course leads to a ton of missed and overlooked opportunities.
    So, here’s a list of needs broken up into discrete features.
    First and foremost, we need that container itself,
    and some of you might be asking right now about virtual machines.
    Those are already out there,
    discussing the ends is out of the scope of this session.
    So I’m not going to go deep into the differences between VMs and containers,
    or why you would use one over the other.
    The one thing I’ll say is virtual machine is not synonymous with container;
    the biggest difference being that a VM includes an entire OS all to itself,
    and containers share the systems OS that they’re running on.
    The point of the container is to be lightweight
    and have the ability to move from one environment to another,
    seamlessly and quickly.
    That’s that I know that there are developments happening in the VM space,
    but that’s a topic for another time.
    So the rest of this list we need an image format to define a container.
    We need a way to build an image of a container,
    a way to manage images,
    a way to distribute and share container images,
    a way to create launch, and run a container environment,
    and a way to manage the lifecycle of the running containers.
    I didn’t even get to orchestration
    or anything but this is plenty to prove my point.
    So Docker was ready with an answer for everything.
    You want to start using containers, use Docker engine.
    Oh you need an image format, here’s the Docker image format.
    You need a way to build an image, use a Dockerfile and call Docker build.
    You want to manage images, call Docker images or Docker remove
    You want to share your images or use an image from someone else,
    call Docker push or Docker pull.
    Oh and by the way, we have Docker Hub where you can store, and share your images.
    You need a public, you need a way to launch,
    run, and manage your containers and their lifestyle.
    You can call Docker run, Docker stop Docker ps.
    Docker succeeded in quickly meeting the immediate needs
    of a hungry container market on top of that the tool sets
    that Docker provided made it all so easy.
    It was enough to walk away with a tremendous part of the market share
    all the way down the aisle.
    And by the way,
    it was really difficult finding a relevant banana picture for this slide.
    So I hope you appreciate this one.
    So, remember in our history lesson when I spoke about the open container initiative.
    Out of all of those features that we just talked about,
    there are two that were taken up for the cause by the OCI.
    The image format and the container runtime.
    Docker did quite a bit of reorganizing their code base,
    developing abstractions, pulling out just great functionality.
    They are a heavy contributor to the OCI.
    Giving the Docker V2 Image Spec as a basis for the OCI Image Spec and runC,
    which was contributed as a reference in full implementation of the OCI runtime spec.
    There are quite a few other container runtimes out there making waves
    including, containerd, rkt, cri-o and Kata
    all with various levels of features for specific use cases.
    It’s worth pointing out that
    containerd was actually contributed by Docker to the cloud native to CNCF
    and internaling uses runC. Containerd has also been integrated into Docker,
    and it’s been in use since Version 1.11, so quite a while now.
    So the next few years will be interesting
    to observe what happens with these specs and how the OCI moves forward.
    It doesn’t seem that they’re done yet.
    And there is quite a range of differing opinions
    about what should, and should not be in the standard for a container runtime.
    Lots of discussions going on about that.
    I’ve added a couple links here that are excellent,
    starting places to learn more about container runtimes, if you’re curious.
    The second one, is the beginning of a blog series by Ian Lewis.
    He is a Google dev advocate.
    The first subtitle in that blog is literally,
    why are containers runtime so confusing?
    Which I’d shamed, but in ride along with, yes, yes.
    Why is that?
    Anyway, he does a really good job of explaining some of the issues there.
    So,
    now that we understand all that Docker entails,
    and some of what’s going on in the market.
    Let’s focus on just the container itself
    and what that actually looks like on your system.
    I’ll show you how it’s stored and what is actually happening under the covers.
    You’ll discover pretty quickly
    that images and containers aren’t really all that magical.
    So, my first experience
    with containers was as a new developer on a project with a tight deadline.
    Of course, I could argue that sick-good description of most projects.
    The best course of action for me, was to just jump in and start getting something up
    and running on my local machine.
    I learn best by doing
    and the Docker documentation is actually really good.
    So if you find yourself in a similar position,
    I recommend going through their get started docs,
    which I share a link here for that.
    I’m going through that guide will get you,
    somewhat comfortable with a lot of the Docker commands that you’re going to need.
    The first thing to note,
    is that Docker image is just a tarball of a complete file system.
    When an image is actually unpacked on your system,
    it’s just thrown into its own directory,
    which becomes its root file system.
    The second, is that processes that are involved in running the containers
    are just regular Linux processes.
    There’s really nothing special about them.
    You could technically, you know,
    create and run containers without anything other than just
    calling the appropriate Linux commands.
    So namespaces are worth pointing out there.
    It’s really important ingredient
    because this is what is used to provide virtual separation between containers.
    This is how the process is inside a container,
    don’t interfere with the processes inside another container.
    Here you can see some of the names faces that were set up for a postgres container
    that I have running on my box.
    The cgroups functionality is integral to
    constraining how much a container can use things,
    like CPU, memory, network bandwidth, etc…
    I can set these constraints by including options on the Docker run command
    when I’m launching an image.
    You can see that in this particular one.
    I’ve constrained the memory usage limit on one of my containers.
    And this didn’t use to show the correct of thing, the correct limit.
    So it looks like that’s been fixed on a later version of Docker here.
    So, I want to quickly gloss over some file system details,
    where containers and images are actually stored on your file system.
    First off, after you’ve installed Docker running the the command Docker info
    will spit out a bunch of information about your installation,
    including your Docker root directory,
    which I have noted here.
    This is where most everything you’re going to care about regarding your Docker images
    and containers will be stored.
    Note that if you’re on a Mac,
    your containers are actually running in a tiny Linux VM.
    So you’re going to need to use a screen, or something to get in there
    and to actually get to the Docker root directory to check it out.
    And if you’re not familiar with how to use the screen command,
    definitely Google that and get familiar with that first.
    It’ll mess up your text display pretty good if you don’t
    enter and exit screen the right way.
    A little frustrating if you haven’t done it before.
    This slide shows how you can get information about the images
    that you have stored on your system.
    First, I listed my available images using the Docker images command.
    I actually have several installed
    but I only list the first couple here to display.
    Using the Docker inspect command.
    I can inspect any image I like, using its image ID.
    Excuse me, This will spit out a ton of information.
    But what I want to highlight here is the graph driver section,
    which contains the paths to the directories where all of the layers
    that belong to this image live.
    So, Docker images are composed of layers,
    which represent instructions in the Docker file
    that was used to build the image originally.
    These layers actually translate into directories,
    and the layers can be shared across images in order to save space.
    The lowerdir, mergedir, and upperdir sections are important.
    The lowerdir contains all of the directories or layers
    that were used to build the original image.
    And these are all read only.
    The upperdir contains all of the content
    that has been modified while the container is running.
    It’s important to remember that this is ephemeral data,
    and only lives as long as the container lives.
    In fact if you have data that you intend to keep
    you should utilize the volume features of Docker,
    and mount a location that will stick around after the container dies.
    This is how most containers running a database, are run.
    The mergedir, is like a virtual directory
    that combines everything from lowerdir and upperdir and workdir as well.
    The workdir is like an internal directory.
    So this slide shows..
    I actually have a few containers running on my system,
    two of them are my local JFrog container registry installation,
    and that includes a container for artifactory,
    and another container for a postgres database.
    The other is a simple test container that I was playing around with.
    Actually clean my system up quite a bit
    just so that I can get some clean screen shots here.
    I had a ton of images and containers running before this.
    Anyway, note that the container IDs of the running containers,
    match up with the containers subdirectory names,
    and then something else to remember here.
    Oh, if you stop a container
    that corresponding directory doesn’t go away
    until the container is actually removed.
    With a Docker remove command.
    So if you’ve stopped,
    if you have stopped containers lying around that never get cleaned up,
    you might see your available space start to dwindle.
    There’s a Docker prune command
    that you can use every now and then to help clean things up.
    So the tool sets around building and running images and containers
    have made things so easy,
    that it’s also easy to shoot yourself in the foot in a few places.
    Here are 3 of the most common gotchas that I run into,
    and that I ran into almost immediately
    when I first started working with containers.
    The first is,
    running a containerized application as the root user.
    I’ll be honest here,
    when I was initially getting containers up and running.
    I was so excited about how well it was working
    that I didn’t take this very seriously.
    I heard it but I didn’t really, you know, it didn’t land.
    So now that you know that process is inside a running container
    are just like any other process on the system.
    I’ll be at a few constraints.
    It’s scary now to run as root inside a container.
    Doing that opens up the possibility of
    a process escaping the intended confines of the container,
    and gaining access to the host resources.
    Exactly what we don’t want to happen.
    The best thing to do, is to create a user,
    and use the user command inside of the Dockerfile when the container’s built,
    in order to run processes is that user.
    There is a way to specify a user when the Docker run command is used,
    but that leaves open the possibility of forgetting to do that.
    It’s kind of nice if the image is just set up by default,
    not to run as root.
    Also pay attention to official images you pull from Docker Hub,
    whether or not they run as root,
    or if they leave that up to you to figure out.
    So even though Docker provides you with the ability
    to set resource limits on your container.
    It doesn’t automatically do it for you.
    In fact, the default settings are a free for all with no limits anywhere.
    So make sure you understand the resource needs of your application,
    too little your container will die from starvation,
    too much the container could smother others on the system.
    The resource usage of your containers is also something
    that you’re going to want to monitor over time and adjust as needed.
    It’s a good way to determine if something is going wrong,
    or if load on your system has changed for some reason.
    So this is a pretty big security issue.
    It’s easy to get complacent,
    and not pay attention to what is actually getting pulled in when you build images.
    Not only do you need to beware of outdated versions
    that you specify in the dockerfile,
    but you need to pay attention to what’s in the base image is coming from.
    Not updating packages in libraries inside your container
    can lead to some embarrassing results,
    especially, when there are tools available now
    to alert you when security issues have been discovered with specific artifacts.
    So speaking of tools that help manage images;
    JFrog artifactory supports Docker registries and images.
    You can use it just like you do for other types of artifacts
    both as a cash for a third party based images,
    and as your own internal registry.
    After uploading your Docker images, you have the ability to gather statistics,
    and even drill down into each layer of an image for more information.
    If you’ve integrated a CICD solution,
    you can also determine what build produced a particular image or what build used it.
    We discussed earlier the problems with not updating images regularly,
    JFrog x -ray is a security scanning tool
    that will alert you if there are any known security vulnerabilities,
    or even licensing issues with your artifacts.
    For a Docker images it’s especially useful,
    because it has the ability to drill down
    into the layers of the image to find out exactly what library,
    or package has been flagged as a problem.
    You have control over how sensitive to make these alerts,
    and what actions to take when they’re triggered
    Whether to fill a build, prevent a download, or simply send a notification about the problem.
    All right.
    We’ve come to the end of our time here.
    So thank you for coming everyone.
    I hope you enjoyed this session,
    that you got something out of it to take back to your teams,
    and feel free to reach out to me with any questions you have.
    Enjoy the rest of swamp up online.
    English
    All
    Docker
    Listenable