Unpacking the Container: A Deep Dive into Virtualized Container Technology [swampUP 2020]

July 20, 2020

2 min read

What are containers? Check out: https://jfrog.com/knowledge-base/what…

What is Docker? Check out: https://jfrog.com/knowledge-base/what…

Containers have become integral to every phase in the lifecycle of application development. Production grade orchestration tools such as Kubernetes have been built to manage them and container platforms like Docker are becoming commonplace in both testing and development. Web tutorials on how to build and manage simple Docker images abound! But what are containers exactly and why have they become so essential to the DevOps ecosystem? This talk is for those curious minds who want to look below the surface and really understand the mechanics of a technique that has actually been around longer than you may think. Where did Docker come from? What about other projects in the container ecosystem – are there alternatives? What does a Docker image actually look like on the filesystem? How do Docker image layers work? What are cgroups? How are system resources allocated and managed and are there any gotchas that you should be aware of? What about security? How can JFrog Container Registry help me manage my Docker images? After this talk, you will have a solid understanding of the what, how & why of virtualized container technology.

Speakers

Melissa McKay

Melissa is a long-time developer/software engineer turned international speaker and is currently a Developer Advocate on the JFrog Developer relations team, sharing in the mission to improve the developer experience with DevOps methodologies. Her background and experience as a software engineer span a slew of languages, technologies, and tools used in the development and operation of enterprise products and services. She is a mom, Java Champion, Docker Captain, co-author of the upcoming book DevOps Tools for Java Developers, a huge fan of UNconferences, and is always on the lookout for ways to grow and learn. She has spoken at Kubecon, DockerCon, CodeOne, JFokus, Java Dev Day Mexico, the Great International Developer Summit, and is part of the JCrete and JAlba UNconference teams. Given her passion for teaching, sharing, and inspiring fellow practitioners, you are likely to cross paths with her in the conference circuit — both online and off!

Video Transcript

Hi everyone, welcome to swampUP
and welcome to this talk on unpacking the container.
I want to get started quickly
because I’m excited to share with you
some of the things that I’ve learned about containers,
which of course are still all the rage right now.
My hope is that you will come away from this talk with the better understanding
of how containers actually work on your system,
and some of what’s really going on under the covers.
A QA for this session will be done at the same time in the chat.
So take advantage of the next 30 minutes and get your questions out there.
If I’m not able to get to them during the session.
I’ll make a point to follow up with you afterward.
So a little bit about me, my name is Melissa McKay.
I just started with JFrog as a developer advocate in February of this year.
I come from a developer background
and I’ve been a developer in some way,
shape, or form over the last 20 years.
Most of my experience has been in server-side development and in Java,
but I’ve had the privilege of working on many different teams
over the years in a variety of different technologies,
languages, and differential sets.
I think most of you know;
it’s not very easy these days to just stay in one language.
So getting a polyglot experience has been a wild ride for me.
I’ve been on large teams, small teams, in big companies, small companies.
I’ve had my share of frustrations and successes
and along the way I discovered a passion for diving in deep to understand things,
sharing what I’ve learned,
and hopefully improving processes along the way.
I started speaking a few years ago
and I decided that was something that I wanted to do more of.
So, I threw my hat into the rain to become a dev advocate with JFrog.
Clearly that all worked out and I’m here with you today at swampUP.
So I feel privileged to be here with you
and I’m super excited to talk with all of you today about virtual containers.
So let’s go ahead and get started.
I’ll start out with a brief history to give some background context.
Hopefully, it won’t be too boring.
But there’s definitely some Milestones
that have happened in the past that we should go over.
So it better explains how we got to where we are today,
then we’ll take a look at the container market,
that’s interesting to see what’s been going on over the past few years,
and it’ll be interesting to see what the next few years brings us.
Then we’ll move into getting a real understanding of what Docker actually is;
Docker is a very overloaded word term
that is used and thrown around quite a bit.
So we’ll clear that up.
After that, we’ll be in an excellent place to talk about
what a container actually is,
and then we’ll review a few common container gotchas
just a few simple things to avoid,
stuff that I experienced right away when I started.
Finally, I’ll leave you with an option for managing your images.
So without further ado.
Let’s jump in and start learning about containers.
Now I know some of you are already wondering
if you’re in the right place because that’s not the picture
that you were expecting I’m sure.
I know the classic shipping container photo is a pretty much expected,
but there’s actually a couple reasons I chose to show bananas here.
First and foremost,
I’m tired of seeing shipping containers on every presentation about Docker,
or containerization in general.
So I started a rebellion;
you’re not going to see any shipping containers in this presentation.
Second, this is really a story about how our industry has adapted
to dealing with limited resources over time,
and bananas reminds me of a story
that my grandfather would repeatedly tell me when I was growing up,
and it went like this:
things were a lot different for him as a kid than for me.
I think every generation says that to the next,
he continued on to share that when he was a kid,
he would get a banana once a year on Christmas.
This must have been during the 20s and 30s,
bananas were such a treat at that time
that none of that banana would go to waste.
He and his siblings would take a fork
and scrape the banana peel to get every last bit of banana off
because there likely wasn’t going to be another one until next year.
So maybe it isn’t the best analogy,
but I like in that story to how computing resources were in the 1960s and 1970s.
I know that’s reaching but hey,
very limited and very expensive,
on top of that it took forever to get stuff done.
Often a computer would be dedicated for a long period of time to a single task,
for a single user.
Obviously the limits on time and resources created bottlenecks, and inefficiency.
Just being able to share was not enough either,
there needed to be a way to share without getting in each other’s way,
or having one person inadvertently causing the entire system to crash for everyone.
So, once again necessity is the mother of invention
and the need for better strategies
and sharing compute resources actually started a path of innovation,
that we see massive benefits from today.
There are some key points in time
that brought us to this state we are in today.
I’m going to begin this lesson with Chroot.
So Chroot was born in 1979,
during the development of the 7th edition of Unix
and was added to BSD in 1982,
being able to change is the apparent root directory
for a process and its children,
results in a bit of isolation in order to provide an environment
for say testing a different distribution, for example.
So Chroot was a great idea, solved some specific problems,
but more than that was needed.
So in 2000, the jail command was introduced by FreeBSD.
Jail is a little more sophisticated
than Chroot and that it includes additional features
to help with further isolation of file systems,
users, and networks with the ability to assign an IP address to each jail.
In 2004 Solaris Zones brought us ahead even further by giving an application,
full user process, and file system space and access to system hardware.
Solaris Zones also introduced the idea of being able to snapshot a filing system,
which you’ll see is pretty important.
In 2006 Google jumped in with their process containers.
Those were later renamed to cgroups,
these centered around isolating and limiting the resource usage of a process.
This is huge.
Moving right along in 2008,
c groups were merged into the Linux Kernel which along with Linux namespaces
led to IBM’s development of Linux containers.
At 2013 was a big year.
Docker came on the scene bringing their ability
to package containers and move them from one environment, to another.
The same year Google open sourced their ”let me container
that for you” project which provided applications the ability to create
and manage their own sub containers.
From here, we saw the use of containers and Docker specifically,
absolutely explode.
In 2014 Docker chose to swap out their use of the LXE toolset
for launching containers with lidcontainer in order to utilize a native golang solution.
And that was something that I didn’t know when I first started is that,
Docker’s actually written in Go.
Almost done with this history lesson,
because from here on out you would just start seeing a ton of names of projects,
organizations specs, etc..
That are just confusing if you don’t have a better understanding of how containers work,
which is the point of this session.
And this last event, however,
June 2015 is important enough to bring up,
it’s included because it will give you some insight
into some of the activity and motivations behind shifts in the market.
The open container project / initiative was established.
This is an organization under the Linux foundation
and it includes members from many major stakeholders.
Including Docker which was really important
with the goal of creating open standards for container runtimes,
and image specification.
So that’s it for the history lesson.
Let’s take a look at what’s been going on in the market recently
concerning container runtimes.
So I did a little hunting,
and I found that for the last three years
Sysdig a company that provides a really powerful monitoring
and troubleshooting tool for Linux;
has put out a container report based on the analysis of their own users.
Part of the report includes data on container runtimes that are in use.
In 2017, they analyzed data from 45,000 containers.
There’s no graph available for that because 99% of those were Docker
so they didn’t feel the need to split up the results.
In 2018, however,
they doubled their sample size to 90,000 containers.
and as you can see 83% is Docker,
12% is CoreOS RKT containers, 4% Mesos containerizer,
and 1% LXC.
It looks like other container runtimes
maybe are encroaching a little bit on Docker.
So moving on to 2019.
This is the latest Sysdig container report
and this included stats from over 2 million containers.
They did state the included data from both their SaaS and on-prem users.
I’m not really sure the last two years,
if it was just on-prem or why they felt the need to put that information in there.
But 2 million is a huge number, so here we go.
Docker is still holding relatively strong at 79% ,
18% is container-d.
But it’s worth noting that container-d is a runtime that Docker actually builds on top of.
I’ll tell you more about that one later.
and the last 4% is Cri-o.
So I don’t know that there’s enough data here to determine
whether DDockerocker is going to stay on top in the future,
or something completely different will prevail
that remains to be seen.
But it’s interesting especially because of what’s been happening over the last few years,
which we’ll get to that later.
But now that I’ve introduced a few of these other container runtimes
that exist out there besides Docker;
it’s time to start talking about what a container actually is,
and what Docker actually provides
in order to appreciate the differences between them.
So what exactly is Docker anyway?
This is key.
What Docker had over the players,
the other players in the container game
was this steadfast focus on commoditizing a complete solution,
complete end-to-end solution that made it easy for developers to package,
and deploy their applications.
Once containers became easy to use we all witnessed the explosion of tools
and resources around containers,
and the Docker image format rose to become a de facto standard in the market.
The stats I showed you from Sysdig are specific to container runtimes,
and that terminology is important to remember.
I’ll explain the pieces and parts involved in working with containers,
and you’ll immediately understand why Docker sucked up the market so fast.
So as users,
let’s think about what we actually need to get our apps out there and running.
Every Innovation that is coming out of this space
is purely based on what users need or want.
Whatever the motivation is behind it.
If a user needs or wants something bad enough,
there’s a huge opportunity for solution providers.
That seems like such a common-sense thing to say,
and that is maybe not worth saying
but so often we can find ourselves getting so far down into the nitty-gritty details
that we lose sight of the actual problem we’re trying to solve.
And that of course leads to a ton of missed and overlooked opportunities.
So, here’s a list of needs broken up into discrete features.
First and foremost, we need that container itself,
and some of you might be asking right now about virtual machines.
Those are already out there,
discussing the ends is out of the scope of this session.
So I’m not going to go deep into the differences between VMs and containers,
or why you would use one over the other.
The one thing I’ll say is virtual machine is not synonymous with container;
the biggest difference being that a VM includes an entire OS all to itself,
and containers share the systems OS that they’re running on.
The point of the container is to be lightweight
and have the ability to move from one environment to another,
seamlessly and quickly.
That’s that I know that there are developments happening in the VM space,
but that’s a topic for another time.
So the rest of this list we need an image format to define a container.
We need a way to build an image of a container,
a way to manage images,
a way to distribute and share container images,
a way to create launch, and run a container environment,
and a way to manage the lifecycle of the running containers.
I didn’t even get to orchestration
or anything but this is plenty to prove my point.
So Docker was ready with an answer for everything.
You want to start using containers, use Docker engine.
Oh you need an image format, here’s the Docker image format.
You need a way to build an image, use a Dockerfile and call Docker build.
You want to manage images, call Docker images or Docker remove
You want to share your images or use an image from someone else,
call Docker push or Docker pull.
Oh and by the way, we have Docker Hub where you can store, and share your images.
You need a public, you need a way to launch,
run, and manage your containers and their lifestyle.
You can call Docker run, Docker stop Docker ps.
Docker succeeded in quickly meeting the immediate needs
of a hungry container market on top of that the tool sets
that Docker provided made it all so easy.
It was enough to walk away with a tremendous part of the market share
all the way down the aisle.
And by the way,
it was really difficult finding a relevant banana picture for this slide.
So I hope you appreciate this one.
So, remember in our history lesson when I spoke about the open container initiative.
Out of all of those features that we just talked about,
there are two that were taken up for the cause by the OCI.
The image format and the container runtime.
Docker did quite a bit of reorganizing their code base,
developing abstractions, pulling out just great functionality.
They are a heavy contributor to the OCI.
Giving the Docker V2 Image Spec as a basis for the OCI Image Spec and runC,
which was contributed as a reference in full implementation of the OCI runtime spec.
There are quite a few other container runtimes out there making waves
including, containerd, rkt, cri-o and Kata
all with various levels of features for specific use cases.
It’s worth pointing out that
containerd was actually contributed by Docker to the cloud native to CNCF
and internaling uses runC. Containerd has also been integrated into Docker,
and it’s been in use since Version 1.11, so quite a while now.
So the next few years will be interesting
to observe what happens with these specs and how the OCI moves forward.
It doesn’t seem that they’re done yet.
And there is quite a range of differing opinions
about what should, and should not be in the standard for a container runtime.
Lots of discussions going on about that.
I’ve added a couple links here that are excellent,
starting places to learn more about container runtimes, if you’re curious.
The second one, is the beginning of a blog series by Ian Lewis.
He is a Google dev advocate.
The first subtitle in that blog is literally,
why are containers runtime so confusing?
Which I’d shamed, but in ride along with, yes, yes.
Why is that?
Anyway, he does a really good job of explaining some of the issues there.
So,
now that we understand all that Docker entails,
and some of what’s going on in the market.
Let’s focus on just the container itself
and what that actually looks like on your system.
I’ll show you how it’s stored and what is actually happening under the covers.
You’ll discover pretty quickly
that images and containers aren’t really all that magical.
So, my first experience
with containers was as a new developer on a project with a tight deadline.
Of course, I could argue that sick-good description of most projects.
The best course of action for me, was to just jump in and start getting something up
and running on my local machine.
I learn best by doing
and the Docker documentation is actually really good.
So if you find yourself in a similar position,
I recommend going through their get started docs,
which I share a link here for that.
I’m going through that guide will get you,
somewhat comfortable with a lot of the Docker commands that you’re going to need.
The first thing to note,
is that Docker image is just a tarball of a complete file system.
When an image is actually unpacked on your system,
it’s just thrown into its own directory,
which becomes its root file system.
The second, is that processes that are involved in running the containers
are just regular Linux processes.
There’s really nothing special about them.
You could technically, you know,
create and run containers without anything other than just
calling the appropriate Linux commands.
So namespaces are worth pointing out there.
It’s really important ingredient
because this is what is used to provide virtual separation between containers.
This is how the process is inside a container,
don’t interfere with the processes inside another container.
Here you can see some of the names faces that were set up for a postgres container
that I have running on my box.
The cgroups functionality is integral to
constraining how much a container can use things,
like CPU, memory, network bandwidth, etc…
I can set these constraints by including options on the Docker run command
when I’m launching an image.
You can see that in this particular one.
I’ve constrained the memory usage limit on one of my containers.
And this didn’t use to show the correct of thing, the correct limit.
So it looks like that’s been fixed on a later version of Docker here.
So, I want to quickly gloss over some file system details,
where containers and images are actually stored on your file system.
First off, after you’ve installed Docker running the the command Docker info
will spit out a bunch of information about your installation,
including your Docker root directory,
which I have noted here.
This is where most everything you’re going to care about regarding your Docker images
and containers will be stored.
Note that if you’re on a Mac,
your containers are actually running in a tiny Linux VM.
So you’re going to need to use a screen, or something to get in there
and to actually get to the Docker root directory to check it out.
And if you’re not familiar with how to use the screen command,
definitely Google that and get familiar with that first.
It’ll mess up your text display pretty good if you don’t
enter and exit screen the right way.
A little frustrating if you haven’t done it before.
This slide shows how you can get information about the images
that you have stored on your system.
First, I listed my available images using the Docker images command.
I actually have several installed
but I only list the first couple here to display.
Using the Docker inspect command.
I can inspect any image I like, using its image ID.
Excuse me, This will spit out a ton of information.
But what I want to highlight here is the graph driver section,
which contains the paths to the directories where all of the layers
that belong to this image live.
So, Docker images are composed of layers,
which represent instructions in the Docker file
that was used to build the image originally.
These layers actually translate into directories,
and the layers can be shared across images in order to save space.
The lowerdir, mergedir, and upperdir sections are important.
The lowerdir contains all of the directories or layers
that were used to build the original image.
And these are all read only.
The upperdir contains all of the content
that has been modified while the container is running.
It’s important to remember that this is ephemeral data,
and only lives as long as the container lives.
In fact if you have data that you intend to keep
you should utilize the volume features of Docker,
and mount a location that will stick around after the container dies.
This is how most containers running a database, are run.
The mergedir, is like a virtual directory
that combines everything from lowerdir and upperdir and workdir as well.
The workdir is like an internal directory.
So this slide shows..
I actually have a few containers running on my system,
two of them are my local JFrog container registry installation,
and that includes a container for artifactory,
and another container for a postgres database.
The other is a simple test container that I was playing around with.
Actually clean my system up quite a bit
just so that I can get some clean screen shots here.
I had a ton of images and containers running before this.
Anyway, note that the container IDs of the running containers,
match up with the containers subdirectory names,
and then something else to remember here.
Oh, if you stop a container
that corresponding directory doesn’t go away
until the container is actually removed.
With a Docker remove command.
So if you’ve stopped,
if you have stopped containers lying around that never get cleaned up,
you might see your available space start to dwindle.
There’s a Docker prune command
that you can use every now and then to help clean things up.
So the tool sets around building and running images and containers
have made things so easy,
that it’s also easy to shoot yourself in the foot in a few places.
Here are 3 of the most common gotchas that I run into,
and that I ran into almost immediately
when I first started working with containers.
The first is,
running a containerized application as the root user.
I’ll be honest here,
when I was initially getting containers up and running.
I was so excited about how well it was working
that I didn’t take this very seriously.
I heard it but I didn’t really, you know, it didn’t land.
So now that you know that process is inside a running container
are just like any other process on the system.
I’ll be at a few constraints.
It’s scary now to run as root inside a container.
Doing that opens up the possibility of
a process escaping the intended confines of the container,
and gaining access to the host resources.
Exactly what we don’t want to happen.
The best thing to do, is to create a user,
and use the user command inside of the Dockerfile when the container’s built,
in order to run processes is that user.
There is a way to specify a user when the Docker run command is used,
but that leaves open the possibility of forgetting to do that.
It’s kind of nice if the image is just set up by default,
not to run as root.
Also pay attention to official images you pull from Docker Hub,
whether or not they run as root,
or if they leave that up to you to figure out.
So even though Docker provides you with the ability
to set resource limits on your container.
It doesn’t automatically do it for you.
In fact, the default settings are a free for all with no limits anywhere.
So make sure you understand the resource needs of your application,
too little your container will die from starvation,
too much the container could smother others on the system.
The resource usage of your containers is also something
that you’re going to want to monitor over time and adjust as needed.
It’s a good way to determine if something is going wrong,
or if load on your system has changed for some reason.
So this is a pretty big security issue.
It’s easy to get complacent,
and not pay attention to what is actually getting pulled in when you build images.
Not only do you need to beware of outdated versions
that you specify in the dockerfile,
but you need to pay attention to what’s in the base image is coming from.
Not updating packages in libraries inside your container
can lead to some embarrassing results,
especially, when there are tools available now
to alert you when security issues have been discovered with specific artifacts.
So speaking of tools that help manage images;
JFrog artifactory supports Docker registries and images.
You can use it just like you do for other types of artifacts
both as a cash for a third party based images,
and as your own internal registry.
After uploading your Docker images, you have the ability to gather statistics,
and even drill down into each layer of an image for more information.
If you’ve integrated a CICD solution,
you can also determine what build produced a particular image or what build used it.
We discussed earlier the problems with not updating images regularly,
JFrog x -ray is a security scanning tool
that will alert you if there are any known security vulnerabilities,
or even licensing issues with your artifacts.
For a Docker images it’s especially useful,
because it has the ability to drill down
into the layers of the image to find out exactly what library,
or package has been flagged as a problem.
You have control over how sensitive to make these alerts,
and what actions to take when they’re triggered
Whether to fill a build, prevent a download, or simply send a notification about the problem.
All right.
We’ve come to the end of our time here.
So thank you for coming everyone.
I hope you enjoyed this session,
that you got something out of it to take back to your teams,
and feel free to reach out to me with any questions you have.
Enjoy the rest of swamp up online.
English
All
Docker
Listenable