OK everyone, it's time for the fourth and final installment of the DevOps Detour series. We're already several months out from when these events actually took place, but nonetheless, I want to close out this series.
Today, the matter concerning my existence is how one goes about making their Docker images more secure — and the fallout from trying to do that.
The third part of the DevOps Detour was all concerned with upgrading
ingress-nginx, and how I thought the
cert-manager upgrade would take forever when it ended up being the easy one of the two.
We're staying on the theme of "upgrading" today, but just shifting it slightly from upgrading services to upgrading how we handle images themselves. The topics of today are distroless images, multi-stage builds (and how caching them sucks), and Docker BuildKit (and how it makes caching suck).
For the longest time, all of the services for uFincs used
node:14-alpine images. Alpine images have long been my default image type of choice (as they have for a lot of people) mainly due to their smaller sizes.
However, that all changed when the Distroless attacked.
I don't remember where I first learned about distroless images (probably just one of the GCP security docs), but these things are damn cool. Whereas Alpine images run... well, Alpine Linux, and are generally just smaller, Distroless images run Debian but with basically nothing included. Like, there's no shell tools like
ls. Heck, there's not even a shell, period.
The only thing the image includes are the runtime dependencies necessary for the app — in my case, using the Node images, that means
node itself. Notably, it doesn't include
npm ("lol wut", you might be thinking, but we'll get back to this in a bit).
And why might one want to use these severely limited images? As you might have guessed, it's all in the name of security.
By drastically reducing the attack surface area, attackers have much less room to work with. Think about it: if an attacker were to directly compromise one of our services, what on Earth are they going to do if they don't even have a shell?? They'd have better luck trying to social engineer their way into my GCP account. And good luck with that!
Having our services this locked down is important because our greatest security risk is if the service serving the uFincs app files (aka our Frontend service) becomes hacked and starts serving files that compromise the cryptographic integrity of a user's data. Combined with using non-root users and a read-only file system (among other things), it would take a major security vulnerability to directly compromise our Frontend (or any of our services).
"This is all well and good (you can never have enough security, right?), but what about
npm? How do you install anything without it?"
A very good question. The answer? Multi-stage builds.
Multi-Stage Docker Builds
At this point in Docker's life, multi-stage builds are nothing new or innovative. However, I hadn't yet adopted them, so they were pretty "new" to me (not because I didn't know what they were — just that I hadn't found a use for them yet).
And why hadn't I adopted them yet? Well, in most of the documentation/articles I'd read around multi-stage builds, it was usually in the context of compiled languages. It was examples like Go where you could create a compiled binary file in the first Docker stage and then just copy it over and execute it in the second stage.
node executable to actually run anything. And since you'd need
node, you'd need all the tooling that is required to first install
node in the first place (e.g. a shell), which just seemed like too much work for too little benefit.
It only 'clicked' once I read through some of the Distroless examples. By using a multi-stage build, you could use a regular Node image in the first stage (in my case, still
node:14-alpine) to install
node_modules and do any compilation/linting/testing steps (in my case, 'compilation' because of TypeScript), and then copy over the code and
node_modules into a Distroless image in the second stage, where the app server would then be directly executed by
node. Since we're copying over the installed
node_modules as well, that solves the problem of not having
npm in the Distroless image!
Well, mostly. The only slightly annoying thing with not having
npm in the final Distroless image is that this also meant not being able to run our
npm scripts. Whereas the app server is just an
index.js file that can be directly executed by
npm scripts were... slightly more complicated, since they called out to installed dependencies in
This was really only noteworthy because our database migration script was an
npm script, and it had to be run using the production (i.e. Distroless) image because it was run during our CI/CD pipeline.
So what'd I have to do? Well, I couldn't write a shell script to replace it since... there was no shell. So I ended up having to write a
node script that would call out to the other
node_modules executables. It was... not pretty. But it worked!
Another side effect of using multi-stage builds (and distroless images and not having
npm...) was that we now had different images for running in development and running in production. We kept a
node:14-alpine image for development since we still needed all of our dev tooling available for.. development. This is obviously less than ideal as far as a "prod/dev" mirroring strategy goes, but since we have the wonder that is per-branch deployments (courtesy of Kubails), we at least still have a 'staging' environment that damn near mirrors the production environment for testing.
And with that, everything was great! We now had much more secure production images so that no one would ever hack us. There was absolutely nothing wrong with this scheme. Nope, none at all...
Caching Multi-Stage Builds
In hindsight, this is plainly obvious, but in the moment, it was a real pain in the ass.
See, with multi-stage Docker builds, each stage is itself an image. But if you want to use regular image caching (i.e. specify an already-built image to use as the basis for building a new image), then you have to store images from every stage. Unfortunately, you can't just store the final (distroless) image and expect that it somehow has the layers from the first stage and will speed up its build. And since the first stage is the one that takes 99% of the build time, it's doubly unfortunate.
As such, I had to modify our build pipeline to push and pull the first stage images to make sure they get the cache speedup. Otherwise, each build would end up being 30+ minutes.
However, this still wasn't the end of it. As part of this overall "upgrading" process, I also ended up upgrading parts of Kubails — one of which was Docker itself. And it seemed like Docker decided to introduce a new "build engine" that would throw me for a loop.
Let me introduce to you: Docker BuildKit.
As Docker puts it, BuildKit is a new "builder backend". Among other things, they say they introduced it because it improves image build times through the use of parallelism.
Quite frankly, there's a whole bunch of other changes that BuildKit introduces to the Docker ecosystem, but I want to cut straight through all of that. What's most important for me, here and now, is the fact that enabling BuildKit is done by adding the
DOCKER_BUILDKIT=1 environment variable to any Docker commands (e.g.
DOCKER_BUILDKIT=1 docker build ...) and that images can't be used for caching by default when doing that.
I don't know who decided it or why (it was frustrating enough to figure out how this worked the first time that I didn't want to look into it any further), but if you want an image to be useable as a cache image in a future build (i.e. as a
--cache-from argument), you need to add
--build-arg=BUILDKIT_INLINE_CACHE=1 when building the image. That is, a minimal build command now looks like this:
DOCKER_BUILDKIT=1 docker build --build-arg=BUILDKIT_INLINE_CACHE=1 .
As I understand it, you need to explicitly include the 'cache information' as part of the image, otherwise using an image built with BuildKit (but without the build arg) will do literally nothing when specified under
Something else worth noting is that, normally, you can just specify the
--cache-from image and Docker will handle pulling the image for you. However, and I don't know if this was just a bug at the time or something, but it would not work for me. Running in GCP Cloud Build with GCP Container Registry (notably, not Artifact Registry), I had to explicitly pull each
--cache-from image before it could be used as a cache image; the build would just error out otherwise (I don't remember what the error was, sorry). So that was also a royal pain to debug.
Hopefully, I just saved you many hours of Googling and rebuilding images.
Questions you might be having:
"Why don't you just not enable BuildKit? Won't images work as cache images like before you upgraded Docker?"
I thought so, and you would think so, but they didn't seem to. I don't know if it was just because I happened to upgrade Docker at the same time that I started using multi-stage builds, and that I was just doing something wrong, but I couldn't get the old image caching behaviour to work.
Anywho, once I had BuildKit somewhat figured out, the multi-stage builds were finally being cached properly, which means we could finally enjoy all the wonderful security benefits of Distroless images.
Honestly, if I had known how much tinkering it would take to get the whole multi-stage build process working, I probably would have just settled for just non-root users and read-only filesystems. I feel like that would have gotten us like 80% of the security benefits without having to completely retool a large portion of our build pipeline. But you know what they say: defense in depth!
And with that, the DevOps Detour series has finally concluded. We've improved our backup system, improved the security of our Kubernetes cluster and services, and got all fancy with our Docker images. These improvements give just that little bit of peace of mind that maybe things won't horribly blow up in the middle of the night. Maybe.
Till the next (non DevOps) time.