BattlefyBlogHistoryOpen menu
Close menuHistory

Solving supply chain attacks once and for all

Ronald ChenJanuary 17th 2022

Software dependency supply chains issues are becoming more frequent. Some notable examples are:

There are various upstream root causes at play here, but all these issues can be mitigated with a single idea. This single idea can be applied to all projects which one has control today and does not require any upstream dependency changes.

Key idea: Mitigate all supply chain issues with a reproducible build

Why reproducible builds matter

A reproducible build is one where given a source repository revision, we can always produce the same binary identical build. This must hold true even across different computers, platforms and even time.

This allows us to track down bugs and avoids the dreaded “works on my computer”. With a reproducible build any bug report should be replicable on any other computer given the same replication steps and data.

Reproducible build prevents supply chain issues as becomes impossible for upstream dependency changes to affect builds.

How make a build reproducible

All build instability is caused by inexact specification of dependencies. What does this actually look like? Let’s take a look at an example building a website. Modern websites have a build process mainly to bundle JavaScripts together. Consider even the most simple React app built with Webpack, we could have the two direct dependencies, source code/npm dependencies and the execution environment.

Reproducible npm dependencies

For a bare minimum React app built with Webpack, we would have the following npm dependencies:

It’s kinda sad I even need to write this section, but there is nuance for something as simple as specifying a version in npm. The problem is the versioning scheme semver (semantic versioning) is inexact.

The original intent with semver was to make dependency upgrade easier by using the version number to define what is compatible. By default when installing a package using npm, the version would be something like ^2.2.1, means “all versions of the form _2.x.y_ that is greater than _2.2.1_”. The next npm install, ^2.2.1 allows the latest 2.9.0 to be downloaded. By definition this makes an irreproducible build, as a new release of any dependency using ^ will likely result in a different binary output.

Unfortunately, even removing the ^ does not make the build reproducible. The silly thing with npm is, even if one pins the direct dependency to an exact version, the transitive dependency is unstable if it is specified using ^. For example, here are the dependencies Webpack defines, which are transitive dependencies. Every time one runs npm install, one could be installing a different version of the transitive dependency! Who knows if this will subtly change the output or not.

This is one of the first problems, we need to ensure npm install is stable. We can do this in one of two ways. Either only npm install once and copy node_modules around (some projects check-in node_modules, but this causes other problems with downloaded binaries for different architectures) or use package-lock.json.

The way package-lock.json works is when a new dependency is installed, package-lock.json remembers what was downloaded. When npm install is used, it first checks to see if package-lock.json exists. If it does, it downloads what was defined in package-lock.json and not whatever is latest. This ensures a stable reproducible set of dependencies!

Note, we kinda have a chicken and egg problem here. If npm install updates package-lock.json for new dependencies and package-lock.json is used in the next npm install, isn’t that circular? How are we going to get a stable build with a circular reference? This is where npm ci comes into play. It installs dependencies and does not update package-lock.json. In fact, it goes one step further and checks that the downloaded files match the hash saved in package-lock.json. This proves the dependencies are identical and can detect if dependencies have been changed (potentially catching malicious acts).

The vast majority of build instability is solved with proper uses of a package-lock.json. But it still does not guarantee a reproducible build. We need to ensure the build is run in a stable execution environment.

Reproducible execution environment

The execution environment is where the build script is running. Unfortunately, the execution environment does affect the output of a build and in order to make it binary perfect, we need to ensure we remove any variability out of the execution environment. In our example we are running Webpack. Here are some of the dependencies one might have in an execution environment when running Webpack.

  • nodenv? Homebrew? nvm? apt-get? NuGet? manual download?
  • bash? tcsh? ksh? zsh? fish? cmd? powershell?
  • Which binaries are on expected on path?
  • Mac? Windows? Linux?
  • Which libraries are available?
  • What updates have been applied?
  • x86? ARM? RISC-V? 32-bit? 64-bit?
  • Bare metal? GitHub Actions? Cloud? Virtual machine? Docker?

Again, which versions and variation? While possible, it is very difficult to make an execution environment reproducible. There are simply a lot of moving parts. There has been many efforts with configuration management tools like Ansible, Chef, Puppet, Salt and Terraform. These tools allow one to declaratively define how an operating system is configured, including all the settings, packages and even files.

Those tools are run directly on a bare metal machine to reproduce a configuration. Alternatively, a configuration can saved as an image with tools like Packer or Docker.

All this effort just to pin down the execution environment to simply to run Webpack? This is indeed overkill for most projects as an execution environment is an unlikely source of build variability. This will need to be considered on a case by case basis.

Bringing it all together

Given all the option, it’s hard to decide what to do. What I would recommend as a starting point is Docker + package-lock.json. Docker is widely supported and package-lock.json is essential. I would build a Docker image to an exact specification that is able to run Node.js, then bring it all together with a single command:

docker run --volume /users/Me/build-dist:/var/build/dist my-image@sha256:abcdef0123456789 build [git repo url] [git hash]

This is using the docker run command. The first part binds the local filesystem /users/Me/build-dist to the location /var/build/dist inside the Docker image. Then it is specifying the image exactly using a content-addressable identifier (the sha256 bit). This is necessary in order to specify the image version exactly as a tag like my-image:v4 is unstable. It is possible to re-publish a Docker image to an existing tag, which would violate reproducibility.

The actual command build is just a shell script:

set -xe
mkdir -p /var/build/dist
cd /var/build
git clone $gitRepoUrl
git checkout $gitHash
npm ci
npm run build
# npm run build outputs to /var/build/dist

After the command completes we can find the build output at /users/Me/build-dist since that was mounted to /var/build/dist inside Docker.

Functional programming aside: If one takes a step back, we can see a reproducible build is just a pure function. By definition of a pure function, given the same input we always get the same output. In this case, the input is a single reversion of a source tree, the pure function is the build script and the output is the build. This also gives us an hint on what to watch out for. A function is no longer pure when it has side-effects. Look for side-effects in the build script to find sources of irreproducibility. For example, npm resolution logic that attempts to “download the latest” is a side-effect. If all dependencies can be defined exactly and retrieved/constructed exactly, then it is pure.

Do you want to practise making reproducible builds? You’re in luck, Battlefy is hiring.


Powered by