Today was an extremely productive day at work because I got incremental Haskell builds working using Nix/Nixpkgs. I still need to polish up what I have a little bit, but this is pretty close to completion.
The background for this is that were two main approaches we were considering at work:
-
Approach 1: Add Nixpkgs support for incremental builds
… as documented here: https://jade.fyi/blog/nixcon2022-retrospective/
-
Approach 2: Use
ghc-nix
I initially tried the second approach (ghc-nix) since it seemed promising and generalized better, which I covered in the following previous posts:
However, the performance bottleneck ended up being a real problem, so I switched to approach 1 (Nixpkgs support) and I got that working today.
This consisted of two branches, the first of which is a branch that adds the Nixpkgs support for incremental builds:
https://github.com/NixOS/nixpkgs/compare/master...MercuryTechnologies:nixpkgs:gabriella/incremental
This one was fairly easy: I just cleaned up what Harry Garood and @leftpaddotpy had already implemented.
If you didn't already read the post I mentioned above the basic way it works is that you have to create two builds of your Haskell package:
- An older full build
- A newer incremental build (that uses the build products from the older build as a starting point)
The change to Nixpkgs adds a new .dist output that packages the older build's dist/build directory so that the newer build can use that as a starting point, so that it only has to build what changed since the older build.
The idea is that the older build is updated infrequently, but often enough that the "diff" between the old and new builds doesn't grow too large.
However, there is a huge gaping hole in this user experience: there isn't a great way to automatically specify what the older build should be. For example, suppose that you just pin the older build to a specific revision: eventually the "diff" between the old and new build will grow so large that you don't benefit from using the old build products for the incremental build. Eventually the incremental build approaches a full build after they diverge enough.
You could add some out-of-band automation to automatically update the reference to the old build, which is what this blog post attempted to do. The idea is that you can add the old build as a Nix flake input and then use Nix's support for updating/re-locking flake inputs to periodically bump the older build.
However, this was not satisfactory for me because I'm not a fan of out-of-band automation (especially when it comes to CI); I like to push as much logic into Nix as possible.
The user experience I actually wanted was something like this:
pkgs.haskell.lib.incremental
{ duration = 7 * 24 * 60 * 60; }
pkgs.haskellPackages.foo
… which would do a full build of the foo package once a week and then incremental rebuilds after that point relative to the last full build.
So the idea I had for implementing that was to do something like this:
- Assume the existing
srcinput for the package is agitrepository (and fail otherwise) - Replace it with a snapshot of the same repository except at an earlier point in time truncated to a certain time interval (e.g. a weekly boundary or daily boundary)
- Use the latest snapshot for the full build
- Use that as the input to the incremental build
However, this is difficult to do using Nix/Nixpkgs in their present state. Specifically, the second step (replace a git repository with an earlier snapshot) is technically possible but requires doing a whole bunch of undesirable stuff (like disabling the sandbox and import-from-derivation) and even when it "works" it is still brittle. Basically, it would be extremely unlikely that Nixpkgs would accept a PR for the evil things that this would entail.
However, there is a simpler and more principled solution to this, which generalizes better: extend builtins.fetchGit to support an optional date argument that accepts anything that git accepts (e.g. 1 week ago, 2000-01-01, or a unix timestamp). If you have that then it becomes much easier to replace a git source with an earlier snapshot, plus you make use of Nix's native support for locking and caching git fetches, so it's more efficient.
That's what I did for my second branch, which extends the builtins.fetchGit utility:
https://github.com/NixOS/nix/compare/master...Gabriella439:nix:gabriella/fetchGit
… and when you combine those two branches then everything just works1 and any Haskell package that uses a git source automatically does a full rebuild every interval with incremental builds in between.
Not only that, but the new builtins.fetchGit functionality could be conceivably used to power the same feature for other languages, too, so this might pave the way for incremental builds for package managers that are Nix agnostic.
-
You also have to use GHC 9.4 or newer for reasons covered in the original blog post.