Skip to content
This repository has been archived by the owner on Mar 11, 2024. It is now read-only.

Granular build time sources

Silvan Mosberger edited this page Oct 27, 2022 · 1 revision

This is an idea for how we could reuse the lib.sources functions to also work with build-time sources (derivation files). This then allows creating filtered derivations containing only a subset of files, while ensuring that the store path only changes when the included files are updated.

The implementation effort would be low (maybe a week, or two with tests and docs), but the use cases aren't very clear. In addition it might not be fully applicable to nixpkgs, as full use introduces a lot of extra derivations which add to the evaluation overhead. RFC 92 might change the landscape a bit though, I am not sure if that would speed this use case up.

Potential use cases

  • Accessing individual files of derivations without requiring a full download (e.g. nerdfonts is a good example)
  • GitHub fetcher for individual files
  • File-incremental build tooling like snack
  • With the above, being able to patch sources without requiring a full rebuild
  • More generally it could allow something like lazy trees for build-time

Overview

The general idea is that we can implement a builtins.path (which supports file filtering) that works on derivations instead of eval-time paths. The main problem is that if you just filter files from a derivation into another, the resulting derivation changes every time any file changes, not just the ones that were filtered for.

In order to get around this, we need to know the hashes of each individual file in advance, such that we can create fixed-output derivations for each individual file. While these individual fixed-output derivations then depend on the original derivation we're trying to filter, the hash and file path of those fixed-output derivations won't change (that's how fixed-output derivations are handled in Nix).

The three main primitives that are introduced for this are:

  • pkgs.granularSource.pin, which hashes all the individual files of a derivation and writes those to a JSON file, which can then either be used directly with IFD, or committed locally to be used without IFD.
  • pkgs.granularSource.create, which takes a derivation and a JSON file created by pin and annotates that derivation with the file hashes from the JSON file.
  • pkgs.granularSource._path, which takes a derivation with annotated file information created by create and returns a store path that contains only the files that were selected with a filter.

Finally, pkgs.granularSource.lib re-exports all the functions from pkgs.lib.sources, but swapping the underlying builtins.path to be pkgs.granularSource._path instead. This essentially allows all of the pkgs.lib.sources functions to be used for both eval-time and build-time sources.

API

pkgs.granularSource.pin args

Pins the files of a derivation by writing the hashes and types of all files to a JSON file in a derivation. The resulting file is suitable to be passed to pkgs.granularSource.create, either as a derivation (which then leads to IFD, disallowed in nixpkgs), or as a file path when copied locally.

This function isn't made for local eval-time sources because in that case the builtins.path primitive can be used without requiring such pinned hashes in advance.

Arguments:

  • src: The derivation whose files to pin.
  • hashAlgo: The hashing algorithm to use, either sha256 or sha512.

Returns a derivation for a JSON file with the following format:

{
  "treeHashes": {
    "<someFile>": {
      "file": {
        "hash": "sha256-1rCVS1wK2D9lL22rbmdE6Wg2PwXRZxgFw8CVqnw3txM="
      }
    },
    "<someDir>": {
      "directory": {
        "entries": {
          "<someNestedFile>": {
            "file": {
              "hash": "sha256-5rAeSHaY8qfcdxHvb9XWZCYX35hYBssalFbZiYjoJV0="
            }
          }
        }
      }
    },
    "<someSymlink>": {
      "symlink": {
        "target": "<somePath>"
      }
    }
  }
}

The hashes use the SRI hash format.

pkgs.granularSource.create args

This function turns a derivation and associated granular file information generated using pkgs.granularSource.pin into a source value that can be used with the pkgs.granularSource.lib functions.

Implementation note: Just use pkgs.granularSource.{_path,_pathSymlinks} with a filter that always returns true, therefore returning the derivation files unchanged.

Arguments:

  • src: Derivation whose files to use as the source
  • pinFile: Path to file generated using pkgs.granularSource.pin. This file is imported at evaluation time, meaning that if this file is a derivation path, import-from-derivation is necessary. To prevent that, copy the pregenerated file to a project-local path.
  • symlink: Whether files should be symlinked instead of copied, defaults to false. Enabling this requires less store space, but increases access time and might mess up some tools.

In nixpkgs this can be used for builders that can benefit from file-level build granularity, such as c2nix.buildCPP like this:

c2nix.buildCPP {
  src = granularSource.create {
    path = fetchFromGitHub { ... };
    pinFile = ./pinFile.json;
  };
}

Implementation note: buildCPP needs to use the pkgs.granularSource.lib functions with src to make use of the additional granularity.

TODO: Add a way to validate the hashes.

(internal) pkgs.granularSource._path args

Like builtins.path, but for derivation paths. Only files not removed by the filter will have an influence on the output hash. Falls back to builtins.path if path is not a derivation.

All arguments are optional except path:

  • path: The underlying derivation. Needs to be a value returned from pkgs.granularSource.create.
  • name (optional): The name of the derivation.
  • filter (optional): A function of the type expected by builtins.filterSource, with the same semantics.

The result is a derivation containing the files from path but filtered according to filter.

Note: The recursive and sha256 argument of builtins.path are not implemented because they aren't needed for the lib.sources interface.

Implementation note: This function needs to implement validation of the hashes.

(internal) pkgs.granularSource._pathSymlinks args

Like pkgs.granularSource._path, but it creates symlinks to the original source instead of copying the files.

TODO: Somehow export the result as a bash script doing the symlinking

pkgs.granularSource.lib

Same functions as lib.sources but acting on granular build-time sources created using pkgs.granularSource.create.

Implementation note: Allow lib.sources to be generic over the builtins.path used.

lib.sources.{setSubpath,limitToSubpath}

These functions are from my proposal in the source combinators PR, these would be useful to get individual files from the granular source. The resulting store path is only influenced by the files it actually contains.