Custom Continuous Benchmarking for depot

TVL depot development (mail to depot@tvl.su)
 help / color / mirror / code / Atom feed

From: sterni <sternenseemann@systemli•org>
To: depot@tvl.su
Subject: Custom Continuous Benchmarking for depot
Date: Mon, 10 Feb 2025 16:46:43 +0100	[thread overview]
Message-ID: <ca85ee9e-558b-4123-8eef-b94c48048ddf@systemli.org> (raw)

Hello everyone,

# Overview

Below I sketch out an idea for a continuous benchmarking service built 
on top of our existing CI/CD infrastructure, i.e. Buildkite. This would 
be suitable to replace our existing use of aspen's [windtunnel.ci].

# Status

I'm looking for feedback on the general idea (i.e. this document)
before implementing a prototype (to prove that it is actually feasible).

# Motivation

-   [windtunnel.ci] does not work at the moment.
     -   The domain has expired.
     -   I do not know whether the backing service is still running.
     -   aspen probably does not have time to work on it / maintain it in
         the near future (which is understandable).
-   We have benchmarks in depot which are only executed manually at the
     moment (not exactly many, though: just //tvix and
     //third_party/lisp/mime4cl. There is, of course, no telling how
     this may change if we have a proper continuous benchmarking service
     built into our CI).
-   Using a separate service, be it windtunnel.ci or a self-hosted
     [bencher] instance, is annoying because it
     needs to be configured separately from our existing CI
     infrastructure and can't leverage readTree for automatically
     discovering benchmarks.

# Proposal

I've recently stumbled over gipeda ("GIt PErformance DAshboard"), a
static site generator which used to back perf.haskell.org/ghc (which no
longer exists). It was not hard to [revive] the code. The nice
thing about it is that it's not concerned with running benchmarks at
all. It just takes a directory of runs (matched to git revisions) and
generates a [dashboard] which visualizes changes in performance and
integrates with a preexisting git viewer.

I think we can set up relatively simple continuous benchmarking for
depot using gipedia:

1.  Add a simple helper which allows easily creating an extraStep that
     executes a benchmark, e.g.:

         meta.ci.extraSteps = {
           hyperfine-bench = depot.ops.benchmarking.mkBench {
             run = run-hyperfine-bench;
             converter = depot.ops.benchmarking.hyperfine-json-to-csv;
           };
           cargo-bench = depot.ops.benchmarking.mkBench {
             run = run-cargo-bench;
             converter = depot.ops.benchmarking.criterion-to-csv;
           };
         }

     The arguments (tentatively) have the following meaning:

     run
     :   should run the benchmark and dump the result to stdout in some
         machine-readable format.

     converter
     :   is a program that reads the output of `run` and converts it to
         the CSV format gipeda uses. This is a separate attribute to
         allow reusing such programs. Derivation are used so that
         new converters can also be defined in an ad hoc way
         (i.e. inline).

     The converted benchmark result would be uploaded as a Buildkite
     [artifact].

     If we want usable results, we'll have to figure out how to
     constrain the execution of these `extraSteps`:

     -   They always need to be executed on the same machine, so times
         are comparable between runs.
     -   Benchmark execution may not be parallelized at all.
     -   The executing machine should otherwise be idle, i.e. no other
         pipeline runs, no other Nix builds etc.

     I'm not sure if it is even possible to express this in Buildkite's
     step configuration in a way that we'd be able to have these
     extraSteps part of the normal depot pipeline (which would be pretty
     cool, though) so that they are executed for refs/heads/canon. We'd
     essentially need some kind of super low priority, but exclusive
     step.

     This may be easier to achieve if we either have a dedicated machine
     for benchmarking or a separate pipeline (see Alternatives section)
     which also doesn't run as often.

2.  Have a step in the pipeline that collects all benchmark results
     and merges them into a single CSV file named after the git revision
     of the run for gipeda. The step would either upload this merged
     collection as an artifact or directly move it into gipeda's data
     directory.

     Buildkite allows the use of globs when downloading
     artifacts, so we could probably just use e.g. `benchmarks/*`.
     This way the merge step would not need to have a full list
     of all benchmarks available to it.

3.  After a such a pipeline run, trigger a rebuild of the dashboard
     (or just run gipeda regularly on a timer).

# Alternatives

-   Use a separate pipeline for benchmark execution
     -   Benchmarks would be identified by e.g. a `meta.ci.benchmarks`
         attribute
     -   Benchmark execution could be implemented in a single step or
         possibly be controlled by pipeline parallelism.
     -   If artifacts are used, disambiguating them from other artifacts
         would be trivial.
     -   We could use a separate [cluster] if we had dedicated agents for
         benchmark execution.
-   Revive [windtunnel.ci]
-   Setup [bencher] (which, at first glance, looks relatively
     complicated)

# Open Questions

## How to Identify Benchmarks

I think this points to some gaps in our Bazel inspired target syntax. We
should eventually work on filling those (see also b/438). Basically,
gipeda uses strings to identify specific benchmarks. In some places,
globbing can be used (which is very simple, i.e. it just expands `*` at
the end of the string to anything). The question, then becomes, how do
we express a specific benchmark as a single string. We have the
following components:

1.  The readTree target (which, due to `extraSteps`, may not be a
     subtarget)
2.  The extraStep name, i.e. the benchmarking script.
3.  The named benchmark results the script returns (of which there may
     be multiple).

Also we would need to figure out which part of the code is in charge of
labeling. Probably the benchmark result merge operation should add the
first two components to the raw results which only have the 3.
component. However, if we use globbed artifact downloads, the merge step
may not have the necessary information for this.

## Gipeda Frontend

The gipeda frontend has quite a few JS dependencies which probably pin
ancient versions of dependencies. I haven't tried to get this to work
or modernize it yet. Probably feasible, worst case is probably that
we'd have to redo the graph rendering or pin dependencies indefinitely.

## CSV as the Canonical Format

With the current proposal, CSV would become the canonical output format
which is a relatively simple key value map of benchmark name to
numerical result. The source output from e.g. criterion would be a lot
richer.

I don't think this is a huge concern since we would be in a position to
change this later if necessary.

[revive]: https://github.com/nomeata/gipeda/pull/65
[bencher]: https://bencher.dev
[windtunnel.ci]:
https://web.archive.org/web/20240926214808/https://windtunnel.ci/
[dashboard]: https://perf.haskell.org/gipeda/
[artifact]: https://buildkite.com/docs/pipelines/configure/artifacts
[cluster]: https://buildkite.com/docs/pipelines/clusters

                 reply	other threads:[~2025-02-10 15:46 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca85ee9e-558b-4123-8eef-b94c48048ddf@systemli.org \
    --to=sternenseemann@systemli$(echo .)org \
    --cc=depot@tvl.su \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://code.tvl.fyi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).