TVL depot development (mail to depot@tvl.su)
 help / color / mirror / code / Atom feed
From: sterni <sternenseemann@systemli•org>
To: Vincent Ambo <mail@tazj•in>, depot@tvl.su
Subject: Re: [tvix] string contexts vs. reference scanning
Date: Wed, 11 Jan 2023 12:49:12 +0100	[thread overview]
Message-ID: <ecb3ceb6-eb07-7e96-6aa3-28b4b640d445@systemli.org> (raw)
In-Reply-To: <CANHrikpS8YXVPUzmqE-yWBse85j3RRjTu0tR4yg9Cqz89CpeNw@mail.gmail.com>

On 1/9/23 23:07, Vincent Ambo wrote:
> Any other input people might have on string contexts is also welcome!

One thing I'd want to see answered is how to handle import from 
derivation. In current C++ Nix this is handled in the following way:
import calls coerceToPath on the value it gets passed. coerceToPath 
looks at the string context and realises any derivation found within it. 
Finally the actual file is retrieved from the store / disk.

There are also similar occasions where things get realised while 
evaluating (interactively?) except reading / importing, but I don't have 
a very good handle on those yet.
> [1]: I'm not actually sure about this. It's possible that all these
> use-cases that exist right now (e.g. string context discarding in TVL's
> :llama: step) actually go away with the Tvix model of starting builds
> immediately, but strongly ordered. Thoughts?

Currently you can work with string context in the following ways:

1. You can discard some using builtins.unsafeDiscardOutputDependency.

    This has been, as far as I can tell, been added to combat the
    oddity of the string context of the `drvPath` attribute.
    Apparently disnix ran into the problem that the `drvPath`
    of a derivation would cause all of its outputs to be built
    in 2009. Subsequently, `builtins.unsafeDiscardOutputDependency`
    was [introduced].

    My _unconfirmed_ theory is that this was a quick and easy
    workaround that was implemented without considering the underlying
    problem. In my view, there is no reason why `drvPath` should
    incur a reference to all outputs of the derivation as well as
    the derivation file itself (I think this is thanks to the reference
    scanner the store runs after the fact which determines if the
    derivation and/or any of its outputs are actually referenced).

    I would be interested in any theories why `drvPath` behaved
    and maybe even should behave that way (maybe useful for recursive
    Nix?). In my experience `drvPath` either never enters a derivation or
    is closely accompanied by `builtins.unsafeDiscardOutputDependency`.
    Maybe when implementing string contexts there was a confusion
    what "=<drv_path>" should mean originally, but this was never
    fixed when disnix came around.

2. You can discard all using builtins.unsafeDiscardStringContext.

    I think the uses of this builtin fall into two categories:

    - To drop wrongfully retained string context. All string
      operations retain string context, even though some actually
      destroy any reference that was present in the string.

      Classic examples would be
      `builtins.substring 0 3 ">>> ${pkgs.hello}"` or
      `builtins.baseNameOf "${pkgs.hello}/bin/hello".

    - As an escape hatch from references to the derivations
      in question. We use this in //nix/buildkite: We use
      derivation paths, so we can skip re-evaluating targets,
      but discard any references to those files. Since buildkite
      doesn't know about nix-copy-closure(1), it'd be difficult
      to copy the required derivation files to an executing machine
      even if we had the correct references. Instead we impurely
      access the store and re-evaluate the target if the derivation
      file is missing.

    With reference scanning, wrongfully retained string context should
    basically disappear, but so would the escape hatch. I think we'd
    need to invent a new mechanism entirely, maybe even as ugly as
    an `allowedImpureReferences = [ … ];`.

    A third use is described in the item for appendContext.

3. You can check if there's any using builtins.hasContext.

4. You can query it using builtins.getContext.

    In C++ Nix 2.3 this is not particularly useful, since you can
    only inspect the root of the dependency graph (i.e. outPath
    will just give you drvPath as context). In Nix >= 2.6 you can,
    however, use this in conjunction with builtins.readFile
    to query the references of store paths. This is of course already
    possible before, but requires writing a reference scanner and/or
    derivation parser in Nix. We have a hacky [prototype] of this
    in depot.

    I think it should be possible to emulate this using reference
    scanning as well, i.e. tvix-eval would need to run the reference
    scanner on the given string for getContext. This would actually
    be pretty nice for doing dependency analysis.

5. You can add it using builtins.appendContext.

    The main use case for `builtins.appendContext` I can think of,
    is to restore string context after it has been discarded via
    `builtins.unsafeDiscardStringContext`. This is required due
    to technical limitations in C++ Nix that affect some algorithms:
    If you, for example want to use (parts of) input strings
    as keys in an attribute set, you need to make sure they have no
    string context. A function that has such a step in its algorithm
    would then use `builtins.getContext` to store the context,
    run the actual algorithm after `builtins.unsafeDiscardStringContext`
    is applied and finally return after restoring the string context
    using `builtins.appendContext`.

[introduced]: 
https://github.com/NixOS/nix/commit/437077c39dd7abb44b2ab02cb9c6215d125bef04
[prototype]: 
https://code.tvl.fyi/tree/nix/dependency-analyzer/default.nix?id=805219a2fad0edac10d046fc5ad5820edb4482ee#n10


  reply	other threads:[~2023-01-11 11:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CANHrikrEDPkH1raGDGAGETeATrWOJ=sBQCUXr6=pHJm1ajbd0A@mail.gmail.com>
     [not found] ` <20221202152213.3a59e629@ostraka>
2023-01-09 22:07   ` Vincent Ambo
2023-01-11 11:49     ` sterni [this message]
2023-01-11 12:20       ` Vincent Ambo
2023-03-16  9:41     ` Vincent Ambo
2023-03-16 12:00       ` Florian Klink
2023-01-10 20:20   ` reference-scanning inputDrvs/inputSrcs Adam Joseph
2023-01-10 20:48     ` Vincent Ambo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ecb3ceb6-eb07-7e96-6aa3-28b4b640d445@systemli.org \
    --to=sternenseemann@systemli$(echo .)org \
    --cc=depot@tvl.su \
    --cc=mail@tazj$(echo .)in \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://code.tvl.fyi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).