TVL depot development (mail to depot@tvl.su)
 help / color / mirror / code / Atom feed
From: Florian Klink <flokli@flokli•de>
To: Vincent Ambo <tazjin@tvl.su>
Cc: Adam Joseph <adam@westernsemico•com>, depot@tvl.su
Subject: Re: [tvix] string contexts vs. reference scanning
Date: Thu, 16 Mar 2023 13:00:39 +0100	[thread overview]
Message-ID: <20230316120039.j4fkp3puzrtbjcpi@tp> (raw)
In-Reply-To: <585fa6e7-f4f1-5b0a-1bef-e46b422fcec5@tvl.su>

[…]
>Let me preface by saying that despite this problem, reference-scanning 
>for inputs yields perfectly functional, *equivalent* but not 
>*identical* derivations to Nix. We do currently consider it a problem 
>because we want to be fully hash-equal with C++ Nix, both to prove 
>that our implementation is correct and to make use of Hydra's cache.

[…]

>There's an orthogonal problem which made this confusing to understand, 
>where C++ Nix has some special logic for how it hashes derivations 
>that use fixed-output paths, which we haven't fully replicated yet. 
>This led to hash differences which masked this underlying problem (the 
>differences still exist, and are a separate issue).
>
>I discussed this with Adam yesterday and he suggested an approach 
>similar to `builtins.placeholder`, which would only be in effect for 
>fixed-output derivations. I think this is feasible but haven't 
>sketched anything yet.

I'm not sure this will be a problem in practice, at least when it comes
to "being able to substitute from binary caches".

For both fixed-output derivations with recursive and flat hashes, the
ATerm of the derivation we finally ended up picking is not ending up in
the info that's used to calculate the output hash.

In the case of a recursive sha256 FOD, a
`source:sha256:$narDigest:$storeDir:$outputName` is used as a
fingerprint:

https://cs.tvl.fyi/depot@def2d32319e5d70b21a799d463d07d880f5ff207/-/blob/tvix/nix-compat/src/derivation/mod.rs?L256#tab=references

In all other cases, a `derivation_or_fod_hash` is used, and baked into
a `output:$outputName:$derivation_or_fod_hash:$storeDir:$outputName`
fingerprint:

https://cs.tvl.fyi/depot@def2d32319e5d70b21a799d463d07d880f5ff207/-/blob/tvix/nix-compat/src/derivation/mod.rs?L269#tab=def

However, for /all/ FODs, `derivation_or_fod_hash` is calculated by the
`fod_digest()` function:

https://cs.tvl.fyi/depot@def2d32319e5d70b21a799d463d07d880f5ff207/-/blob/tvix/nix-compat/src/derivation/mod.rs?L172#tab=def

… which internally uses a
`fixed:out:$hashWithModeAsNixHashString:$outputPath` as inputs into the
sha256 function.

This means, even if we end up mapping the "wrong" fixed-output
derivation somewhere, the output hashes will still be the same.

And as lookups from the binary cache ask for the "compressed hash" /
`StorePath.digest` [^1] only, and the path of the .drv doesn't matter,
it shouldn't be a problem when it comes to substitute from the binary
cache.

So I'm not sure if all this placeholder logic is really necessary to
implement at all.

VWe can compare evaluation to be equivalent by simply calculating the
resulting output paths, both for Tvix and Nix, and use this to validate
the evaluator does the same.

>Either way, for me this also raises the thought of whether we should 
>decouple Tvix's internal representation of a derivation from that of 
>C++ Nix and only "materialise" C++ Nix derivations (and accompanying 
>hashes) where needed. Something to think about ...

I personally don't think we want to, or should materialize .drv files in
/nix/store at all. The usecases for this that I've heard about so far
are mostly attempts to schedule /around the Nix scheduler/, or somehow
extract "build closures".

We didn't recently discuss much on how we want a Tvix scheduler/builder
interface to look like, but I personally would like it to treat some of
the Nix-specific details in a very opaque fashion.

It doesn't need to know how, e.g., the Nix sandbox env vars are
populated, or how some desired output paths have been calculated. It can
treat these as opaque strings to pass along to the build environment, or
to set up mountpoints as such, but doesn't understand how they are
produced.

These values can be calculated/produced by the thing that still has all
the context (pun intended) about Derivation construction.

That way, the builder would be more generic, and we could also play with
different path calculation modes without having to touch its
implementation.

-- 
flokli

[^1]: https://cs.tvl.fyi/depot@def2d32319e5d70b21a799d463d07d880f5ff207/-/blob/tvix/nix-compat/src/store_path.rs?L40

  reply	other threads:[~2023-03-16 12:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CANHrikrEDPkH1raGDGAGETeATrWOJ=sBQCUXr6=pHJm1ajbd0A@mail.gmail.com>
     [not found] ` <20221202152213.3a59e629@ostraka>
2023-01-09 22:07   ` Vincent Ambo
2023-01-11 11:49     ` sterni
2023-01-11 12:20       ` Vincent Ambo
2023-03-16  9:41     ` Vincent Ambo
2023-03-16 12:00       ` Florian Klink [this message]
2023-01-10 20:20   ` reference-scanning inputDrvs/inputSrcs Adam Joseph
2023-01-10 20:48     ` Vincent Ambo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230316120039.j4fkp3puzrtbjcpi@tp \
    --to=flokli@flokli$(echo .)de \
    --cc=adam@westernsemico$(echo .)com \
    --cc=depot@tvl.su \
    --cc=tazjin@tvl.su \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://code.tvl.fyi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).