* [tvix] string contexts vs. reference scanning [not found] ` <20221202152213.3a59e629@ostraka> @ 2023-01-09 22:07 ` Vincent Ambo 2023-01-11 11:49 ` sterni 2023-03-16 9:41 ` Vincent Ambo 2023-01-10 20:20 ` reference-scanning inputDrvs/inputSrcs Adam Joseph 1 sibling, 2 replies; 7+ messages in thread From: Vincent Ambo @ 2023-01-09 22:07 UTC (permalink / raw) To: Adam Joseph, depot; +Cc: Lukas Epple, Griffin Smith, Florian Klink [this mail thread is started out of another mail conversation] We are currently implementing `builtins.derivationStrict`, which translates the string context in evaluator values to the inputDrvs and inputSrcs fields in a derivation struct in C++ Nix. Florian and I have been "documenting"[0] our understanding of the code that does this in Nix as we go along. As amjoseph has previously said, string contexts are likely unnecessary: Adam Joseph <adam@westernsemico•com> writes: > Related to that second reason: I've written a tool (using the awesome > libnixstore crate) that calculates drv.inputDrvs using scanForReferences > instead of string contexts, and compares the calculated results to what > cppnix produced. I ran it on the 144,586 derivations in my buildfarm's > store, and (after fixing a lot of bugs in my scanner) there was only a > single disagreement in the entire store, which IMHO is better described > as a revealed bug in nixpkgs Because of this, We plan to use reference scanning over derivation input attributes instead of string contexts, but may need to address the problem of implementing unsafeDiscardStringContext to support certain use cases[1]. Adam, would you be up for pushing a WIP CL (or a commit straight to //users/amjoseph!) with the scanner you've written (and esp. with your bugfixes!) that we could then figure out how to integrate? It might help us avoid some overhead of running into the same bugs again :) Any other input people might have on string contexts is also welcome! [0]: https://github.com/tvlfyi/nix/commit/wtf-are-contexts [1]: I'm not actually sure about this. It's possible that all these use-cases that exist right now (e.g. string context discarding in TVL's :llama: step) actually go away with the Tvix model of starting builds immediately, but strongly ordered. Thoughts? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tvix] string contexts vs. reference scanning 2023-01-09 22:07 ` [tvix] string contexts vs. reference scanning Vincent Ambo @ 2023-01-11 11:49 ` sterni 2023-01-11 12:20 ` Vincent Ambo 2023-03-16 9:41 ` Vincent Ambo 1 sibling, 1 reply; 7+ messages in thread From: sterni @ 2023-01-11 11:49 UTC (permalink / raw) To: Vincent Ambo, depot On 1/9/23 23:07, Vincent Ambo wrote: > Any other input people might have on string contexts is also welcome! One thing I'd want to see answered is how to handle import from derivation. In current C++ Nix this is handled in the following way: import calls coerceToPath on the value it gets passed. coerceToPath looks at the string context and realises any derivation found within it. Finally the actual file is retrieved from the store / disk. There are also similar occasions where things get realised while evaluating (interactively?) except reading / importing, but I don't have a very good handle on those yet. > [1]: I'm not actually sure about this. It's possible that all these > use-cases that exist right now (e.g. string context discarding in TVL's > :llama: step) actually go away with the Tvix model of starting builds > immediately, but strongly ordered. Thoughts? Currently you can work with string context in the following ways: 1. You can discard some using builtins.unsafeDiscardOutputDependency. This has been, as far as I can tell, been added to combat the oddity of the string context of the `drvPath` attribute. Apparently disnix ran into the problem that the `drvPath` of a derivation would cause all of its outputs to be built in 2009. Subsequently, `builtins.unsafeDiscardOutputDependency` was [introduced]. My _unconfirmed_ theory is that this was a quick and easy workaround that was implemented without considering the underlying problem. In my view, there is no reason why `drvPath` should incur a reference to all outputs of the derivation as well as the derivation file itself (I think this is thanks to the reference scanner the store runs after the fact which determines if the derivation and/or any of its outputs are actually referenced). I would be interested in any theories why `drvPath` behaved and maybe even should behave that way (maybe useful for recursive Nix?). In my experience `drvPath` either never enters a derivation or is closely accompanied by `builtins.unsafeDiscardOutputDependency`. Maybe when implementing string contexts there was a confusion what "=<drv_path>" should mean originally, but this was never fixed when disnix came around. 2. You can discard all using builtins.unsafeDiscardStringContext. I think the uses of this builtin fall into two categories: - To drop wrongfully retained string context. All string operations retain string context, even though some actually destroy any reference that was present in the string. Classic examples would be `builtins.substring 0 3 ">>> ${pkgs.hello}"` or `builtins.baseNameOf "${pkgs.hello}/bin/hello". - As an escape hatch from references to the derivations in question. We use this in //nix/buildkite: We use derivation paths, so we can skip re-evaluating targets, but discard any references to those files. Since buildkite doesn't know about nix-copy-closure(1), it'd be difficult to copy the required derivation files to an executing machine even if we had the correct references. Instead we impurely access the store and re-evaluate the target if the derivation file is missing. With reference scanning, wrongfully retained string context should basically disappear, but so would the escape hatch. I think we'd need to invent a new mechanism entirely, maybe even as ugly as an `allowedImpureReferences = [ … ];`. A third use is described in the item for appendContext. 3. You can check if there's any using builtins.hasContext. 4. You can query it using builtins.getContext. In C++ Nix 2.3 this is not particularly useful, since you can only inspect the root of the dependency graph (i.e. outPath will just give you drvPath as context). In Nix >= 2.6 you can, however, use this in conjunction with builtins.readFile to query the references of store paths. This is of course already possible before, but requires writing a reference scanner and/or derivation parser in Nix. We have a hacky [prototype] of this in depot. I think it should be possible to emulate this using reference scanning as well, i.e. tvix-eval would need to run the reference scanner on the given string for getContext. This would actually be pretty nice for doing dependency analysis. 5. You can add it using builtins.appendContext. The main use case for `builtins.appendContext` I can think of, is to restore string context after it has been discarded via `builtins.unsafeDiscardStringContext`. This is required due to technical limitations in C++ Nix that affect some algorithms: If you, for example want to use (parts of) input strings as keys in an attribute set, you need to make sure they have no string context. A function that has such a step in its algorithm would then use `builtins.getContext` to store the context, run the actual algorithm after `builtins.unsafeDiscardStringContext` is applied and finally return after restoring the string context using `builtins.appendContext`. [introduced]: https://github.com/NixOS/nix/commit/437077c39dd7abb44b2ab02cb9c6215d125bef04 [prototype]: https://code.tvl.fyi/tree/nix/dependency-analyzer/default.nix?id=805219a2fad0edac10d046fc5ad5820edb4482ee#n10 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tvix] string contexts vs. reference scanning 2023-01-11 11:49 ` sterni @ 2023-01-11 12:20 ` Vincent Ambo 0 siblings, 0 replies; 7+ messages in thread From: Vincent Ambo @ 2023-01-11 12:20 UTC (permalink / raw) To: sterni, depot sterni <sternenseemann@systemli•org> writes: > One thing I'd want to see answered is how to handle import from > derivation. Files enter an evaluation through the `EvalIO` interface, which is passed the path(s) to read and returns the contents or store path. In the actual implementation of `EvalIO` that supports store interactions, store path access is detected and blocked/suspended until build completion. At the time that this calculation occurs, we already called builtins.derivation or equivalent on the thing yielding that path, so the build is already running. > There are also similar occasions where things get realised while > evaluating (interactively?) except reading / importing, but I don't have > a very good handle on those yet. I'm not sure what those would be, but there's nothing the evaluator can do to read from disk that wouldn't go through EvalIO. > 1. You can discard some using builtins.unsafeDiscardOutputDependency. This doesn't really discard the context, it changes the _type_ of the context entry. See cl/7807. I believe that this is unnecessary, all uses of it in nixpkgs are a hack. > My _unconfirmed_ theory is that this was a quick and easy workaround > that was implemented without considering the underlying problem. In my > view, there is no reason why `drvPath` should incur a reference to all > outputs of the derivation as well as the derivation file itself (I > think this is thanks to the reference scanner the store runs after the > fact which determines if the derivation and/or any of its outputs are > actually referenced). Well, the drv file brings everything inside of it into scope (that includes its output paths, which are part of the serialised representation). Otherwise referencing a drv file would make "dead paths" available in the builder. It's conceptionally reasonable that this closure is yielded, I think. It seems odd to me that it involves store database queries, though. All the information should be *in* the drv file. In Tvix, if a drv path is referenced then that drv was also part of the evaluation, so we *should* already know all of that information without running any queries. > - To drop wrongfully retained string context. All string > operations retain string context, even though some actually > destroy any reference that was present in the string. This is basically a workaround for how contexts work in C++ Nix. It's not a problem if contexts are replaced by scanning. > - As an escape hatch from references to the derivations in question. > We use this in //nix/buildkite: Yes, but we use it to work around a problem that we only have because of the model of C++ Nix. In Tvix this whole thing would work differently (the evalution itself drives the creation of the build targets). We'd essentially have a buildkite-driver that uses Tvix to evaluate and yields the build targets as necessary, using a store (local or remote) as the synchronisation point. The other uses are covered in https://cl.tvl.fyi/7807 and I think fairly uncontroversial. My gut feeling is that we can get away with not implementing any of the destructive string context operations, and if we _really_ end up needing unsafeDiscardStringContext, it's not terribly difficult to implement (could be a `Value` variant containing another Value that just acts as a marker). //V ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tvix] string contexts vs. reference scanning 2023-01-09 22:07 ` [tvix] string contexts vs. reference scanning Vincent Ambo 2023-01-11 11:49 ` sterni @ 2023-03-16 9:41 ` Vincent Ambo 2023-03-16 12:00 ` Florian Klink 1 sibling, 1 reply; 7+ messages in thread From: Vincent Ambo @ 2023-03-16 9:41 UTC (permalink / raw) To: Adam Joseph, depot; +Cc: Florian Klink Okay, there's some (for me) new information on string contexts that will require some careful handling if we want to proceed with reference scanning. Let me preface by saying that despite this problem, reference-scanning for inputs yields perfectly functional, *equivalent* but not *identical* derivations to Nix. We do currently consider it a problem because we want to be fully hash-equal with C++ Nix, both to prove that our implementation is correct and to make use of Hydra's cache. Now, for the problem: There are some real-life cases, for example during nixpkgs bootstrapping, where multiple different fixed-output derivations are written to produce the same hash. For example, bootstrap sources that are downloaded early are fetched using a special "builder hack", in which the `builder` field of the derivation is populated with the magic string `builtins:fetchurl` and the builder itself will perform a fetch, with everything looking like a normal derivation to the user. These bootstrap sources are later on defined *again*, once `curl` is available, to be downloaded using the standard pkgs.fetchtarball mechanism, but yielding the *same* outputs (as the same files are being fetched). In our reference scanning implementation, this output scanning of FOD will yield whatever the *first* derivation was that produced the given path as the drv to be stored in the `inputDrvs` field of the derivation. There's an orthogonal problem which made this confusing to understand, where C++ Nix has some special logic for how it hashes derivations that use fixed-output paths, which we haven't fully replicated yet. This led to hash differences which masked this underlying problem (the differences still exist, and are a separate issue). I discussed this with Adam yesterday and he suggested an approach similar to `builtins.placeholder`, which would only be in effect for fixed-output derivations. I think this is feasible but haven't sketched anything yet. Either way, for me this also raises the thought of whether we should decouple Tvix's internal representation of a derivation from that of C++ Nix and only "materialise" C++ Nix derivations (and accompanying hashes) where needed. Something to think about ... //V ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tvix] string contexts vs. reference scanning 2023-03-16 9:41 ` Vincent Ambo @ 2023-03-16 12:00 ` Florian Klink 0 siblings, 0 replies; 7+ messages in thread From: Florian Klink @ 2023-03-16 12:00 UTC (permalink / raw) To: Vincent Ambo; +Cc: Adam Joseph, depot […] >Let me preface by saying that despite this problem, reference-scanning >for inputs yields perfectly functional, *equivalent* but not >*identical* derivations to Nix. We do currently consider it a problem >because we want to be fully hash-equal with C++ Nix, both to prove >that our implementation is correct and to make use of Hydra's cache. […] >There's an orthogonal problem which made this confusing to understand, >where C++ Nix has some special logic for how it hashes derivations >that use fixed-output paths, which we haven't fully replicated yet. >This led to hash differences which masked this underlying problem (the >differences still exist, and are a separate issue). > >I discussed this with Adam yesterday and he suggested an approach >similar to `builtins.placeholder`, which would only be in effect for >fixed-output derivations. I think this is feasible but haven't >sketched anything yet. I'm not sure this will be a problem in practice, at least when it comes to "being able to substitute from binary caches". For both fixed-output derivations with recursive and flat hashes, the ATerm of the derivation we finally ended up picking is not ending up in the info that's used to calculate the output hash. In the case of a recursive sha256 FOD, a `source:sha256:$narDigest:$storeDir:$outputName` is used as a fingerprint: https://cs.tvl.fyi/depot@def2d32319e5d70b21a799d463d07d880f5ff207/-/blob/tvix/nix-compat/src/derivation/mod.rs?L256#tab=references In all other cases, a `derivation_or_fod_hash` is used, and baked into a `output:$outputName:$derivation_or_fod_hash:$storeDir:$outputName` fingerprint: https://cs.tvl.fyi/depot@def2d32319e5d70b21a799d463d07d880f5ff207/-/blob/tvix/nix-compat/src/derivation/mod.rs?L269#tab=def However, for /all/ FODs, `derivation_or_fod_hash` is calculated by the `fod_digest()` function: https://cs.tvl.fyi/depot@def2d32319e5d70b21a799d463d07d880f5ff207/-/blob/tvix/nix-compat/src/derivation/mod.rs?L172#tab=def … which internally uses a `fixed:out:$hashWithModeAsNixHashString:$outputPath` as inputs into the sha256 function. This means, even if we end up mapping the "wrong" fixed-output derivation somewhere, the output hashes will still be the same. And as lookups from the binary cache ask for the "compressed hash" / `StorePath.digest` [^1] only, and the path of the .drv doesn't matter, it shouldn't be a problem when it comes to substitute from the binary cache. So I'm not sure if all this placeholder logic is really necessary to implement at all. VWe can compare evaluation to be equivalent by simply calculating the resulting output paths, both for Tvix and Nix, and use this to validate the evaluator does the same. >Either way, for me this also raises the thought of whether we should >decouple Tvix's internal representation of a derivation from that of >C++ Nix and only "materialise" C++ Nix derivations (and accompanying >hashes) where needed. Something to think about ... I personally don't think we want to, or should materialize .drv files in /nix/store at all. The usecases for this that I've heard about so far are mostly attempts to schedule /around the Nix scheduler/, or somehow extract "build closures". We didn't recently discuss much on how we want a Tvix scheduler/builder interface to look like, but I personally would like it to treat some of the Nix-specific details in a very opaque fashion. It doesn't need to know how, e.g., the Nix sandbox env vars are populated, or how some desired output paths have been calculated. It can treat these as opaque strings to pass along to the build environment, or to set up mountpoints as such, but doesn't understand how they are produced. These values can be calculated/produced by the thing that still has all the context (pun intended) about Derivation construction. That way, the builder would be more generic, and we could also play with different path calculation modes without having to touch its implementation. -- flokli [^1]: https://cs.tvl.fyi/depot@def2d32319e5d70b21a799d463d07d880f5ff207/-/blob/tvix/nix-compat/src/store_path.rs?L40 ^ permalink raw reply [flat|nested] 7+ messages in thread
* reference-scanning inputDrvs/inputSrcs [not found] ` <20221202152213.3a59e629@ostraka> 2023-01-09 22:07 ` [tvix] string contexts vs. reference scanning Vincent Ambo @ 2023-01-10 20:20 ` Adam Joseph 2023-01-10 20:48 ` Vincent Ambo 1 sibling, 1 reply; 7+ messages in thread From: Adam Joseph @ 2023-01-10 20:20 UTC (permalink / raw) To: depot Quoting Adam Joseph (2022-12-02 15:22:13) > I've written a tool (using the awesome libnixstore crate) that calculates > drv.inputDrvs using scanForReferences instead of string contexts, and compares > the calculated results to what cppnix produced.... In a week or two I'm > going to clean it up and write a post about it on the nixos discourse Just want to mention that I haven't forgotten about this, but it is on hold for a while because of two things: 1. Much as I loathe C++, I should implement this in cppnix first rather than in tvix because: a. If I don't get it exactly right on the first try, tvix will get the blame for any behavioral differences, and that is not cool. The implementation is straightforward and obvious except for placeholders and unsafeDiscardOutputDependency -- those are subtle, and I can't be sure I'll get them right on the first try. b. Cppnix is already in production use. It can calculates scanned-inputDrvs *in addition to* string-context inputDrvs, and emit a warning if they disagree. Having no warnings on a large production codebase (nixpkgs, maybe even Hydra) for a few months is the sign that it ready for an RFC. 2. Nix will need scanned-inputDrvs anyways in order for Ericson2314's "EBB" derivations-that-emit-derivations-without-using-nixlang to be practically useful. So I am going to wait for nix#3959 to merge before I try to implement scanned-inputDrvs for cppnix. I will be much easier to convince people of its usefulness at that point. https://github.com/NixOS/nix/pull/3959 https://github.com/NixOS/nix/pull/3959#issuecomment-1375508978 You may not hear much from me for the next month or two. I have been dragged into a really enormous project that I have been wanting to do for a long time: https://github.com/NixOS/nixpkgs/pull/209870 Getting such a major change approved only happens if it resolves some kind of excruciating pain. Fortunately that excruciating pain is present right now: https://github.com/NixOS/nixpkgs/issues/108305 So #209870 is my top priority right now because at the moment it is very easy to convince people that it is useful. - a ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: reference-scanning inputDrvs/inputSrcs 2023-01-10 20:20 ` reference-scanning inputDrvs/inputSrcs Adam Joseph @ 2023-01-10 20:48 ` Vincent Ambo 0 siblings, 0 replies; 7+ messages in thread From: Vincent Ambo @ 2023-01-10 20:48 UTC (permalink / raw) To: Adam Joseph, depot On 1/10/23 23:20, Adam Joseph wrote: > If I don't get it exactly right on the first try, tvix will get the blame > for any behavioral differences, and that is not cool. I think we have thick enough skin to deal with this ;) My plan is to go ahead anyways and implement this string-scanning based solution. We're fairly close to producing arbitrary derivation hashes, and as we're serialising derivations into the same format as C++ Nix we'll be able to use existing diff tools to figure out discrepancies when computing hashes over the whole package set. I think this will allow us to figure out problems quite quickly! We can always resort to implementing string contexts if necessary. > The implementation is straightforward and obvious except for placeholder > and unsafeDiscardOutputDependency placeholders should fit into how we're envisioning the rest of the machinery pretty well, we'll see. context discarding builtins are interesting, I have a working theory that the use-cases in which they are currently used simply do not apply in Tvix (and we can implement them as no-ops that emit warnings), and will probably write up another post about that. > I have been dragged into a really enormous project that I have been wanting > to do for a long time: > > https://github.com/NixOS/nixpkgs/pull/209870 Cool to see work on bootstrap infrastructure. Hoping we can (in the not so far future) also collaborate with the Guix folks on that ... Good luck with the project! //V ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-03-16 12:00 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CANHrikrEDPkH1raGDGAGETeATrWOJ=sBQCUXr6=pHJm1ajbd0A@mail.gmail.com> [not found] ` <20221202152213.3a59e629@ostraka> 2023-01-09 22:07 ` [tvix] string contexts vs. reference scanning Vincent Ambo 2023-01-11 11:49 ` sterni 2023-01-11 12:20 ` Vincent Ambo 2023-03-16 9:41 ` Vincent Ambo 2023-03-16 12:00 ` Florian Klink 2023-01-10 20:20 ` reference-scanning inputDrvs/inputSrcs Adam Joseph 2023-01-10 20:48 ` Vincent Ambo
Code repositories for project(s) associated with this public inbox https://code.tvl.fyi This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).