GardenerAether
@GardenerAether

ive been doing a bit of thinking about nixos errors. and like. to be clear, im sure that if nixos devs could make more useful error messages that they obviously would. and i myself am a pretty big fan of nixos, to the point of straight up evangelization in private circles

im interested though in finding out why nix's error messages dont end up working most of the time (especially for beginners)

i have a handful of observations: errors are very sparse, seem largely unrelated, expect too much knowledge of the underlying libraries, stack traces are quite large, individual points in a stack trace seem to vary wildly in relatedness, and an abundance of cryptic error messages

but that doesnt do much to explain why thats the case. its not helpful to just point out "look! bad thing about labguage!". at the same time though, this is kind of where i have to turn to other people; i cant come up with a single clear or concise explanation as to why

so in terms of solutions, my analysis kind of runs dry here. the right answer is probably the unsatisfyingly complicated bunch of different only-slightly-related answers; occam's razor may simply not be designed to shave these hairs. again, im sure that if nix devs could make the errors communicate more useful information, they already would have, and attempts have been made.

like i already said, i love nix and nixos; out of all the possible distros i could have used based on asahi linux, i chose nixos, basically knowing that itd probably be the most difficult choice. but my experience with nixos previously had told me it was worth it, and i still think it was. i dont regret my decision whatsoever. theres a lot that nix does really well. nix is, to use some deeply professional phrasing ive learned over the years, really fucking swagger

but i think that having discussions like this- talking about what languages and tools do wrong- is as important as discussing what they do right, so long is each is framed from the perspective of why

so im interested. anybody else with experience with nix, or any other programming languages that either produce immediately useful or decidedly useless error messages wanna chime in?

what makes a good error system? what do you think makes an error system bad, and what do you think can be done to make a good error system?

ill definitely be taking any discussion into consideration when working on my own language, and im sure plenty of others will to, so dont think that youre comments are getting thrown into the void, opinionated or not! lets try and make better error systems that programmers can better rely on when they make the sillies :3

(i pinkie promise one day ill bother to learn css and youll have prettier posts to look at)


You must log in to comment.

in reply to @GardenerAether's post:

I can't comment about nix specifically, but one thing I'd like to see revived in error systems is the ability to handle errors interactively and without unwinding. Back in the DOS days, if you had a filesystem error, DOS would pause the program, report the error on the console, and ask the user to "Abort, Retry, Fail?" We all made fun of that back in the day, but if you could insert the right floppy disk, plug the serial cable back in, or whatever, you had a chance to retry the operation successfully without the rest of the program knowing anything even happened.

Now, with multitasking systems you can't realistically stop the whole system every time an error occurs, and Unix generally has a focus on scriptability and batch processing that makes stopping and waiting for user input undesirable a lot of the time. But I still think there's a lot of benefits to that model where error handling notionally occurs in the context where the error condition occurred. Exception-style error handling, as well as its value encodings such as returning an error code, Maybe, or, Either, immediately loses the context of where the error occurred since you pass the error condition upwards trying to find someone willing to handle or report it. That makes a lot of intuitive error handling approaches, such as retrying the operation after the error condition goes away, more difficult because not only do you need to manually write the code path that restarts the operation, but that code path needs to begin the operation again from scratch, since the original attempt has been unwound.

In terms of current mainstream language design trends, I think it'd be valuable to treat an error handling situation more like an await than a throw, offering the error up to a handler to address the condition and potentially resume the failed operation in-place, unwinding only in a situation where the error couldn't be addressed. Common Lisp's condition system is often cited as a gold standard for that sort of in-place error handling paradigm. In a more forward-looking language design, I think algebraic effect handlers could represent this sort of error handling well too.

!!

absolutely this. more reasons for me to hate unix i guess, lol. being able to more interactively debug programs and figure out errors probably wouldnt be a major boon for users, but in-depth enough good be massive for the actual developers behind them!!

you bring up a really good point here about how often times we can't just consider good error systems provided by the language, because very often we use custom error mechanisms during runtime as well

i really wanna try sketching up something a bit more like what you described in haskell now. free monads arent exactly fast, but probably a good place to experiment with smth like this in since free monads basically encapsulate the idea of a language. ill let ya know what i can find!

I think part of the issue is that some nix devs would not make error messages better if they couldโ€”I had a friend try to make UI improvements and one of the maintainers nitpicked it to death and then reverted the changes after it was merged and retroactively changed the contributor rules to justify the behavior. Theyโ€™re actively driving away contributors who want to work on this improvement.

while i generally try to avoid cynicism, its of course disheartening every time stories like this come up. maintainers often turn out to be shitty people, or have some unfathomable god complex that makes them impossible to work with and throws other devs under the bus

best case scenario the community splinter's between one group which (usually for good reasons) wants to do its thing anyways and another that (usually for bad reasons) tries to salvage whatever's left of the project's original developer and maintainer base (prism launcher fits this type of situation to a Teeยฒ)

worst case scenario, absolutely nothing happens except maintainers show they dont actually care that much about the contributions of their fellow developers and post-talk-justify the whole thing (sounds like what you mentioned)

oss maintainers, while not universally, often sort of behave like an owning class

i guess part of the problem then is actually a social issue rather than a technical one. even if we wanted to improve upon existing projects, development can often be stunted by things like this

culture has to be very delicately cultivated, but i feel like, in a great twist of irony, even that too is much easier to do with startup projects. in the same way that python, haskell, scala, rust, and so on have all formed their own little programming subcultures, hopefully future language designers (including myself) can weigh the social aspect a bit more and try to guide that community to one more receptive and open to change

thanks for engaging!! :3

another more philosophical take i have on error management is that it's often beneficial to treat error conditions as part of the normal domain of an operation than as a side channel to be "handled". i had written a blog post a while back about this in the context of compiler diagnostics: https://duriansoftware.com/joe/constructing-human-grade-parsers

if you treat a parser result in the classic Either ParseError AST manner, then that tends to encourage the classically awful "expected TOKEN_1 or TOKEN_2" sorts of parser errors. but if you can incorporate the error condition into the AST itself, then you can collect multiple errors in one pass, letting the developer fix them all in one go before running the compiler again, and those errors also have more context in the compiler's program state, making it easier to do more advanced error reporting such as offering fix-it notes.

you could also look at this approach in other situations. if the result of a "download file" operation is just Either NetworkError [Byte], then there's not much you can do in response to the NetworkError case besides try the download all over again. but if you produce a PartialFile type, whose representation includes the ability to represent missing or partial bytes along with error conditions associated with them, then your program has the ability to process the partial bytes you did receive, or resume the download to retry getting only the missing parts instead of restarting the download completely.

i don't have many ideas or prior art though to point to as far as precedent for systematizing this sort of in-band error management approach, since it's (or at least to me it seems) inherently domain-specific how and to what extent you can incorporate error states into your "normal" values

thanks for the link to your post! ill definitely be reading through whenever i get a moment

ive thought about this sort of thing a couple times. even just the idea of a stack trace always felt weird to me. we really should be using more sophisticated tools by now, at least for functional programming languages. even if you embedded the error information within a tree that gives a lot more away as from both the context its placed within, as well as the context surrounding it

i feel like this kind of approach to designing error systems integrates really well with what you said previously about treating errors more with an await. treating errors less as an alternative failure state, and instead more as a partial result is a lot more practical. within a runtime context, such as an error system built into a game engine, treating errors more as not-yet-fulfilled obligations could make it a lot easier to interactively explore the underlying problem spaces

curiously, and this might look more like deranged ramblings, but i actually think that the multiplicative form of disjunction within MLL type theories gets close. if haskell had negation types, something type of the form \a -> Free ((->) (Not a)) could be worth investigating. its not quite right, namely in that it sort of assumes that there is always a component of error (which i guess isnt entirely unreasonable either), but its something worth mentioning i think, as it could at least give a hint or two as to what the error systems we're wanting should actually look like

I've been thinking about errors for typecheckers for a few years now

I wrote a lot of words a year or so ago, I uhh don't want to read it myself and don't necessarily think you should either: https://cofree.coffee/~verity/comprehensive_errors.html

the gist was try to provide parallel errors when possible, and to go about it algebraically to provide a bigger picutre

but that's for a typechecker โ€“ for a runtime, things are different

and on a totally different note: I'd love if all errors indicated what program/software they originated from, so you at least know what source code to grep to find it

(there should really be a service that looks at an error message and points you to the code that produced, if it is OSS)

also obviously you want to separate form and content in errors โ€ฆ an error me be thrown by a framework, but with a message from the user of that framework โ€ฆ

I'll try to read the other comments when I have time! (/attention)

thanks for the link! ill give it a proper read through whenever i get a moment i have Much to read >~<

i feel like better package manager integration with programming languages would probably help with reporting the origins of a particular error. ironically, this is something that nix technically has more potential in, but its not really usable in pure for software development

i think type checking could help in the case of a lot of silly bugs. dynamic typing tends to let errors propagate a long way through a system before they are caught...

as an example, "1st argument to stdenv.mkDerivation is missing attr pname" is a lot more informative than "error calculating attr name: missing attribute pname" followed by a 1000 line long stack trace that includes the module system itself

that ship has kind of sailed for nix though. integrating type checking now would require rewriting basically the entirety of nixpkgs, a ask that, judging by how the community has handled flakes, would never end