Basically, how do we signal something went wrong when an integer overflow happens or some other assertion fails?
Simple choice: Exit program without running destructors. This means we will leave file handles, network sockets etc. in bad states though. Rust's
panic='abort'
, basically.Next choice: Run destructors on main thread, then exit program. This just leaves other threads hanging though. Probably not worth considering.
That leads to doing what Rust does: Run destructors and exit thread. Other threads can watch for it, pull the panic value out of the thread's handle, etc. That gets complicated, aieeeeee. Might be worth thinking about more; what if "panic in thread" works differently from "panic in main thread?" You can pass arbitrary code into a thread though. I dunno!
Either way we can't always promise to run destructors, but it's pretty nice to always run them when we can.
Panicking across FFI boundaries, as always, gets wacky. It would be nice for a function to be able to assert that it does not panic though, now that I think of it...
Interesting conversation and design stuff in this article: https://lobste.rs/s/8mqgal/different_ways_handle_errors_c
Zig has some interesting approaches on handling error traces and unwinding: https://ziglang.org/documentation/0.10.1/#Errors
One approach from the Rust world: https://bluxte.net/musings/2023/01/08/improving_failure_messages_rust_tests/
Isn't it expected that the OS will clean up any handles after a crashed application?
Also: https://lwn.net/Articles/191059/ ("Crash-only software...")
As to Zig's errors, they have one long-standing unresolved ticket that for now is mostly being kept swept under the rug: https://github.com/ziglang/zig/issues/2647 (see e.g.: https://github.com/ziglang/zig/issues/2647#issuecomment-1444790576)
And now it also reminds me that Araq (Nim creator) has some blogpost musings about error handling: https://nim-lang.org/araq/quirky_exceptions.html
Isn't it expected that the OS will clean up any handles after a crashed application?
It will, but... hmmm, things like databases are the things that generally want to Really Maintain Consistency even in the face of errors, but they tend to do it not by cleaning up when an error occurs, but by structuring the data in such a way that it's guaranteed to be recoverable no matter when an error happens. Journals to be replayed, detection of partial writes, etc.
So it's worth considering what kinds of things can and/or should be reasonable to guarantee in the face of an error. The main concerns for the language itself, rather than an application, are things that maintain internal consistency. Allocator state/memory leaks for example, or unlocking mutexes.
So, something to think about for aborting is that one key difference between Rust's
panic=unwind
andpanic=abort
is that unwinds can be caught (ie at thread boundaries), and aborts cannot. Aborting is a one-way trip; once the panic has happened there's no way for control flow to enter back into the original program. But it doesn't have to necessarily be an "immediately ask OS to kill this process" trip; it could in fact preserve the call stack, walk down it, and print out a backtrace. However it can't run destructors because that would make it no longer a one-way trip. Hell, this "diverge control flow without touching state" is basically what the OS does already when it saves a core dump.So really, the hard part is maintaining consistent state. If you abort, you don't have to care about maintaining consistent state anymore, because you're about to blow it up. If you have a way to catch unwinds, or otherwise deal with threads that may panic and don't want them to take down the entire process with it, then maintaining consistent state (ie trying to unlock mutexes, not leak memory, etc) becomes Problematic.
I guess I'm kinda restating what has already been said. Panics are simple and nice because they are unrecoverable. Once you start trying to recover from them shit gets complicated. Erlang deals with it by not sharing state but making it easy to encapsulate state. But part of the whole appeal of threads is being able to isolate state. But... threads also really suck at that because they don't have the guard rails that exist around OS processes. So maybe the answer is just "threads aren't the right tool for isolating state".
Maybe "panic program" and "panic thread" can be separate operations, somehow? That has some appeal. From their own point of view, each diverges, but something outside of it (the OS or the parent thread(???)) is free to do some kind of cleanup.
Zig looks worth learning from: https://andrewkelley.me/post/zig-stack-traces-kernel-panic-bare-bones-os.html
Hi ~icefox, not sure if it is at all in the design space you are going for, but exceptions as algebraic effects is definitely another strategy. I saw somewhere on your wiki that you had looked into Koka - I think Koka is neat, but kind of hard to learn how algebraic effects work from (definitely for the reasons you outlined there - wrt. koka's model being non-local control flow and side-effects tracking smushed together). If you're interested in getting a handle on them if you don't have one already, I really recommend checking out the Effekt language and their language design papers, I found them way easier to read than the Koka ones: https://effekt-lang.org/publications.html
The concrete benefit of an algebraic effects approach to exceptions is because algebraic effects are a unification of all non-local control flow, you get reasonable semantics around their interactions with threading and async and freeing things for-free. You would very likely lose out on some performance though, and how to make syntax around algebraic effects reasonably understandable still seems to be a work in progress for sure. This would also let you convert between values as errors (Result[T, E] etc) and exceptions as errors (raise E) fairly easily, but you could do that without an algebraic effects system - I think Swift has some interesting work in the design space around this with their try stuff, that I think Rust has been working on stealing.
Also, algebraic effects would allow you to assert a function does not panic. In the system of Effekt all effects aside from those on first-class functions are inferred. But you can assert a function to be pure by appending
/ ()
(or something along those lines - forget the exact syntax) after the type. You could also definitely do this with just a side effect tracking system, though.
It's certainly in the general region of the design space, though I kinda want to avoid them if I can simply for the sake of Garnet's "don't be a research project" goal -- which I may have already failed by now, admittedly. My thoughts on effects in general are in the first couple sections on https://man.sr.ht/~icefox/garnet/properties.md, so that's probably something like what you saw. I haven't looked much at Effekt though, that sounds like something I should do.
My main beef with effects as usually discussed in Academic Things is that they really like their non-local control flow, and I really would like to avoid that as much as I can for Garnet. It makes borrow checking hard, it makes reasoning about programs hard (or litters your programs with annotations), it makes your compiler more complicated (a la Rust async), and so on. If I wanted a unification of non-local control flow I'd already have Lua-like coroutines, but decided not to for basically those reasons. It's a great solution for a managed language that is not Garnet.
However... I might see what you're getting at. Threads already involve non-local control flow so using property-like things to manage their effects could be interesting. Beyond only just applying to functions you could make threads that have properties like "doesn't panic" or "can't leak resources" or such, which could be pretty useful...