~icefox/garnet#47: 
ABI thoughts

Being able to interoperate with rust libs would be awesome tbh. My current thoughts for this are "maybe we can define the ABI and submit patches to rustc to conform to it".

This is... uh, you know, insane, but I don't see a better way around this. If we define a Garnet ABI for each platform, then make LLVM able to target it, then you could have Rust and C compilers that generate code that links to Garnet code.

Related: #18 #14

Status
REPORTED
Submitter
~icefox
Assigned to
No-one
Submitted
1 year, 6 months ago
Updated
18 days ago
Labels
T-THOUGHTS

Simon Heath (edited) 11 months ago* · edit

So reading through https://thephd.dev/binary-banshees-digital-demons-abi-c-c++-help-me-god-please again, part of the Surprise there is things unexpectedly having different ABI's. In one example:

extern int fll (long long value); 
extern int fi128 (__int128_t value);

These generate different asm code. In the C++ example, it's something cursed involving iterators and copy constructors.

So really, an ABI is fundamentally trying to define the answer to "what asm code do you generate to do X?" This also explains why it's inherently an optimization malus, because an optimizer wants to be able to answer that question with "whatever I feel like right now".

There's more subtle shit tho. Is making a struct bigger an ABI break? Yes, if you inline the smaller struct into registers and pass the bigger one via pointer. Ok, is making a struct bigger an ABI break if you only ever touch it through pointers? Yes, if you ever copy it and need to know how much to copy. Ok, you give it something like Swift's witness table that tells you how to copy it, what its initializer and destructor are, etc... now every ABI boundary involves dynamic dispatch. Annoying, right?

So an incomplete list of what changes might break an ABI is:

  • Function names and mangling
  • How function args are passed
  • How objects are copied/moved
  • The memory allocation properties of an object (size, alignment, stride)
  • Details of creation and destruction (and gods know what else when it comes to allocators)
  • Inlining will totally fucker it because, well, it takes out any indirection -- so any change to an inlined function may break the ABI
  • Secret representation changes of closures, which Rust would hide behind impl Trait
  • Adding members to a sum type maybe?
  • Changing struct layout and members, naturally
  • Prolly changing the Garnet properties a function has, since the compiler may optimize things differently based on them, or may change calling conventions (for example, exception handling or leaf function calls)
  • Virtual function calls, if those become implicit a la Rust's dyn (especially if you allow inheritance), cause if you add or remove functions then it changes the vtable layout, and if you allow inheritance then adding and removing functions will break anything inherited from it.
  • Heckin macros, of course

So part of the stuff from that post is making major language versions occasionally allowed to break the ABI. So, stdlib would be versioned, you'd have 2024 stdlib and 2030 stdlib and 2040 stdlib and such, and they are allowed to not only break the contents of the stdlib but break the ABI. This is a scary AF idea but the benefits, if done carefully, might be real handy.

Cripes this is giving me stress hives. What can we do?

Well, for one, ABI's only matter for DLLs. For static libs and exes, if the compiler knows what it wants to do, it can heckin do it; there's no real reason to worry too hard about distribution and reuse of static libs, and for compile time artifacts the only thing that really matters is that the compiler knows what to do with it. If you really can't build a static lib from source then it can follow the same ABI as a DLL, though I am increasingly convinced that distributing a proprietary static lib is always evil. For embedded systems that don't load code at runtime, none of this matters.

Ok, well that's something.

Simon Heath 11 months ago · edit

After reading https://thephd.dev/to-save-c-we-must-save-abi-fixing-c-function-abi , if we're having mangled symbols anyway, is there any real reason we can't just include an ABI version number in them too? Doesn't seem like something to do lightly, and I haven't thought through all the implications of it, and the main time I have to think about versioned symbols is when glibc fucking breaks for some stupid reason.

But... Right now at 1 am it seems like a pretty neat idea.

~icefox a month ago

~akavel a month ago

I distinctly remember recently stumbling upon some page (or reddit comment?) elaborating on why ABI stability in C++ is not a good thing (AFAIR, one of the things listed was that they can't upgrade the data structures used in stdlib containers, e.g. hash map). Possibly in context of some question like "why Rust no haz stable ABI???". But I seem unable to find it in any of my half-broken bookmarking systems, so, sorry :( anyway, just wanted to put it here for completeness, not that I want to sway the decision in any direction. But again, anyway, can't find at the point, so, uh... :( 🤷‍♂️

~icefox 18 days ago

Yeah an ABI is always gonna be a backwards compat issue, one way or another. Just look at Windows and all the things in it that can never be fixed. But it's also a gateway to... well, everything; just look at all the Rust libraries out there that have to present a C-like API to let other languages use them, and all the C++ libraries out there that can never be used from other languages 'cause they can't instantiate templates. Saying "we don't need an ABI" is one of those things where if you don't have it then people will re-invent it anyway, badly.

I vaguely remember reading something like along the lines of "we don't need a stable ABI" as well, maybe it was this? https://blaz.is/blog/post/we-dont-need-a-stable-abi/ Upon actually reading that though, it's a lot more moderate than the title makes it sound, I more or less agree that "runtime type info" is the way to go.

More references from Rust:

By now I've been sorta low-key assuming that no language more complicated than C will have a perfect mapping to any ABI (and C isn't exactly low-complexity anyway). But instead of making a "Garnet ABI" that expresses everything Garnet does perfectly, it might be best to have a sorta least-common-denominator that is nonetheless more powerful than the existing least-common-denominator of C. The circle-lang link kinda takes a similar approach, trying to describe the most useful things instead of everything. It can have rules for expressing things like "can this function unwind or not", "is this value moved or borrowed", "is this value Copy", stuff like that, so a compiler and linker have more information to work with about how to lego together bits of code.

Register here or Log in to comment, or comment via email.