~icefox/garnet#47: 
ABI thoughts

Being able to interoperate with rust libs would be awesome tbh. My current thoughts for this are "maybe we can define the ABI and submit patches to rustc to conform to it".

This is... uh, you know, insane, but I don't see a better way around this. If we define a Garnet ABI for each platform, then make LLVM able to target it, then you could have Rust and C compilers that generate code that links to Garnet code.

Related: #18 #14

Status
REPORTED
Submitter
~icefox
Assigned to
No-one
Submitted
1 year, 2 months ago
Updated
7 months ago
Labels
T-THOUGHTS

Simon Heath (edited) 7 months ago* · edit

So reading through https://thephd.dev/binary-banshees-digital-demons-abi-c-c++-help-me-god-please again, part of the Surprise there is things unexpectedly having different ABI's. In one example:

extern int fll (long long value); 
extern int fi128 (__int128_t value);

These generate different asm code. In the C++ example, it's something cursed involving iterators and copy constructors.

So really, an ABI is fundamentally trying to define the answer to "what asm code do you generate to do X?" This also explains why it's inherently an optimization malus, because an optimizer wants to be able to answer that question with "whatever I feel like right now".

There's more subtle shit tho. Is making a struct bigger an ABI break? Yes, if you inline the smaller struct into registers and pass the bigger one via pointer. Ok, is making a struct bigger an ABI break if you only ever touch it through pointers? Yes, if you ever copy it and need to know how much to copy. Ok, you give it something like Swift's witness table that tells you how to copy it, what its initializer and destructor are, etc... now every ABI boundary involves dynamic dispatch. Annoying, right?

So an incomplete list of what changes might break an ABI is:

  • Function names and mangling
  • How function args are passed
  • How objects are copied/moved
  • The memory allocation properties of an object (size, alignment, stride)
  • Details of creation and destruction (and gods know what else when it comes to allocators)
  • Inlining will totally fucker it because, well, it takes out any indirection -- so any change to an inlined function may break the ABI
  • Secret representation changes of closures, which Rust would hide behind impl Trait
  • Adding members to a sum type maybe?
  • Changing struct layout and members, naturally
  • Prolly changing the Garnet properties a function has, since the compiler may optimize things differently based on them, or may change calling conventions (for example, exception handling or leaf function calls)
  • Virtual function calls, if those become implicit a la Rust's dyn (especially if you allow inheritance), cause if you add or remove functions then it changes the vtable layout, and if you allow inheritance then adding and removing functions will break anything inherited from it.
  • Heckin macros, of course

So part of the stuff from that post is making major language versions occasionally allowed to break the ABI. So, stdlib would be versioned, you'd have 2024 stdlib and 2030 stdlib and 2040 stdlib and such, and they are allowed to not only break the contents of the stdlib but break the ABI. This is a scary AF idea but the benefits, if done carefully, might be real handy.

Cripes this is giving me stress hives. What can we do?

Well, for one, ABI's only matter for DLLs. For static libs and exes, if the compiler knows what it wants to do, it can heckin do it; there's no real reason to worry too hard about distribution and reuse of static libs, and for compile time artifacts the only thing that really matters is that the compiler knows what to do with it. If you really can't build a static lib from source then it can follow the same ABI as a DLL, though I am increasingly convinced that distributing a proprietary static lib is always evil. For embedded systems that don't load code at runtime, none of this matters.

Ok, well that's something.

Simon Heath 7 months ago · edit

After reading https://thephd.dev/to-save-c-we-must-save-abi-fixing-c-function-abi , if we're having mangled symbols anyway, is there any real reason we can't just include an ABI version number in them too? Doesn't seem like something to do lightly, and I haven't thought through all the implications of it, and the main time I have to think about versioned symbols is when glibc fucking breaks for some stupid reason.

But... Right now at 1 am it seems like a pretty neat idea.

Register here or Log in to comment, or comment via email.