~sircmpwn/hare#863: 
Unbounded array struct members aren't specified

The type of each struct or union field shall have a definite, non-zero size.

harec allows the final member of a struct (or union) to be an unbounded array, for C compatibility. However, this isn't specified, and lots of details are unclear:

  • harec also allows this for unions (again only for the last member), which I imagine is a bug and should be disallowed. (Albeit maybe we want to allow types with undefined size to exist in unions generally(?); such unions would need to be used as pointers.)
  • harec allows the unbounded array member to be the only member of the struct, whereas C requires at least one other named member be present. I don't think there's a technical reason we need to require this (but see below), but we should at least decide exactly what we want here.
  • C has different semantics for flexible structs than Hare. In Hare, a struct with an unbounded array member has undefined size, whereas in C, the size of the struct is as though the flexible array wasn't present, with the caveat that using the struct within other structs or arrays is disallowed. Or, put another way, stack-allocated flexible structs are allowed in C. (This is why at least one other member needs to be present, since zero-size types aren't allowed in C.) It would be great to not implement this since it's unnecessarily complicated, but C functions may return a stack-allocated flexible struct, so, maybe we should? tbh I'm very slightly leaning towards still disallowing this.
Status
REPORTED
Submitter
~sebsite
Assigned to
No-one
Submitted
1 year, 4 months ago
Updated
9 months ago
Labels
acceptance design harec spec

~sebsite 9 months ago

On the note of stack-allocated flexible structs: I just helped someone in #hare with something related to flexible structs (specifically rt::inotify_event), and disallowing a stack binding prevented a stack corruption bug (the code used to compile because rt::inotify_event's layout wasn't correct and so it had a defined size, but this has since been fixed, and so the code failed to compile). This is a great real-world example of our current behavior catching and preventing a bug at compile-time, so I think we should continue to disallow this.

The one downside is that you can't take the size() of such a struct, for use in the length of a buffer. For example: inotify(7) says that the minimum buffer size that can hold any event is sizeof(struct inotify_event) + NAME_MAX + 1. the equivalent hare code fails to compile, since you can't take the size of rt::inotify_event. This is kinda annoying, but I think it's still fine?

~sebsite 9 months ago

Hm, if we change offset() to take in a type and a field name rather than a field access expression (or add some other builtin expression for this), then C's sizeof() behavior can be replicated with offset(rt::inotify_event, name). I actually think this is better than allowing size(), since it's more explicit.

~turminal 9 months ago

Every value constructed in Hare lives on the stack at some point in its lifetime, except for globals initialized at compile time and the I guess the expressions in some of the fields of append, insert and alloc (flexible structs don't make sense in the first two), so unless we make flexible structs pretty much unconstructable in Hare (because for example, you can't reassign the struct once you've allocated it), we can't really restrict them from being stack allocated.

I guess a relevant question here is what exactly are we trying to achieve with Hare flexible structs? If it's just C compatibility, then making them awful and hard to construct is kinda fine I guess, because the target use case are mostly structs that are passed in from C libraries. But I don't think just having them for C compatibility is worth it. If we have them we should make them actually useful.

Oh and regarding size, if we're having flexible structs just for C compat, size should behave the way it does in C, and if not, probably still the same, but I'm less sure about that.

And fwiw, I also think we should consider allowing stack allocated unbounded arrays, since I don't think there are any technical reasons not to, and not having them is sometimes annoying to deal with.

~sebsite 9 months ago

Every value constructed in Hare lives on the stack at some point in its lifetime, except for globals initialized at compile time and the I guess the expressions in some of the fields of append, insert and alloc (flexible structs don't make sense in the first two), so unless we make flexible structs pretty much unconstructable in Hare (because for example, you can't reassign the struct once you've allocated it), we can't really restrict them from being stack allocated.

Objects in general are underspecified, and I plan to address this alongside the mutability overhaul, since there's some overlap.

But, I don't understand why we can't prevent stack allocating them without making them constructable? They're difficult to construct and to use, but I think that's a good thing (see below).

I guess a relevant question here is what exactly are we trying to achieve with Hare flexible structs? If it's just C compatibility, then making them awful and hard to construct is kinda fine I guess, because the target use case are mostly structs that are passed in from C libraries. But I don't think just having them for C compatibility is worth it. If we have them we should make them actually useful.

Not just C libraries; also low-level APIs like inotify. I've considered in the past whether we should just get rid of them entirely, but I think that they're used often enough elsewhere that we should have at least rudimentary support for them.

That being said: flexible structs (and unbounded arrays in general) are a low-level unsafe feature, by design. And as such, I don't think we should make them easier to use, since that also makes them much easier to misuse (even if you know what you're doing). The behavior that we have right now is, for the most part, correct IMO. It makes logical sense (see below) and does a good job of preventing bugs. They serve their intended purpose.

Oh and regarding size, if we're having flexible structs just for C compat, size should behave the way it does in C, and if not, probably still the same, but I'm less sure about that.

The size isn't really accurate though. The type still semantically has undefined size, so I don't think size should lie to you, even if it makes some operations slightly more convenient.

The semantics don't exactly match up with C because our type system is different from C's. C has a notion of "complete" and "incomplete types"; all complete types have a definite size. Struct types whose final member is an incomplete array are themselves complete types, albeit with some restrictions on their use (special cases which Hare currently doesn't need to worry about). The incomplete array at the end just acts as a way to read/write memory which extends beyond the rest of the struct.

Hare has no such distinction: types may either have a definite size or an undefined size. A struct whose final member has undefined size must, by extension, have undefined size itself.

And fwiw, I also think we should consider allowing stack allocated unbounded arrays, since I don't think there are any technical reasons not to, and not having them is sometimes annoying to deal with.

Unbounded arrays are inherently unsafe, and so I think it's good that working with them isn't really easy. The stack corruption bug I mentioned in the previous comment which was prevented does enough to convince me of this.

I also don't really see the logic in allowing them to be stack-allocated tbh. It'd be similar to allowing stack-allocated opaque objects. By definition, the size of the object is unknown, and you can't allocate an unknown amount of data. You can allocate a fixed-size buffer on the stack, and then interpret its contents as opaque, but the object itself still isn't opaque. Allowing allocating an opaque (or unbounded array) object directly would require changing the type system to accommodate, and I really don't think that's worth it.

You also still wouldn't be able to pass such objects around, like as function arguments, without having them convert to a different fixed-size type, so the behavior would be inconsistent.

TL;DR: I'm convinced at this point that the behavior we currently have for flexible structs is the correct behavior (apart from a few known bugs which need to be fixed). Changing the behavior to make some things slightly easier will always either complicate the type system, introduce more weird unintuitive special cases, and/or make the language more unsafe and error-prone to use by getting rid of trivial and almost always correct compile-time checks. Our current behavior strikes a really good balance between safety and complexity, while also being (IMO) the most intuitive.

~sebsite 9 months ago

If the ability to stack-allocate a flexible struct is absolutely needed, then you can use subtyping:

type foo = struct {
    // ...
};

type bar = struct {
    foo,
    extra: [*]u8,
};

~sebsite 9 months ago

Also, regarding the second bullet point in the original ticket: I think we should continue to allow having the unbounded array field be the only struct field. It's not compatible with C since C doesn't have zero-size types, but it's easier to allow it than to not, and there's not really any reason to disallow it.

~sircmpwn REPORTED BY_DESIGN 9 months ago

Unbounded structs are also useful in Helios for some EFI stuff. I make use of the subtyping thing there to put them on the stack, too.

~sircmpwn BY_DESIGN REPORTED 9 months ago

Err, sorry, didn't mean to close it.

Register here or Log in to comment, or comment via email.