~sircmpwn/hare#960: 
issues with array literal type inference

According to the spec, in absence of a type hint, array literal's members are all supposed to be of the same type, which becomes the member type of the resulting array type. This is probably a little bit too restrictive, especially in presence of flexible literals, null, nullable and non-nullable pointers, and so on.

Harec doesn't really comply with the spec here, and picks the type of the first element in the literal as the member type, and then verifies all the other elements are assignable to that (with some oddities around promote_flexible which I don't fully understand). Type assignability isn't symmetric, so the first member has more weight in determining the type of the literal than the others, and the result is dependent on order of the members, which isn't exactly optimal.

So we should fix Harec to be order independent. There are a couple of options:

  • Keep the behavior mentioned in the spec (which, as mentioned, is quite restrictive)
  • Invent a more relaxed procedure specifically for this purpose. One way that comes to mind is using type assignability bidirectionally with each consecutive pair of elements, and picking the "less restrictive" type as the intermediate result when determining the member type. There are probably corner cases and I'm not even 100% sure this is symmetric, but it's an idea.
  • Reuse some procedure for "unifying" types in an order independent manner from somewhere else in the language: two options come to mind, type promotion, and type reduction. Type promotion is undefined for most aggregate types, so we'd need to expand it significantly for this, which isn't exactly reuse and falls closer to the second point. Type reduction looks like a more promising candidate, but it's a bit too flexible perhaps? I imagine we want [1i, 1u: (uint | int)] and [1i: (uint | int), 1u] to yield [2](int | uint), but [1i, 1u] should probably error out? So perhaps a bit more restricted version of type reduction would be just the right thing?

Second and third options obviously require spec changes too.

Some cases that I'd like to make work (with any order of members):

let x = &3;
let a = [x, null]; // [2]nullable *int
let x = &3;
let y = x: nullable *int;
let a = [x, y]; // [2]nullable *int
let a = [[], [1]: []u8]; // [2][]u8
let a = [[1], [1u8]]; // [2][1]u8
let  a = [1: (str | int), "asdf"];

Some cases that should probably error out:

let a = [[1u8], [1i]]; // should not be [2]([1]u8 | [1]int)
let a = [[1,2,3], [1,2]]; // should not be [2]([2]int | [3]int)
let a = [1, "asdf"]; // should not be [2](int | str)

See #810 for a related issue with array literal type inference when there is a type hint provided.

Status
REPORTED
Submitter
~turminal
Assigned to
No-one
Submitted
2 months ago
Updated
2 months ago
Labels
design harec spec

~sebsite 2 months ago

We could afford to be a bit more permissive here (especially with null), but I don't think it's a big deal to be order dependent. It simplifies the compiler, and allows you to figure out the array's member type just by looking at the first member. We already have a bunch of type deduction systems; it would suck to add yet another one, outside of some simple stuff like with nulls and nullable pointers.

Type reduction has some weird semantics that would make it unsuitable for use here without modification (see #945, and my comment on it):

let x: (*int | nullable *int) = &0;
let a = [x, x];

a's type should be [2](*int | nullable *int), but with type reduction it would be [2]nullable *int.

Related to #952, we should also decide whether we want to allow never expressions: let a = [1, abort()]; (or with flipped order).

Some comments on some of the cases you gave:

let a = [[], [1]: []u8]; // [2][]u8

With the slice assignment RFC, this would become let a = [null, &[1]: []u8];, which makes it easier to make that work.

let a = [[1], [1u8]]; // [2][1]u8

I feel like I remember this intentionally not working (in that order)? Because it would complicate the compiler or something? ~ecs would know more about that, since it involves flexible types.

let a = [[1u8], [1i]]; // should not be [2]([1]u8 | [1]int)

What about [[1u8], [1i], [1i: (u8 | int)]]?

let a = [[1,2,3], [1,2]]; // should not be [2]([2]int | [3]int)

Speaking of the slice assignability RFC: should [&[1, 2, 3], &[1, 2]] be allowed (with type [2][]int)? How about [[1, 2, 3][..], &[1, 2]]? I don't think those should be allowed, but if the other cases you gave are allowed, then allowing these would make some sense.

~sebsite 2 months ago

Ooh, what about expandable arrays?

let a = [[0...], [1, 2, 3]]; // [2][3]int ?

~sebsite 2 months ago

Another issue I thought of:

let a = [[1], [1u8]]; // [2][1]u8

This implies that [1]'s type (before lowering) is [1]iconst. This would need to be handled in other cases. Like, if it's used in a tuple literal. Or when taking its address, would it become *[1]iconst, or would it be lowered like &0 is? What about slicing? Would []iconst be a thing? And if we allow the case you gave, then it would also make sense to allow [&0, &0u8], which would make *iconst a thing.

This would also cause issues if/when we add type(expr), (or typeof(expr), whatever). We want to disallow taking the type of a flexible literal, but type([1, 2, 3]) should be allowed IMO.

~ecs 2 months ago

fyi [1]iconst et al is exactly what i opted to avoid - sticking iconst inside aggregate types, in ways that affect their size, means that lowering iconsts requires updating the aggregates' sizes, which sucks

the order-dependence here has always smelled a bit to me, but tbh it's not the worst thing - i'd be ok with shipping hare 1.0 with those array semantics, and i'm not sure it's worth adding substantially more complexity in order to improve this

~turminal 2 months ago

We could afford to be a bit more permissive here (especially with null), but I don't think it's a big deal to be order dependent. It simplifies the compiler, and allows you to figure out the array's member type just by looking at the first member. We already have a bunch of type deduction systems; it would suck to add yet another one, outside of some simple stuff like with nulls and nullable pointers.

the order-dependence here has always smelled a bit to me, but tbh it's not the worst thing - i'd be ok with shipping hare 1.0 with those array semantics, and i'm not sure it's worth adding substantially more complexity in order to improve this

If it's not a big deal to be order dependent here, why do we bother with type reduction at all? Reordering switch/match or even if cases is actually much more feasible than reordering array elements.

I don't mean that entirely seriously, but I do think the reasoning for avoiding order dependence applies equally here and for branching expresion result calculation. The only difference is that array literals are much less common than branching, but that should not be an excuse for intentionally giving them inferior semantics.

I totally agree this isn't worth much additional complexity, but at the same time, not being able to accomodate this use case indicates that perhaps our other deduction systems need adjustment, which by itself does not imply increase in complexity.

Type reduction has some weird semantics that would make it unsuitable for use here without modification (see #945, and my comment on it):

let x: (*int | nullable *int) = &0;
let a = [x, x];

a's type should be [2](*int | nullable *int), but with type reduction it would be [2]nullable *int.

Yeah I was speaking of some abstract future version of type reduction that is more reasonable than what we currently have.

Honestly, #945 and some other problems with type reduction make me think it's way too elaborate and we should make it simpler and leave complex cases for the user to resolve manually. Perhaps that would make it directly usable here as well.

Related to #952, we should also decide whether we want to allow never expressions: let a = [1, abort()]; (or with flipped order).

Some comments on some of the cases you gave:

let a = [[], [1]: []u8]; // [2][]u8

With the slice assignment RFC, this would become let a = [null, &[1]: []u8];, which makes it easier to make that work.

let a = [[1], [1u8]]; // [2][1]u8

I feel like I remember this intentionally not working (in that order)? Because it would complicate the compiler or something? ~ecs would know more about that, since it involves flexible types.

fyi [1]iconst et al is exactly what i opted to avoid - sticking iconst inside aggregate types, in ways that affect their size, means that lowering iconsts requires updating the aggregates' sizes, which sucks

Okay yeah this one was not very well thought out.

On a tangent, Harec's reliance on fixed sizes during check is quite unreasonable at times. I have a patchset prepared (and some more in preparation) that improves some of that. I'm not really advertising for flexible literals in aggregate types here, just pointing out this particular technical obstacle is (mostly) going away at some point.

let a = [[1u8], [1i]]; // should not be [2]([1]u8 | [1]int)

What about [[1u8], [1i], [1i: (u8 | int)]]?

Probably not? It essentially requires the compiler to "see through" the array literal. And that very quickly leads to intractable stuff. We actually do already have something kind of similar in tuple literal type inference, and it feels very out of place.

[[1u8], [1i], [1i]: ([1]u8 | [1]int] on the other hand should be fine.

let a = [[1,2,3], [1,2]]; // should not be [2]([2]int | [3]int)

Speaking of the slice assignability RFC: should [&[1, 2, 3], &[1, 2]] be allowed (with type [2][]int)? How about [[1, 2, 3][..], &[1, 2]]? I don't think those should be allowed, but if the other cases you gave are allowed, then allowing these would make some sense.

I haven't thought about slice assignability RFC hard enough yet to give an informed answer. My guess would be no for [&[1, 2, 3], &[1, 2]] since there is no indication from the user that they want a slice. [[1, 2, 3][..], &[1, 2]] does have an indication. So, maybe that should be fine?

Ooh, what about expandable arrays?

let a = [[0...], [1, 2, 3]]; // [2][3]int ?

That would be desirable, I think. And hopefully not too hard once we get expandable arrays in order.

Register here or Log in to comment, or comment via email.