Thoughts on type conversions

Surprised we don't have a ticket for this already?

Inspired by https://gist.github.com/T-Dark0/b14c35f44d434bd382d850a4c57f410f , "Everything as does in Rust".

Reproduced here for posterity:

as does a huge number of things. Probably too many. Here's a comprehensive list of everything it can do. Note that as conversions are transitive: If a as B as C compiles, then so does a as C, though it might not take the same "path" through intermediate conversions.

  • int <-> int
    • zero_extend: unsigned int -> bigger unsigned int. Pads the number with leading zeroes.
    • sign_extend: signed int -> bigger signed int. Pads the number with leading zeroes if positive, and leading ones if negative.
    • truncate: bigger int -> smaller int (regardless of signedness). Throws away the high bits of the number
    • reinterpret_sign: int -> int of the same size and opposite signedness. Does nothing to the bits.
  • int <-> float
    • cast_saturate_to_infinity: Yields the float closest to the specified integer. Yields infinity if the integer is out of bounds.
    • cast_nan_to_zero_saturating_when_out_of_bounds: Truncates the float and yields the corresponding integer. NaN yields 0, and the infinities yield the minimum and maximum integer value.
  • int <-> ptr
    • expose_addr: ptr -> int. Does nothing to the bits, and exposes the address (which is a provenance-related concept)
    • from_exposed_addr: int -> ptr. Does nothing to the bits, inheriting the provenance of a pointer with the same address that was previously exposed.
  • ptr <-> ptr
    • cast: ptr -> ptr: Does nothing to the bits, keeps the pointer's mutability while changing the type
    • cast_const -> *const T -> *mut T. Does nothing to the bits. Changes the pointer's mutability while keeping the type
    • cast_mut -> *mut T -> *const T. Does nothing to the bits. Changes the pointer's mutability while keeping the type
  • ptr <-> function pointer
    • to_fn_ptr: ptr -> fnptr. Does nothing to the bits.
    • to_data_ptr: fnptr -> ptr. Does nothing to the bits.
  • char -> u32 (this conversion only goes one way)
    • from: char -> u32. Does nothing to the bits.
  • bool -> u8 (this conversion only goes one way)
    • from: bool -> u8 Does nothing to the bits, yields 1 for true and 0 for false.
  • ref -> ptr (these conversions only go one way)
    • as_mut_ptr: &mut T -> *mut T. Does nothing to the bits.
    • as_const_ptr &T -> *const T. Does nothing to the bits.
  • ptr to sized -> ptr to unsized (that is, any pointer or reference, including Box, Arc, etc, from Ptr<T> to Ptr<U> where U does not implement Sized)
    • unsize: Makes the pointer fat. The data pointer is still stored therein, and pointee-specific metadata is added.
  • type ascription (That is, the fact that None as Option<u8> compiles)
    • ascribe: This... isn't really a function. It's more like a special inference thing.

Also, here's some special mentions:

  • char -> u8: Desugars to char -> u32 -> u8. That is, it truncates the character.
  • &mut T -> *const T. Desugars to &mut T -> &T -> *const T. Notably, this means that if the resulting pointer is cast back to *mut T, it cannot be used for writes anyway.
Assigned to
8 months ago
6 months ago

~icefox 7 months ago

There's basically a few fundamental cases. Whenever you do T1 -> T2 you could have:

  • Every bit pattern in T1 is a valid bit pattern in T2
  • or T1 has some bit patterns that aren't valid in T2 and so requires a runtime check or compile-time restriction.

On its own slightly-overlapping axis there are these possibilities:

  • T1 is smaller than T2, so you need to figure out how to widen it and set the new bits to something valid. For example zero-extension vs. sign-extension for integers.
  • T1 is larger than T2, so you need to reduce, truncate, or convert in some way. A trivial example could be truncating an unsigned integer, a less trivial one could be shortening a F64 to F32.
  • T1 is the same size as T2. If this is the case then the first two bullets entirely encompass the problem.

So there's a few different operations we want to be able to express:

  • Bitwise cast that is done without a copy. May or may not require a validity check?
  • A conversion that requires changing the size of something, which may or may not require a copy in reality -- ie, extending a U32 to a U64 in a register is prolly a noop, while doing it to a member in a struct will require copying it out of that struct.
  • A conversion that may fail. For example turning a U32 into a U8 without truncating it.

Ok. That's what we want to be able to express:

  • Fallible vs. infallible
  • In-place vs. copying
  • For copying conversions, multiple conversion policies (noop, sign-extend, round-towards-zero, etc)

This isn't even starting to think about pointer provenance! But it does express what we want to be able to say about raw pointer conversions. Any pointer *T1 can be turned into a pointer *T2 as an in-place cast. However, it is only infallible if converting T1 to T2 is infallible, and it's only Safe if T1 -> T2 can be done in-place. I think. If it can't be done in-place, for example *U32 -> *U64, then you have to know more than the compiler does about what memory is like around that U32.

~icefox 7 months ago

This also gives us a framework for thinking about the differences between a "pointer" and an "address".

  • *T -> Address is infallible and in-place, and thus safe
  • Address -> *T is also in-place, but is fallible. But the compiler probably won't know what to check for besides a valid bit-pattern, which might not be possible if you're conjuring a pointer to MMIO mem you don't want to read. But that's a separate pointer type anyway, so I guess that's ok.

~icefox 6 months ago*

Oh, Zig is another good source: https://ziglang.org/documentation/master/#Casting

...also is a good example of what I don't want to do, since it is a million different cases for different types trying to do the Sensible Thing for each one. These are pretty interesting though:

    @bitCast - change type but maintain bit representation
    @alignCast - make a pointer have more alignment
    @enumFromInt - obtain an enum value based on its integer tag value
    @errorFromInt - obtain an error code based on its integer value
    @errorCast - convert to a smaller error set
    @floatCast - convert a larger float to a smaller float
    @floatFromInt - convert an integer to a float value
    @intCast - convert between integer types
    @intFromBool - convert true to 1 and false to 0
    @intFromEnum - obtain the integer tag value of an enum or tagged union
    @intFromError - obtain the integer value of an error code
    @intFromFloat - obtain the integer part of a float value
    @intFromPtr - obtain the address of a pointer
    @ptrFromInt - convert an address to a pointer
    @ptrCast - convert between pointer types
    @truncate - convert between integer types, chopping off bits

@bitcast and @aligncast in particular are difficult to express otherwise, for example.

~icefox 6 months ago

So what I really want is just a function cast(|From, To| From) To that would do the conversion. This conversion is infallible and copying, though I suppose it could be done in-place if T1 is not Copy. Note this signature is basically Rust's std::convert::From, though that also has a constraint of T: Sized. The fallible version would just be TryFrom. But you could think of a few other more general-purpose functions along those lines: widen(), truncate(), bitcast()...

Register here or Log in to comment, or comment via email.