UTF8 and Strings


Strings in Retro currently have a few issues.

  • ASCII only (currently Retro allows for strings containing UTF8, but does not provide words for actually manipulating UTF8 characters)
  • Null terminated
  • There is a lot of overlap with the functionality provided by arrays

I will be making changes to resolve these, but it will not be a quick process. Changing the strings model will break (to various degrees) backwards compatibility, so this is not something that'll be rushed.

My current plan:

#Stage 1(a): New String Words

I will introduce s:fetch and s:store to update characters in a string. For the standard strings, this will be a thin layer over fetch and store. For the new strings, these will be a little more involved.

#Stage 1(b): Introduce UTF8 strings.

  • UTF8 strings will be arrays of character data.
  • There will be a us: (utf8 string) namespace for words operating on them.
  • A sigil for creating them will be provided.
  • Match functionality in existing string vocabulary.
  • Reuse array words internally when possible.

#Stage 2: Consolidation

This will involve updating the array words.

  • Indirection in access words (fetch, store, etc)

I will need to insert some indirection (to allow for things like us:fetch and us:store to be used when updating arrays that contain utf8 data). This will also aid in allowing for arrays of byte or halfword data.

To be continued as feedback is gathered and work progresses

Assigned to
10 months ago
4 months ago
No labels applied.

~yjmbo 4 months ago

Why not just go directly to UTF-32 and not worry about the extra storage? You have a 32-bit word ...

Register here or Log in to comment, or comment via email.