~crc_/retroforth#82: 
UTF8 and Strings

#Background

Strings in Retro currently have a few issues.

  • ASCII only (currently Retro allows for strings containing UTF8, but does not provide words for actually manipulating UTF8 characters)
  • Null terminated
  • There is a lot of overlap with the functionality provided by arrays

I will be making changes to resolve these, but it will not be a quick process. Changing the strings model will break (to various degrees) backwards compatibility, so this is not something that'll be rushed.

My current plan:

#Stage 1(a): New String Words

I will introduce s:fetch and s:store to update characters in a string. For the standard strings, this will be a thin layer over fetch and store. For the new strings, these will be a little more involved.

#Stage 1(b): Introduce UTF8 strings.

  • UTF8 strings will be arrays of character data.
  • There will be a us: (utf8 string) namespace for words operating on them.
  • A sigil for creating them will be provided.
  • Match functionality in existing string vocabulary.
  • Reuse array words internally when possible.

#Stage 2: Consolidation

This will involve updating the array words.

  • Indirection in access words (fetch, store, etc)

I will need to insert some indirection (to allow for things like us:fetch and us:store to be used when updating arrays that contain utf8 data). This will also aid in allowing for arrays of byte or halfword data.

To be continued as feedback is gathered and work progresses

Status
REPORTED
Submitter
~crc_
Assigned to
No-one
Submitted
28 days ago
Updated
28 days ago
Labels
No labels applied.