Deduplication of d:source data

A fair amount of space is wasted due to duplicate strings in the d:source data. This will need be addressed before the next release.

Assigned to
17 days ago
17 days ago
feature request retro/nga

~rickcarlino 17 days ago

Any idea on how you would do this? I've thought about how this might work also. Maybe a dictionary entry that contains the string and then use a content hash in the header that points to a string lookup table? I don't have enough knowledge of Retro internals to say for sure but am curious.

~crc_ 17 days ago

For words in the base image, I'm manually filling in the source data, so those don't have duplications.

I have a couple of easy options for the others:

  • setup a table (or linked list?) of source filenames & hashes, and point the d:source field to the existing entries (or add to it if not present)
  • use the dictionary. In this case I'd have a word class that identifies these strings, and point the d:source field for words to the d:name field of the dictionary entries whose name matches the source filename

I like the second approach in terms of not needing to add another data structure, but it would add some visual noise to the output of d:words. I'll probably do a prototype of each and see what feels better in practice.

Register here or Log in to comment, or comment via email.