~crc_/retroforth#4: 
suggestion: add support for string hashing in retro/c/c++

For my day job I recently had to address performance issues and I implemented a "perfect hash table" in C++ https://en.wikipedia.org/wiki/Perfect_hash_function - The code I used was published on https://stackoverflow.com/questions/2826559/compile-time-preprocessor-hashing-of-string

The BIG advantage is in C++ code StringHash::StaticHash("interpret") is the hash computed by the compiler, zero runtime overhead. The second C function can then be used at runtime to create the hash.

I think this function deserves its own i/o type so hashing can be done by the vm at high speed with slower fallback support. This can then be used as a building block to seriously improve the performance of Dictionary operations :)

"C"

unsigned StringHash__Dynamic(const char* str) {
  unsigned R = (StringHash::OFFSET ^ (*str & 0xFF)) * StringHash::PRIME;
  while (*str++) {
    R = (R ^ (*str & 0xFF)) * StringHash::PRIME;
  }
  return R;
}

"C++"

class StringHash
{
public:
  template <unsigned N, unsigned I>
  struct HashHelper
  {
    constexpr static unsigned Calculate(const char(&str)[N])
    {
      return (HashHelper<N, I - 1>::Calculate(str) ^ (str[I - 1] & 0xFF)) * StringHash::PRIME;
    }
  };

  template <unsigned N>
  struct HashHelper<N, 1>
  {
    constexpr static unsigned Calculate(const char(&str)[N])
    {
      return (StringHash::OFFSET ^ (str[0] & 0xFF)) * StringHash::PRIME;
    }
  };

  template<unsigned N>
  constexpr static unsigned StaticHash(const char(&str)[N])
  {
    return HashHelper<N, N>::Calculate(str);
  }
  static const unsigned OFFSET = 0x01234567;
  static const unsigned PRIME = 0x89ABCDEF;
};

Status
REPORTED
Submitter
~scott91e1
Assigned to
No-one
Submitted
1 year, 5 days ago
Updated
2 months ago
Labels
suggestion

~crc_ 1 year, 4 days ago

One observation: a fallback to this would probably need to be running on a 64-bit implementation as cells are signed and the StringHash::PRIME is bigger than a signed integer can hold on 32-bit systems.

~scott91e1 11 months ago*

its my understanding that as long as its a prime there should be no issues with picking a smaller one.

i am assuming to make it compatible with 32bit signed the hashing should be performed using unsigned and only the lower 31 bits returned?

~crc_ 11 months ago

I think that would suffice.

~crc_ referenced this from #45 8 months ago

~crc_ referenced this from #71 2 months ago

Register here or Log in to comment, or comment via email.