For my day job I recently had to address performance issues and I implemented a "perfect hash table" in C++ https://en.wikipedia.org/wiki/Perfect_hash_function - The code I used was published on https://stackoverflow.com/questions/2826559/compile-time-preprocessor-hashing-of-string
The BIG advantage is in C++ code StringHash::StaticHash("interpret") is the hash computed by the compiler, zero runtime overhead. The second C function can then be used at runtime to create the hash.
I think this function deserves its own i/o type so hashing can be done by the vm at high speed with slower fallback support. This can then be used as a building block to seriously improve the performance of Dictionary operations :)
"C"
unsigned StringHash__Dynamic(const char* str) {
unsigned R = (StringHash::OFFSET ^ (*str & 0xFF)) * StringHash::PRIME;
while (*str++) {
R = (R ^ (*str & 0xFF)) * StringHash::PRIME;
}
return R;
}
"C++"
class StringHash
{
public:
template <unsigned N, unsigned I>
struct HashHelper
{
constexpr static unsigned Calculate(const char(&str)[N])
{
return (HashHelper<N, I - 1>::Calculate(str) ^ (str[I - 1] & 0xFF)) * StringHash::PRIME;
}
};
template <unsigned N>
struct HashHelper<N, 1>
{
constexpr static unsigned Calculate(const char(&str)[N])
{
return (StringHash::OFFSET ^ (str[0] & 0xFF)) * StringHash::PRIME;
}
};
template<unsigned N>
constexpr static unsigned StaticHash(const char(&str)[N])
{
return HashHelper<N, N>::Calculate(str);
}
static const unsigned OFFSET = 0x01234567;
static const unsigned PRIME = 0x89ABCDEF;
};
One observation: a fallback to this would probably need to be running on a 64-bit implementation as cells are signed and the
StringHash::PRIME
is bigger than a signed integer can hold on 32-bit systems.
its my understanding that as long as its a prime there should be no issues with picking a smaller one.
i am assuming to make it compatible with 32bit signed the hashing should be performed using unsigned and only the lower 31 bits returned?
I think that would suffice.