GetIdentitymust scan every node in the directory (it can't tell which ones are identities
Childrenthe hierarchy doesn't communicate any information about the tree relationships between nodes right now, so every node on disk is conceivably a child node of every other (we must check).
Recentmust scan every node even though it only cares about a single type of node
The above could be addressed by storing nodes in a more structured hierarchy. Something like:
. ├── communities │ └── SHA512_B32__QYRdxHQOnLTr_SD0u8nPIjAYi1YODWH05tlSDR9dAdQ │ ├── self │ ├── SHA512_B32__lVTlE-WJUJxtUtWPfafnudQk9oyHT2pgWEtBZXFjx4o │ │ ├── self │ │ ├── SHA512_B32__z89tiqtAmXWfN5l7-L18hFIQ4rXl_Rp4i98Cjyuj9xE │ │ └── SHA512_B32__ZI9oIOdcUhV6uCF3wQFU4_XsJZmEaN_7TqWLn-Ob0TQ │ └── SHA512_B32__z3n44lYtegNanse1iQTLbBG7r7cn9mjGjND17-6WdO0 │ ├── self │ ├── SHA512_B32__zRHH06sN9QjEhAX4oZsp6jDubSKxXH-DXiqPoWHVRpk │ └── SHA512_B32__zudIQfF_3pb3NyXJoOZjvTWP4TxAqa70OMkW3A_ErTE ├── .grove │ └── version └── identities ├── SHA512_B32__rs8L0sAjH2fE9anr5_Bu7Ijv6Cyig-H8N_WcoUcB3EQ └── SHA512_B32__uE9Nm7kZvJVJ1zdZYbsBsrn0BNX6rPhV_MQWape8Acw
All root nodes like identities and communities can be iterated quickly. Each community's data is stored in the
self file in the directory named for the community ID. The directories inside of a community are conversations (top-level replies) and their data is in their own
self files. Within each conversation are all of the participating nodes stored in a flat format.
I think a structure like this would address most of the performance issues that we face now. The
.grove directory (like
.git folder) is for us to store information like version numbers. This will allow us to iterate on how nodes are stored in a backwards-compatible way. The example file
.grove/version would tell the
Grove instance that opened it which schema was in use.
I'm happy to discuss alternative filesystem layouts though, if anyone has ideas that they would like to explore.