~whereswaldon/arbor-dev#123: 
Let users create 'profiles' with metadata about themseelves

General concept: It'd be useful to be able to add information to yourself as Arbor grows, particularly personal information which might be useful: rough geographic location/time zone, where you work, any interests, your pronouns, a real name, perhaps even known key aliases, etc. I'm going to refer to these snippets of user-supplied metadata as "tags"

Proposal: It should be possible to attach arbitrary metadata to your identity. Theoretically information could just be posted and searched for, but we don't have searching in any client and that's a bad UX regardless. This is tagging technique commonly used in Discord servers in a limited form even though Discord servers have universal search. In the Raleigh server for example, people can be tagged with their location (Raleigh, Durham, or surrounding areas.) In plenty of servers you're invited to tag yourself with any relevant pronouns. In most servers I've seen you're free to tag yourself with interests and people can often mention that interest (think "Hey @tabletop_enthusiasts. Does anyone want to start a new campaign this weekend?".) The concept of @'ing certain tags is out of scope, but theoretically possible.

Changes required:

  1. Identity nodes must be leaves in the current scheme. I propose that identity nodes should allow children with a maximum depth of 1 consisting of nodes of 2 new types: Tag and DeleteTag.
  2. Currently the forest is immutable. In certain cases (Tag nodes,) it will be possible to delete nodes with the addition of a DeleteTag node.

Rationale for 1: The new nodes and attaching them to the Identity is the only sensible way to handle it, consider:

We could set this information in the normal message tree, however in order to assemble a profile the client would have to replay all of chat history to find the most recent profile info for a user which is completely infeasible if communities can last for years.

We could add metadata to the Identity at creation time, but this would require a new Identity every time the metadata is updated (either a correction, more metadata being added, or metadata being deleted.)

The best solution I can think of is to attach nodes to the Identity of the form (in terms of relevant data) struct Tag {string key, string value} where both are freeform, user-defined text. These Tags are always children of an Identity node and they are always leaf nodes. In order to facilitate deletes and updates we'll also add a DeleteTag of the form struct DeleteTag {messageid id}. These are also hung directly off of an Identity, can have no children, and their messageid must refer at least temporarily to a Tag node on the same Identity. Temporarily because after the DeleteTag is stored on the Identity tree, the tag it refers to should be deleted. Updating a piece of metadata is just a delete and a create. Conceptually all of the nodes are immutable, they never change, but they may no longer exist.

Rationale for 2: Deleting messages rather than creating a new identity on every change or leaving old nodes around but marked as obsolete somehow requires changes to the tree requirements. I think that this particular change is worthwhile. The first is that communities change; what may be safe to reveal to the community right now may not be safe to reveal as the community has grown over the last 5 years. It wouldn't be possible to do this without deleting the data (obviously) or, with the identity metadata idea, deleting Identities to create new ones, which breaks authorship.

The use of a DeleteTag node rather than putting node deletion into Sprout is partially due to a lack of authentication within the Sprout protocol. Even if that were implemented it leaves the problem of deletion propogation: if my work laptop has been off for a month and someone deleted the metadata they set 2 months ago, how does my work laptop learn about the deletions? It introduces state into the protocol which could be kept in the forest. Everyone in the community keeps a permanent record of all of the deletes. If a client receives a delete for a node it doesn't have then it stores the delete and does nothing else, no harm done.

Status
REPORTED
Submitter
~tekk
Assigned to
No-one
Submitted
2 years ago
Updated
2 years ago
Labels
discussion feature forest-go specification

~tekk referenced this from #129 2 years ago

~whereswaldon 2 years ago

I think that the points you raise around keeping the metadata out of the identity are really good. There are legitimate reasons to want to remove information about yourself from your identity node, and we can't destroy identity nodes without actually breaking history.

As to the specific implementation, I like the overall framework. Child "attribute" nodes that can be attached and later destroyed seem like a neat solution to me. It makes the state live within the forest (nice and cross-protocol), and it is eventually-consistent assuming that users ever come online. We can't stop malicious users from modifying their code to ignore the "deleteAttribute" (or whatever we call them) nodes, but we couldn't stop them from screenshotting the data either, so that's no real loss.

The devil (as always) is in the details. Specifically:

  1. What kind of node do we use for this? Do we create a new node type, or reuse the reply node type?
  2. Using one node per key-value pair makes each datum easy to revoke individually, but is tremendously inefficient in terms of the number of nodes and quantity of data created. The signature at the end of each node will dramatically outweigh the actual data in size. However, coalescing attributes into larger nodes complicates the process of destroying those attributes individually.
  3. What does the bootstrapping process look like after this change? Currently we acquire all known communities, then find the leaves of those communities, then find the ancestry of leaves we didn't already know about. With this in place, we'd also need to scan for updates to all known identities. Should we do that before or after the other steps? Does it matter? Is there any way to narrow how many identities need to be checked?
  4. Can we just store the metadata in the TWIG metadata field that already exists? We then don't need to implement/use a new key-value format, and we already have code for manipulating that.

Whatever we do here, I'd like to build a mechanism that we could reuse for community attributes. Communities also change, so it would be useful to be able to set descriptions, grant "permissions", and other things using a similar mechanism.

~whereswaldon referenced this from #165 2 years ago

Register here or Log in to comment, or comment via email.