~whereswaldon/arbor-dev#84: 
Figure out how to implement online/away/dnd status

It is often useful in chat systems to be able to tell which other users are currently online or have been online recently. It'd be awesome for arbor to be able to do that, but how should it be implemented? Arbor's conversation history is designed to be durable, but data like this is transient. Should it still be implemented as nodes in the tree? Should it be a new node type? Should it be in the protocol layer? If so, how do we handle the case where users are connected over different protocols?

Status
REPORTED
Submitter
~whereswaldon
Assigned to
Submitted
9 months ago
Updated
7 days ago
Labels
feature help-wanted specification

~athorp96 9 months ago

I imagine this should have to work as part of the protocol layer. What are your thoughts on augmenting a user ID to the subscribe and unsubscribe messages?

~whereswaldon 9 months ago

Hmm. I think that's an interesting approach. I like the concept, but we need to work out these details:

  • How would the "online-ness" propagate between relays? If I subscribe to my local relay as my identity, and it then subscribes as "me" to the one running on arbor.chat, does arbor.chat then subscribe to every single peer as "me"? Or what exactly does it mean to subscribe as a user?
  • How would we authenticate that it really was the user that it claims to be? How do we stop me from perpetually making you appear to be online. It seems like the only way right now would be to use the pubkey of an identity to sign something that proves online-ness. Thus far, sprout has managed to avoid actually processing PGP data, but perhaps it's inevitable that we need to do that eventually.

Perhaps this is simply blocked on the fact that the sprout protocol currently has no way to authenticate the user. Every relay is anonymous. This definitely needs to change long-term in order to facilitate exchanging non-public messages (like DMs or private groups).

Another approach would be to introduce a new node type (called Temp) for now. Temp nodes are similar to Reply nodes except that they:

  • Have a ExpiresAt field indicating how long they are relevant. After this time, all peers should delete them
  • Maybe don't have a Content field?

To announce that you are online in a community, you create a Temp node that is a child of the community that is valid for the next 5 minutes. In the metadata, you indicate that it's an advertisment that you're "online". After 5 minutes you either make another one (to continue broadcasting your online status), or you start showing as offline to your peers. You can cancel your online status earlier than the ExpiresAt field by writing a new Temp node (perhaps as a child of the one that hasn't expired yet) that indicates you're cancelling your online status in the metadata.

Because the nodes still have to be signed by their author, they'd be authenticated by default. This would require periodic usage of the signing key, which might keep the key's passphrase cached in GPG indefinitely. That could have security implications... Additionally, relays would have to keep a list of things that should be deleted in the future, so it complicates their implementation. Perhaps it's still worth exploring?

To sum up, the two plausible approaches seem to be:

  • extend sprout to carry status info (requires some kind of protocol authentication)
  • extend the forest to support nodes with a short-term lifespan

I'm happy to discuss either in more depth or to talk through any other approaches.

~athorp96 9 months ago

I see the the forest nodes sending a heart beat throughout the network. GPG keys being cached is definitely an issue though. Perhaps the caching time could be enforced in the application, or a certificate could be granted to a client that verifies their identity?

~athorp96 9 months ago

To clarify, the heartbeat makes the most sense to me. Getting around the GPG caching issue could be accomplished if we flushed the cache manually after 5 minutes without sending a message. Alternatively, maybe a certificate could be granted to the client, who uses the certificate to sign the heartbeat?

~whereswaldon 9 months ago

Just checking, when you say "heartbeat" are you referring to extending the Forest to support expiring nodes?

As for the GPG caching; it's probably going to be a thorny problem no matter what we do. Re-typing your (hopefully long) passphrase frequently is a poor UX, and it technically gives an adversary more opportunities to observe you entering your passphrase (depends on your threat model, of course). The converse is that your key stays available in memory and can be used by any local process (or maybe only processes on the same TTY? Not sure).

Regardless, I feel like retyping my passphrase every 5 minutes would get old really fast, but I'm not especially comfortable with the idea of leaving my key unlocked for long periods of time. Other than a periodic cache flush, can you think of other options?

You've mentioned the idea of granting a certificate to "the client"? Could you elaborate on that? In this context, what is the "client"? What would the certificate mean, and how would it be verified? I'm not looking for an engineering spec or anything, but just a more detailed description of your thoughts.

~athorp96 9 months ago

When I say heartbeat, I mean a periodic ping from the client (the instance of the TUI) to the network. The ping serves the sole purpose of telling the network that the user (or client) is still active.

I think we can workaround the cache refresh system by keeping a timer that is reset every time a message is sent. If the timer expires then the GPG cache is dumped. In that way we can continue to sign heartbeat messages without worrying about the cache persisting indefinitely.

A certificate isn't the best word. But if a relay or server asymmetrically encrypted a gpg key and sent it to a client, the client could use that key to sign the heartbeat without refreshing the cache-time of the user's key (so the user's key would not be indefinitely cached).

Now that I think about it, a secondary "heartbeat key" doesn't necessarily need to be issued by the relay or server. So long as it is signed by the user's key, the heartbeat key could be used to verify the heartbeat. This key could be created by the client in code. If it is a session key, it could be created and signed upon starting the TUI. If the key is to persist across sessions, it could be created once and the key's passphrase could be encrypted using the user's gpg key.

~whereswaldon 9 months ago

I really like the idea of creating a session subkey that is used to sign status messages. I think that is an elegant solution that will prevent keeping important keys unlocked indefinitely while also allowing the degree of flexibility that we need.

As for the heartbeat, I'm trying to understand whether you're talking about creating an actual node in the forest data structure (either a new kind of node or an existing node with metadata) or adding something to the protocol layer only.

~athorp96 9 months ago

I see a heartbeat as a protocol layer message sent from the client to the network that is signed by some session key.

~whereswaldon 9 months ago

Okay, cool. That sounds like a neat way to authenticate a session too. We could just derive a session key that is bound to some identity. That session key can only be used to authenticate at the protocol layer, not to sign any nodes. Messages signed by that session key could travel between relays and still be authenticated by virtue of the session key being signed by the identity. I like it.

I think there is a tradeoff here though. Right now, clients like wisteria know absolutely nothing about the protocol (well, technically wisteria has some super-buggy protocol support, but we don't use it). Implementing this would require either making every client implement the protocol directly (which complicates implementing new clients) or defining some new way for this information to be broadcast to a running client using the existing convention of watching the file system.

Perhaps we could broadcast this using a special directory in the grove (or somewhere) for session data. We could write the signed annoucements there, and clients could pick up on that and choose how they handled those files on a client-by-client basis. We'd just need to establish some conventions about the semantics of different directories within a grove.

Thoughts?

~athorp96 9 months ago

When you say a special directory in a node, do you mean a root node in the network?

~whereswaldon 9 months ago

I mean a special "sessions" folder (or something) in the Grove hierarchy where forest nodes are stored.

When you say "root node in the network" do you mean like a root node in the forest (identity/community) or a peer in the network of relays)

~athorp96 9 months ago

Yeah, I think I've been using the wrong nomenclature. I believe I have been using "network" to describe the network of relays and clients, as well as the forest. In this context I was referring to the a separate community in the forest, but perhaps it would be best to keep them in a physically different location on disk.

~whereswaldon 9 months ago

Yeah, I think maybe we've been talking past one another because of some nomenclature things. I'm working on some documentation that will hopefully help us all use the same words to refer to the same ideas in the future, but it's not quite done.

Earlier, you said:

I see a heartbeat as a protocol layer message sent from the client to the network that is signed by some session key.

So I may have misinterpreted you here. I read "protocol layer message" as "a piece of data only present with the sprout protocol", i.e. something that does not exist within the Forest data structure. I can see an argument for doing it this way, as it prevents temporary data from existing within the forest (and the rest of the forest is permanent by design). Under this implementation, your local client or relay would derive a special session key signed by your identity key and use that to send sprout-layer announcements about your status that could be verified by peers. However, such information would only exist within sprout, so clients using other protocols or relying on changes in the on-disk forest data wouldn't be able to detect the updates to status. That's why I was proposing adding a directory in the standard on-disk format (the grove) that held this data instead of normal forest nodes.

As an alternative approach, we could create actual nodes in the forest that advertised this data. They would need to be temporary in nature (otherwise they would rapidly fill your hard disk), but they could be exchanged using the same mechanisms that we already have (and they could be validated by the same mechanisms as well). Doing it this way would have the advantage of not modifying the protocol to accommodate the new kind of Forest node. Since they would be proper nodes, clients that didn't speak sprout could still discover them on disk in the grove using the mechanisms that already exist.

I think the first option shows and interesting mechanism for generally adding authentication to the sprout protocol and using it to exchange realtime status info. The second option requires less systemic change and seems to have broader compatibility (doesn't require everyone to use sprout).

I hope that I described those well enough for them to make sense (writing is hard). Please let me know if I was too ambiguous. If the difference and tradeoffs are clear to you, what do you think about the two approaches?

~athorp96 9 months ago

That makes everything a lot clearer. Thanks for taking the time to go through it all. I don't believe I understood what you meant by node before; I thought thought when you said node you were referring to a peer, client, or relay (I am not a smart man). The docs and your reply cleared things up for me, I believe.

I think sending a new node to the forest would be ideal since, like you said, it would involve the least change. Perhaps these nodes could contain the status of an identity or most recent time of being signed. One way to do this could be to have a mutable node in the grove that is contains my identity hash and an active status. Upon starting my client, it changes thee status to "active" and re-sends the node to the grove. Upon exiting, the client changes the status to "inactive" and re-sends the node to the grove. This however would require mutable nodes, which I am not sure are possible at the moment.

~whereswaldon 9 months ago

Mutable nodes are probably impossible (they can't have a stable hash as their ID), but temporary nodes are (I think) effectively the same thing. If we had nodes that were:

  1. Unable to be replied to, and
  2. Automatically invalidated (and destroyed) at some deadline

then we could use those nodes to broadcast a pesudo-live status. It's important that you can't reply to a node like this because otherwise the ancestry of some node would automatically break when its parent was automatically destroyed.

~whereswaldon 9 months ago*

I was a little rushed when I wrote that earlier. To elaborate, we can use a stream of short-lived status nodes instead of a single mutable one. We could also think about adding a construct analagous to a branch pointer in git or IPNS that we could use to add a movable human-readable name to a specific node. Then /status could be updated constantly to point to the latest version of a user's status. However, that adds a lot of complexity that I don't think we need right now.

For the short term, I propose:

  1. Either:

    • a. create a new kind of node in the forest that is temporary, or
    • b. establish a form of node metadata that indicates that the node can be discarded at some future date
  2. Use the nodes that 1 enables to release a constant stream of short-lived (Perhaps 5min as a first default) status nodes when a user is in wisteria. When a user wants to disconnect, publish a short-lived node indicating that the previous status message is obsolete early.

  3. Consider creating some kind of key hierarchy to enable signing these status messages with a less-privileged key. Right now, identity nodes cannot have any child nodes, but perhaps there's room here for another kind of node (sub-identity?) that could be a child of an identity and be used to sign less-important automatic data like user status. Factors to consider:

    • a. Right now, the fact that we know identities don't exist in a hierarchy means that we know that it will only take 1 node to validate the author of any random node. We can estimate the worst-case number of nodes needed to validate any forest node by using it's TreeDepth*2 (assuming every node in its ancestry was created by a unique author). If identities existed within their own hierarchy, we could need n nodes just to validate the author of a node. It becomes challenging to evaluate the worst-case number of nodes required to validate a new node. I think this property is important because it allows clients and relays to evaluate the amount of computational effort required to validate a new node. In the future, overtaxed relays may use this information to decline propagating a node that will require tremendous effort to validate, which can help close off a DoS avenue.
    • b. Also, creating an identity hierarchy would either require a new node type or we'd need to change the rules so that an identity can be a child of another identity.
    • c. Another way to do this would be to put a less-privileged key into the metadata of an identity node. This would allow the single node to carry all of the necessary data, though it does require a huge number of metadata bytes.

Do any of the above approaches seem especially good or bad?

~athorp96 8 months ago

Addressing proposition 1, I think B would be powerful. The ability to mark a node as obsolete could have a lot of use cases, and the way a client may handle that metadata could be configurable. You could use it to change identities, update messages using diffs, as well as maintain communities of metadata like active status.

~whereswaldon 8 months ago

~athorp96 We talked about this outside of this ticket a little, and you elaborated more on the idea of maintaining communities purely for metadata. Essentially, my understanding of the proposal was to have either a community for all user metadata or a special metadata community corresponding to each existing community. These metadata communities would only contain temporary Reply-style nodes that expired rapidly to indicate user statuses.

I think that approach may be worth exploring, but I wonder whether adding the extra community is really strictly necessary. Would you be open to trying out adding status replies to the existing community? It seems like the approach that we can try the fastest. Implementing it requires:

  • specifying a node data extension in the metadata field that adds an expiry time
  • specifying a node data extension that forbids generating replies to a node
  • specifying a node data extension that advertizes online status
  • teaching clients like wisteria how to interpret these status messages
  • teaching clients how to generate these status messages

Such an approach leaves these problems unsolved:

  • your gpg keyring for your primary identity would still be used to sign those messages. Future work on less-priviledged keys could address this.
  • the sprout protocol itself is still unauthenticated. Future work (perhaps involving session keys, tbd) could fix this.

Thoughts?

~athorp96 8 months ago

I can work on this some time. I just started a new job so time might be sparse but I can get started on it.

~athorp96 a month ago

For posterity, this was the result of a recent discussion on how this may be implemented using a hypothetical active-status API: https://pad.nixnet.services/HA2bSGjmSKyUaXu4T1jwuA?both

~athorp96 9 days ago

~whereswaldon, I've been thinking about the best way to remove the nodes from old status messages and it occurred to me: should we modify the relay to not transmit expired messages? Consider a user being online for an hour, accumulating a number of status messages in the forest. At any point in time, only one node in the set should have any meaning; it's of no use to transmit messages that are not relevant, right? This solves only part of the problem; when expired nodes are evicted client-side, they don't get infinitely re-sent to the client as they'll have already expired.

We will still need to extend some interface to allow for the deletion of nodes from an archive. I'm still not sure where the best place for that it, yet.

~whereswaldon 7 days ago

~athorp96: Yes, we should modify the relay not to transmit expired messages. As you correctly note, this only solves half of the problem, but it's a start.

I think we need to extend the forest.Store interface to incorporate a Remove(id *fields.QualifiedHash) error. We will then need to implement that on many implementations of the Store interface, though everything but the Grove type should be trivial.

~whereswaldon 7 days ago

I should mention that removing a node should also remove all of its descendants, since it becomes impossible to validate them without it.

Register here or Log in to comment, or comment via email.