It's still not clear whether we should go for server-side or client-side generated previews. Both have their benefits and drawbacks in terms of privacy.
- Sender-side: needs standardization and client-tag support on the IRCd side. Sender can send fake previews.
- Server-side: all URLs get sent to the server (unless we do the same as Signal, but this has abuse potential). Can be used by the sender to track when a user sees a message.
- Client-side: leaks the client IP, can be used by the sender to track when a user sees a message.
In any case, I think it's fine to do experiment as long as all of this functionality is opt-in, gated behind a user setting.
If we go for client-side, we'll need an HTML tokenizer. It doesn't seem like there's a good one we could use (the standard one isn't exposed and not sure it's good for our use-case). Shouldn't be too hard to port the Go tokenizer.
package:htmlhas a lenient HTML parser, so we can use it to parse a truncated HTML document. It doesn't seem like there is a way to abort an HTTP request though.
Also ref OpenGraph.