We should support channel casemapping. That probably means canonicalizing channel names somehow.
Example where this causes issues:
Also causes issues with history, where echo-messages are stored in
#channeland the rest are stored in
2020-08-24 14:17:23 emersion i'm writing a modern irc client and i don't want to go out of my way to support case-folding 2020-08-24 14:17:39 emersion would it be reasonable to support a single hard-coded casefolding algorithm? 2020-08-24 14:18:26 @jwheare case folding for what? 2020-08-24 14:18:34 emersion channel names and nicknames 2020-08-24 14:19:03 @jwheare you don't need to. casemapping rfc or ascii doesn't involve unicode case folding 2020-08-24 14:19:55 emersion well, it'd be simpler for me to implement just unicode case-folding, and nothing else 2020-08-24 14:20:11 @jwheare ircds don't 2020-08-24 14:20:46 @jwheare you would potentially be preventing your users from accessing certain channel 2020-08-24 14:22:10 emersion hm, how so? 2020-08-24 14:22:28 emersion i'm starting to wonder whether case-mapping is worthwhile to implement at all 2020-08-24 14:24:28 @jwheare casemapping is necessary 2020-08-24 14:24:38 emersion i'm not clear on what "implement case-mapping" means it seems. i was under the impression that clients need to not case-map channel names when sending them to the server, but need to perform case-mapping when receiving state from servers 2020-08-24 14:24:40 hhirtz how so? because if you consider "#Σ-sigma" and "#σ-sigma", the users won't be able to join one when they've joined the other. 2020-08-24 14:24:48 hhirtz to be the same* 2020-08-24 14:24:57 emersion ah, right 2020-08-24 14:25:13 emersion why is case-mapping necessary? 2020-08-24 14:25:29 <-- eta (~eta@trainsplorer/developer/eta) has quit (Quit: we're here, we're queer, connection reset by peer) 2020-08-24 14:25:31 jess because servers do it so clients also have to 2020-08-24 14:26:14 emersion without case-mapping, all a client needs is that servers only use one form of the case-mapped channel name 2020-08-24 14:26:42 emersion well, kind of. my client doesn't need to know when a JOIN command succeeds 2020-08-24 14:27:25 <-- clokep (~Thunderbi@unaffiliated/clokep) has quit (Read error: Connection reset by peer) 2020-08-24 14:27:39 --> clokep (~Thunderbi@unaffiliated/clokep) has joined #ircv3 2020-08-24 14:27:51 jess so if you join #asd} and im also in there, and i send a message to #asd], some ircds will propagate the message but not convert the casing to what you're expecting 2020-08-24 14:28:24 jess so you'll miss the message 2020-08-24 14:29:09 <-- clokep (~Thunderbi@unaffiliated/clokep) has quit (Read error: Connection reset by peer) 2020-08-24 14:29:40 --> clokep (~Thunderbi@unaffiliated/clokep) has joined #ircv3 2020-08-24 14:30:03 emersion right 2020-08-24 14:30:57 jess not knowing that two nicks are equal or two channel names are equal when they are as far as the ircd is concerned can lead to odd desynchronization 2020-08-24 14:32:12 jess or if you're doing a wider fold than the ircd is, you might think two nicks are the same when they are not 2020-08-24 14:32:47 jess which would only be an issue on an ircd that permits nicknames outside of ascii or rfc1459 folding 2020-08-24 14:33:17 * hhirtz thought rfc1459 mapping was not used anymore, then he looked at freenode's 005 2020-08-24 14:33:37 emersion all right, i guess we'll need to implement the full thing then 2020-08-24 14:33:51 @jwheare it's basic 2020-08-24 14:33:54 emersion well 2020-08-24 14:33:56 @jwheare far easier than unicode case folding 2020-08-24 14:33:57 xPaw https://github.com/matrix-org/matrix-appservice-irc/blob/dcf572772a91e9b2e6f09cf36dee56b03defd276/src/irc/formatting.ts#L332 is thsi enough? 2020-08-24 14:34:00 emersion it's basic but is intrusive 2020-08-24 14:34:19 jess imagine you message jãke with super secret info because you thought jÃke and jãke are the same, but you meant to message jÃake 2020-08-24 14:34:20 jess uhh 2020-08-24 14:34:23 jess jÃke 2020-08-24 14:34:24 @jwheare yeah you need to litter some normalise_irc_case around the place 2020-08-24 14:34:27 emersion when you're using a modern lang, unicode case-folding is one strings.ToLower away 2020-08-24 14:34:53 @jwheare well just swap that strings.ToLower with your custom function 2020-08-24 14:35:15 emersion yeah, make it per-network, and store the case-mapped channel name in the DB too, etc 2020-08-24 14:35:20 jess I've gold server.casefold() where server is an instance of the server that knows what the 005 is 2020-08-24 14:35:28 jess uh 2020-08-24 14:35:32 jess s/gold/got/ 2020-08-24 14:36:16 emersion which is why a "strict"/"no-op" case-mapping would make everything simpler 2020-08-24 14:36:25 hhirtz xPaw: .toLowerCase() performs UTF-8 casefolding 2020-08-24 14:36:31 @jwheare storing in the db normalised you have to do anyway. but yeah you need to know the casemapping per server and pass that info around 2020-08-24 14:36:46 jess you could mangle any line to preemptively fold params you think should be folded 2020-08-24 14:36:50 xPaw hhirtz, and thats a problem? 2020-08-24 14:37:05 jess and then by the time the line has left the context of a server instance, it's already folded 2020-08-24 14:37:14 emersion ah, and you don't want to case-fold *everything* 2020-08-24 14:37:26 hhirtz yep, as said previously: you'll consider different nicks to be the same 2020-08-24 14:37:41 emersion if a message comes from #MySuperChannel, you don't want to display #mysuperchannel 2020-08-24 14:37:50 jess xPaw: À isn't à in rfc1459/ascii 2020-08-24 14:38:21 xPaw what server allows these chars, and doesn't casefold them? 2020-08-24 14:38:25 emersion so having a big "filter" that case-maps every message received doesn't work 2020-08-24 14:38:32 jess emersion: sure, my session parsing creates a Channel object when i join somewhere, and it has a .name with the correct folding, but everything else deals with the folded version 2020-08-24 14:38:41 jess xPaw: unrealircd can do it 2020-08-24 14:38:47 emersion yup 2020-08-24 14:38:48 jess oragono too 2020-08-24 14:38:57 xPaw so it allows unicode, but still only lowercases ascii? 2020-08-24 14:39:00 jess uhh correct casing* 2020-08-24 14:39:03 jess xPaw: yes 2020-08-24 14:39:06 xPaw irc pls... 2020-08-24 14:39:23 jess clients only know about rfc1459 and ascii casemaps so anything else would case problems 2020-08-24 14:39:25 @jwheare there is no unicode casemapping. doing anything else would break clients 2020-08-24 14:39:38 jess would cause* fuck sake 2020-08-24 14:39:50 xPaw anyone have a JS implementation? 2020-08-24 14:40:25 jess I've got a clever solution in python that might give you ideas 2020-08-24 14:40:53 xPaw and what do you default it to, if no casemapping? 2020-08-24 14:41:09 jess default to rfc1459 2020-08-24 14:41:30 jess here's the fold algo https://github.com/jesopo/ircstates/blob/ca9abfc34b78b12e34bacd5515b5cae64d690dd5/ircstates/casemap.py#L9 2020-08-24 14:41:43 emersion interesting 2020-08-24 14:41:50 xPaw no strict-rfc? 2020-08-24 14:42:01 jess nah 2020-08-24 14:42:15 jess no one supports it anyway. someone tried to switch a server to it and caused problems 2020-08-24 14:42:26 jess think it was inspircd's Atilla 2020-08-24 14:42:58 jess bitbot supports it but I've never seen it 2020-08-24 14:43:51 --> thomasross (~email@example.com) has joined #ircv3 2020-08-24 14:46:20 @jwheare think i've seen it 2020-08-24 14:46:30 @jwheare probably on ngircd 2020-08-24 14:46:46 jess eww 2020-08-24 14:47:15 jess ngircd seems to use ascii these days 2020-08-24 14:47:27 jess source: irc.w3.org 2020-08-24 14:47:46 @jwheare yeah i can't find any now actually 2020-08-24 14:48:52 @jwheare https://stats.ircdocs.horse/isupport/#token-casemapping 2020-08-24 14:48:54 BitBot [Title] IRC RPL_ISUPPORT Statistics 2020-08-24 14:48:57 emersion do we need to handle the case where an irc server changes case-mapping? 2020-08-24 14:49:20 jess charybdis has been talking about this recently 2020-08-24 14:49:36 jess how would you switch freenode from rfc1459 to ascii without stopping the whole network 2020-08-24 14:50:35 hhirtz CASEMAPPING=none should be the only casemapping 2020-08-24 14:50:50 jess do you mean changing casemapping while still connected 2020-08-24 14:50:55 jess or changing between connections 2020-08-24 14:51:03 xPaw so realistically, which strings do you apply casemapping to? nicks and chan names? 2020-08-24 14:51:04 emersion changing between connections 2020-08-24 14:51:09 hhirtz oh yeah, it can change while still connected 2020-08-24 14:51:10 jess oh, yes 2020-08-24 14:51:15 emersion fun fun fun 2020-08-24 14:51:16 jess you need to handle that 2020-08-24 14:51:36 jess it shouldn't need a lot of handling i think? 2020-08-24 14:51:40 emersion well 2020-08-24 14:51:46 jess hhirtz: yes, nicks and channel names 2020-08-24 14:51:47 emersion when you have a DB with channel names in it 2020-08-24 14:52:16 emersion maybe i can get away with not storing the case-mapped channel name in the DB, i'll have to see 2020-08-24 14:52:56 jess god i hate irc 2020-08-24 14:53:36 --> eta (~eta@trainsplorer/developer/eta) has joined #ircv3 2020-08-24 14:56:29 jess servers changing casemapping is typically rare but inspircd will apparently soon be switching 2020-08-24 14:57:30 xPaw where did ^~ casefold come from? its not in rfc1459 2020-08-24 14:57:40 xPaw despite the casemapping suggesting it is... 2020-08-24 14:57:50 jess 2812 2020-08-24 14:58:03 jess which is why strict exists 2020-08-24 14:58:15 e it's always been there 2020-08-24 14:58:26 xPaw its not in 1459, e 2020-08-24 14:58:44 e 1459 is an inaccurate description of reality at the time 2020-08-24 14:59:37 jess rfc2812 is also inaccurate because it has ~ and ^ the wrong way around but that doesn't cause technical issues