URLs with encoded umlauts don't work

Timo Ollech
Assigned to
7 months ago
6 months ago
No labels applied.

~edwardloveall 7 months ago

Thanks Timo, great find! I'll see what I can do to fix this. I'm doing this in my free time so I'm not sure when it will be fixed.

Timo Ollech 6 months ago · edit

Am 19.01.22 um 14:04 schrieb ~edwardloveall:

Also possibly related

Yep, it's the same URL.



~edwardloveall 6 months ago

Whoops! That's what I get for trying to triage when I get up in the morning. Good eye, Timo.

~nikos 6 months ago*

It does not seem to be because of umlauts in the URL, this one doesn't have any umlauts, but produces the same error: https://lilithwittmann.medium.com/bundesservice-telekommunikation-enttarnt-dieser-geheimdienst-steckt-dahinter-cd2e2753d7ca

It is the same author however.


~edwardloveall REPORTED FIXED 6 months ago

This is now fixed and deployed as of commit 7d0bc37 version 2022-01-30

The bug actually had nothing to do with the URL but everything to do with how Medium calculates offsets for where to mark up text like bold or a link. Medium uses UTF-16 character offsets (likely to make it easier to parse in JavaScript) but Crystal uses UTF-8. Converting strings to UTF-16 to do offset calculation then back to UTF-8 fixes this.

The reason this author had the problem is they put a block of text near the end of their post describing (I think) how to support them. It included two 💸 emoji, and the very end of their text had a different style than the rest. When crystal went to grab that piece of the string, the emoji pushed the indexes out beyond the end of where it thought the text was and then it crashed.

More info in the commit log here.

Thanks for the report!

Register here or Log in to comment, or comment via email.