~tsileo/microblog.pub#148: 
image gets broken when downloaded through media proxy

Try "Lookup" of this note using your instance of microblog.pub:

https://mstdn.social/@ClubTeleMatique/110594760748689278

This note has a link to ploum.net article, and when showing this note, microblog.pub tries to render a "card" (sorry I don't know the correct term) of the linked page with page title, hostname, and an image.

On my instance, this image is broken. Is it broken on your instance, too?

Inspecting details, I see that URL of original image is:

https://ploum.net/files/metadeath.jpg

Comparing that image to the one received via microblog.pub media proxy, I see that microblog.pub added four bytes "2b c0 0e 0b" in the beginning, and one extra "03" byte at the end. Not sure if this detail will tell you anything :)

Trying to curl -vvv this file from ploum.net, I see these lines in output, among others:

  • ALPN: offers h2,http/1.1
  • ALPN: server accepted http/1.1

So it seems that server doesn't support http/2. So it's probably not http/2 issue. Any thoughts what can it be?

Status
REPORTED
Submitter
Alexey Shpakovsky
Assigned to
No-one
Submitted
1 year, 5 months ago
Updated
1 year, 5 months ago
Labels
No labels applied.

~aonrud 1 year, 5 months ago*

Just to confirm, I can replicate the issue by posting that link (https://ploum.net/2023-06-23-how-to-kill-decentralised-networks.html). I see this problem on my instance occasionally too. To confirm if the problem is consistent, does the preview work if you post this link (https://www.bluecollarwriter.com/home/news062323)?

I tried comparing the identify output of a few working images with broken ones, and the only obvious difference is Units: undefined and the absence of a resolution and print_size field as a result. Not sure it's relevant, since those are all for print values, but I tried 3 working and 3 broken images, and it is consistent.

Alexey Shpakovsky 1 year, 5 months ago · edit

Thanks for confirmation, ~aonrud!

However, I can't confirm this issue with the link you've posted.

Posting your link shows a preview with a working image. You can see it at this URL: https://shpakovsky.ru/@alexey/proxy/media/19543/6Wtm1mBwqB-5NEbBhpxT0rek5gh1s4Ss5o9OA53K7i4=/aHR0cHM6Ly9saDQuZ29vZ2xldXNlcmNvbnRlbnQuY29tLzJyaUZIbi0tQmlJSEpYc2F6RmtaZThnalpCN1dxaGFTbjBLQVJFTVFkY2V1dm1ycTdTLTRoWnUtcUQ1NnM1aGVleWlYTGo1dkRiU0dEbElnMzBLYXBLdz13MTYzODM= (it's cached by nginx for about 1 year, so link should work). It seems to be coming from this URL: https://lh4.googleusercontent.com/2riFHn--BiIHJXsazFkZe8gjZB7WqhaSn0KAREMQdceuvmrq7S-4hZu-qD56s5heeyiXLj5vDbSGDlIg30KapKw=w16383

  • but it shows "Error 403 (Forbidden)!!1" page from Google when opening this URL with a browser and curl, both from my home machine and from VPS where my microblog instance is running. Strange.

Regarding running identify on broken files - in my case, running identify broken.jpg prints this:

identify: Not a JPEG file: starts with 0x2b 0xc0 `broken.jpg' @ error/jpeg.c/JPEGErrorHandler/348.

~aonrud: can you compare output of hexdump -C ... | head for a working image (downloaded by wget or a browser) and a broken one (downloaded through microblog.pub proxy)? In my case it was pretty obvious that microblog.pub proxy added few extra chars, and running this command:

tail -c +5 broken.jpg | head -c -1 >fixed.jpg

fixed it (verified by both opening it and comparing md5sums). Can you check if it fixes the image for you, too?

Also, I forgot to mention that it happens only when "simply" downloading images. When microblog.pub tries to resize the image - I could see the image properly (resized). Try adding /50 to your URLs - will the image appear?

~aonrud 1 year, 5 months ago

Thanks Alexey. You're right - I've conflated two different problems. My example link works fine for me now as well, so with the 403 errors, it was probably caused by the target server.

Regarding your example, I can confirm the same result. Removing the extra bytes corrects the image. I tried the resized version and that also works. By default it converts to webp, but if I remove webp from the request header, it sends a working jpeg.

I'm not hugely familiar with the codebase, but it looks like the difference between the serve_proxy_media() function and serve_proxy_media_resized() is the use of streaming. The difference is using the streamed response from httpx.AsyncClient(), so I wonder if that's where something is going wrong?

Alexey Shpakovsky 1 year, 5 months ago · edit

Thanks for confirmation, ~aonrud!

BTW, I found the issue.

The issue is that when sending request to the target server, microblog.pub strips some of headers sent by browser, like host, cookie, and user-agent. But NOT "Accept-Encoding" header, which tells remote server which compression methods the browser accepts - it happens in function _proxy_get in file app/main.py. But when microblog.pub receives an answer - it strips all headers, except few whitelisted ones, and "Content-Encoding" header is NOT among them - see end of serve_proxy_media function in same file.

Hence, server receives "Accept-Encoding" header (in my case, it's "gzip, deflate, br"), and decides to compress the image using one of them (in my case, it's "br"). Browser receives data which is actually compressed, but it doesn't know that it's actually compressed (due to lack of "Content-Encoding" header) so doesn't try to decompress it, so what we get is a broken image.

Solution is to add "content-encoding" header to the list of headers sent back to browser at the end of serve_proxy_media function.

It should also be possible to add b"accept-encoding" to the list of headers which are NOT sent to the remote server, but for some reason this didn't work for me.

~tsileo 1 year, 5 months ago

Hey there!

Good catch! I just pushed a fix to add content-encoding to the allow list: https://it.sr.ht/~tsileo/microblog.pub/commit/a5290af5c803cdd57ef30390c45e0a2e2eb27211.

Thanks!

Register here or Log in to comment, or comment via email.