Try "Lookup" of this note using your instance of microblog.pub:
https://mstdn.social/@ClubTeleMatique/110594760748689278
This note has a link to ploum.net article, and when showing this note, microblog.pub tries to render a "card" (sorry I don't know the correct term) of the linked page with page title, hostname, and an image.
On my instance, this image is broken. Is it broken on your instance, too?
Inspecting details, I see that URL of original image is:
https://ploum.net/files/metadeath.jpg
Comparing that image to the one received via microblog.pub media proxy, I see that microblog.pub added four bytes "2b c0 0e 0b" in the beginning, and one extra "03" byte at the end. Not sure if this detail will tell you anything :)
Trying to curl -vvv this file from ploum.net, I see these lines in output, among others:
So it seems that server doesn't support http/2. So it's probably not http/2 issue. Any thoughts what can it be?
Just to confirm, I can replicate the issue by posting that link (https://ploum.net/2023-06-23-how-to-kill-decentralised-networks.html). I see this problem on my instance occasionally too. To confirm if the problem is consistent, does the preview work if you post this link (https://www.bluecollarwriter.com/home/news062323)?
I tried comparing the
identify
output of a few working images with broken ones, and the only obvious difference isUnits: undefined
and the absence of a resolution and print_size field as a result. Not sure it's relevant, since those are all for print values, but I tried 3 working and 3 broken images, and it is consistent.
Thanks for confirmation, ~aonrud!
However, I can't confirm this issue with the link you've posted.
Posting your link shows a preview with a working image. You can see it at this URL: https://shpakovsky.ru/@alexey/proxy/media/19543/6Wtm1mBwqB-5NEbBhpxT0rek5gh1s4Ss5o9OA53K7i4=/aHR0cHM6Ly9saDQuZ29vZ2xldXNlcmNvbnRlbnQuY29tLzJyaUZIbi0tQmlJSEpYc2F6RmtaZThnalpCN1dxaGFTbjBLQVJFTVFkY2V1dm1ycTdTLTRoWnUtcUQ1NnM1aGVleWlYTGo1dkRiU0dEbElnMzBLYXBLdz13MTYzODM= (it's cached by nginx for about 1 year, so link should work). It seems to be coming from this URL: https://lh4.googleusercontent.com/2riFHn--BiIHJXsazFkZe8gjZB7WqhaSn0KAREMQdceuvmrq7S-4hZu-qD56s5heeyiXLj5vDbSGDlIg30KapKw=w16383
- but it shows "Error 403 (Forbidden)!!1" page from Google when opening this URL with a browser and curl, both from my home machine and from VPS where my microblog instance is running. Strange.
Regarding running
identify
on broken files - in my case, runningidentify broken.jpg
prints this:identify: Not a JPEG file: starts with 0x2b 0xc0 `broken.jpg' @ error/jpeg.c/JPEGErrorHandler/348.
~aonrud: can you compare output of
hexdump -C ... | head
for a working image (downloaded by wget or a browser) and a broken one (downloaded through microblog.pub proxy)? In my case it was pretty obvious that microblog.pub proxy added few extra chars, and running this command:tail -c +5 broken.jpg | head -c -1 >fixed.jpg
fixed it (verified by both opening it and comparing md5sums). Can you check if it fixes the image for you, too?
Also, I forgot to mention that it happens only when "simply" downloading images. When microblog.pub tries to resize the image - I could see the image properly (resized). Try adding /50 to your URLs - will the image appear?
Thanks Alexey. You're right - I've conflated two different problems. My example link works fine for me now as well, so with the 403 errors, it was probably caused by the target server.
Regarding your example, I can confirm the same result. Removing the extra bytes corrects the image. I tried the resized version and that also works. By default it converts to webp, but if I remove webp from the request header, it sends a working jpeg.
I'm not hugely familiar with the codebase, but it looks like the difference between the
serve_proxy_media()
function andserve_proxy_media_resized()
is the use of streaming. The difference is using the streamed response from httpx.AsyncClient(), so I wonder if that's where something is going wrong?
Thanks for confirmation, ~aonrud!
BTW, I found the issue.
The issue is that when sending request to the target server, microblog.pub strips some of headers sent by browser, like host, cookie, and user-agent. But NOT "Accept-Encoding" header, which tells remote server which compression methods the browser accepts - it happens in function
_proxy_get
in file app/main.py. But when microblog.pub receives an answer - it strips all headers, except few whitelisted ones, and "Content-Encoding" header is NOT among them - see end ofserve_proxy_media
function in same file.Hence, server receives "Accept-Encoding" header (in my case, it's "gzip, deflate, br"), and decides to compress the image using one of them (in my case, it's "br"). Browser receives data which is actually compressed, but it doesn't know that it's actually compressed (due to lack of "Content-Encoding" header) so doesn't try to decompress it, so what we get is a broken image.
Solution is to add "content-encoding" header to the list of headers sent back to browser at the end of
serve_proxy_media
function.It should also be possible to add b"accept-encoding" to the list of headers which are NOT sent to the remote server, but for some reason this didn't work for me.
Hey there!
Good catch! I just pushed a fix to add
content-encoding
to the allow list: https://it.sr.ht/~tsileo/microblog.pub/commit/a5290af5c803cdd57ef30390c45e0a2e2eb27211.Thanks!