~tsileo/microblog.pub#104: 
Nearly all outgoing failing since Sunday evening

For the past couple days (since about 6pm Sunday America/Eastern time), nearly every outgoing activity is failing with this error:

File "/home/drj/.cache/pypoetry/virtualenvs/microblogpub-Vpd
T5UFA-py3.10/lib/python3.10/site-packages/pyld/jsonld.py", line 1130, in normalize raise JsonLdError( pyld.jsonld.JsonLdError: ('Could not convert input to RDF dataset before normalization.',) Type: jsonld.NormalizeError Cause: ('Could not expand input before serialization to RDF.',) Type: jsonld.RdfError Cause: ('Dereferencing a URL did not result in a valid JSON-LD object. Possible causes are an inaccessible URL perhaps due to a same-origin policy (ensure the server uses CORS if you are using client-side JavaScript), too many redirects, a non-JSON response, or more than one HTTP Link Header was provided for a remote context.',) Type: jsonld.InvalidUrl Code: loading remote context failed Details: {'url': 'https://w3id.org/identity/v1', 'cause': JsonLdError('Could not retrieve a JSON-LD document from the URL.')}

It would seem that signing the activity uses a context from https://w3id.org/identity/v1, which became unreachable around that time. Being unable to fetch the JSON-LD, resolving that context fails.

The error is coming from the old library, though. It seems the only solution might be to not use that URL for the context.

Status
RESOLVED FIXED
Submitter
Dan Jones
Assigned to
No-one
Submitted
2 months ago
Updated
a month ago
Labels
No labels applied.

~aonrud 2 months ago

Same problem here. Does _options_hash need to depend on that context URI, or is there an obvious alternative? I might have misunderstood, but I thought the context URI functioned like a namespace ID rather than requiring an active page?

Anyway, sorry I can't add more, except to confirm the issue, though presumably it affects all installs.

Rodrigo Ghedin 2 months ago · edit

What does it mean? I just installed a microblog instance earlier today and noticed massive delays in delivering posts do followers — one of them took four hours to reach followers’ timelines. Is this issue related?

~dzshy 2 months ago

I have the same problem. And I changed the url to a git repo mirror on GitHub. It works for me.

https://lists.sr.ht/~tsileo/microblog.pub-devel/patches/38129

Rodrigo Ghedin 2 months ago · edit

@~dzshy I changed the URL as suggested, but the delay to propagate posts to another instances is still present.

Have you noticed if this was fixed with your patch?

Dan Jones 2 months ago · edit

Rodrigo,

After I applied the patch, most of my outgoing activities had already errored out. I had to tweak the database to get them to retry again.

I ran the following SQL against the database:

UPDATE outgoing_activity SET is_errored = 0, tries = 10, next_retry = '2023-01-01 01:01:01' WHERE is_errored = 1 AND tries > 16;

That should put the ones that failed and dropped out of the queue back into the queue.

If you get a "database locked" error, bring down microblog and then try again.

"Rodrigo Ghedin" outgoing@sr.ht writes:

@~dzshy I changed the URL as suggested, but the delay to propagate posts to another instances is still present.

Have you noticed if this was fixed with your patch?

-- View on the web: https://todo.sr.ht/~tsileo/microblog.pub/104#event-220785

Dan Jones referenced this from #104 2 months ago

Rodrigo Ghedin 2 months ago · edit

Thanks, Dan!

I tried to run this SQL command, but apparently it haven’t fix the delays :/

Don’t know if this could be any helpful, but this is my instance: https://social.manualdousuario.net https://social.manualdousuario.net/

Dan Jones referenced this from #104 2 months ago

Dan Jones 2 months ago · edit

Sorry, that SQL should have been:

UPDATE outgoing_activity SET is_errored = 0, tries = 10, next_retry = '2023-01-01 01:01:01' WHERE is_errored = 1 AND tries >= 16;

I put >, when it should've been >=. You could probably leave off the AND tries >= 16 entirely.

You might want to check the output from your outgoing service to see if there's anything in there that could tell you what's going on.

I should not that this could put a lot of stuff back in the outgoing queue, and depending on how much is in there, it might take a while for it all to go out. For me, I tried to avoid posting stuff after I noticed the bug, and it took about 15 minutes for it to catch up and send everything out.

If you have a lot of followers, and posted a lot during the time it wasn't working, it might take a while for it to catch up.

If your problem was only delays, it's probably unrelated to this bug, though.

"Rodrigo Ghedin" outgoing@sr.ht writes:

Thanks, Dan!

I tried to run this SQL command, but apparently it haven’t fix the delays :/

Don’t know if this could be any helpful, but this is my instance: https://social.manualdousuario.net https://social.manualdousuario.net/

-- View on the web: https://todo.sr.ht/~tsileo/microblog.pub/104#event-220834

Rodrigo Ghedin 2 months ago · edit

Nice, I’ll try it.

My instance has 3,2k followers, and I found the same error you posted above, in this issue.

I’ll let it untouched for a while, to see if this actions have some effect.

Thanks, Dan!

Rodrigo Ghedin 2 months ago · edit

Unfortunately, the issue remain.

I left my instance for ~12h without any interactions. A few minutes ago, I replied and liked a post from another instance, and these interaction haven’t appeared there, plus error log ballooned.

Here’s the log, in case anyone can take a look to help me find out what’s wrong (warning: 20 MB file): https://arquivos.ghed.in/outgoing.log

Thanks in advance!

~aonrud 2 months ago

That log suggests you haven't changed the URL that's causing the original issue. Did you edit app/ldsig.py? If you're using Docker you may need to rebuild the image as well.

~dzshy 2 months ago

@Rodrigo Ghedin

I have noticed this in your log:

Details: {'url': 'https://w3id.org/identity/v1', 'cause': JsonLdError('Could not retrieve a JSON-LD document from the URL.')} File "/opt/venv/.venv/lib/python3.11/site-packages/pyld/jsonld.py", line 1219, in to_rdf

The only place https://w3id.org/identity/v1 appearing in microblog.pub's source code is app/ldsig.py, which I have already modified in my patch. For this reason, it seems that you didn't apply the patch correctly, or the new code is not deployed. According the log, I guess you are using docker, so you need to rebuild the image and make sure that the current working container is running the newly built image. You can run:

sudo docker cp [CONTAINRE-ID]:/app/app/ldsig.py ./

to get the source file from current running container and check if it's right.

Rodrigo Ghedin 2 months ago · edit

Ops, my bad!

(I know close to nothing about Docker, although I use a managed hosting and they help me a lot with all the hosting things.)

I asked the support to rebuild the Docker image.

Thanks for your attention!

Rodrigo Ghedin 2 months ago · edit

It worked like a charm :)

Thanks, guys!

~tsileo 2 months ago

Hey everyone!

Sorry about this incident, the timing was quite unfortunate as I was on a business trip.

I just pushed a fix that tweaks the document loading: https://git.sr.ht/~tsileo/microblog.pub/commit/3f129855d1b3d276ca9f9e036d4711e832466984

It seems to work on my instance.

I can't seem to see anyone complaining about the document being down, which seems a bit weird, I will look deeper into this.

I will keep you updated if I find something.

Thanks!

~tsileo 2 months ago

@Rodrigo the delay for posts showing up is coming from overloaded Mastodon instances (i.e. the receiving instance), once a message is sent to an instance, it may have a backlog of messages to process. This is unrelated to this issue.

~tsileo 2 months ago

I just pushed a proper fix here: https://git.sr.ht/~tsileo/microblog.pub/commit/ce6f9238f3b8afe1ba1564a7dcbdf2d6f6445db4

It seems the "security" context is the newer context that should be used anyway!

Rodrigo Ghedin 2 months ago · edit

@~tsileo Now it’s working fine, thanks! But I’m pretty sure the delays were related to this issue. Right after we applied the earlier fix, suggested by @Dan Jones, the problem was gone.

~tsileo REPORTED FIXED a month ago

Register here or Log in to comment, or comment via email.