Hello. My website (https://drwho.virtadpt.net/) is part of the Fediring. Earlier this week I noticed that one particular IP address (5.161.53.68 - fediring.net) is the source for about 63.7% of all of the web traffic on my site in the last 31 days.
I'm flattered that somebody's indexing me. However, that seems a little excessive. Is Lieu supposed to be that aggressive when it spiders sites? Or is my site just that big? I don't have any experience with it so I don't know if that's normal or not. Can you please advise?
Hello o/
It's late here; please forgive my brief reply.
Lieu was set to crawl the entire fediring daily, and in hindsight, that's excessive. I've set it to monthly for now and will re-evaluate/reply tomorrow afternoon or evening.
Cheers
~jbauer please feel free to weigh in if you have any thoughts
Reducing crawling frequency would be good. I do think once per day is a bit excessive for the size and frequency of updates of most sites on the ring, so we should go with something along the lines of once every two weeks or once every month (depending on what Lieu allows)
we should go with something along the lines of once every two weeks or once every month (depending on what Lieu allows)
Crawling and ingesting are just two commands that execute in sequence in a cronjob, so any cron expression is fine. It had just been
@daily
, but is now@monthly
. There's a thread on fedi about it where the Lieu creator says they crawl the xxiivv ring manually every 3-4 months; based on that, I think leaving it at monthly sounds fine for now. It's not 3-4 months, but it's still a ~30x reduction in traffic ^^'What do you think ~drwho?
I think monthly recrawls make sense, ~amolith. Most sites don't change all that frequently and the ones that do tend to post their links before needing to search for them makes sense.
Is this a monthly full re-crawl, or is there a way to optimize it? Say, by paying attention to If-Modified-Since headers or HTTP 304 status returns?
As far as I'm aware, it's a full re-crawl each time and there are no optimisations. Lieu's config docs don't really mention anything about traffic, mainly just general config and improving ingest heuristics for better results.
Sounds like this problem has been solved! I'll go ahead and close this issue now.