Unknown links cause offpunk to raise the following:
Traceback (most recent call last):rc://irc.libera.chat/#xxxx
File "/usr/sbin/offpunk", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/lib/python3.11/site-packages/offpunk.py", line 1888, in main
gc.call_sync(refresh_time=refresh_time,depth=depth,lists=args.url)
File "/usr/lib/python3.11/site-packages/offpunk.py", line 1739, in call_sync
fetch_list(l,validity=refresh_time,depth=depth,tourchildren=True)
File "/usr/lib/python3.11/site-packages/offpunk.py", line 1713, in fetch_list
fetch_url(l,depth=depth,validity=validity,savetotour=tourchildren,count=[counter,end])
File "/usr/lib/python3.11/site-packages/offpunk.py", line 1703, in fetch_url
fetch_url(k,depth=d,validity=0,savetotour=savetotour,\
File "/usr/lib/python3.11/site-packages/offpunk.py", line 1703, in fetch_url
fetch_url(k,depth=d,validity=0,savetotour=savetotour,\
File "/usr/lib/python3.11/site-packages/offpunk.py", line 1703, in fetch_url
fetch_url(k,depth=d,validity=0,savetotour=savetotour,\
[Previous line repeated 1 more time]
File "/usr/lib/python3.11/site-packages/offpunk.py", line 1696, in fetch_url
links = r.get_links(mode=mode)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/ansicat.py", line 528, in get_links
self._build_body_and_links(mode)
File "/usr/lib/python3.11/site-packages/ansicat.py", line 515, in _build_body_and_links
abs_l = urllib.parse.urljoin(self.url,l.split()[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/urllib/parse.py", line 551, in urljoin
urlparse(url, bscheme, allow_fragments)
File "/usr/lib/python3.11/urllib/parse.py", line 395, in urlparse
splitresult = urlsplit(url, scheme, allow_fragments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/urllib/parse.py", line 500, in urlsplit
_check_bracketed_host(bracketed_host)
File "/usr/lib/python3.11/urllib/parse.py", line 446, in _check_bracketed_host
ip = ipaddress.ip_address(hostname) # Throws Value Error if not IPv6 or IPv4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: '7' does not appear to be an IPv4 or IPv6 address
Could you test it with trunk? I think I’ve fixed a similar crash in the upcoming 2.1
On Thu, 30 Nov 2023 22:24:20 +0000 "~lioploum" outgoing@sr.ht wrote:
Could you test it with trunk? I think I’ve fixed a similar crash in the upcoming 2.1 No, it seems like my problem from this test with trunk (c3aff6755e256fd977bd5073c5c2880f03d9c177)
https://tracker.debian.org[7] Traceback (most recent call last): File "/usr/sbin/offpunk", line 8, in <module> sys.exit(main()) ^^^^^^ File "/usr/lib/python3.11/site-packages/offpunk.py", line 1897, in main gc.call_sync(refresh_time=refresh_time,depth=depth,lists=args.url) File "/usr/lib/python3.11/site-packages/offpunk.py", line 1747, in call_sync fetch_list(l,validity=refresh_time,depth=depth,tourchildren=True) File "/usr/lib/python3.11/site-packages/offpunk.py", line 1721, in fetch_list fetch_url(l,depth=depth,validity=validity,savetotour=tourchildren,count=[counter,end]) File "/usr/lib/python3.11/site-packages/offpunk.py", line 1711, in fetch_url fetch_url(k,depth=d,validity=0,savetotour=savetotour,\ File "/usr/lib/python3.11/site-packages/offpunk.py", line 1711, in fetch_url fetch_url(k,depth=d,validity=0,savetotour=savetotour,\ File "/usr/lib/python3.11/site-packages/offpunk.py", line 1711, in fetch_url fetch_url(k,depth=d,validity=0,savetotour=savetotour,\ [Previous line repeated 2 more times] File "/usr/lib/python3.11/site-packages/offpunk.py", line 1668, in fetch_url if not netcache.is_cache_valid(url,validity=validity): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/netcache.py", line 106, in is_cache_valid cache = get_cache_path(url) ^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/netcache.py", line 135, in get_cache_path parsed = urllib.parse.urlparse(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/urllib/parse.py", line 395, in urlparse splitresult = urlsplit(url, scheme, allow_fragments) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/urllib/parse.py", line 500, in urlsplit _check_bracketed_host(bracketed_host) File "/usr/lib/python3.11/urllib/parse.py", line 446, in _check_bracketed_host ip = ipaddress.ip_address(hostname) # Throws Value Error if not IPv6 or IPv4 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/ipaddress.py", line 54, in ip_address raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address') ValueError: '7' does not appear to be an IPv4 or IPv6 address
(I've added a debugging call to fetch_url to print the url every time, so I had the data from fetch_url)
The URL gemini://rkta.srht.site/debbug-subscribe.gmi fails to be able to extract links as netcache seems to interpret the link
=> https://tracker.debian.org [7] https://tracker.debian.org
Erroneously, but that's not handled by offpunk but by urllib, it seems to be run something like the following.
urllib.parse.urlparse("https://tracker.debian.org[7]")fetch_url() receives url as 'https://tracker.debian.org[7]' which is inherently incorrect I think. Is the page wrong?
It further seems to be called as Line 1710 in offpunk.py calls fetch_url, it is running with mode == "links_only", which means that when
AbstractRender().get_links(mode)
gets called (from the page which links to the different string) the function should only return the link, which should behttps://tracker.debian.org
.I guess to fix this editing some things from lines in between 497 and 523 is needed.
That’s a very good catch. This is indeed a problem with that particular page.
Fixed a crash when parsing hidden_urls bug #32
GemtextRenderer is parsing the text for URLs not starting with "=>" and adding them later to the list to avoid having to copy/paste with the mouse. This is an hidden feature.
In this case, the url was not supposed to be one and included [] chars which prevent urllib to know how to handle it.
The fix involved refactoring the looks_like_url functions out of offpunk and add it to offutils so it can be used by ansicat to ensure a function looks_like_url before giving it to urllib.
!resolve FIXED
Referencing the commit that fixes this for posterity: https://git.sr.ht/~lioploum/offpunk/commit/316465835217744f560fe2cd68bc457c1fc998d6 Commit id 316465835217744f560fe2cd68bc457c1fc998d6
Thanks, I have finally synced offpunk on my computer without it crashing. (Yes, I have read the comments you did the moment you posted them, but I've been holding off until offpunk finishes so I don't have to send another comment if there's a bug related to the same field.) I have not had any other issues, thanks.
( Also, I guess I should have tried to fix the adding every fetched link to the tour part as offpunk has already put over 31000 links in there. I guess I accidentally almost made a archive of the world wide web, I should probably also request a feature to ignore certain links when fetching, (like upload.wikimedia.org or en.wikipedia.org) And yes, I think I'm a bit insane using
--depth 5
and linking to a capsule list. I guess I should probably email the user-discussion list to make the proposals. )
any --depth over 1 is insane ;-)
there’s already the option to ignore http links, allowing you to cache the gemini world.
offpunk --sync --disable-http
or you can disable images:
offpunk --sync --images-mode None (that one is new and not really well tested)
But, indeed, it would be best discussed on the list.