Currently the User-Agent is blank. This should be changed so that it contains the details of the project including an abuse and/or contact email address. This should be configurable easily so that not all self-hosted instances reference the parent project: e.g.
veri-[web|gemini|gopher]-scraper operated by <domain> (<email>)
The colly library we are using to grab URLs from pages respects robots.txt, but this is passed to go-readability which has no concept of robots.txt. We either need to implement our own checking of the site's robots.txt before running go-readability, or else give the list of links another pass through colly to filter out ones that don't match the site's robots.txt
Database technology as yet undecided