The colly library we are using to grab URLs from pages respects robots.txt, but this is passed to go-readability which has no concept of robots.txt. We either need to implement our own checking of the site's robots.txt before running go-readability, or else give the list of links another pass through colly to filter out ones that don't match the site's robots.txt