~ols/veri#4: 
Respect robots.txt

Status
REPORTED
Submitter
~ols
Assigned to
No-one
Submitted
1 year, 7 months ago
Updated
1 year, 7 months ago
Labels
No labels applied.

~ols 1 year, 7 months ago

The colly library we are using to grab URLs from pages respects robots.txt, but this is passed to go-readability which has no concept of robots.txt. We either need to implement our own checking of the site's robots.txt before running go-readability, or else give the list of links another pass through colly to filter out ones that don't match the site's robots.txt

Register here or Log in to comment, or comment via email.