Implement in-house crawler


Assigned to
10 months ago
10 months ago
No labels applied.

~danskeren 10 months ago

:) Big fan of your work.

Being able to search within a specific category, such as code, shop, food, music, etc., and making this data accessible to all, is the long-term goal. This would also allow us to provide category-specific filters to the search results, which would no doubt yield better results than a general-purpose search engine.

~sircmpwn 10 months ago

That will be quite nice - but isn't that secondary to having an in-house crawler?

~danskeren 10 months ago

I envision a public search index that only allowed websites that didn't engage in user tracking, SEO spam, and whatnot. It would be a breath of fresh air not to worry about cookie overlays, auto-playing videos, etc. whenever you opened a website presented by the search engines. For this we would definitely need an in-house crawler.

For the category-specific search index then I think it would be possible to obtain the data for a lot of the categories through APIs, data dumps, and by scraping a specific site. I intend to focus on these first since it will be a lot easier to implement :)

Register here or Log in to comment, or comment via email.