Colly provides a clean interface to write any kind of crawler/scraper/spider
With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.
Features
- Clean API
- Fast (>1k request/sec on a single core)
- Manages request delays and maximum concurrency per domain
- Automatic cookie and session handling
- Sync/async/parallel scraping
- Distributed scraping
- Caching
- Automatic encoding of non-unicode responses
- Robots.txt support
- Google App Engine support
1func main() {
2 c := colly.NewCollector()
3
4 // Find and visit all links
5 c.OnHTML("a", func(e *colly.HTMLElement) {
6 e.Request.Visit(e.Attr("href"))
7 })
8
9 c.OnRequest(func(r *colly.Request) {
10 fmt.Println("Visiting", r.URL)
11 })
12
13 c.Visit("http://go-colly.org/")
14}