Using multiple collectors

It is advised to use multiple collectors for one scraping jobs if the task is complex enough or has different kind of subtasks. A good example is coursera course scraper where two collectors are used - one parses the list views and handles paging and the other one collects course details.

Colly has some built-in methods to support the usage of multiple collectors.

Tip

Use collector.ID in debugging to distinguish different collectors

Cloning collectors

You can use the Clone() method of a collector if collectors have similar configuration. Clone() duplicates a collector with identical configuration but without the attached callbacks.

c := colly.NewCollector(
	colly.UserAgent("myUserAgent"),
	colly.AllowedDomains("foo.com", "bar.com"),
)
// Custom User-Agent and allowed domains are cloned to c2
c2 := c.Clone()

Passing custom data between collectors

Use collector’s Request() function to be able to share context with other collectors.

Example of sharing context:

c.OnResponse(func(r *colly.Response) {
	r.Ctx.Put(r.Headers.Get("Custom-Header"))
	c2.Request("GET", "https://foo.com/", nil, r.Ctx, nil)
})