Getting started

Before working with Colly ensure that you have the latest version. See installation guide for more details.

Let’s get started with some simple examples.

First, you need to import Colly to your codebase:

import "github.com/gocolly/colly"

Collector

Colly’s main entity is a Collector object. Collector manages the network communication and responsible for the execution of the attached callbacks while a collector job is running. To work with colly, you have to initialize a Collector:

c := colly.NewCollector()

Callbacks

You can attach different type of callback functions to a Collector to control a collecting job or retrieve information. Check out the related section in the package documentation.

Add callbacks to a Collector

c.OnRequest(func(r *colly.Request) {
    fmt.Println("Visiting", r.URL)
})

c.OnError(func(_ *colly.Response, err error) {
    log.Println("Something went wrong:", err)
})

c.OnResponse(func(r *colly.Response) {
    fmt.Println("Visited", r.Request.URL)
})

c.OnHTML("a[href]", func(e *colly.HTMLElement) {
    e.Request.Visit(e.Attr("href"))
})

c.OnHTML("tr td:nth-of-type(1)", func(e *colly.HTMLElement) {
    fmt.Println("First column of a table row:", e.Text)
})

c.OnXML("//h1", func(e *colly.XMLElement) {
    fmt.Println(e.Text)
})

c.OnScraped(func(r *colly.Response) {
    fmt.Println("Finished", r.Request.URL)
})

Call order of callbacks

1. OnRequest

Called before a request

2. OnError

Called if error occured during the request

3. OnResponse

Called after response received

4. OnHTML

Called right after OnResponse if the received content is HTML

5. OnXML

Called right after OnHTML if the received content is HTML or XML

6. OnScraped

Called after OnXML callbacks