GitHub - mcmillhj/gocrawl: A single domain web crawler written in Go

This is my first time using the Go programming language, so I thought I would write a simple web crawler.

The crawler will only accept a single domain, and gather a mapping of the site and all assets of each page.

Overview:

Accepts a single domain
Does not crawl subdomains
Obeys robots.txt (if one can be found)
Examines Content-Type header of the http.Get response, discards anything with a 'Content-Type' that is not 'text/html'

Future Work:

re-factor crawl into a goroutine so more than one crawl can be happening at a time
obey robots.txt
research more idiomatic testing practices in Go
refactor Page into its own package

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
crawler		crawler
README.md		README.md
main.go		main.go

Provide feedback