1024programmer Java Go Language Journey_Web Crawler_Concurrency Thinking Way

Go Language Journey_Web Crawler_Concurrency Thinking Way

https://tour.go-zh.org/concurrency/10

The code comes from MIT 6.824. Learn these three implementation methods and understand the concurrency way of thinking under Go

serial

func Serial(url string, fetcher Fetcher, fetched map[string]bool) {
if fetched[url] {
return
}
fetched[url] = true
urls, err := fetcher.Fetch(url)
if err != nil {
return
}
for _, u := range urls {
Serial(u, fetcher, fetched)
}
}

Clear dfs implementation, record visited URLs in fetched, traverse all URLs

Concurrent locking

type fetchState struct {
mu sync.Mutex
fetched map[string]bool
}
func ConcurrentMutex (url string, fetcher Fetcher, f *fetchState) {
f.mu.Lock()
already := f.fetched[url]
f.fetched[url] = true
f. mu.Unlock()
if already {
return
}
urls, err := fetcher.Fetch(url)
if err != nil {
return
}
var done sync.WaitGroup
for _, u := range urls {
done.Add(1)
go func(u string) {
ConcurrentMutex(u, fetcher, f)
done.Done()
}(u)
}
done.Wait()
return
}

Record There is competition between threads in the operation of the visited URL. In the concurrent programming model of multi-thread shared memory, there are conflicts when multiple threads operate the same memory. Mutex locks are used here to ensure concurrency safety

There are two interesting points here. sync.WaitGroup is used to wait for tasks. Execution completed; func(u string) here the parameter u needs to be passed in (value copy), and cannot refer to a changing u variable above (reference copy)

Concurrent communication

func worker(url string, ch chan []string, fetcher Fetcher) {
urls, err := fetcher.Fetch(url)
if err != nil {
ch <- []string{}
} else {
ch <- urls
}
}
func master(ch chan [ ]string, fetcher Fetcher) {
n := 1
fetched := make(map[string]bool)
for urls := range ch {
for _, u := range urls {
if fetched[u] == false {
fetched[u] = true
n += 1
go worker(u, ch, fetcher)
}
}
n -= 1
if n == 0 {
break
}
}
}
func ConcurrentChannel(url string, fetcher Fetcher) {
ch := make (chan []string)
go func() {
ch <- []string{url}
}()
master(ch, fetcher)
}

The master and the worker are decoupled through the channel. The master takes out the URL from the channel for judgment, and the worker is responsible for traversing the URL and writing the result back to the channel. The interaction between the master and the worker is only the reading and writing of the channel

Although the channel is also implemented internally through locking, this is a new way of thinking about concurrent programming. Since the language itself provides a thread-safe channel, you only need to consider how to communicate with the channel

It should be noted that reading and writing channels will block by default, and you need to consider the exit/timeout mechanism

Both This kind of realization is more about the difference in the way of thinking, which needs to be experienced with heart.

More, about Go concurrency principles

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/764993

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: [email protected]

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索