Everything you need to know about concurrence and parallelism in Go

In programming, when we talk about doing several things at once, we often talk about concurrency and parallelism equally. However, it is important to note that these two concepts are not the same thing, but are often confused.

In golang there are some proverbs, known as The Proverbs of Go, written by Rob Pike, once of the creators of Go. Those proverbs summarizes the good community practices and the philosophy you should follow when programming in Go. And precisely, one of those proverbs states that: Concurrency is not parallelism.

So first of all I want to make clear what's the difference between concurrency and parallelism.

What's the difference between concurrency and parallelism?

Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. Which does not imply that both processes run at the same time.

Parallelism, then, arises when two or more of these concurrent tasks are literally executed at the same time. Therefore, in computing terms, this is only possible when we have more than one processor (or multi-core / multi-thread processors).

As you not always known the number of processors that the machine where you are going to execute your program will have (for example, imagine you share your golang code in GitHub and people download it to run it in their machine), it is usually more correct to talk about concurrency than parallelism.

How does Go achieve concurrency?

Now that it is clear what the difference between concurrency and parallelism is, let's see how Go achieves it.

Go provides us with multiple tools to handle concurrency, but probably the most known ones are goroutines and channels.

Goroutines

Goroutines are (normal) functions executed concurrently (not sequentially) using the go keyword. Let's see an example

package main

import (
    "fmt"
    "time"
)

func commonFunc() {
    fmt.Println("I'm a common func, executed concurrently")
}

func main() {
    go commonFunc()
    fmt.Println("I'm the main routine")
    time.Sleep(1 * time.Second)
}

As you see, in the above code, the commonFunc function will be executed in a concurrent way with the main function, and depending on the process scheduling, either of the functions could print fist to our terminal.

All we have to do is invoke or call the function with the go keyword first and Golang will take care for us of the concurrent execution. Easy, right?

WaitGroups

Before talking about channels, I want to talk about wait groups first. Did you notice that I added a time.sleep at the end of the main function in the example above?

As Go is such a fast language, we needed to add a time.Sleep to the main function to make sure that the commonFunc function has time to be executed before the main function finishes and our program exits. This is not the best way to handle sync and waiting in concurrency (we don't know how long it will take to run or functions), and the correct and efficient way to do it is using wait groups.

WaitGroup, provided by the standard package sync, is a struct that allows us to wait for a collection of goroutines to finish. By using it, we can:

Indicate how many goroutines we want to wait by using the Add function (how many gorotuines are in the WaitGroup).
Indicate that a goroutine has finished by using the Done function.
Wait for all goroutines in the WaitGroup to finish by using the Wait function.

Let's see updated our previous example:

package main

import (
    "fmt"
    "sync"
)

func commonFunc(wg *sync.WaitGroup) {
    fmt.Println("I'm a common func, executed concurrently")
    wg.Done()
}

func main() {
    wg := sync.WaitGroup{}
    wg.Add(1)
    go commonFunc(&wg)
    fmt.Println("I'm the main routine")
    wg.Wait()
}

That would be the correct way of writing our previous example.

Channels

Ok, now we know how to make one routine wait for others. However, this will not always be enough. Sometimes we need to communicate data between two routines, and here is where channels come into play.

Channels can be of two types: buffered and unbuffered channels, and they work pretty much like a PubSub pattern. Let's see the differences between them.

Buffered channels

Buffered channels are channels that have an associated buffer. This means that they can store a certain number of values in them. When the buffer is full, the channel will block the sender until the receiver receives a value from the channel. When the buffer is empty, the channel will block the receiver until the sender sends a value to the channel.

For example:

package main

import (
    "fmt"
)

func genNumbers(from, to int, ch chan int) {
    for j := from; j <= to; j++ {
        ch <- j
        fmt.Printf("Published: %d\n", j)
    }
}

func main() {
    ch := make(chan int, 10) // buffered channel
    from, to := 1, 20
    go genNumbers(from, to, ch)
    for j := from; j <= to; j++ {
        fmt.Printf("Received: %d\n", <-ch)
    }
}

In the above example, we create a goroutine to get the numbers from 1 to 20 and send them to the channel. Then, in the main routine, we receive the numbers from the channel and print them to the terminal.

As the channel has a size of 10 elements, the genNumbers goroutine will be able to send the first 10 numbers to the channel without blocking, and then it will block until the main routine receives / read one of the numbers from the channel. Then, the goroutine will be able to send another number to the channel, and so on.

The opposite is also true for the main routine. If the buffer in the channel is empty and the main routines tries to read a value from it, it (the main routine) will be blocked at reading from the channel (<-ch) until genNumbers sends a number to it.

Unbuffered channels

Unbuffered channels, in the other hand, are channels that don't have an associated buffer where to store values, so all I/O operations over the channels will block our routines.

This means that, to be able to write on a channel, we need someone to be reading from it. And vice versa, to be able to read from it, we need someone to be writing. Note that this is different to having a buffered channel of size 1, as in that case we can write to the channel the first time without blocking, as the buffer will be empty and will have room for our data.

Unbuffered channels will allow us not only to communicate two routines, but also to be able to synchronize them, by taking advantage of the blocking operations.

Let's see the previous example, this time using an unbuffered channel:

package main

import (
    "fmt"
)

func genNumbers(from, to int, ch chan int) {
    for j := from; j <= to; j++ {
        ch <- j
        fmt.Printf("Published: %d\n", j)
    }
}

func main() {
    ch := make(chan int) // unbuffered channel
    from, to := 1, 10
    go genNumbers(from, to, ch)
    for j := from; j <= to; j++ {
        fmt.Printf("Received: %d\n", <-ch)
    }
}

As you see, the only difference is that we don't specify the size of the channel when creating it. As you probably deduced from the previous explanation, this time our code will send values one by one, and by the time one value is sent, it's also received at the same time, in a sync way. Before, using buffered channels, this operation could be async when the buffer was not full or empty.

Select

So channels are great to communicate and/or sync two routines, but what if we want to communicate more than two routines? As we saw, channels are blocking operations (unbuffered channels, or buffered channels that are empty / full), so we can't just read from two channels at the same time, as we would block our routine and generate a deadlock.

To fix this problem, Go provides us with the select statement. The select statement lets a goroutine wait on multiple communication operations. A select blocks until one of its cases can run, then it executes that case. It chooses one at random if multiple are ready.

Let's see an example:

package main

import (
    "fmt"
    "time"
)

func genNumbers(from, to int, ch chan int) {
    for j := from; j <= to; j++ {
        ch <- j
        fmt.Printf("Published: %d\n", j)
    }
}

func timeout(n int, ch chan bool) {
    time.Sleep(time.Duration(n) * time.Second)
    ch <- true
}

func main() {
    from, to := 1, 1000
    genCh := make(chan int)
    timeoutCh := make(chan bool)

    go genNumbers(from, to, genCh)
    go timeout(1, timeoutCh)

    for {
        select {
        case x := <-genCh:
            fmt.Printf("Received: %d\n", x)
        case <-timeoutCh:
            fmt.Println("timeout")
            return
        }
    }
}

In the above example, we are listening to two different channels in our main routine at the same time: genCh and timeoutCh. It will execute the first case that is ready, and as it's inside a for loop, it will keep listening to both channels until the timeoutCh channel is ready, and then it will exit the program.

A few extra notes about communicating routines

As we have seen, channels are a really good way to communicate routines. However, Go also allows us to communicate routines by sharing memory, i.e., accessing to the same variables.

This communication model between processes it's not recommended in most cases, as it's more error prone and we can easily generate race conditions. In fact, do you remember The Proverbs of Go I mentioned at the beginning of this article? If you took a few moments to read them, you may have noticed another proverb that says: Don't communicate by sharing memory, share memory by communicating.

As you probably know, in concurrent programming we wan't to control that a variable is not accessed at the same exact time by two or more routines, to avoid data inconsistency and the aforementioned race conditions. To control that access, Go provides a well know resource you may have used in other languages: Mutex.

So at the end, sharing memory is a valid and effective way of communicating routines, but you should only use it if you really really need it, like, for example, operating over a gigantic matrix, where the cost of copying the matrix to a channel would be too high.

Also, when using channels, it's important to note that some data structures are pointers internally, like slices. This means that, if you are going to send one of those data structures through a channel, you should make sure that you are not going to modify it in the receiver routine, or directly send a copy of it by using functions like copy, to avoid unintentionally sharing memory between routines.

Conclusion

That's all. If you have read the entire article and ended here, you already know everything you need about concurrency and parallelism in Go. Now, it's your time to practice and apply all this knowledge in your next project!

I hope my article has helped you, or at least, that you have enjoyed reading it. I do this for fun and I don't need money to keep the blog running. However, if you'd like to show your gratitude, you can pay for my next coffee with a one-time donation of just $1.00. Thanks!