Effective Concurrency in Go @ Go Israel Meetup in TLV

JFrog is a proud Community Sponsor for the Go Israel Meetup

March 30, 2023

< 1 min read

Go is built around the idea that less is more, and this philosophy is certainly true of its concurrency model. In this talk, we will take a deep dive into some elegant patterns and dangerous pitfalls when using concurrency in Go. We will rethink two structures in the context of Go’s model: the Condition-Variable and the LRU Cache. We will build both of them from the ground up, showcasing both the simplicity of Go as well as the complexity of any concurrent structure. In this talk, we will look at a live / semi-live demo where I will exhibit some concurrency ideas and “gotchas” while showcasing them using real data structures similar to the ones we actually use in our codebase.


Amitai Gottlieb

Software Engineer @ JFrog

Amitai has been developing Go at Jfrog for two years, working on a distributed cache solution know as Private Distribution Network. Other than work, Amitai is enthusiastic about programming languages, design patterns and has a deep love for mathematics. When not programming, you can usually find Amitai riding his bike around Israel.

Video Transcript

I’m a software engineer and
a jfrog
and I work on distributed systems and go
so that’s just about a little bit about
my background and I’m studying towards a
double major in maths and computer
science I’m almost done I’ve been done
for quite a while almost done for quite
a while and
in my spare time I’m a computer
languages Enthusiast so I’m really into
all types of computer languages and what
you can do with them and the ideas
and and I love my bike almost as much as
I love software and my girlfriend that
means I love my girlfriend more than my
bike I just want to be clear about that
but I really love my bike
yeah so let’s go back in time I want to
kind of tell you a story a little bit of
history and so the year is 1978 and the
programmer is Tony and Tony
is a touring Prize recipient during
award recipient and he invented the
quicksort algorithm at age 22.
and he also invented the null pointer
which he later called his billion dollar
mistake so he’s you know done some good
things done some bad things and pretty
respectable guy in general
and and the problem that he’s trying to
solve in 1978 is that it’s becoming
apparent that parallel programming is
going to be the future of software
development and
and parallel programming is hard
so he realizes that concurrent software
is the future of performance software
and that’s the way the world’s heading
even in 1978 process is running on many
processors they’re running in parallel
schedulers Etc
um and he wants to define a single way
to to write concurrent programs which
um which isn’t available to him at the
um what they had then is quite similar
to what we have now which is mutexes
semaphores condition variables all kinds
of elements to allow us to to program
concurrently but there was no like one
true way to program concurrently maybe
similar to how we would program this
sequentially where everyone understands
Loops if else statements and there
wasn’t these kind of uh Paradigm
constructs for concurrent programming
yeah I think it was Biggie Smalls I said
more mutex is more problems in his 1994
album they’re ready to die
so uh yeah
so who are we searching for a solution
which would provide him with three
attributes the first was Simplicity this
is pretty self-explanatory uh
he wanted any programmer to be able to
look at the concurrent software he was
writing and to reason about it easily to
talk about it with other programmers uh
as he would any software
and he wanted the solution to be
abstract and meaning that software
developers wouldn’t have to think about
concurrency while they were writing
concurrent programs so they could just
think about their ideas that they wanted
to convey in their software and the
implementation of the concurrency would
happen behind the scenes
and finally and this is probably the
most important he wanted his solution to
be composable meaning that he could
scale up his concurrent solution add
more concurrency and that it would still
be viable it would it would continue to
be simple and abstract
so what he came up with is something
called CSP has anyone here heard of CSP
by the way okay that’s cool uh three
and so this was huge in 1978 and or
published a paper where he describes a
new programming language
and even though it’s it’s technically a
programming language I want us to think
of it in this talk more of like a
philosophy as a way of thinking about
concurrent programming
um and technically actually it’s an
um but that’s not important uh if
anyone’s interested I’ll link the paper
at the end
and he calls this language algebra
philosophy thing he calls it
communicating sequential processes
so his language has Primitives of
process input and process output and it
has a structure of parallel composition
of sequential processes so this is
probably gibberish to most people so
let’s dive into this in short and his
idea is to have small independent and
sequential software units and all of the
software units run in parallel to each
and they communicate with each other
using message passing otherwise known as
channels so does this sound
familiar to anyone is anyone there sorry
sorry erlang erlang is actually really
interesting uh I was hoping someone
would say go routines
um but that’s actually it is the case
but what I was thinking of is a shell
so how is this similar to Shell commands
let’s take a look
so consider the following command can
anyone everyone see this by the way okay
so we’re doing a cut on a foo.txt file
and then grapping for the word bar and
then T into a a file called buzz.text
and so this basically means we’ll read
from Foo we’ll filter out all the lines
that don’t have the word bar and then
we’ll write the output into a file
called bats so what’s happening here
actually so
cut grip and tea are all small
sequential processes right they’re all
communicating with each other over
channels which in this case is the pipe
so cat is communicating to grip over a
pipe and they’re executing concurrently
that’s something that we don’t think
about a lot while we’re using the
terminal but when we compose commands in
this way
cut grip and T are all set off at the
same time and what synchronizes between
them is actually the pipe so they’re all
running concurrently so what a crazy
concurrency model I think this is
absolutely amazing like none of us ever
think of concurrency while we’re using
the terminal right we just think of
writing commands and the concurrency
happens behind the scenes
basically what we have here is a
concurrency model which is simple
because what’s written here is is really
understandable it’s abstract because as
I said no one ever thinks of parallelism
while they’re writing commands in the
terminal and it’s composable because if
we want to add more functionality or we
want to change the command all we need
to do is like we did here where we throw
in a set command we just need to change
our commands a little bit and they will
get completely different functionality
so this is kind of the essence of CSP
this is the idea that Tony ho was trying
to to come to give across and this is
the kind of ideas that we want to
program with when we’re programming in a
kind of CSP Style
so yeah we have here like writing
sequential code which is boring and then
writing parallel code which is okay and
then writing sequential code and running
in parallel in the Shell which is
all right yeah so back to CSP and I want
to compare CSP a little bit to go and
how the the syntaxes match just uh just
so we can see the similarities and so in
CSP and go we can read and write from
and go the way we would read from a
channel is using the arrow operator so
here we read card image from card reader
in CSP it’s very similar we use the
question mark operator so here we read
card image from card reader in CSP
um and we can write the channels of
course as well and go also Arrow
operator to write line image to the
channel line printer
in CSP and we use the exclamation mark
and so very similar in general and an
important point is that in CSP and then
go channels are blocking until there is
a reader or a writer on the other side
so if I write to a channel in CSP I have
to wait until someone reads from the
channel before I can continue operation
and this means that channels are more
than just a tool for passing information
they’re also a tool for synchronizing
between different threads or between
different processes and we’ll see how
this comes into play a little bit later
um so executing sequential code is
something which every programming
language can do you can do it in go and
here we read from a channel in increment
a variable I and then do some
conditional logic and you have exactly
the same kind of functionality in CSP
and we can also execute concurrent
processes of course no concurrency model
would be complete without this so in go
we just prefix the function with the go
command in order to execute things
concurrently so here disassemble squash
and assemble will all run concurrently
and in CSP we can do this using a double
pipe operator so to run disassemble
squash and assemble together we can just
put them together with this double pipe
and so another important point about CSP
is that in CSP processes never share
state any shared information is CSP in
CSP has to be passed over channels
and what this gives us absolutely for
free is an absence of data races so
because we have to copy any information
between two channels only one process
can ever see any copy of data and we can
never have data races in CSP
um cool so the final thing I want to
talk about is the select command I won’t
go into the what what select command can
do because it’s not so important if
you’re not really aware if you’re not a
go programmer but it’s a really powerful
tool for for selecting over events with
we can do it and go using this uh the
syntax so here if x is larger than y
will perform some logic and if we can
read a variable C from the channel in
we’ll do another logic and and in CSP
you actually have exactly the same
behavior you can achieve it with the
this pound sign operator
so here on the left we have the case of
X is larger than y and on the right we
have the case where we read C from the
channel in
and and this is a really interesting
because all of these features you know
blocking channels
simple parallel functions the select
case syntax that we have in go these are
like the joules on the crown of ghost
concurrency model these are things we
think of these as modern Sleek design
implementations of go when in fact they
were you know thought of by Tony Hall
more than 50 years ago
so or almost 50 years ago and so that’s
interesting that raises like that raises
eyebrows for me and
so CSP is also a very rich language
regardless I won’t go into all of the
syntax it’s not really important and but
if anyone’s interested
yeah I I recommend it and so in practice
um because CSP is so simple and the
model is so simple
and we can actually develop tools our
tools have been developed to prove the
correctness of CSP code this means that
we can prove absence of Deadlocks we can
prove or there are obviously absence of
data races and we can prove the
correctness of CSP code
and these techniques were used to write
safe concurrent code for the
International Space Station in the 80s
so really powerful language really
powerful ideas
and a number of language and languages
emerge that were directly inspired by
CSP hey one of them is Occam and very
Niche language and another one is new
squeak which is also incredibly Niche
but it’s really interesting because new
squeak was designed at Bell Labs by Luca
cardelli and a guy called Rob Pike
and if any of you are unaware uh Rob
Pike is one of the designers of go one
of the three designers ago
and probably the most influential person
in the design of the girl language and
so all this just goes to say that the
the similarities that we just saw
between CSP and between go are no
Rob Pike was fully aware of CSP and the
powers of the CSP thinking when he
implemented channels and when the go
concurrency model was created
so I want to see how we can apply like
CSP thinking into the way we write go
and so that we can really utilize the
full the full power of the go
concurrency model
and cool so the example I’m going to
give is something called the Civil
veritosis does anyone know what this is
by any chance uh you’ve got the the one
mathematician in the room
um so the Civil erectostinis is an
algorithm to finding prime numbers
and and the reason I chose it and don’t
read this this is just an example but
the reason I chose it is a because horde
gives this exact example in his original
paper in 1978 for how we can use a CSP
to write concurrent code and so I
thought it would be a nice a nice little
example kind of link us back to history
and for us to do it here together
so the similaratistan is the problem
statement is as follows given a large n
how can we find all the prime numbers up
to n
so in this case we have n equal 25 and
we want to find all the prime numbers up
to 25.
and so the first thing we do is we
remove one because one isn’t Prime and
that’s a fact and they don’t argue with
um and then the second thing we do is we
take two two is the first prime number
and we can mark it with an Asterix we
know it’s a prime number and more than
that we know that any multiple of two
isn’t Prime
so we can remove all of the multiples of
two so four six eight all the even
numbers we can throw them out they’re
definitely not Prime
next we’ll move on to the next number
three we mark it it’s Prime and now we
can remove all the multiples of three so
a 12 goes uh sorry nine will go with 15
will go any odd number which is a
multiple of three is now also removed
and we continue in this manner Now we
move to five we remove any multiples of
etc etc eventually we’re left with all
the prime numbers okay
so fairly simple
um but uh we’ll see how we can do this
concurrently using CSP which is a not so
simple Maybe
and so first of all how would we write
this sequentially just simply and in go
what I have here is just a very
straightforward sequential function
which does exactly what I just described
to you and
we first make an array of all the
numbers between 0 and a n or which will
represent all the numbers between 0 and
N we can set 0 and 1 to be not prime
because 1 is not prime I said that
and then we can just iterate over all
the numbers if the number is prime we
can set all the multiples to be not
and finally just print all the prime
numbers and it’s pretty straightforward
um and here we have a pretty decent
decent implementation of this algorithm
and it’s not optimal but we can optimize
it and we can get to a a complexity of n
Times log log n and not very important
but we can do that if we want
but you’re sitting at your day job in
ancient Greek algorithms Incorporated
and your boss comes along and he tells
you that’s great but can we make this
parallel please you know I have 10
processors I want to use them and so how
would we go about doing this
so we have I’ll just go back to the code
we have this shared data structure here
and they say this array
we might have to put a lock on this
array so that if a lot of the threads
wanted to access it concurrently we
would have safe concurrent access but
we’re going to take a different approach
and we’re going to think like a shell
exactly like I said before we’re going
to compose the ideas together like
exactly like a show command
so I want you to imagine the following
command just imagine this isn’t what we
have yet but um
like just a imagine that we’re going in
this direction so sec to 100 is going to
print all the numbers between two and a
hundred so two three four five six
Etc up until 100. and then the first
sieve command that we have here will
filter out all of the even numbers so
it’ll print two and filter out four six
eight Etc
and then the next sieve we’ll filter out
all the multiples of three just like we
did in step two and then the next step
will filter out all the multiples of
etc etc and then we’ll be left with the
prime numbers so I’ll just go through
that again so we’re all uh we’re all on
the same page
so SEC outputs a list of numbers from
two to one hundred
and save is a command which is going to
take as an input a list of numbers it’s
going to print the first number out and
then it will pass on all the numbers
that aren’t a multiple of the first
okay so as an example
the first sieve will take in the numbers
two three four five six up to 100. and
then output sorry it will first print to
the first number and then it will output
any number which isn’t a multiple of two
so it will output three five seven nine
the second sieve is going to print three
the first number and then it’s going to
Output all the numbers which it got
which aren’t a multiple of three so five
seven it skips nine and then we go on to
11 and we keep going in this way and
we’re going to chain a lot of these Civ
functions together and eventually we’re
going to get all the prime numbers in
this way
cool so there’s three things we have to
do here three fairly simple things and
we’re done the first we need to
implement the SEC the second we need to
implement Civ and then we need to tie
them all together and then we’re
um SEC is fairly simple
and it writes the numbers two three four
to a channel out until canceled so it
receives a channel done so that we can
cancel it at the end and it receives the
channel out to write to
uh we defer closing the the out Channel
and we’ll get to why that’s important at
the end and then all we do in a loop
forever is we just write numbers two
three four five six to the out Channel
until someone cancels us and when
someone cancels us we just return
so that’s sick
the save function receives an in-channel
a list of numbers coming in and has a
list of numbers going out we again defer
closing the out Channel we’ll get to
that at the end we print the first
and then we can just go over all the
numbers that we get in
if the number is a multiple of P we can
skip it and otherwise we can pass the
number on to the next step
um cool
so that’s the sieve function how do we
compose all of this together
and so the idea is as follows
and so we’re going to start off with our
SEC function it’s going to pass off the
number is 2 3 4 5 to the first save
the save function is going to write to
an out Channel which is going to be the
N channel to the next function which
will be the out channel to the final
sieve function so the negative function
and so on and so on until we until we
and then I just want you to notice that
each sieve this sieve will print the
first prime number the Civ will print
the second prime number the simple print
the third prime number
and then once we get here we know that
we’ve printed all the prime numbers we
wanted to print so once this save starts
passing on numbers we know we’re done
so we can just take that output put it
straight into the SEC tell it we’re done
the SEC will close this will close the
next sieve this will close this will
close and we’ll exit gracefully okay so
this is the last piece of code where
we’re completely done and so how does
this look when we chain it all together
the first thing we do is we start our
SEC function we give it a done channel
so that we’ll be able to tell it to
close and we give it an in-channel so
that it can write to the first sieve
and we set it off in a go routine
and next 100 times
um we we set off a save function and
every time we set off a function we tell
them the next in channel will be your
out channel so this the in function for
the next sieve is the out channel for
the previous one
and then finally we can just take the
last out Channel and pipe it directly
into the done function to tell the SEC
that we’re done and this will print all
the prime numbers for us
and so I just want to show you this
working because you know I want you to
believe me that this actually works
and then once we’ve done that and we’ll
be done we’ll have created our first
cspa style concurrency
and so let’s take a look
so this is the code
this is all of the code it may have
seemed a little long while I was talking
about it but this is actually the full
implementation of this prime number uh
and let’s run it so
go run save
Tata we get the first 100 prime numbers
and if anyone wants to check this the
code is on GitHub
yeah so that’s it
so what have we achieved here and we
ditched the shared data structure we
ditched the array that we had before
and instead we favored streams of data
which we could more easily reason about
and pass between our different functions
so this is the core of the CSP idea
and we also achieved all of this with
zero locks it was all done using
channels which in my opinion are much
easier to reason about as a human
and and we created composable
concurrency that scales naturally if we
want to add more or we want to generate
more prime numbers all we need to do is
generate more go routines and we’ll have
more prime numbers for free
and the final like bonus thing is that
actually this implementation without
even thinking about it gives us optimal
complexity for this problem so we did it
in O of n log log n and again exercise
for the reader why this is Olan logo
again but uh we believe me it is and
so that’s it thank you very much
um and does anyone have any questions
that’s true for every prime number you
need to go routine and so I just want to
say this this idea the idea is cool
never write this for a prime number
iterator it’s much slower than doing it
just with an array this was more to show
the idea but um in go go routines are
really really cheap so there are a few
kilobytes of memory and they don’t
actually open an operating system thread
so you can run thousands tens of
thousands I think up to Millions even
um and it will you you won’t see any
kind of performance issues with that so
you can get a million prime numbers
quite easily with this implementation
very much
um so my original talk I wanted to start
with this and then go on into like all
the ways that I use this in my code
but that would have been an hour and a
um but I think that the idea is first of
all you need to get used to writing code
that looks like this and not like
classical concurrency code and so it
takes a little bit of learning just like
in the same way you learn how to use a
mutex and at the start it’s really
unintuitive and then you learn how to
use it I think it’s the same with this
kind of code and and I think what I said
uh preferring streams over data
structures is the the heart of it so if
you can especially with things like
arrays and the arrays of arrays usually
you can move those things into streams
of data which move between functions
rather than a single instance which
everyone has to talk with and it makes
your code a lot easier to understand in
my opinion
question was like
or what would be a natural
stay eight everybody should you just
said you shouldn’t be using that
you’re doing this information
what is a good implementation okay
that’s a great question so
um I think you with IO it’s definitely
the big one so things like caches where
um you need to get something from a
cache but it takes a while because the
cash has to perform some kind of network
operation or disk operation
and you have to wait for this that’s a
really good place to to move your data
into maybe a stream rather than a cache
which holds it in a in a data structure
with a lock and another thing would be
handling requests for instance so
handling Network requests moving things
through streams of data rather than
having an object which holds I don’t
know your requests or uh or whatever
and those are the kind of things I would
anything else yeah
so so the way it works is that if you
we’ll just do two like like let’s say
cut and then grab
so what happens is the operating system
starts the two programs together
and it just tells the grip read from the
pipe and every time you can read
start working so
the the cat will will start up it will
read a line from the file and then it
will pass it to the pipe and then the
pipe will then output a line to the grip
and grep will be able to read one line
and output to the terminal and then the
the cut will read another line and then
grep can read another line and this way
they both actually work together rather
than having to read the whole file and
then to filter through the whole file
that’s the way it happens concurrently
um anything else if there’s any
technical questions about the code if
someone didn’t understand something I’d
really be happy to go over it so feel
by the way um all of this is on my
um I’ll make sure to put the GitHub up
at the end and the original paper that I
took this from is also on my GitHub for
anyone who wants to
before the better it is