Generating Generators – GopherCon Europe 2022

by Tamir Bahar

October 26, 2022

< 1 min read

About the talk: Everyone wants Generators in go. To be able to yield a value from a function and then keep it running. Some people turn to goroutines and channels, but that comes with significant overhead.

In this talk, we’ll use static analysis & code generation to roll our own generators.

You can access Tamir’s project repo here: https://github.com/tmr232/gengen

About the speaker: Tamir Bahar is an Advanced Technologies Team Leader at JFrog, and a member of the Israeli ISO C++ National Body Discussion Group. Tamir is a software engineer and reverse engineer. He likes playing around with the software and finding simple solutions for complex problems, and complex solutions for simple problems.

Speakers

Tamir Bahar

Advanced Technologies Team Lead at JFrog

Tamir is an Advanced Technologies Team Leader at JFrog, and a member of the Israeli ISO C++ National Body Discussion Group. Tamir is a software engineer and reverse engineer. He likes playing around with the software and finding simple solutions for complex problems, and complex solutions for simple problems.

Video Transcript

please put your hands together for tamir
[Applause]
[Music]
okay and you’re talking about generating
generators right yes i
am how do you make it present in case of
piracy go to the start slide as well
yeah it’s computers in it it’s hard
generating generators that sounds very
meta
doesn’t it think about code generation
already is quite meta
but generating things that also generate
i mean
this is quite exciting what do you think
of that v sounds interesting doesn’t it
the first thought that came to my mind
was machine tools because that’s how you
you know make generators
but i think this is a programming talk
okay yeah
[Laughter]
someone told me something tells me
you’re going to enjoy this please put
your hands together for tamir
can i get the clicker
thank you
so um hello i’m tamir and before we just
want to make sure i’m walking on the
stage during the talk
and i’ve been told that if i fall off
the stage the people on the stream would
not be able to see me so if you see me
getting up to here or something like
that please tell me just say stop so
that i do not fall let’s try it once
good
now now we can begin now i’m going to
talk about generating generators
which is a fun thing we can do in go
and let’s start first i want to talk
about myself for a bit which is a
natural thing to do
um i relatively new to go i’ve been
coding in go for about a year now and
before that most of my
experience has been python and c plus
plus so you can say that some of the
experience might have been challenging
switching from those languages to go and
on the one hand i had to let go of some
features like a lot of meta programming
in c plus plus memory management which
actually that’s nice to be ahead of
and some features of python that were
really nice in python but not relevant
in go
on the other hand go has so much good
going for it it’s relatively simple and
the tooling is amazing the ecosystem is
wonderful and when i want to deploy
something i just send one binary i don’t
need to force people to install python
they need to
spread the huge amount of libraries for
c plus plus it just works
and but even for that as i learned more
going that used to the idioms there’s
still one thing that i kept missing from
python and that thing is generators now
before i go forward i just want to say
simply generate as a way to create
iterators in a simple and
straightforward way
now
before we dig into that i want to talk
about my workflow when i’m dealing with
newer apis and after all everyone once
in a while we encounter a new api it can
be some remote rest api it could be
something local just a data structure
that we’ve not dealt with before
and when we do that we want to know what
the data is that we’re dealing with i
mean documentation is nice and reading
the structures is nice but at least for
me seeing the actual data that i’m going
to be operating on is a huge help so the
first thing i do is write some hacky
print function just to get the data in
and point it to the console
um
that’s about it so here we have a simple
library structure i’m guessing you’re
familiar with libraries some of them
have rooms in the room’s shelves and on
the shelves there are books it is a
relatively simplistic example obviously
but you see later it gets more
complicated
when i get the book i want to print it
out and then i can continue
and specifically when i’m doing that if
i have an l i just panic i know we
should not panic on errors it’s really
bad but
later when they keep factoring in the
code they can change they can look for
panics and replace all of them if i’m
silencing those it’s sometimes really
hard to find those places and then we’ll
get issues in production
now once i have that down and i know the
data i know what the books look like and
i want to start working with them the
next thing they need to do is start
creating the business logic but they
need to decide where to place the
business logic one simple option would
be to just place it instead of the print
function this works but as the business
logics
becomes more complicated and more people
use that code it’s going to get really
really ugly and really hard to maintain
i mean we don’t want to put everything
in one single big function the next
thing you can do is you can query all
the data in advance put in a slice and
then work with that slice
but if we have a lot of data or if we
need to make network calls to get that
data we usually don’t want to get it all
in advance we want to do it on demand
when we actually need that data
and for that we can use iterators
iterators provide a way for us to get
our data on demand but plop over it as
if we were using a slice
and we have a relatively simple simple
interface first we create an iterator in
this case with the etherbooks function
that returns an iterator over the box
then we call the next method of the
iterator next method returns a boolean
if it’s true it means there is a value
to get from the iterator if it’s false
the iteration is complete
then we call the value method to get
that value out of the iterator and use
it
and finally when we’re done with the
iteration we call the error method to
check whether we stop because we
exhausted all the values in the iterator
or because we just had an l and we need
to handle it again as i said i’m
panicking here but you shouldn’t do it
now
irradials are great but implementing
them
that can be quite a hassling goal eh
first they are usually comprised of
roughly five parts the first is a struct
we need some stack to maintain the state
of the iterator at all we’re changing
that state with each call to next and we
need to know what’s going on next
we have a constructor for the iterator
because sometimes the construction is
not trivial and we don’t want to have
the user with that
then we have three getter methods
and two already trivial getters value
and l we just return the value of the l
stored in the in iterative structure the
last one next is the method that’s doing
all the heavy lifting when you call next
we want to advance data from the current
state and the current value to the next
one so all the actual processing is
happening inside that one method
now that’s quite a lot of code to put on
a slide so we’re going to do something a
bit simpler here i’m going to use the
closure iterator
structure to simplify things with that
we can take all those five pieces and
compress them into a single function
with a single closure inside it
first we have our range iterator
function here we return the closure
iterator
that replaces the constructor
effectively
then you have a few variables in what
we’ll call a state block it saves the
state of the iterator this allows us to
just use variables it would be captured
by the closure instead of a struct
then instead of the next function and we
have an advanced one which is just a
different naming
and instead of returning true or false
and we also call the width value with
error or exhausted method functions if
we return exhausted it means that we
just exhausted the iterator if we return
with value it means that we have a value
to return and the iteration continues
and if we return with l it means that we
encounter the l and we need to stop
and now let’s go back to the prefix
example and a really simple nested loop
printing all the books
and let’s see how it’s implemented as an
iterator
this is the code we get which is quite a
bit longer and has quite a few issues in
my opinion
first we have some weird initialization
we’re initializing the book index to
minus one instead of zero just to make
the iteration simpler later
and we have some weird index things
going on
and the most critical part we’re
iterating in the wrong order instead of
starting with the homes and then moving
to the shelves and then the books we’re
starting with the books going to the
shelves and then the rooms
this does not look anything like the
code we just saw before and so coming
into that you won’t necessarily know
what it’s even doing
what is all that go simplicity
and i’m sure some of you may have better
solutions to that
but i did not this took me quite a bit
of time to implement and i had to write
tests to make sure that it actually
works while
here i just you can see that it works
simple loop
and yeah sure but maybe better ways to
do that in go right now and you can
train on that and be really good at
implementing iterators but the next
person looking at the code would not be
as good as you are and so this becomes a
real maintainability issue and often
makes you avoid iterators altogether i
mean why would i do it if it’s so much
work and so easy to make bugs though and
how to use later
so we’re gonna use generators
here’s the same code but implement it
with a generator you can see the iterbox
function and you can see the parental
books function we had before there are
three differences all together first
instead of having no return value we
return a generator and for our purposes
generator implements the same iterator
interface
then instead of printing we yield the
value we’ll get into exactly what that
means in a moment
and finally because it’s a function with
a return value we have to return
something we’re returning nil so what’s
actually going on here
well
all generators are created using a
generator function generator function is
any function that uses yield
and when we call a generator function
instead of executing the function we
just return a new generator will not
execute any code inside that function
as you can see here we call it box and
we return a generator then when we call
next we start executing the generator
function we execute it all the way until
we encounter the yield the yield call
when we do that we return to form next
and store the value and so it can be
fetched later
then when you call iterator.value we get
that value that you just yielded
then on the next call to next and we
keep going either we encounter next
yield again or we get to return a return
statement when you get a return
statement you just stop iteration
now if you had an error to report we
just return that error and we’ll get it
for the error method here we don’t have
an l so we just return nil
and
that’s with generator syntax now
sadly as nice as that looked that is
currently at least and as far as i know
it’s not planned anytime soon that is
not go it’s pretend though that i wrote
to demonstrate what we’re gonna do
i want that to be go it would be really
nice for me because i like that code but
let’s not go it will not run today
so
one option was okay sure i can change
the compiler or submit a proposal and
but i want you all to be able to
experiment with that as well so what i
did instead is use code generation it is
to take the code our pretend goal and
create from that actual go code that can
execute
and we’re going to use some go to
exclude because go really has amazing
tooling to solve that issue first we’re
going to use go generator when you have
a file with pretend go and it defines
generators we’re going to add a go
generate line to run our generation tool
and generate the file the new code then
we’re going to add a build tag
specifically enabled genjen for
generating generators it will allow us
to separate the fake go code the pretend
code with generator definitions from the
actual code with the generator
implementations so generated code would
not have the same tag and then we know
that they do not conflict and we can
just run the implementations
um so next thing you have to do is start
transforming the code from our generator
definitions that using a pretend go to
actual go code here we start with the
simplest generator possible which is an
empty generator that basically returns
now and yields no values
first thing we do is we copy that
generator code just the body of it the
return nil into a closure iterator into
the advanced function
then
we replace return nil with an exhausted
because that’s all we want to report
generator is exhausted
next the next simplest one is just
returning an arrow this works the same
but instead of returning
exhausted we return with l and return
the l that we had again we’re generating
that code and only the new code would be
compiled
and next we’ll yield values because
that’s the core of an iterator like
saying i have an error i have no values
to report that’s not interesting but i
have actual values so we have an hello
world a generator
and if you’re wondering the yield
function is just a placeholder it does
nothing
so the first thing you do again is copy
that code into the advanced method then
replace the yield with return with the
value as we
discussed previously
now we have an issue if we keep calling
that advanced function we will always
return the same value we will never stop
because we never reach return nil
and for that we’re going to use
quite a bit of messy code see
we’re going to use gotos and labels
to mark the parts of the code we want to
execute we’re going to add a label 1 at
the beginning of the original function
before the yield and one after the yield
then we’re going to add a switch
statement at the beginning of the
function to decide where we want to go
do we want to go to the first statement
the first label or the second one
lastly
in the state block that we mentioned
before with the next we have the
variable to tell us which label we need
to go to and before we turn from the
function we change the variable to the
right value
and then first time go into the function
reach return with value and the second
time we reach return nil which will be
replaced again with return exhaust
because we just terminated the iterator
now
we are using goto which is great
it might scare some of you but you need
to remember that ghost go-to is not c’s
go to seas go to is a mess ghost go too
which again you should not use unless
you have a really good prison is a lot
safer
there are two things two limitations
added in gold that make it really nice
to use well not nice use but less
dangerous one we cannot skip over
variable declarations so think about if
you have a variable declaration you use
a go to jump over it
what is the state of that variable is it
initialized is it not initialized what’s
going to happen i don’t want to think
about it so go just disallows it
entirely
and same for blocks
in c you can just jump wherever you can
jump into the middle of a block now
consider an if statement you have a
condition that should hold for all the
code inside that block if that does not
hold and you just jump into it
your code is going to act very odd and
might eventually have serious issues so
we just disallow that as well those two
things are impossible in go
but in our generators obviously we’re
going to define variables because we
need them and we’re going to have blocks
because we are doing control flow
so we need to circumvent those issues
and in the first issue variable
declarations instead of keeping them
inside the function we’re just going to
take all of the variables that we define
and move them all into the state block
and this does two things one it means
that all the gotos are going to be after
the variable declarations so we’re good
and the other thing is we need to
maintain the state of those variables
between different calls to advance and
that solves this automatically for us
because they’re captured by the closure
next blocks
blocks generally have two purposes one
is scoping of variables and two is
control flow since we took all the
variable definitions out of the blocks
they no longer do any scoping and they
only do control flow
and again using gotos we can achieve
similar results without blocks
let’s look at an if statement here we
have an if
and we have the control flow graph that
represents that if we start with a in
condition then if the condition is true
we go to the then block if it’s false we
go to the else block
and eventually we go to the after logo
whatever is after that if
now let’s see what we need to do to
transform that and get all the code out
of the blocks first we have a couple of
labels and they don’t do anything they
just mark the relevant parts of our code
then we had a couple of go-tos those
gotos are useless and they just
replicate the same behavior we already
had which is at the end of each block
jump
after the loop the statement
then
we can take the blocks move them out and
use a bit of redirection we’ve got us to
jump to them
now if we have the condition
and it’s true we go to the we have the
go to then label we jump to the code
that was originally inside
perform the three yields and then go to
the after label so the flow is
maintained even though we don’t need to
use the blocks this means that now we
can add a label after each shield
statement and just jump to though
that’s it for the if now we’re going to
go for a few other control structures
and see what we do with them
first a forever loop here we have a
simple loop that iterates
infinitely
and the first thing we do as we did
before we add labels they don’t change
anything but they give us the general
structure
then we’re the go to to jump from the
end of the loop block to the beginning
and now if you can see we have an
infinite loop already without using the
false statement so we can just remove it
and we have an infinite loop just as we
did before but no blocks so again we can
just label whatever we want
next we have a while loop
and while loop is nice because it
combines two things we already used
before forever loop it loops infinitely
and an if statement
so let’s model it as those
see now we have a loop an infinite loop
with a condition with a condition inside
it
and since we already transformed both of
those structures we just repeat the same
transformations and get this and you may
notice that i missed the jump to the
loophead here
but it should be right after the
increment
and this gives us a again a while loop
next we have c style loops which we will
again construct using all the things
we’ve seen before e we define a variable
declare a variable that would jump to
the state loop to the state block
and we transform the loop into a while
loop
and we have the increment inside the
loop from here we can do what we did
before
main difference here that we need to
notice is if we have a continuous
statement inside the loop instead of
going to the loop head it should go
right to the increment otherwise would
skip then in the increment
i mean the last structure we’re going to
cover today is a full range this is
similar to what we did before but
slightly different because here we don’t
have an explicit variable to hold the
state when we iterate over a slice we
just go over it if you just tell me to
stop at a certain point i can’t really
tell you where i am and continue from
that point so for that and we’re gonna
use adapters and adapters take a slice
or a map or a channel and convert it
into an iterator but we can use the
iterator interface with next and value
to iterate over
and then again an iterator has a
variable definition which we handled
before it has a while loop a 4 with a
condition and it has the body of the
loop
all of that has been modded before so we
could just use that and for map we have
an extra complexity because we need to
first convert it into a slice
because we need to guarantee that the
order is okay every time we access it
and of course the order of iteration of
the map is not guaranteed and we don’t
want to just duplicate the values and
so there are more control structures in
go and some of them can be modeled
most of them some of them don’t really
make sense think of default statements
if you defer functions execution where
is it gonna run when is it gonna run
when we finish execution of the entire
generator when we finish a single
iteration
it doesn’t really make sense to model
but everything else can be modeled even
if it’s a bit
a bit of walk
so with so we’ve seen what generators
are how they help us write iterators
we’ve gone through the basics of the
implementation and the last thing to do
now is a demo
here we can see the code for fibonacci
sequence generator and we can see the
build tag at the top and you can see us
importing the gen gen package which
we’ll use we have the go generate line
to actually generate that code and we
have the generator itself we have two
values we initialize two variables we
initialize them to one and then in an
infinite loop we just keep pumping up
the value out the values
yes and we’ll go and overflow fairly
quickly but that’s not the point here
then we write go generate to generate
our code and we get something that looks
like this more or less first you can see
the gen the not changing tag in the top
which means that those files would never
conflict and that this file would
actually get compiled and then you see
the actual the implementation of the
generator this time no yield and no
wheel returns just regular go code
and you can see the double underscore
before next just to make sure that it
doesn’t conflict with any variable names
defined in the original function
and all we have left to do is run it so
we create a fibonacci generator and we
loop over it of course we only want to
take 10 items out of it not let it run
infinitely because that won’t stop
and
we run
and we get the fibonacci sequence
and that’s it
[Applause]