Pay As You Go • Generate Data Using Generators (Data Structure Categories #7)
Generators • Part 7 of the Data Structure Categories Series
It's often useful to think of data structures as storage units with objects inside them. A list could be a shelf with numbered boxes in a row. And you can picture a dictionary as a locker system where each locker has a label.
In these examples, the items are stored in these storage units. However, you could also have a structure that creates the objects as and when they're needed. You can imagine a 3D printer that creates the item you need when you need it rather than storing it in a unit.
A generator doesn't store any of its data. Instead, it creates each item when it's needed. If this sounds familiar, it's because you read something similar in the previous article in this series about iterators. A generator is an iterator. We'll see what makes a generator a generator in the rest of this article.
The Data Structure Categories Series
We reached the final article in this series. You can read the previous ones by following the links in this overview:
Generators, Generators, and Generators
When we use the term "generators", we normally refer to generator iterators. However, we could also be referring to generator functions or generator expressions.
Confused? Let's start clearing some of that confusion. And the best way to do so is to look at some examples.
Generator expressions
Let's start with generator expressions.
In fact, let's not start with generator expressions. Let's start with list comprehensions instead:
The expression within the square brackets creates a list containing all the numbers the expression represents. The result is the list [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
. The list numbers
contains all the elements within it.
Let's replace the square brackets with parentheses—they're the round brackets. How much of a difference can this small change in the type of brackets make? [Spoiler alert: a lot]
The first line is similar to a list comprehension. But beware, it's not a tuple comprehension, even though round brackets replace the square brackets. After all, it's not the round brackets that make a tuple.
This is a generator expression. It creates a generator iterator, which is assigned to numbers_gen
. The object created, the generator iterator, doesn't store any data. Nor does it reference data stored elsewhere. Instead, it will generate the data when it's needed.
Note that generator iterators are iterators. As you read in the previous article in this series, one way of getting the next item from an iterator is to use the built-in function next()
:
And you can keep calling next()
until you run out of values to generate:
We'll talk about generator iterators in more detail later.
Note for the pedants: I'm using the term "storing data" for lists and other data structures that contain data. However, these structures don't store the data within them. Instead, they hold references to objects stored elsewhere. However, this point is not too relevant in the context of this article. Therefore, for the sake of this article, I'll continue to use the term "store" to differentiate between data structures like lists, which hold data, and generator iterators, which do not.
Generator functions
Let's look at another way of creating generator iterators. You can define the following function:
This function definition is similar to a standard function definition but uses the yield
keyword instead of return
. A function definition that contains a yield
statement is a generator function. A generator function also creates a generator iterator. Let's try this out:
The name numbers
refers to a generator object. Incidentally, the generic terms "generator object" and "generator" normally refer to a generator iterator.
When you create a generator object using a generator function, the function is paused at the beginning. Each time you call next()
, the function will execute all the lines up to and including the yield
statement. Then it will pause again:
The print()
function was executed, and the value 42
returned by next()
. The generator numbers
is currently in a paused state waiting for the following next()
call. However, when this occurs, there's nothing left in the function:
This raises a StopIteration
exception. You've seen this StopIteration
earlier in this article and when you read about iterators in the previous article in this series.
The following example will shed a bit more light on the process:
When you create the generator object using the generator function, the function is paused at the start. Here are the steps that occur when you start calling next()
:
The first time you call
next()
, the function starts from the beginning and prints out"I was here!"
. It also gives back the value42
. The function pauses just after it yields this number and waits for the next time it's needed.The second time you call
next()
, the function resumes from where it left earlier. It prints the second phrase,"I already told you: I was here"
, and gives back the number in the secondyield
statement,84
. The function pauses at this point.The third time you call
next()
, the function carries on and prints the third phrase and gives back the value in the thirdyield
statement.Finally, the fourth call to
next()
raises aStopIteration
since the generator function has reached the end.
Earlier in this article, you used a generator expression to make a generator that creates the doubles of the numbers from 0
to 9
. You can replicate this generator using a generator function:
You create the generator numbers
from the generator function get_some_numbers()
. The first time you call next()
, the for
loop in the generator function starts iterating. However, the function will pause each time there's a yield
statement. The following calls to next()
resume the for
loop and yield the following number.
You can also create a new generator iterator using the same generator function. Note that in the code so far, you consumed the first three numbers of the first generator numbers
before you create the second generator:
The two generators, numbers
and numbers_again
, are independent of each other even though they're created from the same generator function.
And you can also consume the generator using a for
loop:
Note that you had already used up the first few values of numbers
. Therefore, the loop resumed from the next available value.
Generator iterators
We've seen three distinct uses of the term "generator". Generator expressions and generator functions create generator iterators. Often, you'll see the term "generator" or "generator object" used to refer to the generator iterator.
I left the section about generator iterators for last since I've already introduced these and discussed them in the previous sections.
A generator iterator is an iterator, as the name implies. It doesn't hold any of its data, but generates values when they're needed. It operates on a "pay as you go" basis. You don't need to invest in creating the items and storing them before you need them, as you do in a list or other structures that store data.
Like all iterators, generator objects are iterable. You can use them in for
loops and wherever you need iterables. However, generators don't have a size, and they're not containers.
Etymology Corner
The term “generator” comes from the Latin generare, which means “to produce”. The root "gene-" has older origins and means "give birth". Therefore, a generator produces or gives birth to an item, one at a time!
This brings us to the end of this seven-part series on Data Structure Categories. Here's the diagram I presented in previous articles showing the hierarchy of the categories I covered.
In the diagram, you can observe the three categories that sit at the top of the hierarchy: iterable, container, and sized. Many data types you’re familiar with belong to all three of these.
You can also see the branch containing iterators and generators as quite separate from the others since they don't contain the data the represent. This makes them more lightweight and memory-efficient.
When you code, you deal with data types all the time. However, what matters most often is not the data type itself, but the properties you want to use. By considering the categories of these data types—whether they're iterable or sized, say—rather than just the types themselves, you can focus on the properties that are crucial when choosing the right data type.
Code in this article uses Python 3.11
Stop Stack
#27
Recently published articles on The Python Coding Stack:
Clearing The Deque—Tidying My Daughter's Soft Toys • A Python Picture Story Exploring Python's
deque
data structure through a picture story. [It's pronounced "deck"]The Final Year at Hogwarts School of Codecraft and Algorithmancy (Harry Potter OOP Series #7) Year 7 at Hogwarts School of Codecraft and Algorithmancy • Class methods and static methods
Tap, Tap, Tap on The Tiny
turtle
Typewriter. A mini-post usingfunctools.partial()
andlambda
functions to hack keybindings in Python'sturtle
modulePython Quirks? Party Tricks? Peculiarities Revealed… (Paid article) Three "weird" Python behaviours that aren't weird at all
The Mayor of Py Town's Local Experiment: A Global Disaster. Why variables within functions are local
Recently published articles on Breaking the Rules, my other substack about narrative technical writing:
Frame It • Part 2 (Ep. 9). Why and when to use story-framing
The Rhythm of Your Words (Ep. 8). Can you control your audience's pace and rhythm when they read your article?
A Near-Perfect Picture (Ep. 7). Sampling theory for technical article-writing • Conceptual resolution
The Wrong Picture (Ep. 6). How I messed up • When an analogy doesn't work
The Broom and the Door Frame (Ep. 5). How the brain deals with stories
Stats on the Stack
Age: 4 months, 2 weeks, and 5 days old
Number of articles: 27
Subscribers: 957
Each article is the result of years of experience and many hours of work. Hope you enjoy each one and find them useful. If you're in a position to do so, you can support this Substack further with a paid subscription. In addition to supporting this work, you'll get access to the full archive of articles and some paid-only articles.
A fascinating exploration of data structures in Python. The diagram you created helped to understand things in a broader context. They way you dwelved into the etymology of the names of data structures helped to understand them in a more fundamental intuitive way. Really learned and enjoyed a lot reading these articles. Thank you so much! 🙂
A very nice series of articles, thank you for writing it!