A One-Way Stream of Data • Iterators in Python (Data Structure Categories #6)
Iterators • Part 6 of the Data Structure Categories Series
It's easy to confuse the terms iterable and iterator. I started this series on data structure categories with iterables. In this article, I'll shift the focus to the iterator.
In the first article, you read about the origin of the prefix iter- that's in both words. It comes from the Latin word for "path" or "journey".
So, let's focus on the endings of both terms, -able and -ator. These are not uncommon endings in English words. Here are a few examples:
"Navigator" and "Navigable": A navigator is a person or object that navigates through something that's navigable. A waterway, say, is navigable since it can be navigated. If you excuse the turn of words, a waterway is "able to be navigated", which is why it has the ending -able.
"Calculator" and "Calculable": A calculator is a person or object that calculates something that can be calculated. And an arithmetic problem is calculable if it can be calculated.
There are other pairs of words that follow this pattern in English and many more that have one of the forms but not the other. Words ending in -ator are called agent nouns, and they're created from other words that usually denote an action. The -ator agent nouns refer to someone or something performing that action.
"Iterator" is to "Iterable" as "Calculator" is to "Calculable" or "Navigator" is to "Navigable".
In this article, you’ll read the terms iterable and iterator often, along with the occasional use of iteration. The similarity between these words doesn’t make it easy when switching repeatedly between them. Beware!
The Data Structure Categories Series
We're on the sixth of seven articles in this series. You can read the previous ones by following the links in this overview:
What's an Iterator? The Short Version
The foray into linguistics helps us understand the difference between these two similar but distinct terms.
An iterable is a structure that can be iterated. You're able to iterate through it.
An iterator is the entity that performs the iteration.
Let's look at an example to put this in context. I'm showing this in an interactive console/REPL session, and I recommend you also experiment with this in a REPL rather than a script:
The object numbers
is a list. A list is iterable, as you've seen earlier in this series. You can create an iterator from an iterable using the built-in function iter()
. In this example, you assign the object returned by iter()
to numbers_iterator
.
The variable numbers_iterator
is of type list_iterator
. You can guess from its name that it's an iterator associated with lists.
An iterator doesn't hold any data
An iterator doesn't have its own data. It relies on data that's available elsewhere. The iterator keeps track of which element in another data structure is next.
Let's get back to the code you have in your REPL session. You have a list called numbers
. This is the iterable. You create an iterator from this list. The iterator marks the beginning of the list. You can imagine it "waiting" just before the first element of the list, which is the integer 2
.
You can move the iterator using the built-in function next()
. You can try this in the same REPL session. I'll show the whole REPL session for clarity, but only the last line is new:
The iterator numbers_iterator
moved across the first item and is now waiting between the first and second elements in the list numbers
.
The built-in function next()
returns the element the iterator has "moved across".
I'm using the terms "waiting" and "moved across" loosely. Their purpose is to give a visual representation of what's going on.
You can call next()
a couple more times (I've removed the calls to type()
in this example, for brevity):
Each time you call next()
with the iterator as its argument, the next element is returned, and the iterator moves to the next item, ready for when it's needed again. At the end of the REPL session shown here, the iterator numbers_iterator
is "waiting" between elements 10
and 3
in the list numbers
.
The iterator numbers_iterator
doesn't have its own data. It uses the data in the list numbers
. However, the list is not altered in any way:
An iterator is a one-way stream
The iterator can only move forward. It's a one-way system. Each time you call next()
, the iterator moves to the next item. But it can never go back.
Let's see what happens when you reach the end:
Each call moves the iterator across the next element. However, after the sixth call to next()
, when the value 42
is returned, the iterator has reached the end of the list. It's gone past the last element and sits just after the last item. There are no elements left. So, the next time you call next()
, the iterator has nowhere to go. A StopIteration
exception is raised.
You've Been Using Iterators Forever
You've been using iterators since you started coding in Python, even if you've never used the built-in functions iter()
and next()
, as you did in the previous section.
That's because whenever you use a for
loop, an iterator is created from the iterable you loop through.
Let's assume you have the following for
loop statement, using the same list as in the previous example:
for number in numbers:
The object you use at the end of a for
loop statement must be an iterable. The loop creates an iterator from the iterable. This is equivalent to what you did earlier when you called iter()
. But this step happens "behind the scenes" in a for
loop.
The for
loop then calls the next()
built-in function with the iterator as an argument. The value returned is assigned to the variable name number
.
Each time the block of code in the for
loop is repeated, next()
returns the next item, and the iterator moves on. And once the StopIteration
exception is raised, the loop knows it has reached the end.
In the first part of this article, you called iter()
once and next()
repeatedly. These steps, which you performed manually, occur behind the scenes when you use a for
loop. And this process is not unique to the for
loop. It occurs whenever you're iterating through an iterable. You'll see examples of this later in this article.
One iterable. Many iterators
We've discussed how iterators don't have their own data. They "borrow" the iterable's elements. You can have several iterators created from the same iterable. Let's look at an example using the same list you've already used:
There's only one list in this program, the list called numbers
. But you're using the list in two separate for
loops, one nested inside the other. This means you're looping through the same list more than once simultaneously.
Each for
loop creates its own iterator from this list. Let's refer to the iterator created by the first for
loop as the outer iterator. The second for loop creates an inner iterator.
How many iterators are created when this code runs? Take a moment to think about the answer…
...we'll return to this question soon.
The first for
loop statement runs first and creates the outer iterator, which is "waiting" just before the first element, 2
. The outer loop calls the next()
function on the outer iterator. This returns 2
, which is assigned to number
. The inner loop then calls next()
on the inner iterator. This is the first time the inner iterator is used, so it also returns 2
, which is assigned to another_number
. You can see this in the first line in the printouts, which shows that both values are 2
. This is marked as checkpoint ① in the diagram below.
The program is running the inner loop at this stage. After printing out the values, it calls next()
on the inner iterator again. This returns 5
, which is assigned to another_number
. The inner iterator is now "waiting" between 5
and 10
. However, the outer iterator is still located between 2
and 5
. The second line of the printout shows this. This is marked as checkpoint ② in the diagram below.
Fast forward to the point when the inner iterator has just moved past the last element in the list, 42
. The outer iterator is still "waiting" between 2
and 5
. This is marked as checkpoint ③ in the diagram below.
In the next iteration of the inner loop, the inner iterator raises a StopIteration
, which indicates that the inner loop is over. It's only at this stage that the outer iterator moves to its next location, returning the second element 5
, which is assigned to number
.
But now, the inner for
loop will start again. It creates a brand new iterator. The new inner iterator is now back at the beginning of the list and is ready to move across the list's first element. The inner loop starts iterating again. After the first iteration, it returns 2
. This point in the program is represented by the printout line that shows Outer: 5 | Inner: 2
. This is marked as checkpoint ④ in the diagram below.
I won't go on! You can visualise this process using the diagram below, which I referred to a few times in the preceding paragraphs. This diagram only shows the first three iterations of the outer loop. You can extend it to cover the remaining three, if you wish.
A reminder that even though this diagram shows the elements of the list repeatedly, there is only one set of data: the original list numbers
. All the iterators use the same data that's in the list.
What's an Iterator? The More Detailed Version
In the first article in this series, you saw that an iterable must have the __iter__()
special method. This enables you to create an iterator from the iterable.
An iterator must have the __next__()
special method, which determines how the iterator can fetch the next item and from where it can do so. The special method should also raise a StopIteration
when the iterator reaches the end of the iterable.
A small note: The constant switching of terms between iterator and iterable makes these discussions trickier. There's little we can do to reduce this. Hopefully, the linguistic discussion at the start helps to ease the load. But the next two paragraphs won't be an easy read!
An iterator is also iterable. Therefore you can use an iterator directly where you would use an iterable. And some iterables act as their own iterators. But even when there are separate classes for the iterable and the iterator, as in the case of the list, the iterator is still iterable.
This means that an iterator also needs an __iter__()
special method. In this case, the iterator's __iter__()
method returns itself. As you read in the article about iterables, the __iter__()
method must return an iterator. An iterator's __iter__()
method satisfies this requirement by returning the iterator itself.
Are you glad those two paragraphs are over?
Recreating lists and list iterators
Let's look at an example by creating two classes called MyList
and MyListIterator
. These mimic the list
and list_iterator
built-in classes. I'm only creating these classes to illustrate the point by using custom classes. The classes are not really that useful since they do what list
and list_iterator
already do! But they'll demonstrate the point. You can experiment with this in a script this time:
Let's look at MyList
first. It takes an iterable as an argument in __init__()
. You could limit this to a list, but this implementation works for any iterable. This iterable is assigned to the data attribute self.items
.
MyList
also has an __iter__()
special method, which makes instances of this class iterable. This special method returns a MyListIterator
object. This object must be an iterator for MyList.__iter__()
to be valid. It does have "iterator" in its name, but that doesn't guarantee it's an iterator!
So, let's look at the MyListIterator
class. It has access to the iterable, which is assigned to self.items
in MyIteratorList
. There's also an index that's set to 0
when the object is initialised.
Let's not spend too long on MyListIterator.__iter__()
. This ensures that the iterator is also an iterable, but you won't use this method!
This class also has a __next__()
special method, which makes the class an iterator class. The index is used to fetch an item from the iterable, and the index is incremented by 1
each time you call __next__()
. If the index goes beyond the iterable's last index, this method raises a StopIteration
. Note how the index increases, but it never decreases. This is the reason why I described an iterator as a one-way stream.
Let's see what happens when you replicate the nested for
loop from the previous section, this time using MyList
instead of a standard list:
Note that this code is nearly identical to the one in the previous section, but it uses a MyList
object instead of a list.
Let's run this code. Look out for the 'Calling __iter__ in MyList'
printouts:
Calling __iter__ in MyList
Calling __iter__ in MyList
Outer: 2 | Inner: 2
Outer: 2 | Inner: 5
Outer: 2 | Inner: 10
Outer: 2 | Inner: 3
Outer: 2 | Inner: 99
Outer: 2 | Inner: 42
Calling __iter__ in MyList
Outer: 5 | Inner: 2
Outer: 5 | Inner: 5
Outer: 5 | Inner: 10
Outer: 5 | Inner: 3
Outer: 5 | Inner: 99
Outer: 5 | Inner: 42
Calling __iter__ in MyList
Outer: 10 | Inner: 2
Outer: 10 | Inner: 5
Outer: 10 | Inner: 10
Outer: 10 | Inner: 3
Outer: 10 | Inner: 99
Outer: 10 | Inner: 42
Calling __iter__ in MyList
Outer: 3 | Inner: 2
Outer: 3 | Inner: 5
Outer: 3 | Inner: 10
Outer: 3 | Inner: 3
Outer: 3 | Inner: 99
Outer: 3 | Inner: 42
Calling __iter__ in MyList
Outer: 99 | Inner: 2
Outer: 99 | Inner: 5
Outer: 99 | Inner: 10
Outer: 99 | Inner: 3
Outer: 99 | Inner: 99
Outer: 99 | Inner: 42
Calling __iter__ in MyList
Outer: 42 | Inner: 2
Outer: 42 | Inner: 5
Outer: 42 | Inner: 10
Outer: 42 | Inner: 3
Outer: 42 | Inner: 99
Outer: 42 | Inner: 42
The output is similar to the one you obtained in the previous section, but there are additional lines showing that the __iter__()
special method in MyList
is called.
The first line shows the case when the outer loop calls MyList.__iter__()
. The outer iterator is created at this time. This is the only time the outer loop creates an iterator.
There are six further printouts showing that MyList.__iter__()
was called. These are all called by the inner loop. Six different inner iterators are created. Each one is exhausted before the outer loop executes the inner loop again. The diagram I showed earlier shows that each inner iterator is different.
Therefore, this code creates seven iterators from a single iterable. Was this the conclusion you reached earlier?
But the elements in the original iterable are never duplicated. Each of the seven iterators uses the same data, the elements in the iterable.
Other types of iteration
The for
loop is not the only place we see this iteration protocol. Let's use MyList
again. This time, you'll use a list comprehension:
In the last line, you create a list comprehension using numbers
, which is an instance of MyList
. Note that you're just creating the list comprehension without assigning it to a variable. In practice, you wouldn't want to do that. But this is fine for this demonstration. You can confirm that this process also creates an iterator from numbers
by running the code:
Calling __iter__ in MyList
The printout shows that MyList.__iter__()
was called.
You can try using numbers
in the map()
function:
Same caveat as earlier: the returned value is not assigned to a variable in this case, but it doesn't matter in this example. You get the same printout when you run this code:
Calling __iter__ in MyList
The map()
function uses the same iteration protocol as the for
loop and list comprehensions.
And here's one final example:
You use the *
operator to unpack the iterable within print()
. And this also uses the same protocol:
Calling __iter__ in MyList
2 5 10 3 99 42
The unpacking creates an iterator from the iterable.
Let's finish where we started this article. "Navigator" and "Navigable" are different but related. Something can be navigated. And something or someone else does the navigation.
The same applies to "Iterable" and "Iterator". An iterable can be iterated. But the iteration is done by the iterator.
The final article in this series will return to the topic of iterators as we discuss generators.
Unlike in other articles in this series, there's no Etymology Corner as the article's introduction deals with this already and because the etymology is shared with that for iterables.
Next in the series: generator
Code in this article uses Python 3.11
Stop Stack
#19
Recently published articles on The Python Coding Stack:
Collecting Things • Python's Collections (Data Structure Categories #5) Collections • Part 5 of the Data Structure Categories Series
And Now for the Conclusion: The Manor's Oak-Panelled Library and getitem()[Part 2]. The second in this two-part article on Python's
__getitem__()
special methodThe Anatomy of a for Loop. What happens behind the scenes when you run a Python
for
loop? How complex can it be?The Manor House, the Oak-Panelled Library, the Vending Machine, and Python's getitem() [Part 1]. Understanding how to use the Python special method
__getitem__()
. The first of a two-part article"You Have Your Mother's Eyes" • Inheritance in Python Classes. Year 5 at Hogwarts School of Codecraft and Algorithmancy • Inheritance
Recently published articles on Breaking the Rules, my other substack about narrative technical writing:
Whizzing Through Wormholes (Ep. 2). Travelling to the other end of the universe—with the help of analogies
Sharing Cupcakes (Ep. 1). Linking abstract concepts to a narrative • Our brain works in funny ways
Once Upon an Article (Pilot Episode) …because Once Upon a Technical Article didn't sound right. What's this Substack about?
Are You Ready to Break the Rules? Narrative Technical Writing: Using storytelling techniques in technical articles
The Different Flavours of Narrative Technical Writing. Why I'm using more storytelling techniques in my Python articles
Stats on the Stack
Age: 2 months, 4 weeks, and 1 day old
Number of articles: 19
Subscribers: 690
This article is a one-way communication. But if you want a conversation, then feel free to comment below, or even better, engage in a conversation in the Substack Chat or Notes
Most articles will be published in full on the free subscription. However, a lot of effort and time goes into crafting and preparing these articles. If you enjoy the content and find it useful, and if you're in a position to do so, you can become a paid subscriber. In addition to supporting this work, you'll get access to the full archive of articles and some paid-only articles.