What's In A List—Yes, But What's *Really* In A List
My son's Python homework and what it tells us about Python's data structures
I noticed my son had a Python script on this computer screen. I couldn't help peeking. He was working on his computing homework for school.
And I spotted this line in his code:
I asked him what the purpose of this line was. He claimed he saw his teacher use it to create a list with six blank entries.
Hmmm…
So I asked him to check what films
is in a REPL, and he realised things weren't as he expected:
What he really wanted, probably, is a list with six empty strings or None
repeated six times:
Yes, this is cool. But let's look at an alternative for this article:
But beware! This solution hides many mysteries behind its simple appearance, and it can lead to somewhat unexpected consequences in some situations.
Let's explore.
What's In A List?
Let's start with a plain old list:
What does this list contain?
Does your answer resemble this one?
The list contains the integers
5
and10
, the string"The Python Coding Stack"
, and another list containing two more strings,"another"
and"list"
.
And this answer is perfectly acceptable in most situations. It reflects how we use lists. They contain other objects.
However, they don't.
The list contains references to other objects. It doesn't contain the objects themselves.
What's the difference? Does it really matter? Isn't this just an insignificant technicality?
You're likely to see examples using a copied list to demonstrate why this matters. Let's look at this briefly by making a copy of some_list
:
This creates a shallow copy. The new list contains the same references as the original list. You can confirm this using Python's built-in id()
:
These lines show the identity of the last element in each list. The identity number is the same. This can only mean that they refer to the same object since an identity value is unique.
But let's explore this idea using the example of a list multiplied by an integer—the example I started this article with from my son's homework. He was writing a short program to store the names of films (or movies for those who use the English language’s North American dialect.)
But I'll use a slightly different example. Let's assume you want a list containing six empty lists. You may be tempted to try the following:
But don't sound your victory horn just yet. The output shows a list containing six empty lists. That's what you want, right?
But let's look at the identities of those six lists:
All six elements in the list have the same identity. This means they're the same object. There's only one list repeated six times.
Here's why this matters. In this example, each of the six lists may contain any number of film titles, such as to keep track of the favourite films of six people. Add a film to the list in the first slot in films
:
But see what happens when you display the entire list:
The film name seems to be added to all six lists—and that's because there's only one list, but repeated six times.
Why does this happen? Let's go back to how you created the list films
:
Consider this expression: [[]]
. The outer list is not empty. It contains another list. However, the inner list is empty. But, as we discussed earlier, the outer list doesn't really contain the inner list. Instead, it contains a reference to it.
Therefore, when you multiply the outer list by 6
, you copy the reference to the inner list six times. As it's the same reference, it refers to the same list. All six slots in films
refer to the same inner list.
If you want six empty lists, you can use the following solution:
The output looks similar to the one you got earlier. But let's look at the identities of the inner lists:
The identities are all different. Therefore, films
now has six different inner lists rather than the same one repeated six times.
Moral of the story: Beware of populating a list by multiplying another list by an integer.
Mutable or Immutable Items
How about the examples I showed you at the beginning:
These scenarios are different. Well, they're actually the same since both of these lists contain the same item repeated six times. But the contents of films
are immutable objects in both scenarios. Strings are immutable data types. And None
is also immutable–it's also a singleton since there's only one None
object in any program. In the previous section, you had lists as the elements of films
, and lists are mutable data structures.
When the list's elements are immutable, it doesn't matter that they're the same object since you can't modify them anyway. That's what immutable means, right?! If you want to change what's in films
, you need to replace one of the references with a reference for another object. There are no pitfalls in this case!
…And to Finish Where We Started
I started the article with this inaccurate code:
The list on the right-hand side of the equals is an empty list []
. It has no references within it. It's empty. Therefore, there's nothing to multiply by six. Zero times six is… you got this one!
So, films
remains empty.
I used a list as an example in this article. Lists don't contain their objects but references to those objects. But this is also true for other data structures in Python, not just lists. All data structures contain references to the items they contain rather than the items themselves.
Unlike containers in the real world, such as the Tupperware box that does contain last night's dinner leftovers, Python containers only contain a note telling you where to find the object you're looking for!
Code in this article uses Python 3.12
For more Python resources, you can also visit Real Python—you may even stumble on one of my own articles or courses there!
And you can find out more about me at stephengruppetta.com
Further reading related to this article’s topic:
My Neighbours Are Moving House • Mutating The Immutable Tuple (Sort Of)
Python's Mutable vs Immutable Types: What's the Difference? – Real Python
Appendix: Code Blocks
Code Block #1
films = [] * 6
Code Block #2
films = [] * 6
films
# []
Code Block #3
films = ["" for _ in range(6)]
films
# ['', '', '', '', '', '']
films = [None for _ in range(6)]
films
# [None, None, None, None, None, None]
Code Block #4
films = [""] * 6
films
# ['', '', '', '', '', '']
films = [None] * 6
films
# [None, None, None, None, None, None]
Code Block #5
some_list = [5, 10, "The Python Coding Stack", ["another", "list"]]
Code Block #6
copied_list = some_list.copy()
Code Block #7
id(some_list[-1])
# 4508264064
id(copied_list[-1])
# 4508264064
Code Block #8
films = [[]] * 6
films
# [[], [], [], [], [], []]
Code Block #9
for film in films:
print(id(film))
# 4508322432
# 4508322432
# 4508322432
# 4508322432
# 4508322432
# 4508322432
Code Block #10
films[0].append("The Shawshank Redemption")
films[0]
# ['The Shawshank Redemption']
Code Block #11
films
# [['The Shawshank Redemption'], ['The Shawshank Redemption'],
# ['The Shawshank Redemption'], ['The Shawshank Redemption'],
# ['The Shawshank Redemption'], ['The Shawshank Redemption']]
Code Block #12
films = [[]] * 6
Code Block #13
films = [[] for _ in range(6)]
films
# [[], [], [], [], [], []]
Code Block #14
for film in films:
print(id(film))
# 4508368640
# 4508356480
# 4508145664
# 4508332416
# 4508322304
# 4508324288
Code Block #15
films = ["" for _ in range(6)]
films
# ['', '', '', '', '', '']
films = [None for _ in range(6)]
films
# [None, None, None, None, None, None]
Code Block #16
films = [] * 6
films
# []
For more Python resources, you can also visit Real Python—you may even stumble on one of my own articles or courses there!
And you can find out more about me at stephengruppetta.com
Great post. Most python devs don't realize that pretty much everything in cpython uses C pointers and it's useful to understand how they work. I wrote this reply to a question on variable swapping the other day and think it is a useful complement to your post:
Variable swapping:
a,b = b,a
The commenter asked if temp variables are used in the bytecode/c code to make this bit of magic happen. Here was my response:
In the C language python implementation (cpython) tuples (righthand b,a above) use pointers to PyObjects. In order to achieve the swapping behavior the C code creates a new tuple which consist of the same PyObjects but with new pointers (call them a' and b') and these are used to assign b' to a and a' to b. So in effect two 'temp vars' are created but they are lightweight pointers thus very cheap.
On a related note it's important to remember that in cpython parameters are passed as "reference by value". This means that any var x passed into a method and reassigned a new value won't be reflected in the variable outside that call stack.
x=5
def modX(x: int):
x = 10
return x
y=modX(x)
assert x == 5
assert y==10
Of course if there is no reassignment inside the method this doesn't happen. If instead you pass a class object and update some of its values, e.g., a dict key addition, then the reference outside the function will still point to that updated dict - since as Stephen points out a dict and most class instances are mutable.
Similar to the variable swapping example, the reference (a C pointer for our purposes) is passed "by value" which means it's a copy of a pointer (holding a memory address), not the address space that holds the pointer itself. Thus when a new address is assigned to the reference/pointer inside the function it is not reflected in the pointer variable outside the call stack.
In order to have that effect we would need to perform the call thus:
x = modX(x)
since then the pointer to x outside the call stack will be reassigned to the copied pointer passed into modX.