Python Backstage • Disassembling Python Code Using the `dis` Module
Let's look behind the scenes to see what happens when you run your Python (CPython) code
You're at the theatre. You're watching an impeccably produced play, and you're stunned at some of the visual effects. "How do they do that?", you're wondering.
But you're lucky to have a VIP pass, and you'll be touring backstage after the play to see what happens behind the scenes. The magic will be revealed.
What if you could go backstage at Python Theatre, too? What if you could see what happens behind the scenes when you run a Python program?
Well, you can.
Caveat: There are many levels of "behind the scenes" Python, ranging from understanding special methods and their connection to every Python operation to understanding the quirks of how functions, classes, data structures, and other operations work. But in this post, I'm going a bit further behind the scenes.
This post is relevant to CPython, the main Python implementation and almost certainly the one you're using, whether you know it or not.
I briefly pondered whether this article should be 3,000 or 30,000 words long. You'll be pleased to know that I opted for the shorter option!
Let's Go Backstage
Let's start with this short Python program:

I'm sure you don't need me to explain this code. By the way, I’m showing line numbers in the code snippets in today’s post as these will be relevant later on.
But what happens when you run this script? The Python interpreter converts this code into an intermediate form called bytecode. This is a lower level of instructions, and it's what's executed to produce the desired output.
And you can peek at what this bytecode looks like using the dis
module. This module allows you to disassemble this intermediate bytecode stage:
You call the dis()
function from the dis
module and pass a string with the same code you had in the earlier program. Note that the indentation of the argument to dis
, which is on multiple lines, is not the preferred format. This is needed because indentation matters in triple-quoted strings, and the triple-quoted string contains Python code, where indentation matters!
Here's the output from this code:
0 RESUME 0
2 LOAD_CONST 0 ('The Python Coding Stack')
STORE_NAME 0 (publication)
3 LOAD_NAME 1 (print)
PUSH_NULL
LOAD_NAME 0 (publication)
CALL 1
POP_TOP
RETURN_CONST 1 (None)
"Gobbledegook. This is not Python", I can hear you shout at your screen. Let's explore what's happening here, without going too much down the rabbit hole. By the way, I'm using Python 3.13 in this article. Some instructions have changed in recent Python versions, so your output may be different if you're using older Python versions.
The output from dis.dis()
is grouped into three segments separated by a blank line. This means there are three distinct sets of instructions in your program.
The first is the one we'll ignore for now. RESUME
initialises the execution context for the code object. It’s used in all sorts of code, including simple scripts, but plays a more significant role in generators or async functions, where execution may pause and resume. In this case, "resume" means start from the beginning!
The second group contains two instructions:
LOAD_CONST
loads a constant to the top of the stack. The stack is the memory area used to store and manage data during code execution. The constant in this case is the string"The Python Coding Stack"
. This constant is the first one defined in this program, which is why there's a0
afterLOAD_CONST
and before('The Python Coding Stack)
in the displayed output. Index0
refers to the first item.STORE_NAME
takes whatever is at the top of the stack, which is the string"The Python Coding Stack"
that was placed there by the previous instruction, and stores it in the first name defined in the program. That's the identifier (name)publication
. The0
next to it shows it's the first name used in this program.
Python names are also called identifiers. I'll use these terms interchangeably in this post.
This first block, which consists of two instructions, refers to the first line in this program:
Python performs two steps when executing this line.
The third group of instructions contains more steps:
LOAD_NAME
is similar to theLOAD_CONST
instruction you saw earlier, but instead of a constant, it loads a name (identifier). It's now the turn of the nameprint
. This is placed at the top of the stack. Note that there's a1
betweenLOAD_NAME
and(print)
sinceprint
is the second identifier used in the program (index=1
). The first one ispublication
used in the previous line of the program.PUSH_NULL
is one you can safely ignore for now. It places a null value on the stack. It's a relatively new addition used to tidy up the stack. It provides consistency with other operations, such as instance method calls, which requireself
as the first argument.LOAD_NAME
again. This time, the interpreter is loading the first identifier used in the code,publication
, and places it on the stack. Recall that this refers to the string"The Python Coding Stack"
following the first line of code.CALL
, you guessed it, calls the function that's on the stack with one argument—that's the number1
next toCALL
. The argument is also on the stack following the previous instruction.POP_TOP
is another instruction you can ignore. The previous step calledprint()
, which returnsNone
. Since thisNone
value made it onto the top of the stack,POP_TOP
removes it since it's not required.RETURN_CONST
shows that this is the end of your script, which Python treats as a code object. It returnsNone
to the Python interpreter running this script. This is the second constant used in this program, hence the1
shown. The first constant is the string"The Python Coding Stack"
.
Those are plenty of steps for a two-line program!
There's one bit of the output from dis.dis()
I haven't mentioned yet. This is the number in the first column shown at the start of each block of instructions.
Let's ignore the 0
ahead of RESUME
. The 2
displayed before the first block of two instructions refers to the line number within the code. The code in this case is whatever is included in the triple-quoted string you pass to dis.dis()
. But why isn't it 1
since publication = "The Python Coding Stack"
is the first line of your mini two-line program?
Because it isn't the first line. The triple-quoted string starts right after the first """
. But what's there right after the """
? Here's your answer:
There's a newline character, \n
, right after the triple-quoted string. So, what appears as the first line is actually the second. The first line is blank. There's also a blank fourth line!
In my code, I wrote the following:
I did this only to make the code more readable by placing the triple quotes on separate lines. You can remove the first and last blank lines in your mini-program if you prefer:
Rerun this script with this change and you'll see the line numbers listed as 1
and 2
since the code block in the triple-quoted string now only has two lines—there are no blank lines.
Disassembling Functions
Let's place your two-line program in a function:
Note that this time, you pass the function name directly to dis.dis()
and not a string with the code. This disassembles the function:
3 RESUME 0
4 LOAD_CONST 1 ('The Python Coding Stack')
STORE_FAST 0 (publication)
5 LOAD_GLOBAL 1 (print + NULL)
LOAD_FAST 0 (publication)
CALL 1
POP_TOP
RETURN_CONST 0 (None)
Let's look at the differences from the example presented earlier in this post.
The line numbers now refer to the lines within the whole script. So, the first line is line 3
, which is the line that includes def do_something()
. And the two blocks of instructions now refer to lines 4
and 5
within the script.
The second block of instructions (if you count RESUME
as the first block) is nearly identical to the earlier one. The interpreter loads the constant "The Python Coding Stack"
and places it at the top of the stack, and then it stores it in publication
. But there are two differences:
The first difference is not really important, but here it is anyway. The two constants in the code are still the string
"The Python Coding Stack"
andNone
, but the compiler stores them in a different order, so the string is the second constant, the one with index1
.The instruction to store the value by linking it to an identifier is now
STORE_FAST
rather thanSTORE_NAME
. The identifierpublication
is a local variable within the function, so the interpreter has a more efficient process to store the data.
How about the final block, the one that refers to the final line of code in the function?
In the previous example, when you ran the two-line program as a script, the first two instructions in the final block were
LOAD_NAME 1 (print)
andPUSH_NULL
. Now, these are replaced byLOAD_GLOBAL 1 (print + NULL)
. The nameprint
is a global variable, so theLOAD_GLOBAL
instruction deals with this specific case. It also pushes a null value on the stack so that there's no need for an explicitPUSH_NULL
. But you can ignore this null value.The second instruction is
LOAD_FAST
rather thanLOAD_NAME
. Once again,LOAD_FAST
deals with loading a local variable, which is more efficient than loading a global variable.
The rest is the same as in the previous example, except that None
is now the first constant (index=0
) rather than the second one.
Some Optimisation You'll (Nearly) Never Need To Do
Let's look at this code now:
The function do_something()
finds the maximum value within the list numbers
100,000 times. This is, of course, a waste of time since the answer will remain the same value, 8
, but bear with me…
Here's the human-readable representation of the bytecode that's output by dis.dis()
:
5 RESUME 0
6 LOAD_GLOBAL 1 (range + NULL)
LOAD_CONST 1 (100000)
CALL 1
GET_ITER
L1: FOR_ITER 18 (to L2)
STORE_FAST 0 (_)
7 LOAD_GLOBAL 3 (max + NULL)
LOAD_GLOBAL 4 (numbers)
CALL 1
POP_TOP
JUMP_BACKWARD 20 (to L1)
6 L2: END_FOR
POP_TOP
RETURN_CONST 0 (None)
We'll proceed more quickly now that you're familiar with some of these terms. Ignore RESUME
and let's focus on the first line of code within the function, which is on line 6. This is the for
statement:
LOAD_GLOBAL
loads the global namerange
(and also pushes a null value on the stack, which you don't care about much).LOAD_CONST
loads the constant integer100000
, which is the argument you pass torange()
in thefor
loop statement.CALL
shows it's now time to callrange()
with one argument.GET_ITER
represents the start of the iteration process. Python creates an iterator from the iterable used in thefor
statement. In this case, it creates arange_iterator
object from the iterable objectrange
. You can read more about iterables, iterators, the iterator protocol, and.__iter__()
in these posts:FOR_ITER
tries to fetch the next item from the iterator that's just been placed on the stack in the previous step. TheL1
shown next toFOR_ITER
is a label used to show different parts of thefor
loop. There are two options for what happens next:If the iterator returns a value, then the interpreter moves on to the next instruction,
STORE_FAST
, which you'll get to soon in this bulleted list.If the iterator is exhausted and raises a
StopIteration
exception (see links above to find out more about iteration), then it jumps forward to the labelL2
. This refers to the instructions needed to end thefor
loop. The18
next toFOR_ITER
refers to the number of bytes to jump forward within the bytecode. Typically, each instruction is two bytes, but some need more bytes. So don't worry too much about this number. The human-readable output shows you the correct point using the labelL2
.
STORE_FAST
stores the value yielded by the iterator to a local variable. Recall that the_FAST
subscript refers to local variables within functions. In this case, this is stored in the local variable_
.
The body of this for
loop includes just one line, which is line 7 in the script. Let's speed through this by combining some instructions into the same bullet point:
LOAD_GLOBAL
(x2): First, there are two global variables to load,max
andnumbers
. The identifiermax
is in the built-in scope, whereasnumbers
is in the global scope, butLOAD_GLOBAL
is used for both. To read more about scopes, you can see Let's Eliminate General Bewilderment • Python's LEGB Rule, Scope, and Namespaces.CALL
andPOP_TOP
: You've seen these already. These instructions call the functionmax()
with one argument—note the1
next toCALL
—and thenPOP_TOP
clears the value returned bymax()
since you're not doing anything with the return value in the code. If you try to assign the return value to a variable, you'll see thatPOP_TOP
is not needed since the value is stored in a local variable. Try it out to see…JUMP_BACKWARD
: It's the end of this iteration, so it's time to jump back to the instruction labelledL1
, which is the start of the loop. This is 20 bytes back, if you care to know! The instruction at the point labelledL1
isFOR_ITER
, which attempts to get the next value from the iterator.
And when FOR_ITER
cannot fetch any more values from the iterator, as you may recall from bullet point 5 above, it jumps to the instruction labelled L2
. Note that this also refers to line 6 in the code:
END_FOR
: Do you need me to explain this? This is the instruction that actually gets rid of the exhausted iterator object from the stack.POP_TOP
: This is used for consistency and to tidy up the stack. You can safely ignore it!RETURN_CONST
: This is the end of the code object, which in this case is the function definition. So, it returnsNone
to the Python interpreter.
Do you want to join a forum to discuss Python further with other Pythonistas? Upgrade to a paid subscription here on The Python Coding Stack to get exclusive access to The Python Coding Place's members' forum. More Python. More discussions. More fun.
And you'll also be supporting this publication. I put plenty of time and effort into crafting each article. Your support will help me keep this content coming regularly and, importantly, will help keep it free for everyone.
And now, for the (possibly useless) optimisation
There are three LOAD_GLOBAL
instructions in the bytecode for this function. But you’ve come across the LOAD_FAST
instruction earlier in this article, which, as the name implies, is faster. Can we replace some or all of the LOAD_GLOBAL
instructions with LOAD_FAST
?
Let's start by timing this code first. The changes are highlighted in green:
The timeit.repeat()
call calls the function do_something()
1,000 times and repeats this process five times by default. Therefore, timings
is a list showing five times, each one showing how long it took to call do_something()
1,000 times. You can then take an average of these five readings:
[6.457241459000215,
6.45811324999886,
6.480874916000175,
6.472584875002212,
6.502984124999784]
Average time: 6.47435972500025 seconds
The average time to call do_something()
1,000 times is 6.47 seconds. Of course, your mileage may vary depending on your computer and what else is running in the background.
Now, you can replace the global list numbers with a local list. The easiest way to do this is to include a parameter in the function definition and then pass the list as an argument when you call the function:
First, let's see the output from dis.dis()
:
6 RESUME 0
7 LOAD_GLOBAL 1 (range + NULL)
LOAD_CONST 1 (100000)
CALL 1
GET_ITER
L1: FOR_ITER 14 (to L2)
STORE_FAST 1 (_)
8 LOAD_GLOBAL 3 (max + NULL)
LOAD_FAST 0 (data)
CALL 1
POP_TOP
JUMP_BACKWARD 16 (to L1)
7 L2: END_FOR
POP_TOP
RETURN_CONST 0 (None)
Ignoring RESUME
, the first block of instructions is almost identical. The line number is different—it's now line 7, since an additional import in this script has pushed everything down by a line. The number of bytes in FOR_ITER
is also different, but let's ignore this.
In the second main block, linked to line 8 of the code, there's one important change. The second instruction is now LOAD_FAST
instead of LOAD_GLOBAL
. This loads data
, which is now a local variable since data
is the variable created when you pass the global list numbers
as an argument assigned to the parameter data
.
And here's the output from the call to timeit.repeat()
:
[6.2737918330021785,
6.286419416999706,
6.250972666999587,
6.234414000002289,
6.241118416997779]
Average time: 6.257343266800308 seconds
The average time is down to about 6.26 seconds from the previous 6.47 seconds. It's not a huge difference, but it shows how accessing local variables using LOAD_FAST
is more efficient than accessing global variables using LOAD_GLOBAL
.
But there are two more LOAD_GLOBAL
instructions in the bytecode. But, I hear you say, these are references to the built-in names range
and max
. How can you bypass this limitation?
Let's start with max
. Have a look at this code:
You define a local variable max_
and make it equal to the built-in max
. This way, each time you need to refer to max_
when you call it in the for
loop, you use a local variable instead of a global one:
6 RESUME 0
7 LOAD_GLOBAL 0 (max)
STORE_FAST 1 (max_)
8 LOAD_GLOBAL 3 (range + NULL)
LOAD_CONST 1 (100000)
CALL 1
GET_ITER
L1: FOR_ITER 11 (to L2)
STORE_FAST 2 (_)
9 LOAD_FAST 1 (max_)
PUSH_NULL
LOAD_FAST 0 (data)
CALL 1
POP_TOP
JUMP_BACKWARD 13 (to L1)
8 L2: END_FOR
POP_TOP
RETURN_CONST 0 (None)
Note how in the block of instructions linked to line 9, you no longer have any LOAD_GLOBAL
. Both instructions that need to load data are now LOAD_FAST
, since both max_
and data
are local variables.
However, you now have an additional block, the one linked to line 7 in your code (max_ = max
), which is not included in the original code. Here, you still need to use LOAD_GLOBAL
and also have an additional STORE_FAST
to store the max
function object in the local variable max_
. But you only need to do this once!
What about the time it takes? Drum roll…
[6.103223165999225,
6.050973375000467,
6.056473458000255,
6.06607754200013,
6.083692415999394]
Average time: 6.072087991399894 seconds
The average time is now 6.07 seconds, down from 6.25 seconds when you used the global max()
within the for
loop. Even though you still have LOAD_GLOBAL
linked to line 7 and the additional overhead of storing this to max_
, each iteration of the for
loop is a bit quicker since you can now use LOAD_FAST
each time you call max_()
.
The difference in performance is not huge, but it was even more noticeable in older versions of Python.
Can you use the same trick with range
? Yes, you can, but you won't gain any speed advantage. Can you see why?
Whereas your code needs to use LOAD_FAST
in each iteration of the for
loop when referring to max_
and data
, it only uses LOAD_GLOBAL
once to refer to range
. Therefore, it doesn't make sense to replace range
with a local variable equivalent. You still need to use LOAD_GLOBAL
once to load the built-in range
, but then you waste time reassigning it to a local variable. So let's not do this! You can try it out if you wish.
[I've been writing this article for a while, on and off, and as I was writing, my friend and fellow author
wrote a somewhat related article that dives even deeper into this topic. This last comparison was inspired by his article: Why This Python Performance Trick Doesn’t Matter Anymore]Final Words
There's so much more we could explore by disassembling code. But I promised a short-ish article, so I'll stop here. Perhaps I'll post about this again in the future.
Disassembling code using dis.dis()
is helpful in understanding Python better and writing more efficient code. In some cases, you can look for ways to optimise your code by looking at the bytecode. In some other use cases, you may have a stubborn bug you can't expose, and the bytecode may give you a different perspective.
If you want to explore a bit further on your own, you can start with the dis
module's documentation code at dis — Disassembler for Python bytecode, and specifically at the list of possible instructions further down the page: https://docs.python.org/3/library/dis.html#dis.Instruction.
Your VIP pass allowed you to take a brief tour of the Python Theatre's backstage area. But here's some advice. Don't spend too long there on any single visit. It's dark, and there are plenty of obstacles backstage. You may never find your way out again!
Photo by Dawn Lio: https://www.pexels.com/photo/stage-with-lightings-2177813/
Code in this article uses Python 3.13
The code images used in this article are created using Snappify. [Affiliate link]
You can also support this publication by making a one-off contribution of any amount you wish.
For more Python resources, you can also visit Real Python—you may even stumble on one of my own articles or courses there!
Also, are you interested in technical writing? You’d like to make your own writing more narrative, more engaging, more memorable? Have a look at Breaking the Rules.
And you can find out more about me at stephengruppetta.com
Further reading related to this article’s topic:
Iterable: Python's Stepping Stones (Data Structure Categories #1)
A One-Way Stream of Data • Iterators in Python (Data Structure Categories #6)
Let's Eliminate General Bewilderment • Python's LEGB Rule, Scope, and Namespaces
Appendix: Code Blocks
Code Block #1
publication = "The Python Coding Stack"
print(publication)
Code Block #2
import dis
dis.dis(
"""
publication = "The Python Coding Stack"
print(publication)
"""
)
Code Block #3
publication = "The Python Coding Stack"
Code Block #4
"""
"first line?"
"second line?"
"""
# '\n"first line?"\n"second line?"\n'
Code Block #5
import dis
dis.dis(
"""
publication = "The Python Coding Stack"
print(publication)
"""
)
Code Block #6
import dis
dis.dis(
"""publication = "The Python Coding Stack"
print(publication)"""
)
Code Block #7
import dis
def do_something():
publication = "The Python Coding Stack"
print(publication)
dis.dis(do_something)
Code Block #8
import dis
numbers = [2, 4, 6, 8]
def do_something():
for _ in range(100_000):
max(numbers)
dis.dis(do_something)
Code Block #9
import dis
import timeit
numbers = [2, 4, 6, 8]
def do_something():
for _ in range(100_000):
max(numbers)
dis.dis(do_something)
timings = timeit.repeat(
"do_something()",
number=1_000,
globals=globals(),
)
print(timings)
print(f"Average time: {sum(timings) / len(timings)} seconds")
Code Block #10
import dis
import timeit
numbers = [2, 4, 6, 8]
def do_something(data):
for _ in range(100_000):
max(data)
dis.dis(do_something)
timings = timeit.repeat(
"do_something(numbers)",
number=1_000,
globals=globals(),
)
print(timings)
print(f"Average time: {sum(timings) / len(timings)} seconds")
Code Block #11
# ...
def do_something(data):
max_ = max
for _ in range(100_000):
max_(data)
# ...
For more Python resources, you can also visit Real Python—you may even stumble on one of my own articles or courses there!
Also, are you interested in technical writing? You’d like to make your own writing more narrative, more engaging, more memorable? Have a look at Breaking the Rules.
And you can find out more about me at stephengruppetta.com