A Little Disassembling
12/16/2020The other day at work someone asked me on Slack why their Python code was behaving in an odd way. I took a look and was completely wrong in my first response.
The person who asked me this question had boiled it down to two interesting functions. Here’s his first function, rendered into an iPython console:
In [2]: def regularor():
...: x = 100
...: i = 0
...: while i < 100 or x < 105:
...: print(i)
...: print(x)
...: x+=1
...: i +=1
...:
And here’s the second:
In [3]: def bitor():
...: x = 100
...: i = 0
...: while i < 100 | x < 105:
...: print(i)
...: print(x)
...: x+=1
...: i +=1
What do you imagine they’ll print out? The output of the first starts like this and goes on for awhile, as you might expect:
In [4]: regularor()
0
100
1
101
2
102
3
103
4
104
5
105
6
106
7
107
8
108
.
.
.
While the second looks completely different:
In [5]: bitor()
0
100
1
101
2
102
3
103
It stops right there! That sure looks odd!
I took an initial guess as to why the second ended so early and the best part of this experience for me was that I was wrong about what was happening. Fortunately, I had an idea of how we’d resolve this mystery.
Now, the most commonly run Python interpreter is CPython, the official Python interpreter. CPython, as you might expect, is written in C and one good place to start understanding how Python works is to look at the source code. When
run, Python is first compiled to bytecode which uses opcodes
and other terms to greatly simplify the job of evaluating a Python program. We can find the various keywords used in Python represented as opcodes
all
present in the file opcode.h
.
Thus, sometimes, to really understand how Python works, it helps to understand how particular opcodes are evaluated. For this, we look to the file ceval.c
, which implements the core evaluation logic for CPython. I remember being really surprised to discover that most of the guts of ceval.c
are expressed in a single, massive case statement that evaluates any particular chunk of bytecode.
Of course, we don’t have to memorize all the opcodes and read ceval.c
to understand why the above functions behave so differently, but knowing about the opcodes and Python’s evaluation function may help us to build a mental model of what’s happening. The tool we will use now to solve our puzzle from above is Python’s disassembler, conveniently available in the dis
module, because it will reveal the opcodes underlying the evaluation of these Python functions above.
In Python you can import the dis
module and load a function into it to be disassembled. For the regularor
function defined above, here’s the complete output from dis.dis
:
In [6]: import dis
In [7]: dis.dis(regularor)
2 0 LOAD_CONST 1 (100)
2 STORE_FAST 0 (x)
3 4 LOAD_CONST 2 (0)
6 STORE_FAST 1 (i)
4 >> 8 LOAD_FAST 1 (i)
10 LOAD_CONST 1 (100)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_TRUE 24
16 LOAD_FAST 0 (x)
18 LOAD_CONST 3 (105)
20 COMPARE_OP 0 (<)
22 POP_JUMP_IF_FALSE 58
5 >> 24 LOAD_GLOBAL 0 (print)
26 LOAD_FAST 1 (i)
28 CALL_FUNCTION 1
30 POP_TOP
6 32 LOAD_GLOBAL 0 (print)
34 LOAD_FAST 0 (x)
36 CALL_FUNCTION 1
38 POP_TOP
7 40 LOAD_FAST 0 (x)
42 LOAD_CONST 4 (1)
44 INPLACE_ADD
46 STORE_FAST 0 (x)
8 48 LOAD_FAST 1 (i)
50 LOAD_CONST 4 (1)
52 INPLACE_ADD
54 STORE_FAST 1 (i)
56 JUMP_ABSOLUTE 8
>> 58 LOAD_CONST 0 (None)
60 RETURN_VALUE
If you compare the above output to the function itself, even without any previous experience with Python’s opcodes, you maybe can imagine what’s happening:
2 0 LOAD_CONST 1 (100)
2 STORE_FAST 0 (x)
Here we are loading a constant and then storing that constant as the value of the variable x
. Later on, we’ll load this variable as well:
7 40 LOAD_FAST 0 (x)
Going back to the problem at hand, it’s probably obvious that there is something misaligned with our expectations around these two conditions present in our while
loops:
while i < 100 or x < 105:
...
while i < 100 | x < 105:
The second uses a binary or
, which may intuitively seem like it would do the same thing as the Python keyword or
in the first function. However, let’s look at just the bytecode produced by the while loop in the regularor
function:
4 >> 8 LOAD_FAST 1 (i)
10 LOAD_CONST 1 (100)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_TRUE 24
16 LOAD_FAST 0 (x)
18 LOAD_CONST 3 (105)
20 COMPARE_OP 0 (<)
22 POP_JUMP_IF_FALSE 58
In plain English, we’re loading a variable i
and a variable 100
and then using a binary comparison operator to compare these two values. Immediately after that comparison is evaluated we see POP_JUMP_IF_TRUE 24
, which means, “if the previous operation evalutes to True
then jump immediately to instruction 24.” Another way to state this is that when Python evaluates while i < 100 or x < 105:
, if the i
is less than 100
, then the x < 105
part is never evaluated because control jumps into the body of the while loop.
Now, for comparison, I’ll load disassembler output for the bitor
function:
In [8]: dis.dis(bitor)
2 0 LOAD_CONST 1 (100)
2 STORE_FAST 0 (x)
3 4 LOAD_CONST 2 (0)
6 STORE_FAST 1 (i)
4 >> 8 LOAD_FAST 1 (i)
10 LOAD_CONST 1 (100)
12 LOAD_FAST 0 (x)
14 BINARY_OR
16 DUP_TOP
18 ROT_THREE
20 COMPARE_OP 0 (<)
22 POP_JUMP_IF_FALSE 32
24 LOAD_CONST 3 (105)
26 COMPARE_OP 0 (<)
28 POP_JUMP_IF_FALSE 70
30 JUMP_FORWARD 4 (to 36)
>> 32 POP_TOP
34 JUMP_FORWARD 34 (to 70)
5 >> 36 LOAD_GLOBAL 0 (print)
38 LOAD_FAST 1 (i)
40 CALL_FUNCTION 1
42 POP_TOP
6 44 LOAD_GLOBAL 0 (print)
46 LOAD_FAST 0 (x)
48 CALL_FUNCTION 1
50 POP_TOP
7 52 LOAD_FAST 0 (x)
54 LOAD_CONST 4 (1)
56 INPLACE_ADD
58 STORE_FAST 0 (x)
8 60 LOAD_FAST 1 (i)
62 LOAD_CONST 4 (1)
64 INPLACE_ADD
66 STORE_FAST 1 (i)
68 JUMP_ABSOLUTE 8
>> 70 LOAD_CONST 0 (None)
72 RETURN_VALUE
We can see right away that it’s longer and that the while
loop, where our expectations are misaligned with the output we’re seeing, is indeed different from the first example:
4 >> 8 LOAD_FAST 1 (i)
10 LOAD_CONST 1 (100)
12 LOAD_FAST 0 (x)
14 BINARY_OR
16 DUP_TOP
18 ROT_THREE
20 COMPARE_OP 0 (<)
22 POP_JUMP_IF_FALSE 32
24 LOAD_CONST 3 (105)
26 COMPARE_OP 0 (<)
28 POP_JUMP_IF_FALSE 70
30 JUMP_FORWARD 4 (to 36)
>> 32 POP_TOP
34 JUMP_FORWARD 34 (to 70)
As a reminder, the condition inside the problematic while
loop looks like this: while i < 100 | x < 105:
. For the opcodes used in this while
loop, starting with the first two operations, we see exactly what we saw in the first function, but the third and fourth are different:
4 >> 8 LOAD_FAST 1 (i)
10 LOAD_CONST 1 (100)
12 LOAD_FAST 0 (x)
14 BINARY_OR
BINARY_OR
is the interesting difference here, and here’s what ceval.c
does when it sees the BINARY_OR
opcode:
case TARGET(BINARY_OR): {
PyObject *right = POP();
PyObject *left = TOP();
PyObject *res = PyNumber_Or(left, right);
Py_DECREF(left);
Py_DECREF(right);
SET_TOP(res);
if (res == NULL)
goto error;
DISPATCH();
}
In short, we pop the two last two loaded values off the variable stack and compare them. Thus, these opcodes boil down to the following instructions:
- Load the variable
i
. - Load the constant
100
. - Load the variable
x
. - Binary-or compare the last two.
This then is the problem: Python is not first comparing i < 100
as we might expect, but instead running BINARY_OR
on 100
and x
.
Knowing this, we can insert some parentheses to clarify how Python interprets this code:
while i < (100 | x) < 105:
Keeping in mind that x
starts at 100
and is then incremented in each pass off the loop, we end up with a series of checks like the following:
In [12]: 100 | 100
Out[12]: 100
In [13]: 100 | 101
Out[13]: 101
In [14]: 100 | 102
Out[14]: 102
In [15]: 100 | 103
Out[15]: 103
In [16]: 100 | 104
Out[16]: 108
That last value, however, is no longer less than 105
, which is why the while
loop ends with x
at 103
! We can see this in greater detail if we evaluate all of these values as binary, keeping in mind that binary OR
is comparing each bit from two values and if the corresponding bit for one is truthy, then the result will also have a truthy bit in that same location:
In [1]: bin(100)
Out[1]: '0b1100100'
In [2]: bin(104)
Out[2]: '0b1101000'
In [3]: bin(108)
Out[3]: '0b1101100'
In [4]: bin(100 | 104)
Out[4]: '0b1101100'
Here, we can see that a BINARY_OR
of 100
and 104
evaluates to 0b1101100
, which is the binary representation of the number 108
. Finally, the term for what we’re seeing here is operator precedence where different operators bind to their arguments and are evaluated earlier or later as a result. From the Python docs on operator precedence we can see that the bitwis operator BINARY_OR
binds with a higher precedence than the or
keyword.
Ultimately, it was thanks to the dis
module that we were able to puzzle out why these functions behaved so differently, but I appreciated getting this question this problem at work because it was an opportunity to dig into the implementation details of the language with my team and to demonstrate how to use Python’s dis
module to solve puzzles like this one.
I must admit that I also find it fun and refreshing when my initial expectations about how something is evaluated in a language like Python are revealed to be wrong.
One other outcome: I think the person who originally came to me with this problem finally swore off using binary operators as conditions as well.
Tags: python