A NumPy Banana Skin • `dtype` (A NumPy for Numpties article)
A peculiar article about a peculiar issue with NumPy and an array's `dtype`
Today’s “article” is a bit different from my usual ones. Quite a bit different, it’s fair to say. Let me know what you think. Normal service resumes from the next article (if there’s such a thing as normal with my articles.)
Code shown in both formats: as a formatted image and as a native Substack code block
Here’s the same code in Substack’s native code blocks (but don’t forget to scroll down to see the article’s cover image!):
>>> import numpy as np
# I played a game every weekday last week
>>> n_points_earned_per_day = np.array(
... [5, 4, 9, 2, 6]
... )
# Weird rule: points earned on first day are only worth half
>>> n_points_earned_per_day[0] = n_points_earned_per_day[0] / 2
>>> n_points_earned_per_day
array([2, 4, 9, 2, 6])
# Hmmm… Day 1 doesn't look right
# (Just checking my arithmetic…)
>>> 5 / 2
2.5
>>> n_points_earned_per_day
array([2, 4, 9, 2, 6])
>>> n_points_earned_per_day[0]
2
# ????
>>> n_points_earned_per_day.dtype
dtype('int64')
# Oh, I see…
>>> np.int64(5 / 2)
2
# OK
>>> n_points_earned_per_day = np.array(
... [5, 4, 9, 2, 6],
... dtype=np.float64,
... )
# Let's try again with the first day points adjustment
>>> n_points_earned_per_day[0] = n_points_earned_per_day[0] / 2
>>> n_points_earned_per_day
array([2.5, 4. , 9. , 2. , 6. ])
# That's better
# All good now
# Phew!
#
# (I'm glad I caught that before it caused any problems)
# (I'll be more careful next time)
#####
# I'm addicted to the game
# I started keeping scores in a NumPy array
>>> one_week_scores = np.array(
... ["Stephen", "2024-02-26", 5, 4, 9, 2, 6]
... )
# New rule: points earned on last day are doubled
>>> one_week_scores[-1] = one_week_scores[-1] * 2
>>> one_week_scores
array(['Stephen', '2024-02-26', '5', '4', '9', '2', '66'], dtype='<U21')
# Eh?!
>>> one_week_scores[-1]
'66'
>>> one_week_scores.dtype
dtype('<U21')
# (That's NumPy-speak for "Unicode string of max 21 chars")
# Maybe this??
>>> one_week_scores = np.array(
... ["Stephen", "2024-02-26", 5, 4, 9, 2, 6],
... dtype=int,
... )
Traceback (most recent call last):
...
ValueError: invalid literal for int() with base 10: 'Stephen'
# Can only have one dtype in a NumPy array
# Back to the drawing board
>>> one_week_scores = np.array(
... ["Stephen", "2024-02-26", 5, 4, 9, 2, 6],
... )
>>> scores_only = one_week_scores[2:].astype(int)
>>> scores_only
array([5, 4, 9, 2, 6])
# Last day rule
>>> scores_only[-1] = scores_only[-1] * 2
>>> scores_only
array([ 5, 4, 9, 2, 12])
# Much better
Code in this article uses Python 3.12