The Trees in the Woods (Data Structures in the Real World #1)
When you're out in the real world and come across actual objects (not the Python ones), do you ever ask yourself what Python data structure is best suited to represent them? No? Maybe it’s just me…
I was out for my morning walk the other day—a non-negotiable part of my daily routine. Sometimes, my mind wanders back to Python and coding while I'm on my walk. I know, I know, I should be using this time to relax and enjoy nature and the fresh air…but that's sort of what I did that morning. Hear me out…
There's a small patch of woods I usually walk past. It was early in the morning, and the sun was still low in the sky. The trees looked calming, with their leaves shimmering in the cool sunlight. And a thought popped into my head:
What Python data structure would be best suited to represent this group of trees?
Perhaps this is not the same question the poets and novelists who wrote vivid descriptions of forests and trees asked themselves when walking in nature.
This is not a futile exercise. On the contrary, it's highly instructive. Bear with me as I take you through my step-by-step reasoning. We'll figure out where to (figuratively) store all the trees.
This is the first in a series of articles in the Data Structures in the Real World series. These won't be consecutive articles, and they'll always be brief, essay-like posts. I hope you enjoy them and, more importantly, find them useful when choosing data structures for your programs.
Which Properties Matter?
We're spoilt for choice when choosing data structures in Python. There are several built-in data structures and others in standard library modules. Then, there are all those defined in third-party modules. And there's rarely only one perfect choice.
Let's pick some of the properties that matter when choosing a data structure for our trees in the woods.
Do We Need a Sequence?
The order of items in a sequence matters, and you can access the items using an index that represents their position in the sequence. Lists and tuples are examples of Python sequences.
But the trees aren't in any particular order. If you're walking through the woods with me and I ask you to point to the first tree, you may look at me a bit puzzled. Is it the first tree you see? Or maybe the first tree you saw when you entered the woods? But what if you came in through a different route? There's no order.
Perhaps there is some order if you're looking at trees that line up an avenue. But in general, trees in the woods or forest don't really have an order.
I'd say we don't need a sequence to represent the trees. You may disagree, but this is my article, so I'll share my opinion!
Do We Need a Mutable or Immutable Type?
What timescale are we interested in? Do we care only about the hour or so when I'm walking through the woods? We can safely assume the trees don't change during this period.
Or do we consider a much longer timescale? New trees can be planted. Old ones may need to be felled.
An immutable data type could be suitable for the short timescale, but a mutable type is better for longer periods. So, let's pick a mutable data type.
What Do We Need to Know About Each Tree?
I don't know much about botany. But I can imagine there are some facts that may be relevant about each tree: its species and age, perhaps.
We also need a way to identify each tree. Since we already determined they're not ordered, we can't use a number to represent their position in a sequence.
Instead, we can use their geographic position. Let's say we'd like to use latitude and longitude coordinates to identify each tree's location.
There's one other point I haven't mentioned yet. Why do I need to store information about the trees in the woods? What we plan to do with the data is an important consideration. In this example, I'm acting out of curiosity, so I just want to store information about each tree to keep a record. However, in general, the choice of data structure also depends on what we plan to do with the data.
Let's Put All the Pieces Together
Let's summarise the decisions we made so far. We need a mutable collection that links each tree to its geographic location and stores relevant facts about the tree, such as its species and age. And we don't need an ordered sequence.
This sounds like a dictionary. It's mutable, so we can add and remove trees whenever needed. The dictionary keys can be tuples containing the latitude and longitude pairs. And each key's value can be another dictionary containing the tree's age, species, and any other details we need to store.
Here's what this dictionary may look like. This example only has one tree in it, but you can add more, of course:
We'd need to make sure the floats representing latitude and longitude always have the same precision, but let's not worry about code details in this essay. If we were to get into the details, we'd need to make sure a tuple of coordinates is going to work as a way of accessing the trees. They're not the most practical type of key for a dictionary! But let's gloss over the detail…
And dictionary keys must be unique, which is a good thing since we can't have two different trees in the same geographical location, right?
If you enjoy these posts and want to contribute towards the time and effort it takes to create them, you can do so through this link. You can contribute any amount you want.
Final Words
Don't take this exercise too seriously. Its purpose is to get you to think about what you want out of a data structure. What properties matter to you and your program? How do you want to access the data? Is order important? How about mutability? And so on…
On my next walk, I'll keep an eye out for other things that can get the Data Structures in the Real World treatment!
Code in this article uses Python 3.12
Stop Stack
#65
Thank you to all those who supported me with a one-off donation recently. This means a lot and helps me focus on writing more articles and keeping more of these articles free for everyone. Here's the link again for anyone who wants to make a one-off donation to support The Python Coding Stack
The Python Coding Book is available (Ebook and paperback). This is the First Edition, which follows from the "Zeroth" Edition that has been available online for a while—Just ask Google for "python book"!
And if you read the book already, I'd appreciate a review on Amazon. These things matter so much for individual authors!
I'm also releasing The NumPy Mindset at the moment. Currently, this is available as an Early Release—I'm publishing chapters as and when they're ready. Members of The Python Coding Place already have access to this Early Release. Everyone else can get it here—You'll get the final ebook version too once it's ready if you get the Early Release version, of course!
And for those who want to join The Python Coding Place to access all of my video courses—past and future—join regular live sessions, and interact with me and other learners on the members-only forum, here's the link:
Any questions? Just ask…
Appendix: Code Blocks
Code Block #1
trees = {
(51.80507, -0.34846): {
"Species": "Oak",
"Age": 70,
},
}
Ngl when I read the title I thought you were going to discuss the tree data structure 🤣
I liked the journey, it was fun to read. I was thinking though that if individual trees can’t be told apart from one another, chances are you aren’t interested to identify specific trees. In that case, grouping them by species for example would be more useful to use as a key. But as you said, it depends on what you want to do with the data.
I can't say I've ever done it in regard to actual trees, but I'm with you on constantly thinking about how I'd implement something. And data structures are a central part of that. I often end up banging out some Python. (Did it just last week on a Numberphile video I saw.)
Long ago I mentored a buddy of mine through an adult education CS degree. He mentioned that his teachers said the Data Structures class tended to filter out the wannabes. If you can wrap your head around the abstractions needed to design data structures, you have the necessary mindset to design code. I think that's about right.