The Memory Canvas: DSA from Scratch using Python ctypes
DSA SERIES
Day 1: The Memory Canvas — Arrays, HashMaps & Sets
To master the algorithm, you must first master the memory. In this guide, we strip away the comforting syntactic sugar of Python. We will journey into the abyss of raw contiguous memory to forge Dynamic Lists, Tuples, HashMaps, and Sets from absolute scratch using ctypes.
⏳ Context: As modern developers, we are spoiled. We type my_list.append(item) or my_dict[key] = value without a second thought. But behind this elegant abstraction lies a battlefield of pointer arithmetic, memory reallocation, and hash collisions. To architect high-performance systems, you cannot merely be a consumer of data structures; you must become their creator.
Today, we bypass Python's high-level objects. We will ask the operating system for raw blocks of memory (C-style arrays of PyObject*) and build the universe from the ground up.
1. The Dynamic Array (List): The Illusion of Infinity
A classic array is rigid—a contiguous, fixed block of memory. A dynamic array (what Python calls a list) creates the illusion of infinite space by managing capacity under the hood. When the vessel is full, it provisions a larger one and carefully migrates the contents.
import ctypes class DynamicArray: def __init__(self): self.n = 0 # Count of actual elements self.capacity = 1 # Total slots available self.A = self._make_array(self.capacity) def append(self, obj): # The threshold is breached. Time to expand. if self.n == self.capacity: self._resize(2 * self.capacity) self.A[self.n] = obj self.n += 1 def _resize(self, c): B = self._make_array(c) # Forging the new, larger array for k in range(self.n): B[k] = self.A[k] # The great migration of data self.A = B self.capacity = c def _make_array(self, c): # Asking the OS for a raw C-array of PyObject pointers return (c * ctypes.py_object)()
2. Tuples: The Immutable Promise
If a dynamic array is water shaping to its container, a Tuple is ice. Once crystallized, its shape cannot be altered. By deliberately omitting reallocation logic and mutator magic methods (like __setitem__), we forge a structure that is thread-safe, predictable, and structurally rigid.
class StaticTuple: def __init__(self, *args): self.n = len(args) self.A = self._make_array(self.n) # Populate the array once, and never again. for i in range(self.n): self.A[i] = args[i] def __getitem__(self, k): if not 0 <= k < self.n: raise IndexError('Seek not what is beyond the boundary') return self.A[k] def _make_array(self, c): return (c * ctypes.py_object)() # Notice the absence of __setitem__. The seal is absolute.
"The clay is molded to form a vessel, but it is the empty space within that gives it utility.
So too is the memory of the machine; we structure the void with arrays and hashes, bringing order to chaos, mapping the infinite variations of reality into the finite bounds of silicon."
— Meditations on the Architecture of Data
3. HashMaps: The Alchemy of Keys
A list requires you to know where your data is (the index). A HashMap allows you to know what your data is (the key). This is achieved through a hashing function—an algorithmic alchemist that transforms strings, tuples, and objects into predictable memory addresses.
But what happens when two distinct keys yield the same address? This is a Hash Collision. Here, we resolve it using Linear Probing: if a slot is occupied by a stranger, we simply step forward to the next available slot.
class HashMap: def __init__(self, capacity=8): self.capacity = capacity self.size = 0 # Parallel arrays holding the duality of Key and Value self.keys = self._make_array(self.capacity) self.values = self._make_array(self.capacity) def _hash(self, key): # Map the infinite universe of keys into finite capacity return hash(key) % self.capacity def put(self, key, value): # If the map is 70% full, the universe must expand (omitted for brevity) if self.size / self.capacity >= 0.7: self._resize(self.capacity * 2) idx = self._hash(key) # Linear Probing: Traverse until an empty void is found while self.keys[idx] is not None: if self.keys[idx] == key: # The entity exists; overwrite its truth self.values[idx] = value return # Collision! Seek the adjacent slot idx = (idx + 1) % self.capacity self.keys[idx] = key self.values[idx] = value self.size += 1 def _make_array(self, c): return (c * ctypes.py_object)()
4. Sets: The Essence of Uniqueness
In philosophy, a Set represents existence without attribute. Is the entity present? Yes or No. Structurally, we do not need to reinvent the wheel. A Set is merely a HashMap stripped of its values. We map the data as the key, and store a meaningless placeholder (like True) as the value.
class HashSet: def __init__(self): # We inherit the architecture of the HashMap self._map = HashMap() def add(self, key): # The key is the truth. The value is an arbitrary shadow. self._map.put(key, True) def contains(self, key): try: self._map.get(key) return True except KeyError: return False
🛠️ Day 1 Synthesis: The Master Builder
By relying on ctypes to manipulate memory, we have stepped outside Python's walled garden. You now understand that a list is just an array playing a shell game, a tuple is a frozen artifact, and dicts/sets are geometric projections governed by hash functions.
We have built the foundational blocks. But what happens when we need to define relationships between data? Tomorrow, we abandon contiguous memory. We explore pointers, references, and the elegant chains of Linked Lists and Trees. Welcome to Day 2: The Nodes of Destiny.
Comments
Post a Comment
?: "90px"' frameborder='0' id='comment-editor' name='comment-editor' src='' width='100%'/>