Backend Serialization — JSON, Pickle Opcodes & The Universal Type Fallacy (2026)
Day 16: Teleporting State — Serialization, Opcodes, and The Universal Type Fallacy
Context: In Day 15, our Router successfully matched an incoming HTTP request to a Python function in memory. But how did the data actually cross the internet? Python objects do not exist in the physical wires connecting two servers. Only raw bytes exist.
The Fundamental Problem of Networking
When you have a Python User object in your RAM, that object is a complex web of memory addresses and pointers. If you want to send that User to a microservice in London, you cannot just send the memory pointers. The server in London has a completely different physical RAM chip; your memory addresses mean nothing to it.
To cross a network, or to be saved to a hard drive, you must dismantle your complex, 3D memory structure, flatten it into a 1D stream of bytes, send it over the wire, and rebuild it on the other side. This is the process of Serialization (flattening) and Deserialization (rebuilding).
1. The Universal Type Fallacy (Why int != int)
Junior engineers often ask: "Why do we need JSON? An integer is an integer. Why can't we just make universal types where a Python int is identical to a C++ int and a Java int?"
Because hardware and language designs violently disagree on how to store data:
Serialization protocols like JSON act as the Universal Translator, bridging the gap between irreconcilable hardware differences.
2. Homogeneous Acceleration (JS to JS)
If JSON is a string, it must be parsed. When a server receives 100MB of JSON, the CPU must read every single character (quotes, brackets, colons) to figure out where strings end and numbers begin. This string parsing is extremely CPU intensive.
However, when a Node.js server talks to another Node.js server (or Python to Python), they share the exact same underlying memory engine (like the V8 engine for JS). Because they are Homogeneous, they can skip string parsing entirely.
Instead of converting an object into a JSON string, JavaScript can use V8's internal serialization (Structured Clone), and Python can use Pickle. These protocols dump the data into highly optimized, binary formats that map almost 1:1 with the language's internal C-structures. When huge payloads are involved, bypassing the JSON string-parsing phase makes JS-to-JS or Python-to-Python communication magnitudes faster.
3. In Practice: JSON vs Pickle
Let's look at how we deploy these two protocols in Python architecture.
import json
import pickle
import datetime
# ==========================================
# SCENARIO 1: JSON (Universal & Secure)
# ==========================================
data = {"user_id": 99, "role": "admin"}
# Dumps (Python -> String)
json_payload = json.dumps(data)
print(f"JSON String: {json_payload}")
# Loads (String -> Python)
parsed_data = json.loads(json_payload)
print(f"Restored: {parsed_data['role']}")
OUTPUT:
JSON String: {"user_id": 99, "role": "admin"}
Restored: admin
class UserSession:
def __init__(self, user_id):
self.user_id = user_id
self.login_time = datetime.datetime.now()
session = UserSession(99)
# Pickle flattens the custom Python object AND the datetime object perfectly.
binary_payload = pickle.dumps(session)
print(f"Pickle Bytes: {binary_payload[:20]}...")
# Restoring the object
restored_session = pickle.loads(binary_payload)
print(f"Restored Object Type: {type(restored_session)}")
OUTPUT:
Pickle Bytes: b'\x80\x04\x95A\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c'...
Restored Object Type: <class '__main__.UserSession'>
🔥 PRO UPGRADE: ORJSON
If you are building high-performance APIs, never use the built-in json module. json is written partly in Python. Instead, pip install orjson. It is a drop-in replacement written in Rust that operates directly on memory bytes. It is drastically faster and can natively serialize Python datetimes without crashing.
4. Under the Hood: Why JSON?
If Pickle handles custom objects and binary speeds, why does the entire internet run on JSON?
1. Human Readability: A developer can intercept a network packet, read the JSON text, and instantly debug the payload. You cannot read binary Pickle data.
2. Language Agnosticism: JSON is universal. A frontend written in JavaScript can send JSON to a backend written in Golang, which sends JSON to a microservice written in Python. It is the Lingua Franca of the internet.
3. Security: JSON is just data. It cannot execute code. As we are about to see, Pickle is highly dangerous.
5. Under the Hood: How Pickle Generates Opcodes
When you run pickle.dumps(), it does not just "save data". It actually generates a list of instructions—called Opcodes.
When you run pickle.loads(), Python spins up a Stack-Based Virtual Machine. It reads the opcodes one by one and executes them to mathematically rebuild the object in RAM. For example:
- I (INT): Tells the VM to create an integer.
- S (STRING): Tells the VM to create a string.
- L (LIST): Tells the VM to create an empty list on the stack.
- A (APPEND): Tells the VM to pop the top item and append it to the list.
- . (STOP): Ends the execution.
🛠️ Day 16 Project: Disassembling the Machine
I have built the Serialization Engine in the official GitHub repository. It includes a MiniPickle class that builds a Stack-Based Virtual Machine entirely from scratch so you can see exactly how bytes are converted into RAM states.
- Observe the
loads()method. See how it iterates through the byte stream, matching opcodes (I,S,L) and pushing data onto a Pythonstack = []. - Run the
disassemble_real_pickle()block. We use Python's built-inpickletools.dis()to expose the actual C-level opcodes of a real Python object.
6. FAQ: Serialization Security
Why is Pickle considered a critical security vulnerability?
Pickle is a Virtual Machine executing opcodes. The real Pickle protocol contains a REDUCE opcode, which tells the VM to execute any arbitrary Python function to rebuild an object. If a hacker sends a maliciously crafted Pickle payload to your API, the REDUCE opcode can instruct your server to execute os.system('rm -rf /'). Never unpickle data from an untrusted source.
What is MessagePack?
Should I cache HTML strings or serialized JSON objects in Redis?
It depends on the consumption pattern. If you are caching a fully rendered widget that just needs to be blasted to the frontend, cache the final HTML string to save the CPU from re-rendering. If the data needs to be mutated, aggregated, or filtered by another backend service before hitting the user, cache it as a MessagePack or JSON byte stream.
What is the difference between Serialization and Marshalling?
Should I use Protocol Buffers (gRPC) instead of JSON?
For internal microservice-to-microservice communication, yes. Protocol Buffers (Protobuf) by Google serialize into a highly compressed binary format. Unlike JSON, it requires a strict schema (a .proto file) on both ends, making it faster and safely typed, but less human-readable for debugging.
7. Resources & Citations
Verify the concepts discussed in this article using the official documentation:
- Python Official Docs: The
picklemodule and security warnings. Explicitly states "never unpickle data received from an untrusted or unauthenticated source." - JSON Specification: JSON.org and RFC 8259, detailing the universal data interchange standard.
- High-Performance Serialization: The
orjsonGitHub repository detailing its Rust-based memory benchmarks vs the standard libraryjsonmodule. - V8 Engine Internals: MDN: Structured Clone Algorithm explaining how JavaScript serializes objects natively in memory.
Comments
Post a Comment
?: "90px"' frameborder='0' id='comment-editor' name='comment-editor' src='' width='100%'/>