Postgres Database, Data Types in Postgres , and The Write Penalty (2026)
BACKEND ARCHITECTURE MASTERY
Day 8: Databases Part 1 - Postgres, orjson, and The Physics of Writes
TL;DR (Summary)
In this architecture deep dive, we explore why PostgreSQL dominates backend development. You will learn the fundamental differences between RAM and Disk storage, how database indices cause a "Write Penalty," and how to leverage advanced Postgres data types like JSONB and UUIDs for high-concurrency systems.
Early in my career, I built an analytics dashboard that tracked user clicks. I stored everything in a massive, flat JSON file on the server. It worked beautifully in local testing. On launch day, two users tried to click a button at the exact same millisecond. The Python script locked the file for User A, crashed for User B, and corrupted the entire JSON structure. We lost a week of data. That day, I learned that a Database is not just a place to store data—it is an operating system designed entirely to survive concurrency.
1. The Storage Divide: Disk vs. RAM
Before we write SQL, you must understand the hardware. Every database architecture decision you make is a brutal negotiation between speed and survival.
- RAM-Based (In-Memory): Systems like Redis or Memcached store data in the server's volatile memory. Retrieving data takes nanoseconds. It is violently fast. The catch? If the server loses power, or the process restarts, the data is instantly annihilated. We use RAM for caching, session states, and ephemeral queues.
- Disk-Based (Persistent): Systems like PostgreSQL or MySQL write data to physical SSDs or NVMe drives. This guarantees survival. If the server catches fire, you can pull the drive out, plug it into a new motherboard, and your user's bank balances are still there. The catch? Disk I/O is exponentially slower (microseconds to milliseconds).
2. What Actually is a DBMS?
A Database Management System (DBMS) is not a hard drive. It is a highly complex software engine that sits between your application and the hard drive.
When 10,000 users try to buy the last Taylor Swift concert ticket at the exact same millisecond, the DBMS enforces Concurrency Control. It lines those requests up, locks the specific row in memory, processes the transaction, ensures it is mathematically valid, writes it safely to the physical disk (Durability), and unlocks the row. It prevents race conditions that would otherwise sell the same ticket 10,000 times. It is the ultimate referee of state.
3. Why PostgreSQL Won the Internet
In the 2010s, there was a massive hype cycle around NoSQL databases (like MongoDB) because developers were tired of rigid SQL schemas. Then, PostgreSQL updated its engine to include the JSONB data type. It allowed developers to store unstructured JSON documents inside a relational, strictly-typed SQL database, and query inside that JSON instantly.
Overnight, Postgres cannibalized 80% of the NoSQL use cases. In 2026, Postgres is the undisputed king of backend architecture. It is open-source, massively extensible, and robust enough to run the financial systems of entire countries while maintaining the flexibility of a document store.
4. Postgres Data Types (Stop Using VARCHAR)
Junior developers create tables using auto-incrementing integers for IDs and VARCHAR for everything else. Senior architects use specific types to optimize memory and security:
5. The Physics of Writes and The Write Penalty
Here is the brutal truth of Database Optimization: Every time you write data to a disk, you pay a tax. We call this the Write Penalty.
If you don't have indexes on your tables, finding a specific user requires a Table Scan. The database must physically read every single row on the disk from top to bottom until it finds the match. A Table Scan on 10 million rows will destroy your SQL Performance and take your API offline.
To fix Table Scans, we create a Database Index (specifically a B-Tree Index). Think of it like the index at the back of a textbook. It tells the database exactly what page the data lives on. However, indexes are physically duplicated data structures. If a table has 5 indexes, every single INSERT now requires six separate disk writes (one for the row, five to update the B-Trees). This multiplies your Write Penalty.
"Architecture is the art of trade-offs. 95% of web applications operate on a Read-Heavy Architecture. Users view their feeds, scroll products, and read articles 100 times more often than they post, buy, or write. Therefore, we gladly accept the severe Write Penalty of maintaining indexes, because the alternative—a catastrophic Table Scan during a read operation—is unacceptable."
6. Code: High-Concurrency Relational Logging
To mitigate the Write Penalty at the application layer, we don't open a new database connection for every query. We use Connection Pooling and binary protocols. Python's asyncpg is the fastest driver available because it bypasses string-parsing overhead.
Furthermore, we use orjson (a Rust-backed library) instead of Python's standard json. In high-throughput logging, JSON serialization is often the CPU bottleneck before the database even gets hit. orjson eliminates this.
async def run_complex_analytics(pool):
# Demonstrates powerful SQL querying: JOINs, Aggregations, and JSONB extraction.
async with pool.acquire() as conn:
# Querying INSIDE the metadata JSONB field and casting types
# We find requests where latency was > 2000ms by joining the services table
slow_query = """
SELECT s.service_name, l.created_at,
l.metadata->>'latency_ms' as latency,
l.metadata->>'endpoint' as endpoint
FROM log_events l
JOIN services s ON l.service_id = s.service_id
WHERE (l.metadata->>'latency_ms')::int > 2000
ORDER BY (l.metadata->>'latency_ms')::int DESC
LIMIT 5;
"""
slow_records = await conn.fetch(slow_query)
for r in slow_records:
print(f"Service: {r['service_name']} | Latency: {r['latency']}ms")
7. Alternatives: The NoSQL Ecosystem
While Postgres is the default, certain architectural problems require entirely different storage paradigms to escape relational limits.
- Document Stores (MongoDB): Optimized for rapidly changing schemas and massive horizontal scaling. Instead of tables and rows, you store loose JSON documents.
- Wide-Column Stores (Cassandra): Built for extreme write-heavy workloads. If you are logging millions of IoT sensor readings per second, Cassandra distributes the Write Penalty across a masterless cluster better than anything else.
- Graph Databases (Neo4j): Built to query relationships. If you need to find "Friends of Friends who also bought Product X", a relational database will choke on massive
JOINoperations. A Graph database traverses those nodes instantly.
We established that Table Scans are fatal, and Indexes fix them. But what exactly is an index under the hood? Tomorrow, we rip open the database engine to explore Index Cardinality, Clustered vs Composite Indexes, Postgres EXPLAIN ANALYZE, and the legendary B+ Tree data structure that powers the internet's search capabilities.
📚 Deep Diver Resources
If you want to master relational limits and NoSQL architectures, these are mandatory reading:
- Logic & Legacy: Advanced SQLite, Indices & The N+1 Problem - Before tackling the massive Postgres engine, read our deep dive into serverless SQLite architecture, Indexing fundamentals, and solving N+1 query leaks using
aiosqlite. - Apache Cassandra Architecture - How wide-column stores achieve masterless, distributed writes for IoT.
- Neo4j Graph Database Concepts - Understand when relationships are more important than the data itself.
- asyncpg Documentation - The official docs for the fastest Python Postgres driver in existence.
Frequently Asked Questions
Q: If memory is so fast, why don't we build entire databases in RAM?
A: We do! In-memory databases like Redis and Memcached are exactly this. However, RAM is incredibly expensive compared to SSD storage. Storing a 5TB relational dataset entirely in RAM would bankrupt a startup, and managing the persistence (saving snapshots to disk to prevent data loss on crash) becomes a massive engineering bottleneck.
Q: Why did you use `orjson` instead of the built-in Python `json` library?
A: Python's native json library is written in pure Python/C and is relatively slow. orjson is written in Rust. When you are processing millions of log entries or API payloads, CPU serialization becomes a massive bottleneck. orjson can serialize dictionaries to byte-strings magnitudes faster, freeing up the CPU to handle more concurrent requests.
Q: Should I use MongoDB for a new project because it's schema-less?
A: "Schema-less" is a dangerous myth. If your database doesn't enforce the schema, your application code must enforce it (checking if fields exist before accessing them, causing massive code bloat). For 90% of projects, Postgres with a well-designed relational schema—using JSONB columns for the truly dynamic parts like metadata—is much safer and more performant long-term.
Comments
Post a Comment
?: "90px"' frameborder='0' id='comment-editor' name='comment-editor' src='' width='100%'/>