Featured

Python Exception Handling: Custom Exceptions, try/except/else & Chaining (2026)

Day 24: Fault Tolerance — Exception Hierarchies & Custom Domains

18 min read Series: Logic & Legacy Day 24 / 30 Level: Senior Architecture

Context: We have connected to raw sockets and streamed our logs. But what happens when the network drops mid-connection? What happens when the JSON payload is missing a key? A crashed system is a tragedy; a silently failing system is a liability.

"Pokemon Exception Handling: Gotta Catch 'Em All"

The most dangerous code in a junior developer's repository looks like this:

try:
       do_something_complex() 
except Exception as e: 
# Catching literally everything 
        print("Something went wrong.")

This is a cardinal architectural sin. It swallows tracebacks, hides critical syntax errors, and forces the application to limp forward in a corrupted state. Senior Architects do not suppress errors; they manage them with surgical precision.

▶ Table of Contents 🕉️ (Click to Expand)
  1. The Hierarchy of Failure
  2. The Top 10 Core Exceptions
  3. The Anatomy of a Rescue (Try/Except/Else/Finally)
  4. Architecting Custom Domain Exceptions
  5. Exception Chaining (The "From" Keyword)

1. The Hierarchy of Failure (BaseException vs Exception)

In Python, exceptions are objects, and they follow a strict inheritance tree. At the very top of the universe sits BaseException.

Never, under any circumstances, use a bare except: or catch BaseException. Why? Because BaseException includes SystemExit and KeyboardInterrupt. If you catch it, you cannot shut down your own server using Ctrl+C. Your script becomes an immortal zombie.

Every logical error you care about inherits from Exception (which inherits from BaseException). When you must catch a broad error, you catch Exception.

2. The Top 10 Core Exceptions

A 4-quadrant infographic categorizing 10 common Python errors. Data Structure Faults (KeyError, IndexError) show a character struggling with dictionaries and lists. Data Integrity Faults (TypeError, ValueError) illustrate mismatched types and invalid values. Environmental Faults (FileNotFoundError, ImportError) depict missing files and broken virtual environments. Architectural Faults (AttributeError, NotImplementedError, ZeroDivisionError, RecursionError) show logical failures and infinite loops.

An Architect knows instantly what part of the system failed based on the exception name. Here are the 10 you must know, broken down by their physical reality:

Data Structure Faults

  • KeyError You asked a dictionary for a key that doesn't exist. Fix: Use dict.get('key', default) instead of `dict['key']`.
  • IndexError You asked a list/tuple for an item outside its mathematical bounds (e.g., asking for the 10th item in a 5-item list).

Data Integrity Faults

  • TypeError An operation was applied to an object of the wrong type (e.g., "Hello" + 5). This is why we use Type Hints.
  • ValueError The type is correct, but the actual value is mathematically invalid (e.g., int("Hello")).

Environmental Faults

  • FileNotFoundError The OS looked for a file on the physical disk and failed. (A subclass of OSError).
  • ImportError / ModuleNotFoundError Python could not find the file or package in its sys.path to import it. Usually a Virtual Environment failure.

Architectural Faults

  • AttributeError You tried to call a method or property that an object does not possess (e.g., [1, 2, 3].append_all()).
  • NotImplementedError An Abstract Base Class (ABC) demanded that a subclass write a specific method, and the developer forgot to do it.
  • ZeroDivisionError The universe broke.
  • RecursionError As discussed in Day 20, the OS ran out of physical Stack Frames in memory (usually past depth 1000).

3. The Anatomy of a Rescue (try / except / else / finally)

A technical flow diagram illustrating the execution path of a Python exception block. It maps the Danger Zone (Try) branching into three paths: Code Succeeds, which leads to the Safe Zone (Else); Code Fails, which leads to the Rescue Zone (Except); and Uncaught Error, which propagates out. All paths converge at the Guarantee Zone (Finally), showing that it executes regardless of whether the previous blocks succeeded or failed.

Most developers know try and except. But shoving 50 lines of code into a try block is a massive anti-pattern. If you put everything inside try, and a completely unrelated function raises a ValueError, your except ValueError: block will catch it by mistake, completely masking the true source of the bug.

To fix this, we use the else block. It strictly separates "the single line of code that might fail" from "the code that should run only if the dangerous code succeeded." Errors inside the else block are not caught by the preceding except block. This narrows your blast radius.

The Complete Execution Gate
def read_and_process_data(filepath: str):
    try:
        # The DANGER ZONE: ONLY put the code here that actually interacts with the disk
        file = open(filepath, 'r')
        data = file.read()
        
    except FileNotFoundError as e:
        # THE RESCUE: Handle specific known failures
        logger.error(f"Missing file: {e.filename}")
        return None
        
    else:
        # THE SAFE ZONE: Runs ONLY if the try block succeeded. 
        # If process_json() throws an error, it will NOT be caught by the except block above!
        return process_json(data)
        
    finally:
        # THE GUARANTEE: Runs no matter what. Succeeded? Failed? Returned early?
        # Doesn't matter. This runs. Critical for releasing OS resources.
        if 'file' in locals() and not file.closed:
            file.close()

Architect's Note: While finally is crucial for cleanup, in modern Python, this specific file-closing pattern is universally replaced by Context Managers (the `with` statement), which we covered in depth previously.

4. Architecting Custom Domain Exceptions

If a user tries to withdraw $500 from a bank account with a $100 balance, what exception should you raise?

A junior developer raises a ValueError("Insufficient funds"). This forces the higher-level API routing layer to parse strings to figure out what happened (e.g., if "Insufficient" in str(error): return 400). This is horrific design.

A Senior Architect creates Domain-Specific Exceptions. They inherit from Exception to create a custom hierarchy that the framework can catch gracefully.

Domain Exception Architecture
# 1. Create a Base Exception for your entire module/domain
class BillingError(Exception):
    """Base class for all billing-related faults."""
    pass

# 2. Inherit for specific, granular faults
class InsufficientFundsError(BillingError):
    def __init__(self, user_id: int, deficit: float):
        self.user_id = user_id
        self.deficit = deficit
        # Call the parent __init__ to set the error message string
        super().__init__(f"User {user_id} is short by ${deficit:.2f}")

# Inside your Business Logic:
def process_withdrawal(user, amount):
    if user.balance < amount:
        # We raise a structurally identifiable object, not just a string
        raise InsufficientFundsError(user.id, amount - user.balance)

# Inside your API/FastAPI Routing Layer:
try:
    process_withdrawal(current_user, 500)
except InsufficientFundsError as e:
    # We can catch the specific class and access its structured data directly!
    return {"error": "Funds too low", "shortfall_amount": e.deficit}, 400

5. Exception Chaining (The "from" Keyword)

Often, a low-level built-in exception (like a KeyError from a database row) needs to be translated into a high-level Domain Exception (like UserNotFoundError). If you just raise the new error inside the except block, you destroy the original traceback, making it impossible to see where the actual failure originated.

You must use exception chaining via the raise ... from ... syntax.

Traceback Preservation
def get_user_profile(user_id: str):
    try:
        # Imagine db is a dictionary that raises KeyError if user_id is missing
        return db[user_id]
    except KeyError as original_error:
        # We translate the unhelpful KeyError into a Domain Error, 
        # but we link them together using 'from original_error'
        raise UserNotFoundError(user_id) from original_error

In the console, Python will now print: "KeyError: 'user_99' ... The above exception was the direct cause of the following exception: UserNotFoundError."

🛠️ Day 24 Project: The Domain Fault Matrix

Build an unbreakable data ingestion pipeline.

  • Create a custom base exception DataIngestionError, and a subclass CorruptedPayloadError.
  • Write a try/except/else/finally block that attempts to parse a JSON string using json.loads().
  • If json.loads() throws a json.JSONDecodeError, catch it and chain it: raise CorruptedPayloadError from e. Use the else block to print the successfully parsed data.

🔥 PRO UPGRADE (The Retry Decorator)

When network endpoints fail, they often succeed on the second try. Your challenge: Write a custom Python @retry decorator. It should accept a tuple of Exceptions to catch (e.g., @retry(exceptions=(ConnectionError, TimeoutError), tries=3)). If the decorated function throws one of those errors, the decorator should catch it, sleep for 1 second, and run the function again until it runs out of tries.

6. FAQ: Exception Architecture

Is it bad practice to use exceptions for control flow?
In languages like C++ or Java, exceptions are notoriously slow and are reserved strictly for disasters. In Python, this is false. Python embraces the EAFP principle: "Easier to Ask for Forgiveness than Permission". It is actually faster and more Pythonic to try an operation and catch the exception (e.g., try: dict['key']) than it is to check if it's allowed first (e.g., if 'key' in dict:), provided the exception doesn't happen 99% of the time.
What does the `pass` keyword do in an except block?
It silently ignores the error. except ValueError: pass is dangerous unless you are absolutely certain the error is expected and requires no mitigation. If you must use it, it is mandatory to add a comment explaining why the error is being swallowed, otherwise you are building silent traps for future developers.

📚 Reliability Resources

Failure: Managed

You now have the power to let your system fail gracefully and predictably. Hit Follow to catch Day 25, where we map out system flow using The State Machine.

Comments