Configuring Pytest/shaare/bti1Ng

python

Configuring Pytest

As you start using Pytest extensively, typing -v or -m on the command line every time becomes tedious.
Centralize your defaults in pyproject.toml under the [tool.pytest.ini_options] table.
A single source of truth means every developer—and your CI system—runs tests with the same settings.
Putting Pytest alongside other PEP 518 tools (Black, isort, Flake8) keeps your repo tidy and consistent.

Why a Configuration File?

Consistency: Run pytest without remembering flags; everyone gets the same behavior.
Simplicity: Remove boilerplate from docs and CI scripts.
Project-specific discovery: Set testpaths, python_files and markers in one place.
Cleaner output: Declare markers to silence PytestUnknownMarkWarning, enable color and rich tracebacks by default.

Configuration File Hierarchy

Pytest searches for settings in this order, using the first match from the current or a parent directory:

pyproject.toml under [tool.pytest.ini_options]
pytest.ini
tox.ini with a [pytest] section
setup.cfg under [tool:pytest]

Embrace pyproject.toml as the modern hub for all your tool configurations.

Creating `pyproject.toml`

Create or open pyproject.toml at your project root.
Add a [tool.pytest.ini_options] table.
Define your defaults using TOML syntax and inline strings.

Common Configuration Options

addopts
Defines default command-line flags that Pytest applies on every run (verbosity, reporting, color, etc.).
markers
Pre-registers custom markers with descriptions so that you can categorize tests and avoid unknown-marker warnings.
testpaths
Restricts test discovery to the listed directories, preventing Pytest from scanning other parts of the project.
python_files
Specifies filename patterns that Pytest treats as test files (e.g., test_*.py).
python_classes
Indicates class name patterns Pytest will consider when looking for test classes (e.g., classes starting with Test).
python_functions
Sets function name patterns Pytest uses to identify individual test functions (e.g., functions beginning with test_).
Other options
- norecursedirs: directories to skip during discovery
- minversion: enforce a minimum Pytest version
- filterwarnings: configure how warnings are handled
- and many more built-in settings for fine-tuning

Example of pyproject.toml

from typing import TypedDict
import re

class TextAttributes(TypedDict):
    word_count: int
    unique_words: set[str]
    average_word_length: float
    longest_word: str

def calculate_text_attributes(input_text: str) -> TextAttributes:
    split_text = re.findall(r"\w+", input_text)
    word_length_sum = sum(len(word) for word in split_text)
    avg_word_length = (
        word_length_sum / len(split_text)
        if len(split_text)
        else 0
    )

    return {
        "word_count": len(split_text),
        "unique_words": set(text.lower() for text in split_text),
        "average_word_length": avg_word_length,
        "longest_word": (
            max(split_text, key=len) if split_text else ""
        ),
    }

Automated Testing with Pytest/shaare/HoeITw

python

Assertions in Pytest

Pytest uses Python’s built-in assert statement to declare expected conditions in tests, making test code concise and readable.
When an assert expression evaluates to True, execution continues; if it evaluates to False, an AssertionError is raised and Pytest marks the test as failed.
Pytest intercepts assertion failures to provide detailed, introspective feedback on why an assertion failed.

The `assert` Statement

The assert keyword checks that an expression is truthy; if it’s falsy, Python raises AssertionError.
You can append an optional message: assert expression, "message", which will be shown if the assertion fails.
In plain Python, assert x == 5 does nothing when true, while assert x == 10, "x should be 10" raises an error with that message if the condition is false.

Pytest and `assert`

Pytest enhances the built-in assert by inspecting the expression’s values and rewriting the failure message to show variable states.
Common assertion patterns include:
- Equality and inequality checks to compare expected versus actual values.
- Truthiness or falsiness checks to verify that objects are non-empty or evaluate to False.
- Membership checks using in or not in to assert presence or absence in containers.
- Comparison operators (<, >, <=, >=) to verify ordering conditions.

Pytest’s Rich Failure Output

When an assertion fails, Pytest displays the values from the expression and highlights exactly where they differ.

Asserting Floating-Point Numbers (`pytest.approx`)

Floating-point arithmetic can yield tiny precision errors, so direct equality comparisons may fail unexpectedly.
Pytest provides pytest.approx to compare floats within a tolerance, supporting both relative and absolute tolerances.

Asserting Exceptions (`pytest.raises`)

Use with pytest.raises(ExpectedException): as a context manager to assert that a block of code raises a specific exception.
You can include match="regex" to verify that the exception message matches a given pattern.
This allows testing both that the correct error type is raised and that its message contains expected details.

Common Pitfalls & How to Avoid Them

Avoid overly complex expressions in a single assert; break them into multiple simpler assertions for clarity.
Always use pytest.approx for floating-point comparisons to prevent false negatives from tiny precision differences.

from text_analysis import calculate_text_attributes
import pytest

# Section: The `assert` Statement

# Uncomment to play around with Python assertions

# x: int = 5

# assert x == 5  # Nothing will happen, because this is True
# assert (
#     x == 10
# ), "x should be 10, but it's not!"  # Raise an AssertionError

# Section: Pytest and `assert`

def test_string_equality() -> None:
    expected_status = "SUCCESS"
    actual_status = "success".upper()

    assert actual_status == expected_status

def test_word_count() -> None:
    text = "Deploying microservice to Kubernetes cluster."
    text_empty = ""

    assert (calculate_text_attributes(text)["word_count"]) == 5
    assert (
        calculate_text_attributes(text_empty)["word_count"]
    ) == 0

def test_unique_words() -> None:
    text = "Deploying microservice to Kubernetes cluster."
    text_with_duplicates = "Deploying deploying."
    text_empty = ""

    text_results = calculate_text_attributes(text)
    text_with_duplicates_result = calculate_text_attributes(
        text_with_duplicates
    )
    text_empty_results = calculate_text_attributes(text_empty)

    assert (len(text_results["unique_words"])) == 5
    assert (
        len(text_with_duplicates_result["unique_words"])
    ) == 1
    assert (len(text_empty_results["unique_words"])) == 0

def test_average_word_length() -> None:
    text = "Deploying microservice to Kubernetes cluster."  # 40 / 5 = 8
    text_with_duplicates = "Deploying deploying."  # 18 / 2 = 9
    text_empty = ""  # 0

    text_results = calculate_text_attributes(text)
    text_with_duplicates_result = calculate_text_attributes(
        text_with_duplicates
    )
    text_empty_results = calculate_text_attributes(text_empty)

    assert (text_results["average_word_length"]) == 8.0
    assert (
        text_with_duplicates_result["average_word_length"]
    ) == 9.0
    assert (text_empty_results["average_word_length"]) == 0.0

def test_longest_word() -> None:
    text = "Deploying microservice to Kubernetes cluster."  # microservice
    text_with_duplicates = "Deploying deploying."  # deploying
    text_empty = ""

    text_results = calculate_text_attributes(text)
    text_with_duplicates_result = calculate_text_attributes(
        text_with_duplicates
    )
    text_empty_results = calculate_text_attributes(text_empty)

    assert (
        text_results["longest_word"].lower()
    ) == "microservice"
    assert (
        text_with_duplicates_result["longest_word"].lower()
    ) == "deploying"
    assert (text_empty_results["longest_word"]) == ""

# Section: Pytest’s Rich Failure Output

@pytest.mark.xfail  # We're marking the test as an expected failure
def test_string_mismatch() -> None:
    expected = "HEllo WOrlD"
    actual = "hello world"

    assert expected == actual

# Section: Asserting Floating-Point Numbers (`pytest.approx`)

def test_float_with_approx() -> None:
    calculated_val = 0.1 + 0.2
    expected_val = 0.3

    assert calculated_val == pytest.approx(expected_val)  # type: ignore

# Section: Asserting Exceptions (`pytest.raises`)

def test_raises_exception() -> None:
    with pytest.raises(ZeroDivisionError):
        _division = 1 / 0

from typing import TypedDict
import re

class TextAttributes(TypedDict):
    word_count: int
    unique_words: set[str]
    average_word_length: float
    longest_word: str

def calculate_text_attributes(input_text: str) -> TextAttributes:
    split_text = re.findall(r"\w+", input_text)
    word_length_sum = sum(len(word) for word in split_text)
    avg_word_length = (
        word_length_sum / len(split_text)
        if len(split_text)
        else 0
    )

    return {
        "word_count": len(split_text),
        "unique_words": set(text.lower() for text in split_text),
        "average_word_length": avg_word_length,
        "longest_word": (
            max(split_text, key=len) if split_text else ""
        ),
    }

Adding Type Hints to Decorators and Generators/shaare/1guohQ

python

Adding Type Hints to Decorators and Generators

Decorators and generators are advanced constructs that require specialized type hints to make their transformations and data flows explicit.
Properly typed decorators allow MyPy to understand how they preserve or change function signatures.
Typed generators clarify the types of values yielded, values accepted via .send(), and final return values.

Typing Decorators

Decorators take a function (Callable) and return a new function; using Callable[..., Any] types them broadly but loses specific signature information.
To preserve the original function’s signature, define a TypeVar bound to Callable[..., Any] and use it for both the decorator’s input and output types.
Inside the decorator, the wrapper can use *args: Any, **kwargs: Any -> Any, while TypeVar ensures the decorated function’s overall type remains correct.

Typing Generators

Use Generator[YieldType, SendType, ReturnType] to specify a generator’s yield type, the type accepted by .send(), and its return type on completion.
If a generator does not use send(), set SendType to None; if it has no explicit return, set ReturnType to None.
The count_up generator is typed as Generator[int, None, str], yielding integers and returning a string message.
The accumulate_and_send generator is typed as Generator[float, float, None], yielding a running total, accepting floats via send(), and returning nothing.

Iterable & Iterator

For functions that consume sequences of items, use Iterable[T] to accept any iterable of T (lists, tuples, generators).
Use Iterator[T] when a function specifically expects an iterator object supporting __next__().

from typing import (
    Callable,
    Any,
    TypeVar,
    ParamSpec,
    Generator,
    Iterable,
)
import functools

# Section: Typing Decorators (simple_logging_decorator)

def simple_logging_decorator(
    func: Callable[..., Any],
) -> Callable[..., Any]:
    @functools.wraps(func)
    def wrapper(*args: Any, **kwargs: Any) -> Any:
        print(f"LOG: Calling {func.__name__}")
        result = func(*args, **kwargs)
        print(f"LOG: {func.__name__} returned {result}")

        return result

    return wrapper

@simple_logging_decorator
def add(x: int, y: int) -> int:
    return x + y

result_add = add(3, 5)

# Section: Typing Decorators (better_logging_decorator with TypeVar)

P = ParamSpec("P")
R = TypeVar("R")

def better_logging_decorator(
    func: Callable[P, R],
) -> Callable[P, R]:
    @functools.wraps(func)
    def wrapper(*args: P.args, **kwargs: P.kwargs) -> Any:
        print(f"LOG: Calling {func.__name__}")
        result = func(*args, **kwargs)
        print(f"LOG: {func.__name__} returned {result}")

        return result

    return wrapper

@better_logging_decorator
def subtract(x: int, y: int) -> int:
    return x - y

result_subtract = subtract(3, 5)

# Section: Typing Generators

def count_up_to(limit: int) -> Generator[int, None, str]:
    for i in range(limit):
        yield i

    return "Counting complete!"

def accumulate_and_send() -> (
    Generator[float, float | None, None]
):
    total = 0.0

    try:
        while True:
            sent = yield total

            if sent:
                total += sent
    except GeneratorExit:
        pass

test_accumulate = accumulate_and_send()
next(test_accumulate)
print(test_accumulate.send(1.0))
print(next(test_accumulate))
print(test_accumulate.send(2.0))
print(test_accumulate.send(3.0))
print(next(test_accumulate))

# Section: Iterable & Iterator

def process_items(items: Iterable[str]) -> list[str]:
    return [item.upper() for item in items]

print(process_items(["a", "b"]))
print(process_items(("a", "b")))
print(process_items({"a", "b"}))
print(process_items({"a": "b", "hello": "world"}))

Generics typing/shaare/8cGpBA

python

Introduction to Generics

Generic types let you write reusable, type-safe functions and classes that work uniformly across different data types.
They preserve the relationship between input and output types, enabling MyPy to infer precise types instead of falling back to Any.
The typing module’s TypeVar and Generic primitives unlock this capability.

The Need for Generics

Annotating with Any sacrifices type information, so tools cannot guarantee correct usage of returned values.
A generic abstraction retains knowledge of the specific type in each context, improving IDE support and static checks.
For example, a "first-item" function should return str for a list[str] and int for a list[int], not just Any.

Defining Type Variables

T = TypeVar('T') declares a placeholder type variable T that can stand for any type.
A function annotated def get_first_item_generic(data: List[T]) -> Optional[T]: returns an element of the same type as the list elements.
MyPy infers T from each call site, preserving specific return types like Optional[str] or Optional[int].

Constrained Type Variables

When a generic should only accept certain types, constrain it: NumberType = TypeVar('NumberType', int, float).
Functions like def add_generic_numbers(x: NumberType, y: NumberType) -> NumberType: then only accept int or float and return that same type.
Constrained type variables combine flexibility with necessary restrictions for safe operations.

Bounded Type Variables

When a generic should only accept subclasses of a specific superclass, we can use a type bound, constrain it: NumberType = TypeVar('NumberType', bound=Superclass).
Functions like def add_generic_numbers(x: NumberType, y: NumberType) -> NumberType: then accept any subclass of Superclass, and they can be different subclasses for each argument.
Like constrained type variables, bounded type variables provide useful functionalities combining flexibility and type safety.

Generic Classes

Inherit from Generic[T] to define a class parameterized by a type variable T.
A class like SimpleStack[T] can push, pop, and peek items of type T, and MyPy will enforce that only T instances are used.
This pattern creates custom container types that maintain strong type guarantees for their contents.

Common Pitfalls & How to Avoid Them

A class that uses T in its methods but does not inherit from Generic[T] is not recognized as generic by MyPy.
Unconstrained TypeVar('T') can degrade type safety when operations require certain capabilities—use bounds or explicit type lists when appropriate.

from typing import Optional, TypeVar, Generic

# Section: Defining a generic function to get the first item of a list

T = TypeVar("T")

def get_first_item(
    input_list: list[T],
) -> Optional[T]:
    if input_list:
        return input_list[0]

    return None

first_number = get_first_item([1, 2, 3])
first_str = get_first_item(["abc", "def"])
first_mixed_list = get_first_item(["abc", "def", 1, 2, 3])

# Section: Constrained TypeVar for numeric addition

NumberType = TypeVar("NumberType", int, float)

def add_generic_numbers(
    x: NumberType, y: NumberType
) -> NumberType:
    return x + y

sum_int = add_generic_numbers(3, 5.0)

# Section: Bounded TypeVar with deployed filter for DevOps resources

class CloudResource:
    def __init__(self, name: str, cpu_usage: float) -> None:
        self.name = name
        self.cpu_usage = cpu_usage
        self.deployed: bool = False

    def deploy(self) -> None:
        print(f"Deploying {self.name}")
        self.deployed = True

class VirtualMachine(CloudResource):
    def reboot(self) -> None:
        print(f"Rebooting VM {self.name}")

class DockerContainer(CloudResource):
    def restart(self) -> None:
        print(f"Restarting container {self.name}")

ResourceType = TypeVar("ResourceType", bound=CloudResource)

def filter_deployed(
    resources: list[ResourceType],
) -> list[ResourceType]:
    return [
        resource for resource in resources if resource.deployed
    ]

vm1 = VirtualMachine("vm-01", cpu_usage=65.0)
vm2 = VirtualMachine("vm-02", cpu_usage=45.0)
container1 = DockerContainer("api-service", cpu_usage=85.0)
container2 = DockerContainer("worker", cpu_usage=55.0)

vm1.deploy()
container1.deploy()

all_resources = [vm1, vm2, container1, container2]
deployed_resources = filter_deployed(all_resources)

# Section: Generic class SimpleStack

G = TypeVar("G")

class SimpleStack(Generic[G]):
    def __init__(self) -> None:
        self._items: list[G] = []

    def push(self, item: G) -> None:
        self._items.append(item)

    def pop(self) -> G:
        if self.is_empty():
            raise IndexError("Stack is empty!")
        return self._items.pop()

    def peek(self) -> Optional[G]:
        if self.is_empty():
            return None

        return self._items[-1]

    def is_empty(self) -> bool:
        return not self._items

str_stack = SimpleStack[str](http://)
str_stack.push("str")

int_stack = SimpleStack[int](http://)
int_stack.push(12)

Typing classes/shaare/Kvqj4A

python

Introduction

As our Python automation projects grow, defining custom classes helps model complex objects and should be reflected in type hints for clearer code and stronger static checking.
Annotating functions and methods with user-defined classes lets MyPy verify correct usage of attributes and methods.

Classes as Type Hints

Any class you define becomes a valid type; you can annotate parameters and return values with it.
MyPy will ensure that calls to such functions pass instances of the expected class and that attribute access matches the class definition.

Hinting Methods Within a Class

Inside class methods, self is implicitly the class type; annotate other parameters and return types normally.
MyPy checks method bodies to ensure you only access attributes and call methods that exist on the class.
New in Python 3.11: You can use typing.Self for methods that return the instance, for example def clone(self) -> Self:.

Forward References (Strings)

Use string literals for type hints when referring to a class that is defined later in the file or in circular scenarios.
Enabling from __future__ import annotations defers evaluation of all annotations, simplifying forward references.

from __future__ import annotations
from typing import Self, Optional

# Section: Classes as Type Hints

class Server:
    def __init__(
        self,
        hostname: str,
        ip_address: str,
        os_type: str = "Linux",
    ):
        self.hostname: str = hostname
        self.ip_address: str = ip_address
        self.os_type: str = os_type
        self.is_online: bool = False

    def connect(self) -> None:
        print(
            f"Connecting to {self.hostname} (IP address: {self.ip_address})"
        )
        self.is_online = True
        print(f"{self.hostname} is online.")

    def get_status(self) -> str:
        return "online" if self.is_online else "offline"

def deploy_app_to_server(
    target_server: Server, app_name: str
) -> bool:
    print(
        f"Deploying {app_name} to server: {target_server.hostname}"
    )

    if not target_server.is_online:
        target_server.connect()

    print(
        f"Deployment of {app_name} to {target_server.hostname} successful."
    )
    return True

web_server = Server(
    hostname="web01.dev.local", ip_address="10.0.1.10"
)
db_server = Server(
    hostname="db01.dev.local", ip_address="10.0.2.20"
)

deploy_app_to_server(web_server, "FrontendApp")
deploy_app_to_server(db_server, "UserDBApi")

# Section: Hinting Methods Within a Class

class Calculator:
    def __init__(self, initial_value: int | float = 0):
        self.total: int | float = initial_value

    def add(self, value: int | float) -> Self:
        self.total += value

        return self

    def subtract(self, value: int | float) -> Self:
        self.total -= value

        return self

    def multiply_by(self, value: int | float) -> Self:
        self.total *= value

        return self

    def divide_by(self, value: int | float) -> Self:
        self.total /= value

        return self

    def get_total(self) -> int | float:
        return self.total

my_calc = Calculator(1)

print(my_calc.add(2).subtract(1).multiply_by(10).get_total())

# Section: Forward References (Strings)

class Employee:
    def __init__(
        self, name: str, manager: Optional[Employee] = None
    ) -> None:
        self.name: str = name
        self.manager: Optional[Employee] = manager
        self.reports: list[Employee] = []

    def add_report(self, report: Employee) -> None:
        self.reports.append(report)

ceo = Employee("ceo")
manager1 = Employee("Alice", ceo)
ceo.add_report(manager1)

Typing/shaare/4A6gAQ

python

Introduction

Python is a dynamically typed language, meaning you can assign values to variables without declaring their types, and type checking happens at runtime.
While this offers rapid development and flexibility, it can lead to ambiguity and late discovery of type-related bugs in larger or collaborative projects.
Type hints (PEP 484, introduced in Python 3.5) let you optionally annotate your code with expected types for variables, function parameters, and return values without changing Python’s runtime behavior.
These annotations are leveraged by static type checkers (e.g., MyPy), IDEs for better autocompletion and error highlighting, and by developers for clearer, more maintainable code.

Why Use Type Hints?

Type hints improve readability by making explicit what data types functions expect and return, which is invaluable when navigating unfamiliar or legacy code.
Static type checkers like MyPy can catch mismatches between hinted and actual types before the code runs, surfacing bugs early in the development cycle.
IDEs (e.g., VS Code, PyCharm) use hints to enhance autocompletion accuracy, provide inline type checking, and support safe refactoring.
Explicit annotations act as a contract in collaborative environments, helping team members understand and correctly use each other’s code.
For example, annotating a function as def process_user_data(user: dict) -> bool: makes it clear that the function expects a dict and returns a bool.

Basic Type Hint Syntax

To annotate a variable, use variable_name: type = value. This syntax (variable annotations) was introduced in Python 3.6 (PEP 526).
Example: config_path: str = "/etc/app.conf" indicates that config_path is intended to be a string.
Function parameters are annotated with param_name: param_type, and the return type is specified after -> before the colon.
Example: def get_server_status(hostname: str, port: int) -> str: declares that the function takes a str and an int, and returns a str.

Common Built-in Types for Hinting

Standard built-ins such as int, float, bool, str, and bytes are directly usable in annotations.
Collections can be hinted with list, tuple, set, and dict. For more precise element types:
- In Python 3.9 and later (PEP 585), you can use built-in generics: list[int], dict[str, int].
- In earlier versions, import from the typing module: from typing import List, Dict and use List[int], Dict[str, int].
The special type None is used for functions that do not return a meaningful value (e.g., -> None).
Advanced types like Optional, Union, and others will be covered when exploring the typing module in a later lecture.

Python Remains Dynamically Typed

Type hints do not alter Python’s runtime behavior; passing arguments of the wrong type won’t raise a hint-related error unless an operation in the code fails for the actual type.
For instance, calling process_id("user-123") on a function annotated as def process_id(user_id: int) -> None: runs without a hint-triggered error, though passing a string where an integer is expected may lead to a TypeError later if arithmetic is attempted.
Static analysis tools flag these mismatches before execution, but Python itself enforces types only when invalid operations occur at runtime.

Common Pitfalls & How to Avoid Them

Believing hints enforce types at runtime: Hints guide tools and developers, but Python ignores them unless you use a runtime checking library.
Over-hinting or incorrect hints: Overly complex or wrong annotations can confuse readers and static checkers; start simple and use Any for truly dynamic values.
Forgetting typing imports: When using List[int], Optional[str], etc., remember to import them from the typing module (unless you rely on built-in generics in Python 3.9+).
Relying on hints for untyped libraries: If a third-party library lacks type hints or has them in separate stub files, static analysis may be limited—consult documentation or stub packages.

# Section: Basic Type Hint Syntax - Variable Annotations
config_path: str = "/etc/app.conf"
retry_count: int = 3
is_enabled: bool = bool(1)
servers: list[str] = ["web01", "web02"]
settings: dict[str, int | str] = {"port": 8080, "user": "admin"}

# Section: Basic Type Hint Syntax - Function Argument and Return Type Annotations
def get_server_status(hostname: str, port: int) -> str:
    print(f"Checking {hostname}:{port}")
    if port == 80:
        return "Online"
    else:
        return "Unknown"

# Section: Python Remains Dynamically Typed
def process_id(user_id: int) -> None:
    print(
        f"Processing user ID: {user_id} (type: {type(user_id)})"
    )

# Demonstration of dynamic typing
process_id(1234)
# process_id("user-1234") # Uncommenting will lead to a static type checking error.

Common Types in Python

Python’s built-in dynamic typing allows rapid development without declaring variable types, but it can lead to ambiguous code and late discovery of type errors in larger projects.
The typing module provides specialized type constructors to precisely describe the contents of collections (list, dict, tuple, set) and other complex scenarios.
By using these constructors, you gain clearer documentation, stronger static analysis with tools like MyPy, and richer IDE support without changing Python’s runtime behavior.

The `typing` Module

On Python 3.9+, built-in generics (list[int], dict[str, str], tuple[int, ...], set[str], frozenset[int]) are available via PEP 585, deprecating typing.List etc. for these cases.
Import specific constructors from typing, for example: List, Dict, Tuple, Set, FrozenSet, Optional, Union, Any.
Using typing remains necessary for compatibility with older versions (Python 3.7/3.8) and for constructs like Optional, Union, Literal, and TypedDict.

Typing Lists

Use list[X] (or List[X] in Python < 3.9) to indicate a list whose elements are of type X.
This makes it explicit if a function expects a list of strings (list[str]) or integers (list[int]), enabling static checkers to catch mismatches.

Typing Dictionaries

Use dict[K, V] (or Dict[K, V] in Python < 3.9) to specify a dictionary with keys of type K and values of type V.
You can nest generics, for example dict[int, list[str]], to model complex structures like mapping user IDs to role lists.

from typing import TypedDict, NotRequired

class User(TypedDict):
    id: int
    name: str
    email: str
    phone: NotRequired[str]

user: User = {
    "id": 123,
    "name": "Alice",
    "email": "alice@example.com",
    "phone": "+123456789",
}

print(f"User data: {user.get("email")}")

Typing Tuples

Fixed-length tuples with heterogeneous types use tuple[T1, T2, ...] (or Tuple[T1, T2, ...] in Python < 3.9).
Variable-length tuples of a uniform type use tuple[T, ...] (or Tuple[T, ...]), though lists are often more natural for that use case.

Typing Sets

Use set[X] (or Set[X] in Python < 3.9) to indicate a set containing elements of type X.
This clarifies that operations like membership checks (in) will compare values of the declared type.
Note: For immutable sets, use frozenset[X] (or FrozenSet[X] in Python < 3.9).

Union[X, Y, ...] for Multiple Possible Types

Use Union[...] when a value may be exactly one of several types (excluding None unless explicitly included).
As of Python 3.10 you can write int | str instead of Union[int, str].

Optional[X] for Values That Can Be None

Optional[X] is shorthand for Union[X, None], indicating a value may be of type X or None.
Static checkers will warn if you use an Optional value without first checking for None.

Any for Unrestricted Types

Any disables type checking for the annotated part, useful during gradual typing of legacy code or when truly dynamic types are needed.
Overuse negates the benefits of static analysis, so prefer specific types whenever possible.

Common Pitfalls & How to Avoid Them

Built-in Generics on Older Python: Syntax like list[int] only works on Python 3.9+; use typing.List[int] for Python 3.7/3.8 compatibility.
Subtle Optional Defaults: def func(arg: Optional[str] = None) clearly allows None as a default, whereas def func(arg: str = None) may confuse static checkers.
Excessive Any: Reserving Any for truly dynamic cases preserves the value of static checking elsewhere in your code.

from typing import Optional, Any

# Section: Typing Lists

hostnames: list[str] = ["web01.example.com", "db01.example.com"]
open_ports: list[int] = [80, 443, 22]

def process_hostnames(hosts: list[str]) -> None:
    for host in hosts:
        print(f"Processing host: {host.upper()}")

process_hostnames(hostnames)
# process_hostnames(open_ports) # Uncommenting will lead to type error

# Section: Typing Dictionaries

server_config: dict[str, str] = {
    "hostname": "app01.prod",
    "ip_address": "10.0.5.20",
    "os_type": "Linux",
}

user_roles: dict[str, list[str]] = {
    "user-123": ["admin", "editor"],
    "user-456": ["dev", "viewer"],
}

# Section: Typing Tuples

server_status: tuple[str, int, bool] = (
    "api.example.com",
    443,
    True,
)

ip_parts: tuple[int, ...] = (192, 168, 1, 100)

# Section: Typing Sets

admin_users: set[str] = {"alice", "bob", "charlie"}

def is_admin(username: str, admins: set[str]) -> bool:
    return username in admins

# Section: Union[X, Y, ...] for Multiple Possible Types

identifier: str | int = "abcde-1234"
identifier = 1234

def process_mixed_data(data: list[int | str]) -> None:
    for item in data:
        if isinstance(item, str):
            print(f"Processing string: {item.upper()}")
        else:
            print(f"Processing int: {item * 2}")

# Section: Optional[X] for Values That Can Be None

def find_user(user_id: str) -> Optional[dict[str, str]]:
    if user_id == "123":
        return {
            "id": "123",
            "name": "Admin user",
            "email": "admin@example.com",
        }

    return None

found_user = find_user("123")

if found_user:
    print(f"Found user: {found_user["name"]}")

# Section: Any for Unrestricted Types
def print_anything(item: Any) -> None:
    print(f"Item: {item}, type: {type(item)}")

print_anything(1)
print_anything("hello")

Implementing Retries and Timeouts/shaare/-hYM4Q

python

Implementing Retries and Timeouts

External services can be slow or unreliable, causing scripts to hang or fail unexpectedly.
Timeouts and retries help ensure your automation scripts remain responsive and resilient.

Timeouts

By default, requests may wait indefinitely for a response, which is risky in automation.
Use the timeout parameter with a single value for both connect and read, or a tuple (connect, read) for fine-grained control.
A ConnectTimeout is raised if the connection can’t be established in time; a ReadTimeout is raised if data stops arriving within the read timeout.

HTTPBIN_ENDPOINT = "https://httpbin.org"

import requests
import time

delay_url = f"{HTTPBIN_ENDPOINT}/delay/5" # Simulate a 5-second delay

start = time.perf_counter()

try:
    res = requests.get(delay_url, timeout=2)
    print(f"Completed in {time.perf_counter() - start:.2f}s, status {response.status_code}")
except (
    requests.exceptions.ConnectTimeout,
    requests.exceptions.ReadTimeout
) as timeout_err:
    print(f"Timeout after {time.perf_counter() - start:.2f}s: {timeout_err}")

Retries

Transient issues like network blips or server overloads may cause requests to fail temporarily.
Implement a simple retry loop that catches errors, retries on server-side (5xx) errors or network exceptions, and breaks on success or client errors.
Use a fixed delay between retries for simplicity, or an exponential backoff for a more robust approach.
Avoid retrying non-idempotent operations.

import requests
import time

flaky_url = f"{HTTPBIN_ENDPOINT}/status/200,500,503"

max_retries = 3
delay = 2

for attempt in range(1, max_retries + 1):
    print(f"Attempt {attempt}/{max_retries}...")

    try:
        res = requests.get(flaky_url, timeout=10)
        res.raise_for_status()
        print(f"Succeeded with status {res.status_code}")
        break
    except requests.exceptions.HTTPError as err:
        if err.response.status_code < 500:
            print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
            break
        else:
            print(f"Failed with server error code {err.response.status_code}.")
    if attempt < max_retries:
        print(f"Waiting {delay}s before retry...")
        time.sleep(delay)
else:
    print(f"All {max_retries} attempts failed!")

Exponential Backoff with Jitter

Fixed delays can overwhelm a recovering server if many clients retry simultaneously.
Exponential backoff increases the wait time after each failure (e.g., 1s, 2s, 4s...).
Adding jitter (a small random offset) prevents synchronized retry spikes.

import requests
import time
import random

def get_with_backoff(url, max_retries=3):
    delay=1

    for attempt in range(1, max_retries + 1):
        print(f"Attempt {attempt}/{max_retries}...")

        try:
            res = requests.get(url, timeout=10)
            res.raise_for_status()
            print(f"Succeeded with status {res.status_code}")
            return res
        except requests.exceptions.HTTPError as err:
            if err.response.status_code < 500:
                print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
                raise RuntimeError(f"Client error! Please review request.")
            else:
                jitter = random.uniform(-0.1 * delay, 0.1 * delay)
                # delay = 1 -> jitter [-0.1, 0.1] -> 0.9 and 1.1s
                # delay = 2 -> jitter [-0.2, 0.2] -> 1.8 and 2.2s
                # delay = 4 -> jitter [-0.4, 0.4] -> 3.6 and 4.4s
                wait = min(delay * 2, 30) + jitter
                print(f"  Failed with server error code {err.response.status_code}. Retrying in {wait:.2f}s")
                time.sleep(wait)
                delay = min(delay * 2, 30)
    raise RuntimeError(f"All retries to query {url} failed!")

try:
    res = get_with_backoff(
        f"{HTTPBIN_ENDPOINT}/status/503",
        max_retries=4
    )
except RuntimeError as e:
    print(e)

Common Pitfalls & How to Avoid Them

Forgetting to set timeouts can cause scripts to hang indefinitely; always use timeout.
Retrying client errors (4xx) usually won’t help; only retry transient server errors (5xx) or network issues.
Retrying non-idempotent operations (e.g., POST) can cause duplicate actions; limit retries to safe methods.
Fixed retry delays can lead to synchronized retry spikes; use exponential backoff with jitter for production scenarios.
python

Handling Authentication/shaare/wD94Ag

python

Handling Authentication

APIs often require authentication to control access, rate limits, and auditing.
Without authentication, requests to protected endpoints will fail with codes like 401 (Unauthorized) or 403 (Forbidden).
This section demonstrates a simple GET to a protected endpoint, illustrating why auth is needed.

Why Authentication?

Authentication tells the API who you are, enabling personalized data and higher rate limits.
It prevents unauthorized access to private resources and supports auditing of actions.
Authenticated requests often succeed where anonymous requests would be blocked or limited.

GITHUB_ENDPOINT = "https://api.github.com"
HTTPBIN_ENDPOINT = "https://httpbin.org"

import requests

urls = {
    "public_endpoint": f"{GITHUB_ENDPOINT}/zen",
    "protected_endpoint": f"{GITHUB_ENDPOINT}/user",
}

for description, url in urls.items():
    res = requests.get(url, timeout=5)
    print(f"{description} ({url}) : {res.status_code}")
    print(res.text[:200])

Basic Authentication

Basic Auth sends a username and password with each request, encoded in the Authorization header.
requests accepts an auth=(username, password) tuple and handles encoding automatically.
Servers return 401 Unauthorized when credentials are missing or incorrect.

import requests
import json

url = f"{HTTPBIN_ENDPOINT}/basic-auth/myuser/myotherpwd"

try:
    res = requests.get(url, auth=("myuser", "mypasswd"), timeout=10)
    res.raise_for_status()
    print(f"Status code: {res.status_code}")
    print("Response JSON:")
    print(json.dumps(res.json()))
except requests.exceptions.HTTPError as err:
    print(err)

Token-Based Authentication

Modern APIs use API keys or bearer tokens passed via the Authorization header.
For GitHub PATs, use Authorization: token <PAT> or Authorization: Bearer <PAT>; for OAuth2, Authorization: Bearer <token>.
Always load tokens from environment variables to avoid hardcoding secrets.

import requests
import os
from dotenv import load_dotenv

load_dotenv(override=True)

token = os.getenv("GH_PAT", "")
print(f"Token: {token[:15]}")

urls = {
    "public_endpoint": f"{GITHUB_ENDPOINT}/zen",
    "protected_endpoint": f"{GITHUB_ENDPOINT}/user",
}

for description, url in urls.items():
    try:
        headers = {
            "Authorization": f"Bearer {token}"
        }
        res = requests.get(url, headers=headers, timeout=10)
        res.raise_for_status()
        print(f"Status code: {res.status_code}")
        print(f"Authenticated user: {res.json().get("login")}")
    except requests.exceptions.JSONDecodeError as err:
        print(f"Invalid JSON in response body. Defaulting to text:")
        print(res.text[:200])
    except requests.exceptions.HTTPError as err:
        print(err)

Common Pitfalls & How to Avoid Them

Using the wrong header format (e.g., Bearer vs token) causes 401/403 errors. Follow API docs.
Hardcoding secrets risks accidental exposure; always use environment variables or secret managers.
python

Handling Errors and Status Codes/shaare/3pMTRg

python

Handling Errors and Status Codes

HTTP status codes communicate the outcome of an API request, and handling them correctly is key to robust automation.
A simple 200 OK means success, while codes like 404 Not Found or 500 Internal Server Error indicate different failure modes.
In this lecture, we’ll learn how to check status codes, use response.ok, raise errors automatically, and inspect error details for troubleshooting.

Understanding HTTP Status Codes

Status codes are grouped by their first digit: 1xx (informational), 2xx (success), 3xx (redirection), 4xx (client error), 5xx (server error).
Examples include 200 OK, 201 Created, 301 Moved Permanently, 404 Not Found, and 500 Internal Server Error.
Knowing these categories helps you decide how to handle each response in your scripts.

Checking `response.status_code`

After a requests call, the integer response.status_code tells you the exact HTTP code returned.
You can compare it directly (e.g., if resp.status_code == 404:) to implement custom logic based on the code.
This explicit check is useful when you need fine-grained control over specific status codes.

GITHUB_ENDPOINT = "https://api.github.com"
HTTPBIN_ENDPOINT = "https://httpbin.org"

import requests

urls = {
    "ok": f"{GITHUB_ENDPOINT}/zen",
    "not_found": f"{GITHUB_ENDPOINT}/nonexistentendpoint"
}

for description, url in urls.items():
    response = requests.get(url, timeout=5)
    print(f"{description}: status {response.status_code}")

Using `response.ok`

The boolean response.ok is True for any status code below 400 (1xx, 2xx, 3xx) and False for 4xx/5xx errors.
This provides a quick success/failure check without examining the numeric code directly.
It’s a handy shorthand when you only need to know if the request broadly succeeded.

import requests

urls = {
    "ok": f"{GITHUB_ENDPOINT}/zen",
    "not_found": f"{GITHUB_ENDPOINT}/nonexistentendpoint"
}

for description, url in urls.items():
    response = requests.get(url, timeout=5)
    print(f"{description}: ok? {"Yes" if response.ok else f"No. Failed with status {response.status_code}"}")

Automatic Error Raising with `raise_for_status()`

Calling response.raise_for_status() will do nothing on 1xx, 2xx and 3xx codes but raise an HTTPError on 4xx/5xx.
This follows the EAFP (“Easier to Ask Forgiveness than Permission”) style: try the request, and catch errors if they occur.
The caught exception carries the original response in its response attribute, letting you inspect headers and body.

import requests
import json

urls = {
    "ok": f"{GITHUB_ENDPOINT}/zen",
    "not_found": f"{GITHUB_ENDPOINT}/nonexistentendpoint"
}

for url in urls.values():
    print(f"Requesting: {url}")
    try:
        res = requests.get(url, timeout=5)
        res.raise_for_status()
        print("  Success!")
    except requests.exceptions.HTTPError as err:
        print(f"  HTTPError: {err} (status {err.response.status_code})")
        try:
            details = err.response.json()
            print("  Error details:")
            print(json.dumps(details, indent=2))
        except ValueError:
            print(f"  Non-JSON response body: {err.response.text[:100]}")

Common Pitfalls & How to Avoid Them

Not checking errors: Treating any response as success can mask failures. Always use ok or raise_for_status().
Catching too broadly: A generic except Exception: hides HTTP errors. Catch HTTPError specifically.
Ignoring error bodies: APIs often return JSON error messages; inspect response.text or response.json().

Making HTTP Requests/shaare/WWNtmg

python

Making HTTP Requests

The requests library simplifies HTTP interactions by abstracting raw HTTP details, making it ideal for DevOps automation tasks.
It allows you to query web services, trigger CI/CD builds, manage cloud resources, and integrate with APIs like GitHub in a straightforward way.
In this notebook, we'll demonstrate installing requests, performing GET and POST requests, inspecting response data, and customizing requests with parameters and headers.
The requests library is a third-party package and must be installed in your active virtual environment.
Use pip install requests==2.32.2 to add it to your project (pinning the version here so that we all work with the same version, but in other projects you can omit the version to install the latest), and consider pinning its version in requirements.txt.

GITHUB_ENDPOINT = "https://api.github.com"

Making GET Requests with `requests.get()`

The GET method retrieves data from a specified URL; it’s the most common HTTP request type.
Key parameters include url, optional params for query strings, headers for custom HTTP headers, and timeout to avoid hanging requests.
The returned Response object provides .status_code, .headers, .text, .content, and .json(), plus .raise_for_status() to handle HTTP errors.

import requests
import json

response = requests.get(GITHUB_ENDPOINT, timeout=10)

print(f"Status code: {response.status_code}")
print(f"Content-Type: {response.headers.get("Content-Type")}")

"""
# Commenting out for brevity, but leaving for documentation

print(".text attribute:")
print(response.text)
print("\n")
print(".content attribute:")
print(response.content)
print("\n")
print(".json() method:")
print(response.json())
"""

data = response.json()
print("Available endpoints:")
print(json.dumps(data, indent=2))

Passing URL Parameters with `params`

Query parameters are passed as a dictionary to the params argument, and requests handles URL-encoding automatically.
This makes it easy to filter, sort, or paginate API results without manually constructing the query string.
You can inspect the final URL via response.url to confirm your parameters were applied correctly.

import requests
import json

search_url = f"{GITHUB_ENDPOINT}/search/repositories"
query_params = {
    "q": "python devops",
    "sort": "stars",
    "order": "desc",
    "per_page": 5
}

response = requests.get(search_url, params=query_params, timeout=10)
response.raise_for_status()

print(f"Requested URL: {response.url}")
results = response.json()

print(f"Found {results.get("total_count")} repositories. Top 5:")
for repo in results.get("items", []):
    print(f"- {repo["name"]} (Stars: {repo["stargazers_count"]})")

print(json.dumps(results.get("items", [])[0], indent=2))

Making POST Requests with `requests.post()`

Use requests.post() to send data to a server, choosing between data= for form-encoded bodies or json= for JSON payloads.
Providing a dictionary to json= automatically serializes it and sets Content-Type: application/json.
The response can be inspected similarly to GET responses, using .status_code, .json(), and error handling.

import requests
import json

post_echo_url = "https://httpbin.org/post"

payload = {
    "script_name": "devops_automation",
    "action": "trigger_deployment",
    "environment": "staging",
    "version": "v1.5.0"
}

response = requests.post(post_echo_url, json=payload, timeout=10)
response.raise_for_status()

print(json.dumps(response.json(), indent=2))

Common Pitfalls & How to Avoid Them

Not setting timeouts can cause scripts to hang indefinitely; always include a timeout value.
Ignoring HTTP errors means you might assume success when a request failed; use response.raise_for_status().
Using data= instead of json= sends form-encoded data, which may be rejected by modern APIs expecting JSON.
Hardcoding secrets in code is insecure; use environment variables (e.g., via python-dotenv) and pass them in headers.

Handling Subprocess Errors/shaare/wBVnmg

python

Handling Subprocess Errors

External commands can fail in multiple ways: non-zero exit codes, missing executables, or hanging processes.
Using subprocess.run(..., check=True) shifts return-code checks into exceptions you can catch.
Specific exception types (CalledProcessError, FileNotFoundError, TimeoutExpired) let you distinguish failure modes and respond appropriately.

subprocess.CalledProcessError Attributes

e.returncode: the non-zero exit status of the command.
e.cmd: the exact command invoked (list or string form).
e.stdout / e.output: captured standard output, if capture_output=True.
e.stderr: captured standard error, if capture_output=True.
These attributes let you log or display detailed diagnostics when a command fails.

import subprocess

cmd = ["ls", "missing_dir"]

try:
    subprocess.run(cmd, check=True, capture_output=True, text=True)
except subprocess.CalledProcessError as err:
    print(f"Command executed: {err.cmd}")
    print(f"Return code {err.returncode}")
    print(f"STDOUT capture: {err.stdout}")
    print(f"STDERR capture: {err.stderr}")

Handling FileNotFoundError

If the executable itself isn’t in PATH, subprocess.run() raises FileNotFoundError before running.
Catching it separately lets you inform the user that a required tool isn’t installed, rather than treating it as a generic failure.

import subprocess

cmd = ["fakecmd", "--version"]

try:
    subprocess.run(cmd, check=True, capture_output=True, text=True)
except FileNotFoundError as err:
    print("FileNotFoundError caught!")
    print(f"  The command '{cmd[0]}' was not found on this system.")

Handling subprocess.TimeoutExpired

Adding timeout=<seconds> to subprocess.run() kills the process if it runs too long.
A TimeoutExpired exception is raised, containing cmd, timeout, and any partial stdout/stderr.
Use this to prevent hung scripts and to implement retry or fallback logic.

import subprocess

cmd = ["sleep", "5"]

try:
    subprocess.run(cmd, timeout=2, capture_output=True, text=True)
    print("Command completed within timeout.")
except subprocess.TimeoutExpired as err:
    print("TimeoutExpired caught!")
    print(f"  Command: {err.cmd}")
    print(f"  Timeout after {err.timeout} seconds")

Recommended Error Handling Strategy

Wrap subprocess.run() in a try block.
First catch FileNotFoundError to detect missing executables.
Next catch subprocess.TimeoutExpired if you use timeouts.
Then catch subprocess.CalledProcessError for non-zero exits.
Finally, if necessary, an except Exception block can log any other unexpected issues.
This layered approach keeps your script robust and your errors informative.

+++

Running External Commands with subprocess.run/shaare/K5w00Q

python

Running External Commands with `subprocess.run`

DevOps automation often requires invoking existing CLI tools or scripts to leverage their functionality without re-implementing it in Python.
The subprocess module provides a secure and flexible interface to spawn child processes, control their input/output streams, and inspect their exit statuses.
The modern recommended method is subprocess.run(), which combines execution, output capture, and error handling in a single call.

import subprocess
import sys

result = subprocess.run(
    [sys.executable, "-c", "print('Hello from subprocess.')"],
    capture_output=True,
    text=True
)

print(f"Return code: {result.returncode}")
print(f"Stdout: {result.stdout.strip()}")

Why `subprocess`? The Old Ways

Older approaches like os.system() invoke a shell directly, making them vulnerable to injection and offering limited control over I/O streams.
The subprocess module was introduced to provide finer control, better security, and a consistent API across platforms.
Functions such as subprocess.call(), check_output(), and Popen exist, but subprocess.run() (Python 3.5+) simplifies most common use cases into one interface.

+++

The subprocess.run() Function

args should be a list of strings where the first element is the command and the rest are its parameters.
capture_output=True captures both stdout and stderr into the returned CompletedProcess.
text=True decodes bytes into strings using the system’s default encoding.
check=True raises a CalledProcessError for non-zero exit codes, allowing you to handle failures via exceptions.
shell=False (the default) avoids invoking a shell, preventing injection vulnerabilities; use shell=True only if you fully control the command string.
The returned CompletedProcess has attributes args, returncode, stdout, and stderr for introspection.

import subprocess
import sys

cmd = [
    sys.executable,
    "-c",
    """print("Hello from subprocess")
invalid_function()"""
]

result = subprocess.run(cmd, capture_output=True, text=True)
print(f"Args: {result.args}")
print(f"Stdout: {result.stdout.strip()}")
print(f"Stderr: {result.stderr.strip()}")
print(f"Return code: {result.returncode}")

Basic Command Execution

Construct your command as a list, choosing the tool and its arguments explicitly.
Use capture_output=True and text=True to get human-readable strings.
Inspect result.returncode to determine if the command succeeded (zero) or failed (non-zero).

import subprocess
import platform

if platform.system() == "Windows":
    cmd = ["ver"]
else:
    cmd = ["uname", "-a"]

result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout.strip())

Common Pitfalls & How to Avoid Them

Forgetting capture_output=True means result.stdout and result.stderr will be None, so you cannot inspect them.
Omitting text=True leaves you with raw bytes that require manual decoding.
Using check=False without checking result.returncode can let failures go unnoticed.
Invoking a shell with shell=True and untrusted input enables injection attacks—always prefer shell=False.
pythonpython

Temporary Files and Directories/shaare/rHyxJw

python

Temporary Files and Directories

Automation scripts often need scratch space for intermediate data without cluttering the filesystem or risking name collisions.
Hardcoding names like /tmp/my_file.txt can lead to security issues, collisions, and manual cleanup.
The tempfile module provides secure, unique temporary files and directories with optional automatic cleanup.

Why Use the tempfile Module?

It creates files with secure default permissions, preventing unauthorized access on multiuser systems.
It generates unique names automatically, avoiding collisions when multiple script instances run concurrently.
It integrates with context managers (with), enabling automatic cleanup of resources when they're no longer needed.
It works across Windows, macOS, and Linux, choosing an appropriate temp location on each platform.

import tempfile
import os

temp_dir = tempfile.gettempdir()
print(f"Default temporary directory: {temp_dir}")
print(f"Sample contents: {os.listdir(temp_dir)[:5]}")

tempfile.TemporaryFile()

Creates an unnamed temporary file opened in binary or text mode.
On UNIX-like systems it typically has no name in the filesystem; on Windows it may appear but remains temporary.
The file is deleted automatically when closed or when the context block exits.
Ideal for internal scratch space that doesn’t need to be passed to external processes.

import tempfile

with tempfile.TemporaryFile(mode="w+t", encoding="utf-8") as temp_file:
    temp_file.write("This is some temporary data.")
    temp_file.seek(0)
    print("Content from TemporaryFile:")
    print(temp_file.read())

tempfile.NamedTemporaryFile()

Creates a temporary file with a visible name in the filesystem.
Default delete=True removes the file when closed; delete=False leaves it for manual cleanup.
Use when you need to pass a filename to another process or library.
Supports custom suffix, prefix, and dir parameters for naming and placement.

import tempfile
from pathlib import Path

# Auto-delete on with exit
path = None

with tempfile.NamedTemporaryFile(mode="w+t", encoding="utf-8", suffix=".log") as temp_file:
    path = Path(temp_file.name)
    print(f"Created temp file at {path}. Exists: {path.exists()}")

print(f"After close. Exists? {path.exists()}")

# Persist after with exit
path_persistent = None

with tempfile.NamedTemporaryFile(
    mode="w+t",
    encoding="utf-8",
    suffix=".log",
    delete=False
) as temp_file:
    path_persistent = Path(temp_file.name)
    print(f"Created temp file at {path}. Exists: {path.exists()}")

print(f"After close. Exists? {path_persistent.exists()}")

if path_persistent.exists():
    path_persistent.unlink()

print(f"After unlink. Exists? {path_persistent.exists()}")

tempfile.TemporaryDirectory()

Creates a new temporary directory, returned as a path string.
When used in a with block, the directory and everything inside it are deleted on exit.
Ideal for workflows that produce multiple temporary files or subdirectories.

import tempfile
from pathlib import Path

temp_path = None

with tempfile.TemporaryDirectory(prefix="batch_job_") as temp_dir:
    print(f"{temp_dir} - type: {type(temp_dir)}")
    temp_path = Path(temp_dir)
    (temp_path / "file1.txt").write_text("data")
    subdir = temp_path / "subdir"
    subdir.mkdir(exist_ok=True)
    (subdir / "file2.txt").write_text("data2")
    print(f"Contents: {[p.name for p in temp_path.iterdir()]}")

print(f"After close. Exists? {temp_path.exists()}")

Common Pitfalls & How to Avoid Them

Calling os.rmdir() or Path.rmdir() on a non-empty directory raises an error; use shutil.rmtree() for recursive deletion.
Forgetting to delete files created with delete=False in NamedTemporaryFile can leave orphaned files.
On Windows, other processes can’t open an open temporary file. Use delete=False and close it before sharing the name.
Relying on a temporary file’s name after closing a TemporaryFile is impossible, since it may never have had one.
python

Filesystem Operations/shaare/smD7WQ

python

Filesystem Operations (os & shutil)

DevOps scripts often need to create, delete, copy, and move files and directories as part of automation workflows.
The os module provides low-level filesystem functions, while shutil offers higher-level operations like copying and recursive removal.
These tools work hand-in-hand with pathlib (for path manipulation) to build robust file management scripts.

Listing Directory Contents

Use os.listdir(path) to get a list of entry names (files and subdirectories) in a directory.
Use Path(path).iterdir() to iterate over Path objects, which you can query further with methods like .is_file() or .is_dir().
os.listdir returns a plain list of strings; iterdir() yields full Path objects, making downstream operations more convenient.

import os
from pathlib import Path
import shutil

"""
Directory structure:

temp_listing_dir/
├── file1.txt
├── file2.log
└── subdir/
    └── subfile.py
"""

tmp_path = Path("temp_listing_dir")
tmp_path.mkdir(exist_ok=True)
(tmp_path / "file1.txt").touch()
(tmp_path / "file2.log").touch()
(tmp_path / "subdir").mkdir(exist_ok=True)
(tmp_path / "subdir" / "subfile.py").touch()

print(f"--- os.listdir(\"{tmp_path}\") ---")
for name in os.listdir(tmp_path):
    print(name)

print(f"--- Path(\"{tmp_path}\").iterdir() ---")
for entry in tmp_path.iterdir():
    print(entry)

shutil.rmtree(tmp_path)

Creating Directories

os.mkdir(path) creates a single directory and fails if parents don’t exist or if it already exists.
os.makedirs(path, exist_ok=False) creates all intermediate directories; set exist_ok=True to ignore existing leaf.
Path(path).mkdir(parents=True, exist_ok=True) is the pathlib equivalent for recursive, idempotent creation.

from pathlib import Path
import shutil

single = Path("my_single_dir")

try:
    single.mkdir(exist_ok=True)
    print(f"Created {single}: {single.exists()}")
finally:
    if single.exists():
        single.rmdir()

nested = Path("parent/child/grandchild")
nested.mkdir(parents=True, exist_ok=True)
print(f"Created nested path {nested}: {nested.exists()}")

shutil.rmtree("parent")

Removing Files and Directories

os.remove(path) or Path(path).unlink() deletes a single file and raises if missing (unless missing_ok=True).
os.rmdir(path) or Path(path).rmdir() removes an empty directory only.
shutil.rmtree(path) recursively deletes a directory tree and all contents; use with extreme caution.

from pathlib import Path
import shutil

"""
Directory structure:

.
├── temp_file.txt
├── empty_dir/
└── tree_root/
    └── child/
        └── inner.txt
"""

temp_file = Path("temp_file.txt")
temp_file.touch()

empty_dir = Path("empty_dir")
empty_dir.mkdir(exist_ok=True)

tree = Path("tree_root/child")
tree.mkdir(parents=True, exist_ok=True)
(tree / "inner.txt").touch()

temp_file.unlink()
print(f"Removed file {temp_file}. Exists? {temp_file.exists()}")
empty_dir.rmdir()
print(f"Removed dir {empty_dir}. Exists? {empty_dir.exists()}")

shutil.rmtree("tree_root")
print(f"Removed \"tree_root\" recursively. Exists? {tree.exists()}")

Copying Files and Directories

shutil.copy(src, dst) copies a file but does not preserve metadata like timestamps or permissions.
shutil.copy2(src, dst) copies files and attempts to preserve metadata.
shutil.copytree(src_dir, dst_dir, dirs_exist_ok=False) recursively copies an entire directory tree.

import shutil
from pathlib import Path

"""
Directory structure:

src_copy/
├── a.txt
└── sub/
    └── b.txt
"""

src = Path("src_copy")
src.mkdir(exist_ok=True)
(src / "a.txt").write_text("A")
(src / "sub").mkdir(exist_ok=True)
(src / "sub" / "b.txt").write_text("B")

dest_file = Path("copied_a.txt")
dest_file_metadata = Path("copied_a_metadata.txt")
shutil.copy(src / "a.txt", dest_file)
shutil.copy2(src / "a.txt", dest_file_metadata)

dest_dir = Path("copied_src")

if dest_dir.exists():
    shutil.rmtree(dest_dir)

shutil.copytree(src, dest_dir)

shutil.rmtree("src_copy")
shutil.rmtree(dest_dir)
dest_file.unlink()
dest_file_metadata.unlink()

Moving Files and Directories

Use shutil.move(src, dst) to move or rename files and directories in one step.
If dst is an existing directory, src is moved into it; if dst names a file, src is renamed there.
Moving across filesystems may involve a copy-and-delete under the hood.

import shutil
from pathlib import Path

"""
Directory structure:

.
├── move_me.txt
├── move_dir/
│   └── inside.txt
└── dest_folder/
"""

file_src = Path("move_me.txt")
file_src.write_text("Moving file.")

dir_src = Path("move_dir")
dir_src.mkdir(exist_ok=True)
(dir_src / "inside.txt").write_text("Inside source dir.")

dest_dir = Path("dest_folder")
dest_dir.mkdir(exist_ok=True)

try:
    shutil.move(file_src, dest_dir)
except Exception as e:
    print(f"Error occurred: {e}")

file_src2 = dest_dir / file_src.name
new_name = Path("renamed.txt")
shutil.move(file_src2, new_name)

try:
    shutil.move(dir_src, dest_dir)
except Exception as e:
    print(f"Error occurred: {e}")

shutil.rmtree(dest_dir)
if new_name.exists():
    new_name.unlink()

Common Pitfalls & How to Avoid Them

PermissionError: Operations fail if the script lacks rights. Ensure correct ownership or run with appropriate privileges.
Non-empty Directories: os.rmdir() and Path.rmdir() only remove empty dirs. Use shutil.rmtree() for recursive deletion, but do so carefully.
Existing Destinations: shutil.copytree() errors if the target exists unless dirs_exist_ok=True. Consider pre-cleanup or that flag.
Irreversible Deletions: There is no undo for os.remove, os.rmdir, or shutil.rmtree(). Add confirmation or dry-run modes when deleting!

+++

Working with Environment Variables/shaare/DM1boQ

python

Working with Environment Variables

Environment variables are dynamic, named values provided by the operating system to running processes, enabling configuration of behavior without code modifications.
They allow applications to adapt across development, staging, and production environments by externalizing configuration data such as API keys, file paths, and feature flags.
Python’s os module offers simple interfaces to access and manage these variables, promoting separation of code and configuration.

import os

for key in ["HOME", "SHELL"]:
    value = os.getenv(key)
    print(f"{key} = {value if value else "Not set"}")

env_keys = list(os.environ.keys())
print(f"We have {len(env_keys)} environment variables available!")

for key in env_keys[:5]:
    print(key)

Accessing Environment Variables with `os.getenv()`

The os.getenv function retrieves the value of an environment variable by key, returning None or a provided default if the key is not found.
It prevents KeyError exceptions by offering a safe access pattern for optional configuration settings.
Since environment variables are always strings, any expected non-string types require explicit conversion after retrieval.

import os 

os.environ["APP_API_KEY"] = "ab12cd34"

api_key = os.getenv("APP_API_KEY")
debug_mode = os.getenv("DEBUG_MODE", False)

if api_key:
    print(f"API key found: {api_key[:4]}... (masked)")
else:
    print("APP_API_KEY not set.")

print(f"Debug mode: {debug_mode}")

Accessing Environment Variables with `os.environ`

os.environ behaves like a dictionary mapping environment variable names to their string values.
Accessing a missing key via os.environ['KEY'] raises a KeyError, making it suitable for mandatory variables.
One should guard against missing keys by checking membership or catching KeyError to handle critical configuration errors.

import os

try:
    java_home = os.environ["JAVA_HOME"]
    print(java_home)
except KeyError:
    print("JAVA_HOME environment variable not set.")

Setting Environment Variables Within Python

While environment variables are typically set externally, os.environ can be modified at runtime to affect the current process and its children.
Assigning to os.environ['KEY'] makes the variable available to any subprocesses spawned by the script.
Deleting an entry from os.environ removes it for subsequent operations within the process, but changes do not persist after the script exits.

import os
import sys
import subprocess

print(f"Initial MY_CUSTOM_VAR: {os.getenv("MY_CUSTOM_VAR")}")

os.environ["MY_CUSTOM_VAR"] = "SetByOurScript"
print(f"Updated MY_CUSTOM_VAR: {os.getenv("MY_CUSTOM_VAR")}")

result = subprocess.run([
    sys.executable,
    "-c",
    """import os
print(f"Child sees MY_CUSTOM_VAR: {os.getenv("MY_CUSTOM_VAR")}")"""  
])

result.stdout

del os.environ["MY_CUSTOM_VAR"]
print(f"After deletion, MY_CUSTOM_VAR: {os.getenv("MY_CUSTOM_VAR")}")

Using dotenv to Manage Local Environment Files

The python-dotenv library lets you keep sensitive and environment-specific values in a .env file instead of the shell.
A .env file lives alongside your script and contains lines like KEY=value; it's loaded at runtime into os.environ.
Install with pip install python-dotenv==1.1.0 (version included here so that we are all using the same, in other installations it may be omitted), then call load_dotenv() before any os.getenv calls.
This approach keeps your shell clean and makes it easy to commit example .env.example files without secrets.
Remember not to commit actual .env files with real secrets! Add them to .gitignore.

import os
from dotenv import load_dotenv

os.environ["MY_DOTENV_VAR"] = "setFromJupyter"

load_dotenv(override=True)

secret_dotenv_value = os.getenv("MY_DOTENV_VAR")

print(f"Retrieved MY_DOTENV_VAR with value {secret_dotenv_value}")

Common Pitfalls & How to Avoid Them

Environment variable names are always case-sensitive in Python, regardless of the underlying OS; inconsistent casing leads to unexpected missing values.
Forgetting that all environment variable values are strings can cause type errors; always convert to the intended type like int or bool after retrieval.
Accessing a missing mandatory variable via os.environ raises KeyError; avoid unhandled errors by checking membership or catching exceptions.
Storing highly sensitive secrets in plain environment variables carries security risks; for production use, consider managed secrets solutions like Vault or AWS Secrets Manager.

import os
from dotenv import load_dotenv

load_dotenv(override=True)

number_dotenv_value = os.getenv("MY_NUMBER_VAR")

print(type(number_dotenv_value))
# print(number_dotenv_value + 45) # Uncommenting will raise TypeError because number_dotenv_value is a string!

Working with CSV files/shaare/lDIjmw

python

Working with CSV files

CSV (Comma Separated Values) is a plain-text tabular format where each line is a row and fields are delimited (commonly by commas).
Widely used for spreadsheets, database exports, DevOps reports or inventories.
Python’s built-in csv module handles reading, writing, quoting, delimiters, headers, and dialects.
Always open files with newline='' and encoding='utf-8' for cross-platform consistency.

CSV Format Basics

Each row represents a record; fields separated by a delimiter (comma by default).
Optional header row defines column names.
Fields containing delimiters, quotes, or newlines must be quoted (usually with double quotes).
Alternative delimiters (tabs, semicolons) and quoting conventions are supported via dialects and parameters.

Reading CSV files with `csv.reader`

Iterates over rows, returning each as a list of strings.
Use next(reader) to skip or extract the header.
Accepts delimiter, quotechar, and other formatting parameters.

import csv
from pathlib import Path

csv_path = Path("servers.csv")

with csv_path.open("r", encoding="utf-8", newline="") as file:
    reader = csv.reader(file)
    header = next(reader)
    print(f"Header: {header}")

    for idx, row in enumerate(reader, start=1):
        print(f"Row {idx}: {row}")

Reading with `csv.DictReader`

Reads rows into dictionaries using the header row as keys.
Access fields by column name instead of index.
Optional fieldnames argument overrides header names.

import csv
from pathlib import Path

csv_path = Path("servers.csv")

with csv_path.open("r", encoding="utf-8", newline="") as file:
    dict_reader = csv.DictReader(file)
    print(f"Fieldnames: {dict_reader.fieldnames}")

    for idx, record in enumerate(dict_reader, start=1):
        print(f"Record {idx}: {record}")

Example of servers.csv

hostname,ip_address,role,status,tags
web01,10.0.1.5,webserver,running,"frontend,prod"
db01,10.0.2.10,database,maintenance,"backend,staging"

Writing with `csv.writer`

Write rows from lists using .writerow() or .writerows().
Open file with newline='' to avoid blank lines.
Control delimiter and quoting via parameters.

import csv
from pathlib import Path

data = [
    ["hostname", "ip_address", "role"],
    ["web02", "10.0.1.6", "webserver"],
    ["app01", "10.0.3.15", "application"],
]

out_path = Path("output_basic.csv")

with out_path.open("w", encoding="utf-8", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(data)

Writing with `csv.DictWriter`

Write dictionaries using fieldnames to define header and column order.
Call .writeheader() before .writerows().

import csv
from pathlib import Path

records = [
    {
        "host": "web01",
        "port": "80",
        "status": "running"
    },
    {
        "host": "db02",
        "status": "maintenance",
        "tags": "prod,finance"
    }
]

out_dict_path = Path("output_dict.csv")
fieldnames = set()

for record in records:
    fieldnames = fieldnames | record.keys()

with out_dict_path.open("w", encoding="utf-8", newline="") as file:
    writer = csv.DictWriter(
        file,
        fieldnames=fieldnames,
        restval="undefined",
        extrasaction="ignore"
    )
    writer.writeheader()
    writer.writerows(records)

Working with YAML files/shaare/d2D1ag

python

Working with YAML files

YAML (“YAML Ain’t Markup Language”) focuses on human readability. Indentation replaces braces and brackets, comments are allowed, and quoting is usually optional.
DevOps tooling (Kubernetes, Ansible, GitHub Actions, many app configs) standardizes on YAML for its clarity and brevity.
JSON is excellent for machine-to-machine communication, but its strict syntax (no comments, heavy quoting) can feel verbose to humans maintaining config files.
Python’s standard library lacks YAML support; PyYAML is the community-standard package to fill that gap.

YAML Syntax and Features

Structure comes from spaces for indentation: tabs are discouraged.
Mappings use key: value; sequences use a leading hyphen (-) plus a space.
Scalars include strings, numbers, booleans (true / false, yes / no), and null.
Comments begin with #.
Multi-line scalars can be literal (|) or folded (>).
*Anchors (&) and aliases ()** avoid repetition by re-using defined blocks.
YAML is a superset of JSON: most valid JSON documents are also valid YAML.

import yaml, json

snippet = """
service: &svc
  name: user-api
  port: 8080
  enabled: true
  tags:
    - api
    - user
    - internal
staging:
  <<: *svc
  replicas: 2
production:
  <<: *svc
  replicas: 4
"""

parsed = yaml.safe_load(snippet)
print(parsed)

multiline_demo = """
literal: |
  line 1
  line 2
  line 3
folded: >
  This is a long string that
  could go out of screen, so
  we will break this up into
  multiple lines to improve
  readability.
"""
print("\n")
print(yaml.safe_load(multiline_demo))

Deserializing YAML with `yaml.safe_load`

Prefer yaml.safe_load (or passing Loader=yaml.SafeLoader) to prevent arbitrary-code execution; avoid yaml.load on untrusted data.
Accepts a string or an open text file handle and returns native Python structures.
Wrap calls in try / except yaml.YAMLError to catch malformed input.

import yaml
from pathlib import Path

compose = Path("compose.yaml")

try:
    with compose.open("r", encoding="utf-8") as file:
        config = yaml.safe_load(file)
        print(f"Compose version: {config["version"]}")

        for svc, options in config["services"].items():
            print(f"{svc.capitalize()} image\t: {options["image"]}")
except yaml.YAMLError as e:
    print("YAML error:")
    print(e)

Example of compose.yaml

version: '3.8'
services:
  web:
    image: myapp:latest
    ports:
      - "8000:80"
  redis:
    image: redis:alpine

Serializing Python Objects with `yaml.dump`

Use yaml.dump(obj, indent=2, default_flow_style=False, sort_keys=False) for readable block-style output.
Set stream to an open file handle to write directly; leave it None to return a string.

import yaml
from pathlib import Path

python_cfg = {
    "service": {"name": "listener-service", "port": 6789, "workers": 4, "enabled": False},
    "queues": ["high", "default", "low"],
    "retry_policy": None,
}

output_path = Path("listener_config.yaml")

with output_path.open("w", encoding="utf-8") as file:
    yaml.dump(python_cfg, file, sort_keys=False, default_flow_style=False)

Example of listener_config.yaml

service:
  name: listener-service
  port: 6789
  workers: 4
  enabled: false
queues:
- high
- default
- low
retry_policy: null

Working with JSON files/shaare/TU-K-w

python

Working with JSON files

JSON is the standard format for data exchange in web services and cloud APIs.
Python’s built-in json module provides functions to convert between JSON text and Python objects.
Key operations: parsing JSON from strings/files and serializing Python objects to JSON strings/files.

JSON Syntax and Python Mapping

JSON objects ({}) map to Python dict.
JSON arrays ([]) map to Python list.
JSON strings map to Python str, numbers to int or float.
true/false → True/False; null → None.
Keys in JSON objects must be double-quoted strings; no trailing commas.

Deserializing JSON

Use json.loads() to parse JSON strings into Python objects.
Raises json.JSONDecodeError on invalid JSON.
Common in DevOps for handling API response bodies.

import json

api_response_str = '{"status": "active", "instance_id": "i-12345", "cores": 4, "tags": ["web", "prod"]}'

try:
    data = json.loads(api_response_str)
    print(f"Parsed data type: {type(data)}")
    print(f"Instance ID: {data.get("instance_id", None)}")
    print(f"Tags: {data.get("tags", None)}")
except json.JSONDecodeError as e:
    print(f"Failed to parse JSON: {e}")

Parsing JSON Files

Use json.load() to read JSON from an open file object.
Always open files with encoding='utf-8' when dealing with JSON.
Wrap file operations in with to ensure proper closure.

import json
from pathlib import Path

config_path = Path("service_config.json")

with config_path.open("r", encoding="utf-8") as file:
    config_data = json.load(file)

for config in config_data:
    service_name = config.get("service", None)

    if service_name:
        print(f"Service: {service_name}")
        print(f"Enabled: {config.get("enabled", False)}")
        print('-' * 20)

Example of service_config.json

[
  {
    "service": "database",
    "port": 5432,
    "connection_pool": 10,
    "enabled": true
  },
  {
    "service": "cache",
    "port": 6379,
    "connection_pool": 5
  },
  {
    "service": "api",
    "port": 8080,
    "connection_pool": 3,
    "enabled": true
  },
  {
    "port": 5000,
    "connection_pool": 3,
    "enabled": true
  }
]

Serializing Python objects to JSON Strings

Use json.dumps() to convert Python objects to JSON strings.
indent makes output human-readable; sort_keys=True orders keys alphabetically.

import json

python_data = {
    "deployment": "frontend-v2",
    "replicas": 3,
    "ports": [80, 443],
    "health_check": True,
    "logs_enabled": None
}

print(f"Simple JSON:\n{json.dumps(python_data)}")
print("\n")
print(f"Pretty JSON:\n{json.dumps(python_data, indent=2, sort_keys=True)}")

Serializing Python objects to JSON Files

Use json.dump() to write Python objects directly to files.
Pass the file handle and optional indent for formatting.

import json
from pathlib import Path

output = {
    "status": "complete",
    "items_processed": 1492,
    "errors": []
}
output_path = Path("run_summary.json")

with output_path.open("w", encoding="utf-8") as file:
    json.dump(output, file, indent=2)

Regex/shaare/YC8opA

python

Regex Essentials: Overview

Regular expressions (regex) are a language for defining text search patterns.
Python’s re module provides functions like search (find anywhere) and match (anchored at start).
Patterns include literals, metacharacters (. ^ $ * + ? [] \), character classes (\d, \w, \s), and quantifiers (*, +, ?, {n,m}).
Greedy quantifiers (*, +) match as much as possible; non-greedy (*?, +?) as little as possible.

Introduction to `re.search()` vs `re.match()`

re.search(pattern, text) scans the entire string for the first occurrence.
re.match(pattern, text) checks only at the beginning of the string.
re.findall() and re.finditer() let you retrieve every occurrence of a pattern.
Always use raw strings (r"...") to define regex patterns, avoiding Python string escapes interfering with regex.

import re

line = "WARN: Disk usage at 91%"
pattern = r"WARN"

print(f"search '{pattern}':", bool(re.search(pattern, line)))
print(f"match '{pattern}':", bool(re.match(pattern, line)))

Common Metacharacters

. matches any character (except newline).
^ anchors at start of string.
$ anchors at end of string.
[] defines a set or range of characters, e.g. [A-Z].
\ escapes metacharacters or introduces special sequences.

import re

test = "Error code: E1234. cxge"

print(f"Dot matches any character: {re.findall(r"c..e", test)}")
print(f"Start anchor (finds): {re.findall(r"^Error", test)}")
print(f"Start anchor (does not find): {re.findall(r"^E1234", test)}")
print(f"End anchor: {re.findall(r"cxge$", test)}")
print(f"Character set: {re.findall(r"[E0-9]+", test)}")

Special Sequences

\d digit (0–9), \D non-digit.
\w word character (letters, digits, underscore), \W non-word.
\s whitespace, \S non-whitespace.
\b word boundary (zero-width match).

import re

text = "The cat scattered 1024 catalogues."

print(f"Digits: {re.findall(r"\d+", text)}")
print(f"Word characters: {re.findall(r"\w+", text)}")
print(f"Whitespace: {re.findall(r"\s+", text)}")
print(f"Word boundary: {re.findall(r"\bcat\b", text)}")

Quantifier Cheat-Sheet

Quantifier	Meaning	Greedy?	Non-greedy form	Meaning
`?`	0 or 1 of the preceding token	Yes	`??`	as few as possible (0 or 1)
`*`	0 or more of the preceding token	Yes	`*?`	as few as possible (including zero)
`+`	1 or more of the preceding token	Yes	`+?`	as few as possible (at least one)
`{n}`	exactly n of the preceding token	-	-	-
`{n,}`	n or more of the preceding token	Yes	`{n,}?`	n or more, but as few as possible
`{n,m}`	between n and m of the preceding token	Yes	`{n,m}?`	between n and m, but as few as possible

import re

text = "aaaa"

print(re.findall(r"a?", text))
print(re.findall(r"a*", text))
print(re.findall(r"a+", text))
print(re.findall(r"a{2}", text))
print(re.findall(r"a{1,3}", text))

print(f"Non-greedy a*: {re.findall(r"a*?", text)}")
print(f"Non-greedy a+: {re.findall(r"a+?", text)}")
print(f"Non-greedy a{{1,3}}?: {re.findall(r"a{1,3}?", text)}")

Quantifiers & Greedy vs Non-Greedy

* / + / {n,} are greedy: match as much as possible.
Append ? (*? / +? / {n,}?) to make them non-greedy: match as little as possible.
Greedy quantifiers match the longest possible string that satisfies the pattern. Adding a ? after them makes them non-greedy (or lazy), matching the shortest possible string.

import re

html = "<p>One</p><p>Two</p><></>"

print(f"Greedy: {re.findall(r"<.*>", html)}")
print(f"Non-greedy: {re.findall(r"<.*?>", html)}")

Capturing Groups and Back-References

Regex lets you check for patterns, but often you need to extract pieces of the match (e.g., IP vs port).
Capturing groups, defined with (), let you isolate and retrieve substrings from a match.
Named groups improve readability by giving meaningful labels instead of relying on group numbers.
Non-capturing groups (?:…) let you apply grouping logic without cluttering captures.
Back-references allow you to match the same text twice (or more) within one pattern.

Capturing Groups

Parentheses () both group and capture the matched text inside them.
Groups are numbered by their opening (, starting at 1; group 0 is the entire match.
Use match.group(n) for a single group or match.groups() to get all captures as a tuple.
Capturing is essential when you need to feed specific substrings into further processing.

import re

log_entry = "Ts=2023-10-27T12:00:00Z Level=ERROR User=admin Action=login_fail IP=10.0.0.5"

# Our goal:
# 1. Group 1: The log level
# 2. Group 2: The user name
# 2. Group 3: The IP address

pattern = r"Level=(\w+)\s+User=(\w+).*?\s+IP=([\d\.]+)"

match = re.search(pattern, log_entry)

if match:
    print(f"Full match: {match.group(0)}")
    print(f"Level: {match.group(1)}")
    print(f"User: {match.group(2)}")
    print(f"IP: {match.group(3)}")
    print(f"All groups: {match.groups()}")

Named Capturing Groups

Syntax: (?P<name>pattern) assigns a label to a capturing group.
Access by name: match.group('name') makes code self-documenting.
match.groupdict() returns a dict of all named captures.
You can still use numeric indices if needed, but names help avoid off-by-one errors.

import re

log_entry = "Ts=2023-10-27T12:00:00Z Level=ERROR User=admin Action=login_fail IP=10.0.0.5"

# Add labels to:
# 1. Group 1: The log level
# 2. Group 2: The user name
# 2. Group 3: The IP address

pattern = r"Level=(?P<level>\w+)\s+User=(?P<user>\w+).*?\s+IP=(?P<ip>[\d\.]+)"

match = re.search(pattern, log_entry)

if match:
    print(f"Full match: {match.group(0)}")
    print(f"Level: {match.group("level")}")
    print(f"User: {match.group("user")}")
    print(f"IP: {match.group("ip")}")
    print(f"All groups: {match.groups()}")
    print(f"Group dictionary: {match.groupdict()}")

Non-Capturing Groups

Use (?:pattern) when you need grouping for quantifiers or alternation without capturing.
Keeps your capture numbers focused on what you actually want.
Prevents unwanted None entries in match.groups() when using optional parts.

import re

log_line1 = "report.txt Status: OK"
log_line2 = "report Status: OK"

# Our goal:
# 1. Group 1: The stem of the filename, with .txt being an optional string
# 2. Group 2: The status code

pattern = r"^(.+?)(?:\.txt)?\s+Status:\s+(.+)$"

match_line1 = re.search(pattern, log_line1)
match_line2 = re.search(pattern, log_line2)

if match_line1: print(match_line1.groups())
if match_line2: print(match_line2.groups())

Back-references

Refer back to a previous capture using \1, \2, … or (?P=name) for named groups.
Useful for matching repeated words or balanced constructs (e.g., open/close tags).
Can make patterns more complex but powerful for advanced text validation.

import re

text = "This this is a test test."
pattern_numbers = r"(?i)\b(\w+)\s+\1\b"
pattern_labels = r"(?i)\b(?P<word>\w+)\s+(?P=word)\b"

print(f"Doubled words: {re.findall(pattern_numbers, text)}")
print(f"Doubled words: {re.findall(pattern_labels, text)}")

html = "<p>Paragraph</p> <b>Bold</b>"
pattern_tags = r"<(\w+)>(.*?)</\1>"

print(f"Tags: {re.findall(pattern_tags, html)}")

Search, Split, and Substitute

re.findall() and re.finditer() let you retrieve every occurrence of a pattern.
re.split() handles complex delimiters beyond simple string splits.
re.sub() performs powerful search-and-replace operations, including reuse of captured groups.

Finding All Matches

re.findall(pattern, string) returns a list of all non-overlapping matches:
- No groups → list of matched substrings.
- With groups → list of tuples of captured substrings.
re.finditer(pattern, string) returns an iterator of match objects, giving access to .group(), positions, named groups, etc., and is more memory-efficient for large inputs.

import re

text = "Errors found: 404, 500, 403, 500. User IDs: user123, admin99."
config = "timeout=60 retries=3 workers=5"

# Find all error codes:
print(f"Numbers found: {re.findall(r"\d+", text)}")

# findall with groups:
print(f"Key-value pairs: {re.findall(r"(\w+)=(\w+)", config)}")

# finditer
for match in re.finditer(r"(\w+)=(\w+)", config):
    print(f"Whole match: {match.group(0)}; key: {match.group(1)}; value: {match.group(2)} - at {match.start()}-{match.end()}")

Splitting Strings

Use re.split(pattern, string) to break a string on a regex pattern, not just a fixed substring.
Always use a raw string literal so backslashes reach the regex engine.
Simple single-character delimiters: use a character class (never captured), e.g. r"\s*[,;]\s*".
Complex delimiters (alternation or multi-character): group with non-capturing parentheses, e.g. r"\s*(?:foo|bar|baz)\s*", so they aren’t included in the result list.
Including delimiters: wrap your delimiter in a capturing group, e.g. r"\s*([,;])\s*", to have the separators appear in the split output.
Summary:
- No parentheses or a non-capturing group → delimiters are removed.
- Capturing group → delimiters appear in the split list.

import re

data = "item1 , item2; item3 ,item4 ;item5"

# 1. Split on comma and semi-colon
pattern1 = r"\s*[,;]\s*"
print(f"Character class split: {re.split(pattern1, data)}")

# 2. Capturing the delimiter
pattern2 = r"\s*([,;])\s*"
print(f"Capturing group split: {re.split(pattern2, data)}")

html = """
<p class='hello'>First paragraph.</p>
<b class='world'>Second paragraph.</b>
End.
"""

pattern3 = r"<.*?class='(?:hello|world)'.*?>|</[pb]>"
print(f"HTML non-capturing split: {re.split(pattern3, html)}")

Substituting Text

re.sub(pattern, replacement, string, count=0) replaces all (or a limited number) of matches.
count controls how many replacements to make (default 0 = all).
Back-references (\1, \g<name>) let you reorder or reuse captured text in the replacement.

import re

text = "User IDs: user123, user456, user123457689. Contact admin789 for help."

# Basic substitution
redacted = re.sub(r"user\d+", "[REDACTED_USER]", text)
print(f"Result of redacting: {redacted}")

# Back-reference for reusing information
redacted_partially = re.sub(r"(u)ser\d+(\d{2})", r"\1[REDACTED_USER]\2", text)
print(f"Result of redacting: {redacted_partially}")

# Limited count of substitutions
redacted_only_two = re.sub(r"(u)ser\d+(\d{2})", r"\1[REDACTED_USER]\2", text, count=2)
print(f"Result of redacting: {redacted_only_two}")

# Named groups for substitution
date_text = "Start: 2023-10-27, End: 2024-01-15"
# Current format YYYY-MM-DD
# Target format DD/MM/YYYY

date_pattern_named = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
replacement_format_named = r"\g<day>/\g<month>/\g<year>"
reformatted_date = re.sub(date_pattern_named, replacement_format_named, date_text)

print(f"Result of date transformation: {reformatted_date}")

Read/Write Text Files/shaare/VogSQA

python

Read/Write Text Files

Use open() to read/write text files with proper modes and encoding.
Specify encoding='utf-8' for portability.
Leverage with to ensure files close automatically.
Read via iteration, .read(), .readline(), .readlines().
Write via .write() or .writelines(), managing newlines manually.

try:
    with open("config.txt", "w", encoding="utf-8") as file:
        file.write("Setting=Value\n")
        file.write("Other=Another\n")

    with open("config.txt", "r", encoding="utf-8") as file:
        content = file.read()
        print(f"Contents of file:")
        print(content)
except OSError as e:
    print(f"File error: {e}")

File Modes

'r': read text (default), error if file missing.
'w': write text, create or truncate.
'a': append text, create if missing.
'x': exclusive create, error if exists (good to prevent overwrites).
'b': binary mode variant (e.g. 'rb', 'wb').
'+': update mode, allows read/write (e.g. 'r+', 'w+').

Understanding `+`

Mode	Reads?	Writes?	Creates if missing?	Truncates on open?
r	✅	❌	❌	❌
r+	✅	✅	❌	❌
w	❌	✅	✅	✅
w+	✅	✅	✅	✅
a	❌	✅	✅	❌
a+	✅	✅	✅	❌

from pathlib import Path

path = Path("mode_demo.txt")

with path.open(mode="w", encoding="utf-8") as file:
    file.write("Initial line\n")

with path.open(mode="a", encoding="utf-8") as file:
    file.write("Appended line\n")

try:
    with path.open(mode="x", encoding="utf-8") as file:
        file.write("This will fail if file exists\n")
except FileExistsError as e:
    print(e)

Reading Text Files

Iteration: for line in f:
- When to use: Ideal for processing large files line by line without loading the entire file into memory; lazy and very memory-efficient.
f.read(size = -1)
- size can be used to specify the maximum number of characters to read; if negative or omitted, reads the entire file.
- When to use: When you need to grab a chunk of text (e.g. next 1024 chars). Good for bulk reads; but beware of high memory usage if you read the whole file at once.
f.readline(size = -1)
- size can be used to specify the maximum number of characters to read from the line; if negative or omitted, reads the full line up to and including the newline.
- When to use: When you want one line at a time but need to guard against overly long lines. Returns an empty string when you reach EOF.
f.readlines(hint = -1)
- hint can be used to define the approximate total number of bytes to read; if negative or omitted, reads all lines.
- When to use: When the file is small or moderate in size and you want a list of all lines for easy indexing or list comprehensions. Not recommended for very large files (may exhaust memory).

from pathlib import Path

sample = Path("read_demo.txt")
sample.write_text("First\nSecond\nThird\n", encoding="utf-8")

print("Iteration for reading:")
with sample.open(mode="r", encoding="utf-8") as file:
    for line in file:
        print(f" -> {line.strip()}")

print("read() for reading:")
with sample.open(mode="r", encoding="utf-8") as file:
    print(file.read())

print("readline() for reading:")
with sample.open(mode="r", encoding="utf-8") as file:
    print(file.readline())

print("readlines() for reading:")
with sample.open(mode="r", encoding="utf-8") as file:
    print(file.readlines())

Writing Text Files

f.write(s)
- s is the string to write; does not add a newline automatically.
- When to use: When writing single strings or building content piece by piece. Returns the number of characters written, so you can verify success.
f.writelines(lines: Iterable[str]) -> None
- lines can be any iterable of strings; does not add newlines for you.
- When to use: When you need to write a batch of strings at once (for example, a list of CSV rows). It's more efficient than multiple calls to .write(), but you must include \n at the end of each string if you want line breaks.

from pathlib import Path

write_demo = Path("write_demo.txt")

with write_demo.open(mode="w", encoding="utf-8") as file:
    file.write("Line A\n")
    file.write("Line B\n")

lines_to_write = [
    "user,ip,role",
    "alice,10.0.0.0,admin",
    "bob,10.0.0.1,dev",
    "charlie,10.0.02,audit"
]
with write_demo.open(mode="w", encoding="utf-8") as file:
    file.writelines(f"{line}\n" for line in lines_to_write)

Mode	Reads?	Writes?	Creates if missing?	Truncates on open?
r	✅	❌	❌	❌
r+	✅	✅	❌	❌
w	❌	✅	✅	✅
w+	✅	✅	✅	✅
a	❌	✅	✅	❌
a+	✅	✅	✅	❌

Mode	Reads?	Writes?	Creates if missing?	Truncates on open?
r	✅	❌	❌	❌
r+	✅	✅	❌	❌
w	❌	✅	✅	✅
w+	✅	✅	✅	✅
a	❌	✅	✅	❌
a+	✅	✅	✅	❌

Configuring Pytest

Why a Configuration File?

Configuration File Hierarchy

Creating pyproject.toml

Common Configuration Options

Assertions in Pytest

The assert Statement

Pytest and assert

Pytest’s Rich Failure Output

Asserting Floating-Point Numbers (pytest.approx)

Asserting Exceptions (pytest.raises)

Common Pitfalls & How to Avoid Them

Adding Type Hints to Decorators and Generators

Typing Decorators

Typing Generators

Iterable & Iterator

Introduction to Generics

The Need for Generics

Defining Type Variables

Constrained Type Variables

Bounded Type Variables

Generic Classes

Common Pitfalls & How to Avoid Them

Introduction

Classes as Type Hints

Hinting Methods Within a Class

Forward References (Strings)

Introduction

Why Use Type Hints?

Basic Type Hint Syntax

Common Built-in Types for Hinting

Python Remains Dynamically Typed

Common Pitfalls & How to Avoid Them

Common Types in Python

The typing Module

Typing Lists

Typing Dictionaries

Typing Tuples

Typing Sets

Union[X, Y, ...] for Multiple Possible Types

Optional[X] for Values That Can Be None

Any for Unrestricted Types

Common Pitfalls & How to Avoid Them

Implementing Retries and Timeouts

Timeouts

Retries

Exponential Backoff with Jitter

Common Pitfalls & How to Avoid Them

Handling Authentication

Why Authentication?

Basic Authentication

Token-Based Authentication

Common Pitfalls & How to Avoid Them

Handling Errors and Status Codes

Understanding HTTP Status Codes

Checking response.status_code

Using response.ok

Automatic Error Raising with raise_for_status()

Common Pitfalls & How to Avoid Them

Making HTTP Requests

Making GET Requests with requests.get()

Passing URL Parameters with params

Making POST Requests with requests.post()

Common Pitfalls & How to Avoid Them

Handling Subprocess Errors

subprocess.CalledProcessError Attributes

Handling FileNotFoundError

Handling subprocess.TimeoutExpired

Recommended Error Handling Strategy

Running External Commands with subprocess.run

Why subprocess? The Old Ways

The subprocess.run() Function

Basic Command Execution

Common Pitfalls & How to Avoid Them

Temporary Files and Directories

Why Use the tempfile Module?

tempfile.TemporaryFile()

tempfile.NamedTemporaryFile()

tempfile.TemporaryDirectory()

Common Pitfalls & How to Avoid Them

Creating `pyproject.toml`

The `assert` Statement

Pytest and `assert`

Asserting Floating-Point Numbers (`pytest.approx`)

Asserting Exceptions (`pytest.raises`)

The `typing` Module

Checking `response.status_code`

Using `response.ok`

Automatic Error Raising with `raise_for_status()`

Making GET Requests with `requests.get()`

Passing URL Parameters with `params`

Making POST Requests with `requests.post()`

Running External Commands with `subprocess.run`

Why `subprocess`? The Old Ways

Accessing Environment Variables with `os.getenv()`

Accessing Environment Variables with `os.environ`

Reading CSV files with `csv.reader`

Reading with `csv.DictReader`

Writing with `csv.writer`

Writing with `csv.DictWriter`

Deserializing YAML with `yaml.safe_load`

Serializing Python Objects with `yaml.dump`

Introduction to `re.search()` vs `re.match()`

Understanding `+`

Mode	Reads?	Writes?	Creates if missing?	Truncates on open?
r	✅	❌	❌	❌
r+	✅	✅	❌	❌
w	❌	✅	✅	✅
w+	✅	✅	✅	✅
a	❌	✅	✅	❌
a+	✅	✅	✅	❌