Implementing Retries and Timeouts/shaare/-hYM4Q

python

Implementing Retries and Timeouts

External services can be slow or unreliable, causing scripts to hang or fail unexpectedly.
Timeouts and retries help ensure your automation scripts remain responsive and resilient.

Timeouts

By default, requests may wait indefinitely for a response, which is risky in automation.
Use the timeout parameter with a single value for both connect and read, or a tuple (connect, read) for fine-grained control.
A ConnectTimeout is raised if the connection can’t be established in time; a ReadTimeout is raised if data stops arriving within the read timeout.

HTTPBIN_ENDPOINT = "https://httpbin.org"

import requests
import time

delay_url = f"{HTTPBIN_ENDPOINT}/delay/5" # Simulate a 5-second delay

start = time.perf_counter()

try:
    res = requests.get(delay_url, timeout=2)
    print(f"Completed in {time.perf_counter() - start:.2f}s, status {response.status_code}")
except (
    requests.exceptions.ConnectTimeout,
    requests.exceptions.ReadTimeout
) as timeout_err:
    print(f"Timeout after {time.perf_counter() - start:.2f}s: {timeout_err}")

Retries

Transient issues like network blips or server overloads may cause requests to fail temporarily.
Implement a simple retry loop that catches errors, retries on server-side (5xx) errors or network exceptions, and breaks on success or client errors.
Use a fixed delay between retries for simplicity, or an exponential backoff for a more robust approach.
Avoid retrying non-idempotent operations.

import requests
import time

flaky_url = f"{HTTPBIN_ENDPOINT}/status/200,500,503"

max_retries = 3
delay = 2

for attempt in range(1, max_retries + 1):
    print(f"Attempt {attempt}/{max_retries}...")

    try:
        res = requests.get(flaky_url, timeout=10)
        res.raise_for_status()
        print(f"Succeeded with status {res.status_code}")
        break
    except requests.exceptions.HTTPError as err:
        if err.response.status_code < 500:
            print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
            break
        else:
            print(f"Failed with server error code {err.response.status_code}.")
    if attempt < max_retries:
        print(f"Waiting {delay}s before retry...")
        time.sleep(delay)
else:
    print(f"All {max_retries} attempts failed!")

Exponential Backoff with Jitter

Fixed delays can overwhelm a recovering server if many clients retry simultaneously.
Exponential backoff increases the wait time after each failure (e.g., 1s, 2s, 4s...).
Adding jitter (a small random offset) prevents synchronized retry spikes.

import requests
import time
import random

def get_with_backoff(url, max_retries=3):
    delay=1

    for attempt in range(1, max_retries + 1):
        print(f"Attempt {attempt}/{max_retries}...")

        try:
            res = requests.get(url, timeout=10)
            res.raise_for_status()
            print(f"Succeeded with status {res.status_code}")
            return res
        except requests.exceptions.HTTPError as err:
            if err.response.status_code < 500:
                print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
                raise RuntimeError(f"Client error! Please review request.")
            else:
                jitter = random.uniform(-0.1 * delay, 0.1 * delay)
                # delay = 1 -> jitter [-0.1, 0.1] -> 0.9 and 1.1s
                # delay = 2 -> jitter [-0.2, 0.2] -> 1.8 and 2.2s
                # delay = 4 -> jitter [-0.4, 0.4] -> 3.6 and 4.4s
                wait = min(delay * 2, 30) + jitter
                print(f"  Failed with server error code {err.response.status_code}. Retrying in {wait:.2f}s")
                time.sleep(wait)
                delay = min(delay * 2, 30)
    raise RuntimeError(f"All retries to query {url} failed!")

try:
    res = get_with_backoff(
        f"{HTTPBIN_ENDPOINT}/status/503",
        max_retries=4
    )
except RuntimeError as e:
    print(e)

Common Pitfalls & How to Avoid Them

Forgetting to set timeouts can cause scripts to hang indefinitely; always use timeout.
Retrying client errors (4xx) usually won’t help; only retry transient server errors (5xx) or network issues.
Retrying non-idempotent operations (e.g., POST) can cause duplicate actions; limit retries to safe methods.
Fixed retry delays can lead to synchronized retry spikes; use exponential backoff with jitter for production scenarios.
python