Delete Set public Set private Add tags Delete tags
  Add tag   Cancel
  Delete tag   Cancel
  • • DevOps notes •
  •  
  • AI
  • Tags
  • Login

Implementing Retries and Timeouts/shaare/-hYM4Q

  • python
  • python

Implementing Retries and Timeouts

  • External services can be slow or unreliable, causing scripts to hang or fail unexpectedly.
  • Timeouts and retries help ensure your automation scripts remain responsive and resilient.

Timeouts

  • By default, requests may wait indefinitely for a response, which is risky in automation.
  • Use the timeout parameter with a single value for both connect and read, or a tuple (connect, read) for fine-grained control.
  • A ConnectTimeout is raised if the connection can’t be established in time; a ReadTimeout is raised if data stops arriving within the read timeout.
HTTPBIN_ENDPOINT = "https://httpbin.org"
import requests
import time

delay_url = f"{HTTPBIN_ENDPOINT}/delay/5" # Simulate a 5-second delay

start = time.perf_counter()

try:
    res = requests.get(delay_url, timeout=2)
    print(f"Completed in {time.perf_counter() - start:.2f}s, status {response.status_code}")
except (
    requests.exceptions.ConnectTimeout,
    requests.exceptions.ReadTimeout
) as timeout_err:
    print(f"Timeout after {time.perf_counter() - start:.2f}s: {timeout_err}")

Retries

  • Transient issues like network blips or server overloads may cause requests to fail temporarily.
  • Implement a simple retry loop that catches errors, retries on server-side (5xx) errors or network exceptions, and breaks on success or client errors.
  • Use a fixed delay between retries for simplicity, or an exponential backoff for a more robust approach.
  • Avoid retrying non-idempotent operations.
import requests
import time

flaky_url = f"{HTTPBIN_ENDPOINT}/status/200,500,503"

max_retries = 3
delay = 2

for attempt in range(1, max_retries + 1):
    print(f"Attempt {attempt}/{max_retries}...")

    try:
        res = requests.get(flaky_url, timeout=10)
        res.raise_for_status()
        print(f"Succeeded with status {res.status_code}")
        break
    except requests.exceptions.HTTPError as err:
        if err.response.status_code < 500:
            print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
            break
        else:
            print(f"Failed with server error code {err.response.status_code}.")
    if attempt < max_retries:
        print(f"Waiting {delay}s before retry...")
        time.sleep(delay)
else:
    print(f"All {max_retries} attempts failed!")

Exponential Backoff with Jitter

  • Fixed delays can overwhelm a recovering server if many clients retry simultaneously.
  • Exponential backoff increases the wait time after each failure (e.g., 1s, 2s, 4s...).
  • Adding jitter (a small random offset) prevents synchronized retry spikes.
import requests
import time
import random

def get_with_backoff(url, max_retries=3):
    delay=1

    for attempt in range(1, max_retries + 1):
        print(f"Attempt {attempt}/{max_retries}...")

        try:
            res = requests.get(url, timeout=10)
            res.raise_for_status()
            print(f"Succeeded with status {res.status_code}")
            return res
        except requests.exceptions.HTTPError as err:
            if err.response.status_code < 500:
                print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
                raise RuntimeError(f"Client error! Please review request.")
            else:
                jitter = random.uniform(-0.1 * delay, 0.1 * delay)
                # delay = 1 -> jitter [-0.1, 0.1] -> 0.9 and 1.1s
                # delay = 2 -> jitter [-0.2, 0.2] -> 1.8 and 2.2s
                # delay = 4 -> jitter [-0.4, 0.4] -> 3.6 and 4.4s
                wait = min(delay * 2, 30) + jitter
                print(f"  Failed with server error code {err.response.status_code}. Retrying in {wait:.2f}s")
                time.sleep(wait)
                delay = min(delay * 2, 30)
    raise RuntimeError(f"All retries to query {url} failed!")

try:
    res = get_with_backoff(
        f"{HTTPBIN_ENDPOINT}/status/503",
        max_retries=4
    )
except RuntimeError as e:
    print(e)

Common Pitfalls & How to Avoid Them

  • Forgetting to set timeouts can cause scripts to hang indefinitely; always use timeout.
  • Retrying client errors (4xx) usually won’t help; only retry transient server errors (5xx) or network issues.
  • Retrying non-idempotent operations (e.g., POST) can cause duplicate actions; limit retries to safe methods.
  • Fixed retry delays can lead to synchronized retry spikes; use exponential backoff with jitter for production scenarios.
    python
1 month ago Permalink
cluster icon
  • Making HTTP Requests : Making HTTP Requests The requests library simplifies HTTP interactions by abstracting raw HTTP details, making it ideal for DevOps automation tasks. ...
  • Running Python modules : Running Scripts: python -m vs. python file.py The Core Difference: What is "Entry Point Zero"? The key to understanding the difference lies in the fir...
  • Pytest Markers : Pytest Markers Markers are decorators (@pytest.mark.) applied to tests to attach metadata. Built-in markers like skip, skipif, xfail, and parametrize...
  • Lambda Functions : Lambda Functions Python functions defined with def allow multiple statements, clear naming, and support for docstrings, making them ideal for complex...
  • Parametrized Tests : Parametrized Tests Introduction Often, we need to test the same logic with different inputs and outputs, such as validating various IP address or hos...


(97)
Filter untagged links
Fold Fold all Expand Expand all Are you sure you want to delete this link? Are you sure you want to delete this tag? The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community