Implementing Retries and Timeouts
- External services can be slow or unreliable, causing scripts to hang or fail unexpectedly.
- Timeouts and retries help ensure your automation scripts remain responsive and resilient.
Timeouts
- By default,
requests may wait indefinitely for a response, which is risky in automation.
- Use the
timeout parameter with a single value for both connect and read, or a tuple (connect, read) for fine-grained control.
- A
ConnectTimeout is raised if the connection can’t be established in time; a ReadTimeout is raised if data stops arriving within the read timeout.
HTTPBIN_ENDPOINT = "https://httpbin.org"
import requests
import time
delay_url = f"{HTTPBIN_ENDPOINT}/delay/5" # Simulate a 5-second delay
start = time.perf_counter()
try:
res = requests.get(delay_url, timeout=2)
print(f"Completed in {time.perf_counter() - start:.2f}s, status {response.status_code}")
except (
requests.exceptions.ConnectTimeout,
requests.exceptions.ReadTimeout
) as timeout_err:
print(f"Timeout after {time.perf_counter() - start:.2f}s: {timeout_err}")
Retries
- Transient issues like network blips or server overloads may cause requests to fail temporarily.
- Implement a simple retry loop that catches errors, retries on server-side (5xx) errors or network exceptions, and breaks on success or client errors.
- Use a fixed delay between retries for simplicity, or an exponential backoff for a more robust approach.
- Avoid retrying non-idempotent operations.
import requests
import time
flaky_url = f"{HTTPBIN_ENDPOINT}/status/200,500,503"
max_retries = 3
delay = 2
for attempt in range(1, max_retries + 1):
print(f"Attempt {attempt}/{max_retries}...")
try:
res = requests.get(flaky_url, timeout=10)
res.raise_for_status()
print(f"Succeeded with status {res.status_code}")
break
except requests.exceptions.HTTPError as err:
if err.response.status_code < 500:
print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
break
else:
print(f"Failed with server error code {err.response.status_code}.")
if attempt < max_retries:
print(f"Waiting {delay}s before retry...")
time.sleep(delay)
else:
print(f"All {max_retries} attempts failed!")
Exponential Backoff with Jitter
- Fixed delays can overwhelm a recovering server if many clients retry simultaneously.
- Exponential backoff increases the wait time after each failure (e.g., 1s, 2s, 4s...).
- Adding jitter (a small random offset) prevents synchronized retry spikes.
import requests
import time
import random
def get_with_backoff(url, max_retries=3):
delay=1
for attempt in range(1, max_retries + 1):
print(f"Attempt {attempt}/{max_retries}...")
try:
res = requests.get(url, timeout=10)
res.raise_for_status()
print(f"Succeeded with status {res.status_code}")
return res
except requests.exceptions.HTTPError as err:
if err.response.status_code < 500:
print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
raise RuntimeError(f"Client error! Please review request.")
else:
jitter = random.uniform(-0.1 * delay, 0.1 * delay)
# delay = 1 -> jitter [-0.1, 0.1] -> 0.9 and 1.1s
# delay = 2 -> jitter [-0.2, 0.2] -> 1.8 and 2.2s
# delay = 4 -> jitter [-0.4, 0.4] -> 3.6 and 4.4s
wait = min(delay * 2, 30) + jitter
print(f" Failed with server error code {err.response.status_code}. Retrying in {wait:.2f}s")
time.sleep(wait)
delay = min(delay * 2, 30)
raise RuntimeError(f"All retries to query {url} failed!")
try:
res = get_with_backoff(
f"{HTTPBIN_ENDPOINT}/status/503",
max_retries=4
)
except RuntimeError as e:
print(e)
Common Pitfalls & How to Avoid Them
- Forgetting to set timeouts can cause scripts to hang indefinitely; always use
timeout.
- Retrying client errors (4xx) usually won’t help; only retry transient server errors (5xx) or network issues.
- Retrying non-idempotent operations (e.g., POST) can cause duplicate actions; limit retries to safe methods.
- Fixed retry delays can lead to synchronized retry spikes; use exponential backoff with jitter for production scenarios.
python