Fix Requests Max Retries Exceeded With Url in Python

The “Max retries exceeded with url” error thrown sometimes by the requests library in Python falls under two classes of errors: requests.exceptions.ConnectionError (most common) and requests.exceptions.SSLError. In this article, we will discuss the causes of the error, how to reproduce it, and, importantly, how to solve the error.

Causes of requests Max Retries Exceeded With Url

The error occurs when the requests library cannot successfully send requests to the issued site. This happens because of different reasons. Here are the common ones. of them:

  1. Wrong URL – A typo maybe (go to Solution 1),
  2. Failure to verify SSL certificate (Solution 2),
  3. Using requests with no or unstable internet connection (Solution 3), and
  4. Sending too many requests or server too busy (Solution 4)

Wrong URL – A typo?

There is a chance that the URL you requested was incorrect. It could be distorted because of a typo. For example, suppose we want to send a get request to “https://www.example.com” (which is a valid URL), but instead, we issued the URL: “https://www.example.cojkm” (we used .cojkm in the domain extension instead of .com).

import requests
url = 'https://www.example.cojkm'
response = requests.get(url)
print(response)

Output:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.example.cojkm', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fd5b5d33100>: Failed to establish a new connection: [Errno 111] Connection refused'))

Failure to verify SSL certificate

The requests library, by default, implements SSL certificate verification to ensure you are making a secure connection. If the certificates can’t be verified, you end up with an error like this:

requests.exceptions.SSLError: [Errno 1] _ssl.c:503: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Using requests with no or unstable internet connection

The requests package sends and receives data via the web; therefore, the internet connection should be available and stable. If you have no or unstable internet, requests will throw an error like this:

Error: requests.exceptions.ConnectionError: HTTPSConnectionPool(host=’www.example.com’, port=443): Max retries exceeded with url: / (Caused by NewConnectionError(‘<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7f5fadb100>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’))

Sending too many requests/ server overload

Some websites blocks connections when so many requests are made so fast. Another problem related to this is when the server is overloaded – managing a large number of connections at the same time. In this case, requests.get() throws an error like this:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.srgfesrsergserg.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000008EC69AAA90>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

Solutions to requests Max Retries Exceeded With Url Error

In this section, we will cover some solutions to solve the “Max Retries Exceeded With Url” Error caused by the above reasons.

Solution 1: Double check URL

Ensure that you have a correct and valid URL. Consider a valid URL mentioned earlier: “https://www.example.com“. “Max retries exceeded with url” error mostly when incorrect edits are done around www and top-level domain name (e.g., .com).

Another error arises when the scheme/protocol (https) is incorrectly edited: requests.exceptions.InvalidSchema. If the second-level domain (in our case, “example”) is wrongly edited, we will be directed to a different website altogether, and if the site does not exist, we get a 404 response.

Other wrong URLS that leads to “Max retries exceeded with url” is “wwt.example.com” and “https://www.example.com “(a white space after .com).

Solution 2: Solving SSLError

As mentioned, the error is caused by an untrusted SSL certificate. The quickest fix is to set the attribute verify=False on requests.get(). This tells requests to send a request without verifying the SSL certificate.

requests.get('https://example.com', verify=False)

Please be aware that the certificate won’t be verified; therefore, your application will be exposed to security threats like man-in-the-middle attacks. It is best to avoid this method for scripts used at the production level.

Solution 3: Solving the “Max retries exceeded with url” error related to unstable connection

This solution fits cases when you have intermittent connection outages. In these cases, we want requests to be able to carry out many tries on requests before throwing an error. For this case, we can use two solutions:

  • Issue timeout argument in requests.get(), or
  • Retry connections on connections-related errors

Solution 3a: Issue timeout argument in requests.get()

If the server is overloaded, we can use a timeout to wait longer for a response. This will increase the chance of a request finishing successfully.

import requests
url = 'https://www.example.com'
response = requests.get(url,  timeout=7)
print(response)

The code above will wait 7 seconds for the requests package to connect to the site and read the source.

Alternatively, you can pass a timeout as a 2-element tuple where the first element is connection timeout (time to establish a connection to the server) and the second value is read timeout (time allowed for the client to read data from the server)

requests.get('https://api.github.com', timeout=(3, 7))

When the above line is used, a connection must be established within 3 seconds, and data read within 7 seconds; otherwise, requests raise Timeout Error.

Solution 3b: Retry connections on connections-related errors

The requests use Retry utility in urllib3 (urllib3.util.Retry) to retry connections. We will use the following code to send requests (explained after).

import requests
from requests.adapters import HTTPAdapter, Retry
import time
def send_request(url,
    n_retries=4,
    backoff_factor=0.9,
    status_codes=[504, 503, 502, 500, 429]):
    sess = requests.Session()
    retries = Retry(connect=n_retries, backoff_factor=backoff_factor,
     status_forcelist=status_codes)
    sess.mount("https://", HTTPAdapter(max_retries=retries))
    sess.mount("http://", HTTPAdapter(max_retries=retries))
    response = sess.get(url)
    return response

We have used the following parameters on urllib3.util.Retry class:

  • connect – the number of connection-related tries. By default, send_request() will make 4 tries plus 1 (an original request which happens immediately).
  • backoff_factor – determines delays between retries. The sleeping time is computed with the formula {backoff_factor} * (2 ^ ({retry_number} – 1)). We will work on an example for this argument when calling the function.
  • status_forcelist – retry for all connections that resulted in 504, 503, 502, 400, and 429 status codes only (Read more about status codes in https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

Let’s now call our function and time the execution.

# start the timer
start_time = time.time()
# send a request to GitHub API
url = "https://api.github.com/users"
response = send_request(url)
# print the status code
print(response.status_code)
# end timer
end_time = time.time()
# compute the run time
print("Run time: ", end_time-start_time)

Output:

200
Run time:  0.8597214221954346

The connection was completed successfully (status 200) taking 0.86 of a second to finish. To see the implementation of backoff, let’s try to send a request to a server that does not exist, catch an exception when it occurs and compute execution time.

try:
    # start execution timer
    start_time = time.time()
    url = "http://localhost/6000"
    # call send_request() method to send a request to the url
    # this will never be successful because there is no server running
    # on port 6000.
    response = send_request(url)
    print(response.status_code)
except Exception as e:
    # Catch any exception - execution will end here because
    # requests can't connect to http://localhost/6000
    print("Error Name: ", e.__class__.__name__)
    print("Error Message: ", e)
finally:
    # Pick end time
    end_time = time.time()
    # Calculate the time taken to execute.
    print("Run time: ", end_time-start_time)

Output:

Error Name:  ConnectionError
Error Message:  HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /6000 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f06a5862a00>: Failed to establish a new connection: [Errno 111] Connection refused'))
Run time:  12.61784315109253

After 4 retries (plus 1 original request) with a backoff_factor=0.9, the execution time was 12.6 seconds. Let’s use the formula we saw earlier to compute sleeping time.

sleeping_time = {backoff_factor} * (2 ^ ({retry_number} – 1))

There are 5 requests in total

  • First request (which is made immediately) – 0 seconds sleeping,
  • First retry ( which is also sent immediately after the failure of the first request) – 0 seconds sleeping,
  • Second retry -> 0.9*(2^(2-1)) = 0.9*2 = 1.8 seconds of sleeping,
  • Third retry -> 0.9*(2^(3-1)) = 0.9*4 = 3.6 seconds of sleeping time, and,
  • Fourth retry -> 0.9*(2^(4-1)) = 0.9*8 = 7.2 seconds.

That is a total of 12.6 seconds of sleeping time implemented by urllib3.util.Retry. The actual execution time is 12.61784315109253 seconds. The 0.01784315109253 difference, which is not accounted for, is attributable to the DNC and general computer power latency.

Solution 4: Using headers when sending requests

Some websites blocks web crawlers. They notice that a bot is sending requests based on headers passed. For example, let’s run this code and turn on the verbose to see what happened behind the hoods.

import http.client
# turn verbose on
http.client.HTTPConnection.debuglevel = 1
import requests
url = 'https://www.example.com'
response = requests.get(url)

Output (truncated):

send: b'GET / HTTP/1.1\r\nHost: www.example.com\r\nUser-Agent: python-requests/2.28.1\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'

In that log, you can see that User-Agent is python-requests v2.28.1 and not a real browser. With such identification, you might get blocked and get the “Max retries exceeded with url” error. To avoid this, we need to pass our actual browser as a user-agent. You can go to the following link to get some headers: http://myhttpheader.com/. In that link my user-agent is “Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0” . Let’s now use that user agent instead.

import http.client
# Turn verbose on.
http.client.HTTPConnection.debuglevel = 1
import requests
headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0'}
url = 'https://www.example.com'
response = requests.get(url,  timeout=5, headers=headers)

Output (truncated)

send: b'GET / HTTP/1.1\r\nHost: www.example.com\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'

Conclusion

The “Max retries exceeded with url” error is caused by an invalid URL, server overloading, failed SSL verification, unstable internet connection, and an attempt to send many requests to a server. In this article, we discussed solutions for all these problems using examples.

The key is always to understand the kind of error you have and then pick the appropriate solution.