Random Delays in Web Scraping in Python

Adding random delays between queries is a great way to avoid getting blocked while running scripts in Python. This technique can be used to ensure that the script is not making too many requests in a short period and thus, preventing it from being blocked by the server.

By adding random delays, the script can spread out its requests over time and thus, avoiding any problems with the server.

Setting Up Random Delays in Python for Web Scraping

First, we have to add the sleep and randint functions. The sleep function says how many seconds the program has to wait to execute the next line of code.

Randint generates a pseudorandom number between the first and the second parameter.

from random import randint

from time import sleep

sleep(randint(4, 10))

This code will wait several seconds: between 4 and 10.

Test Your Random Delay Settings and Make Sure You’re Not Being Detected

Let’s add the requests library to our code. This code will check 3 pages and return the status code of the website.

from random import randint

from time import sleep

import requests

pages = ['https://docs.python.org/3/library/time.html',

'https://docs.python.org/3/library/random.html',

'https://requests.readthedocs.io/en/latest/']

for page in pages:

r = requests.get(page)

print(r.status_code)

sleep(randint(4, 10))

If you run the code, you should see the following result:

200
200
200

Error status 200 is an HTTP status code indicating that the request was successful. If you see a different code, it means that the operation wasn’t successful.

Best Practices for Adding Random Delays

When you add random delays, you have to remember to find the best numbers not to overload servers if the site is small, but at the same time, you don’t want to wait too long for the results.

Codeigo

Just programming

Random Delays in Web Scraping in Python

Setting Up Random Delays in Python for Web Scraping

Test Your Random Delay Settings and Make Sure You’re Not Being Detected

Best Practices for Adding Random Delays