Adding random delays between queries is a great way to avoid getting blocked while running scripts in Python. This technique can be used to ensure that the script is not making too many requests in a short period and thus, preventing it from being blocked by the server.
By adding random delays, the script can spread out its requests over time and thus, avoiding any problems with the server.
Setting Up Random Delays in Python for Web Scraping
First, we have to add the sleep and randint functions. The sleep function says how many seconds the program has to wait to execute the next line of code.
Randint generates a pseudorandom number between the first and the second parameter.
1 2 3 |
from random import randint from time import sleep sleep(randint(4, 10)) |
This code will wait several seconds: between 4 and 10.
Test Your Random Delay Settings and Make Sure You’re Not Being Detected
Let’s add the requests library to our code. This code will check 3 pages and return the status code of the website.
1 2 3 4 5 6 7 8 9 10 11 12 |
from random import randint from time import sleep import requests pages = ['https://docs.python.org/3/library/time.html', 'https://docs.python.org/3/library/random.html', 'https://requests.readthedocs.io/en/latest/'] for page in pages: r = requests.get(page) print(r.status_code) sleep(randint(4, 10)) |
If you run the code, you should see the following result:
200 200 200
Error status 200 is an HTTP status code indicating that the request was successful. If you see a different code, it means that the operation wasn’t successful.
Best Practices for Adding Random Delays
When you add random delays, you have to remember to find the best numbers not to overload servers if the site is small, but at the same time, you don’t want to wait too long for the results.