How to Set Up Proxies in Selenium for Web Scraping

How to's, Python, Proxies, Nov. -27-20245 minutos de leitura

When working with Selenium for web scraping or automation, integrating proxies is a must. Proxies allow you to bypass bans, rate limits, and geo-restrictions, making your tasks seamless and efficient. But configuring proxies in Selenium can be a challenge, especially if you’re dealing with authentication or need to monitor HTTP requests. That’s where Selenium Wire comes in.

Selenium-Wire

Selenium Wire is an extended version of Selenium that adds extra advanced features on it by allowing  you to easily authenticate proxies, intercept HTTP requests and responses, and debug network traffic.

In this guide, we’ll show you how to set up proxies in Selenium using selenium-wire and the webdriver-manager. Typically, you need to download the binary files for webdrivers from browsers and maintain regular updates. The webdriver-manager simplifies this process by handling these tasks for you.

By the end of this blog, you’ll have a fully configured Selenium setup tailored to ProxyScrape proxies, ready to tackle any challenges that come your way. Let’s dive in!

TL;DR

To access the complete script without going through the entire tutorial, click this link to copy the full code.

Prerequisites 

Before we dive into setting up proxies in Selenium, make sure you have the following tools and libraries installed and ready:

  • Python Installed
    • Ensure you have Python 3.7 or higher installed on your system.
    • You can download the latest version from the official Python website.
  • Required Python Packages (Pip Install)
    • selenium-wire
    • webdriver-manager

Run the following command to install all dependencies:

pip install selenium-wire webdriver-manager

Note: You may encounter the error “ModuleNotFoundError: No module named blinker._saferef”. This can be resolved by downgrading the blinker library to version 1.7.0

  • Start by first Uninstalling the current version of blinker
pip uninstall blinker
  • Then install the specific version mentioned above:
pip install blinker==1.7.0

With the prerequisites in place, let's break down the script configuration into three simple steps:

Setting Up Proxies in Selenium: The Script

Now that we’ve covered the prerequisites, let’s move on to the actual script. This step-by-step guide will help you integrate ProxyScrape residential proxies with Selenium using selenium-wire and webdriver-manager.

1. Importing Required Libraries

We begin by importing the necessary libraries:

import re
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

2. Proxy Configuration

Define your ProxyScrape proxy details:

proxy_address = "rp.proxyscrape.com:6060"
proxy_username = "your_proxy_username"
proxy_password = "your_proxy_password"
  • Replace the placeholders (proxy_username, proxy_password) with your actual ProxyScrape credentials.
  • rp.proxyscrape.com:6060 is the ProxyScrape residential proxy endpoint.

3. Selenium Wire Options

Set up the proxy in Selenium Wire:

sw_options = {
   'proxy': {
       'http': f'http://{proxy_username}:{proxy_password}@{proxy_address}',
       'https': f'https://{proxy_username}:{proxy_password}@{proxy_address}',
   }
}

4. Configuring Chrome Options

Optimize Chrome settings for better performance:

chrome_options = Options()
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")

5. Initialize WebDriver

Set up selenium-wire with webdriver-manager:

service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, seleniumwire_options=sw_options, options=chrome_options)
  • ChromeDriverManager: Automatically downloads and sets up the correct ChromeDriver binary for your browser version.
  • seleniumwire_options: Configures the proxy for Selenium Wire.
  • options: Applies Chrome-specific settings.

6. Access the Target Website

Navigate to the ProxyScrape Judge endpoint to test your proxy:

driver.get('https://ssl-judge2.api.proxyscrape.com/')
  • ProxyScrape Judge: This endpoint returns information about the proxy being used, such as your IP address and headers.

7. Parse the Response

Extract and display your proxied IP address using regex:

# Example: Extract the IP from the response
response = driver.page_source

# using simple regex to parse origin ip
print("Response:", response)
print("Your IP is:", re.search("HTTP_X_FORWARDED_FOR = (\d+\.)+\d+", response).group().split("=")[-1])
# quit the browser instance
driver.quit()
  • Regex Explanation:
    • Matches the header HTTP_X_FORWARDED_FOR and extracts the proxied IP
    • Splits the result to isolate the IP address

Conclusão

In conclusion, using ProxyScrape residential proxies with Selenium Wire is a robust solution for anyone needing advanced web scraping and automation capabilities with enhanced privacy and security.

By following this guide, you can set up a seamless environment that not only bypasses restrictions but also requires minimal configuration efforts. This method leverages powerful tools like Selenium Wire and WebDriver Manager to efficiently manage and route traffic through proxies, ensuring your scraping tasks remain efficient and reliable.

If you need assistance with web scraping or have questions about our product, don't hesitate to contact us via live chat. You can also join our Discord community for support and updates.