Python Selenium – Scraping Google search result URL’s

In this project we will use Selenium and a Chrome web driver to scrape the URL’s of Google search results. The prerequisite of the project is that you have a Chrome web driver stored on your machine. You may download the web driver by clicking here, note that it needs to match your current version of Chrome.

We begin by specifying the location of our web driver, the driver.get() method will then navigate to the given URL Address. In this example are going to the Google home page.

driver = webdriver.Chrome(executable_path="C:/chromedriver.exe")
driver.get("https://www.google.com?hl=en")

The next snippet of code isn’t always necessary depending on which location you are in. I tend to run similar code via a VPN and am sometimes presented with the following dialogue from Google based on which country I am connecting to – Long story short it wont hurt to add this, it checks if the dialogue appears and will automatically click on the ‘I agree’ button.

try:
    WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#L2AGLb > div")))
    driver.find_element(By.CSS_SELECTOR, "#L2AGLb > div").click()
except:
    pass

We then move on to defining our search term, which in this case is ‘Scriptopia’ another parameter were also adding is ‘num=100’ which limit results to 100 per page – I believe this is the maximum per page for Google. There are many Google URL parameters which are extremely helpful for example to go the to the next set of page results or even show omitted search results. I may cover this at a later point.

driver.implicitly_wait(1)
Query = "Scriptopia"
driver.get("https://www.google.com/searchq=+%22+"+Query+"%22&num=100")
driver.implicitly_wait(1)

Using Beautiful soup we then parse the results and scrape the URL’s of all search results found.

soup = BeautifulSoup(driver.page_source, 'html.parser')
search = soup.find_all('div', class_="yuRUbf")
for links in search:
    print(links.a.get('href'))

The full source code for this project can be found below:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup

driver = webdriver.Chrome(executable_path="C:\chromedriver.exe")
driver.get("https://www.google.com?hl=en")

try:
    WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#L2AGLb > div")))
    driver.find_element(By.CSS_SELECTOR, "#L2AGLb > div").click()
except:
    pass

driver.implicitly_wait(1)
Query = "Scriptopia"
driver.get("https://www.google.com/searchq=+%22"+Query+"%22&num=100")
driver.implicitly_wait(1)
soup = BeautifulSoup(driver.page_source, 'html.parser')
search = soup.find_all('div', class_="yuRUbf")
for links in search:
    print(links.a.get('href'))        
driver.close()
driver.quit()        

Leave a Reply