Python Selenium – Scrape data from Yell directory

In this post we will be creating a Python script that will scrape data from the Yell directory, Yell is a digital marketing and online directory business in the United Kingdom. For this project we’ll be making use of Selenium which is used to automate web browser interaction from Python.

To begin with we will require a search term and location from the user, this is done in the snippet of code below.

#Get Input from User
Query = input("Enter Seach: ")
Location = input("Enter Location: ")

We then initiate our web driver and head over to the Yell website. The Chromium web driver can be downloaded here, ensure that it matches with your current version of Chrome.

#Start Chrome driver
driver = webdriver.Chrome('C:\Chromium\chromedriver.exe')
driver.get("https://www.yell.com/")

The Yell homepage consists of a search term input field, a location input field and a search button. We will use selenium to find these elements on the webpage. Using the code below the search term and location fields are populated and the search button is pressed. The script then invokes a short delay for the page elements to load.

#Enter Query
SearchQuery = driver.find_element_by_css_selector("input[type='text'][id='search_keyword']")
SearchQuery.send_keys(Query)

#Enter Location
SearchLocation = driver.find_element_by_css_selector("input[type='text'][id='search_location']")
SearchLocation.send_keys(Location)

#Start Search
driver.find_element_by_xpath("//button[text()='Search']").click()

#Wait for Elements to Load
driver.implicitly_wait(10)
Yell homepage

The first page of results is then presented, here we will retrieve the current URL and introduce an integer named ‘pageNumber’ with a value of 2. These will be used later to help us navigate through the paged results.

url = (driver.current_url)
pageNumber = 2

We then introduce a while loop that will retrieve and print the search results, using the URL seen in the above snippet of code we will the ‘pageNum’ parameter to the URL along with the page number. We will then repeat these steps again and increment the page number by 1.

Once we reach a page number with no results the following is displayed, the script will check for the ‘Oops! Something went wrong.’ message to be present and if so, stop the script.

Yell search with no results remaining
while True:
    message = driver.find_elements_by_tag_name('h2')
    for i in message:
        if "Oops! Something went wrong." in i.text:
            driver.close()
            driver.quit() 
            sys.exit(0)

        print(i.text)
    driver.get(url+"&pageNum="+str(pageNumber))
    driver.implicitly_wait(10)
    pageNumber = pageNumber + 1

The full source code for this project can be found below.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import sys

#Get Input from User
Query = input("Enter Seach: ")
Location = input("Enter Location: ")

#Start Chrome driver
driver = webdriver.Chrome('C:\Chromium\chromedriver.exe')
driver.get("https://www.yell.com/")

#Enter Query
SearchQuery = driver.find_element_by_css_selector("input[type='text'][id='search_keyword']")
SearchQuery.send_keys(Query)

#Enter Location
SearchLocation = driver.find_element_by_css_selector("input[type='text'][id='search_location']")
SearchLocation.send_keys(Location)

#Start Search
driver.find_element_by_xpath("//button[text()='Search']").click()

#Wait for Elements to Load
driver.implicitly_wait(10)

url = (driver.current_url)
pageNumber = 2

while True:
    message = driver.find_elements_by_tag_name('h2')
    for i in message:
        if "Oops! Something went wrong." in i.text:
            driver.close()
            driver.quit() 
            sys.exit(0)
            
        print(i.text)
    driver.get(url+"&pageNum="+str(pageNumber))
    driver.implicitly_wait(10)
    pageNumber = pageNumber+1

Leave a Reply