Getting Started with Web Scraping

Ankit Gupta
2 min readDec 28, 2021

--

Working in different domains we often encounter scenarios where we are asked to procure data from any particular web page or blogs. These data could be as simple as author names, product reviews, sales number or images.

We’ll be covering a very simple process to extract product review from Amazon, and save it in notepad and csv file. This process could be replicated to extract product review from any webpage to scrape reviews for products or movies, and furthermore.

Importing Libraries

First we’ll import all the important libraries necessary for this job.

# Importing requests to extract content from a url 
#(https://realpython.com/python-requests/#the-get-request)
import requests
# Importing Beautifulsoup for web scrapping
from bs4 import BeautifulSoup as bs

Scraping Product Reviews From Website

Now that we have all the libraries we will start to scrape amazon for product reviews.

#creating an empty list to store reviews
review_list = []

A for loop which will run for as many times as the number of review pages we need to be scraped.

Web link for Review Page
for i in range(1,5):
temp_list=[]
url="https://www.amazon.in/Alchemist-Paulo-Coelho/product- reviews/8172234988/ref=cm_cr_arp_d_paging_btm_next_2?ie=UTF8&reviewerType=all_reviews&pageNumber="+str(i)
response = requests.get(url)
soup = bs(response.content,"html.parser")
reviews = soup.find_all("span",attrs={"class","a-size-base review-text review-text-content"})
for i in range(len(reviews)):
temp_list.append(reviews[i].text)
review_list=review_list+temp_list

We’ll now see how the code works.

for i in range(1,5): #here range function is used to provide the number of pages that we would like our code to scrape through.

response = requests.get(url) #get method is used to get or retrieve data from a specified resource.

soup = bs(response.content,”html.parser”)# creating soup object to iterate over the extracted content

reviews = soup.find_all(“span”,attrs={“class”,”a-size-base review-text review-text-content”}) #Extracting the content under specific tags. The find_all() method looks through a tag’s descendants and retrieves all descendants that matches our filters.

Selecting attributes for data extraction

Writing scraped data into a text file

# writng reviews in a text filewith open("reviews.txt","w",encoding='utf8') as output:
output.write(str(review_list))
Generated Review Output

And so we have our product review saved in a notepad!

--

--

Ankit Gupta
Ankit Gupta

No responses yet