How to write web scraper for stocks using Beautifulsoup

Jouneid Raza
2 min readFeb 8, 2020

--

What is web scraping

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler.

We are going to write a web scraper using python , for an stock website. For this we are using Beautifulsoup.

Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. It is available for Python 2.7 and Python 3.

Let’s start the code.

from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pandas as pd
import datetime as dt
import pandas as pd

page = requests.get(“https://money.cnn.com/data/us_markets/")
soup = BeautifulSoup(page.content, ‘html.parser’)
print (soup)

tickers_tab = soup.find_all(‘td’,{‘class’:’wsod_firstCol’})
# Getting all the tickers
tickers_tab= tickers_tab[0:10]

count=0
tick_list=[]

for tick in tickers_tab:

for tic in tick:
tick_list.append(tic)
# Have ticks and their tags
tick_list = tick_list[1::2]

tickers=[]
for ss in tick_list:
for s in ss:
tickers.append(s)
print (tickers)

tickers_price = soup.find_all(‘td’,{‘class’:’wsod_aRight’})

# three columns of price, change, and percent change
tickers_price = tickers_price[0:30]

#As we have three column, so here we are three lists to get index for each column and records.

price_indx = [0,3,6,9,12,15,18,21,24,27]
change_indx = [1,4,7,11,13,16,19,22,25,28]
per_change_indx = [2,5,8,11,14,17,20,23,26,29]

# Data for price column

price =[]
price_data=[]

price = [tickers_price[i] for i in price_indx]
for pri in price:
for p in pri:
for s in p:
price_data.append(s)

print (price_data)

# Data for change column

change =[]
change_data=[]

change = [tickers_price[i] for i in change_indx]
for ch_pri in change:
for c_pri in ch_pri:
for c_pr in c_pri:
for c_p in c_pr:
change_data.append(c_p)
print (change_data)

# Data for percentage change column

per_change =[]
per_change_dat=[]

per_change = [tickers_price[i] for i in per_change_indx]
for per_chng in per_change:
for per_chn in per_chng:
for pr_chn in per_chn:
for pr_ch in pr_chn:
#rint (pr_ch)
per_change_dat.append(pr_ch)


print (per_change_dat)

# We have ticker list and their data in list. create a dataframe

df=pd.DataFrame(tickers)
df[‘price’]= price_data
df[‘change’]= change_data
df[‘%change’]=per_change_dat

# check the dataframe

df.head()

# prepare a name for file to be saved. This is for if you want to run this program on three to keep getting live data.

x=dt.datetime.today().strftime(“%m/%d/%Y”)
x=x.replace(‘/’,’-’)
x=x+’.csv’
filename=’us_stock_sectors-’
filename=filename+x

print (filename)

#Save the file as csv

df.to_csv(filename)

You can use this data further for your project like stock prediction or analysis. Use case can be different.

Cheers :)

Feel free to contact me at:
LinkedIn https://www.linkedin.com/in/junaidraza52/
Fiverr https://www.fiverr.com/jouneidraza
Whatsapp +92–3225847078
Instagram https://www.instagram.com/iamjunaidrana/

Happy Learning :)

--

--

Jouneid Raza
Jouneid Raza

Written by Jouneid Raza

With 8 years of industry expertise, I am a seasoned data engineer specializing in data engineering with diverse domain experiences.

No responses yet