How to write web scraper for stocks using Beautifulsoup
What is web scraping
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler.
We are going to write a web scraper using python , for an stock website. For this we are using Beautifulsoup.
Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. It is available for Python 2.7 and Python 3.
Let’s start the code.
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pandas as pd
import datetime as dt
import pandas as pd
page = requests.get(“https://money.cnn.com/data/us_markets/")
soup = BeautifulSoup(page.content, ‘html.parser’)
print (soup)
tickers_tab = soup.find_all(‘td’,{‘class’:’wsod_firstCol’})
# Getting all the tickers
tickers_tab= tickers_tab[0:10]
count=0
tick_list=[]
for tick in tickers_tab:
for tic in tick:
tick_list.append(tic)
# Have ticks and their tags
tick_list = tick_list[1::2]
tickers=[]
for ss in tick_list:
for s in ss:
tickers.append(s)
print (tickers)
tickers_price = soup.find_all(‘td’,{‘class’:’wsod_aRight’})
# three columns of price, change, and percent change
tickers_price = tickers_price[0:30]
#As we have three column, so here we are three lists to get index for each column and records.
price_indx = [0,3,6,9,12,15,18,21,24,27]
change_indx = [1,4,7,11,13,16,19,22,25,28]
per_change_indx = [2,5,8,11,14,17,20,23,26,29]
# Data for price column
price =[]
price_data=[]
price = [tickers_price[i] for i in price_indx]
for pri in price:
for p in pri:
for s in p:
price_data.append(s)
print (price_data)
# Data for change column
change =[]
change_data=[]
change = [tickers_price[i] for i in change_indx]
for ch_pri in change:
for c_pri in ch_pri:
for c_pr in c_pri:
for c_p in c_pr:
change_data.append(c_p)
print (change_data)
# Data for percentage change column
per_change =[]
per_change_dat=[]
per_change = [tickers_price[i] for i in per_change_indx]
for per_chng in per_change:
for per_chn in per_chng:
for pr_chn in per_chn:
for pr_ch in pr_chn:
#rint (pr_ch)
per_change_dat.append(pr_ch)
print (per_change_dat)
# We have ticker list and their data in list. create a dataframe
df=pd.DataFrame(tickers)
df[‘price’]= price_data
df[‘change’]= change_data
df[‘%change’]=per_change_dat
# check the dataframe
df.head()
# prepare a name for file to be saved. This is for if you want to run this program on three to keep getting live data.
x=dt.datetime.today().strftime(“%m/%d/%Y”)
x=x.replace(‘/’,’-’)
x=x+’.csv’
filename=’us_stock_sectors-’
filename=filename+x
print (filename)
#Save the file as csv
df.to_csv(filename)
You can use this data further for your project like stock prediction or analysis. Use case can be different.
Cheers :)
Feel free to contact me at:
LinkedIn https://www.linkedin.com/in/junaidraza52/
Fiverr https://www.fiverr.com/jouneidraza
Whatsapp +92–3225847078
Instagram https://www.instagram.com/iamjunaidrana/
Happy Learning :)