Stock data scraper
Stock data is useful to make predictions and build other some use cases related to stocks. Authentic live data is key feature to make an intelligent and powerful model.
We will be using CNN Business live site for data scraping purpose.
We can use beautifulsoup for this tutorial. You can install this by using following command.
pip install beautifulsoup4
Let’s Start.
from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pandas as pd
import datetime as dt
import pandas as pd
Once we have imported the required packages, now we will load the Webpage to start further processing.
page = requests.get("https://money.cnn.com/data/us_markets/")
soup = BeautifulSoup(page.content, 'html.parser')
soup
This is complete page, we will be exploring tickers and data step by step.
tickers_tab = soup.find_all('td',{'class':'wsod_firstCol'}) # Getting all the tickers tickers_tab= tickers_tab[0:10]
count=0
tick_list=[]
for tick in tickers_tab:
for tic in tick:
tick_list.append(tic)
# Have ticks and their tags
tick_list = tick_list[1::2]
tickers=[]
for ss in tick_list:
for s in ss:
tickers.append(s) print (tickers)tickers_price = soup.find_all('td',{'class':'wsod_aRight'})
# three columns of price, change, and percent change
tickers_price = tickers_price[0:30]
Define the list indexes for three column we intended to get.
price_indx = [0,3,6,9,12,15,18,21,24,27]
change_indx = [1,4,7,11,13,16,19,22,25,28]
per_change_indx = [2,5,8,11,14,17,20,23,26,29]
Price Data
price =[]
price_data=[]
price = [tickers_price[i] for i in price_indx]
for pri in price:
for p in pri:
for s in p:
price_data.append(s)
print (price_data)
Change Data
change =[]
change_data=[]
change = [tickers_price[i] for i in change_indx]
for ch_pri in change:
for c_pri in ch_pri:
for c_pr in c_pri:
for c_p in c_pr:
change_data.append(c_p)
print (change_data)
Percentage Change Data
per_change =[]
per_change_dat=[]
per_change = [tickers_price[i] for i in per_change_indx]
for per_chng in per_change:
for per_chn in per_chng:
for pr_chn in per_chn:
for pr_ch in pr_chn:
#rint (pr_ch)
per_change_dat.append(pr_ch)
print (per_change_dat)
Combine these list and make an dataFrame.
df=pd.DataFrame(tickers)
df['price']= price_data
df['change']= change_data
df['%change']=per_change_dat
Create a unique name, if you want to run this code in thread, to keep getting live data dumps.
x=dt.datetime.today().strftime("%m/%d/%Y")
x=x.replace('/','-')
x=x+'.csv'
filename='us_stock_sectors-'
filename=filename+xfilename
You can add time into DateTime, so you can run this on a minute basis.
Let’s save our data as CSV.
df.to_csv(filename)
Find the code using the below link.
Cheers. :)
Cheers :)
Feel free to contact me at:
LinkedIn https://www.linkedin.com/in/junaidraza52/
Whatsapp +92–3225847078
Instagram https://www.instagram.com/iamjunaidrana/