Stock data scraper

Jouneid Raza
2 min readApr 30, 2020

--

Stock data is useful to make predictions and build other some use cases related to stocks. Authentic live data is key feature to make an intelligent and powerful model.
We will be using CNN Business live site for data scraping purpose.

We can use beautifulsoup for this tutorial. You can install this by using following command.

pip install beautifulsoup4

Let’s Start.

from bs4 import BeautifulSoup
import requests
import lxml.html as lh
import pandas as pd
import datetime as dt
import pandas as pd

Once we have imported the required packages, now we will load the Webpage to start further processing.

page = requests.get("https://money.cnn.com/data/us_markets/")
soup = BeautifulSoup(page.content, 'html.parser')
soup

This is complete page, we will be exploring tickers and data step by step.

tickers_tab = soup.find_all('td',{'class':'wsod_firstCol'}) # Getting all the tickers tickers_tab= tickers_tab[0:10]  
count=0
tick_list=[]
for tick in tickers_tab:
for tic in tick:
tick_list.append(tic)

# Have ticks and their tags
tick_list = tick_list[1::2]
tickers=[]
for ss in tick_list:
for s in ss:
tickers.append(s)
print (tickers)tickers_price = soup.find_all('td',{'class':'wsod_aRight'})

# three columns of price, change, and percent change
tickers_price = tickers_price[0:30]

Define the list indexes for three column we intended to get.

price_indx = [0,3,6,9,12,15,18,21,24,27]
change_indx = [1,4,7,11,13,16,19,22,25,28]
per_change_indx = [2,5,8,11,14,17,20,23,26,29]

Price Data

price =[]
price_data=[]

price = [tickers_price[i] for i in price_indx]
for pri in price:
for p in pri:
for s in p:
price_data.append(s)

print (price_data)

Change Data

change =[]
change_data=[]

change = [tickers_price[i] for i in change_indx]
for ch_pri in change:
for c_pri in ch_pri:
for c_pr in c_pri:
for c_p in c_pr:
change_data.append(c_p)
print (change_data)

Percentage Change Data

per_change =[]
per_change_dat=[]

per_change = [tickers_price[i] for i in per_change_indx]
for per_chng in per_change:
for per_chn in per_chng:
for pr_chn in per_chn:
for pr_ch in pr_chn:
#rint (pr_ch)
per_change_dat.append(pr_ch)


print (per_change_dat)

Combine these list and make an dataFrame.

df=pd.DataFrame(tickers)
df['price']= price_data
df['change']= change_data
df['%change']=per_change_dat

Create a unique name, if you want to run this code in thread, to keep getting live data dumps.

x=dt.datetime.today().strftime("%m/%d/%Y")
x=x.replace('/','-')
x=x+'.csv'
filename='us_stock_sectors-'
filename=filename+x
filename

You can add time into DateTime, so you can run this on a minute basis.

Let’s save our data as CSV.

df.to_csv(filename)

Find the code using the below link.

Cheers. :)

Cheers :)

Feel free to contact me at:
LinkedIn https://www.linkedin.com/in/junaidraza52/
Whatsapp +92–3225847078
Instagram https://www.instagram.com/iamjunaidrana/

--

--

Jouneid Raza
Jouneid Raza

Written by Jouneid Raza

With 8 years of industry expertise, I am a seasoned data engineer specializing in data engineering with diverse domain experiences.

No responses yet