Scrape Yahoo Finance using Python

During this post, we are going to find out how to scrape Yahoo Financials using Python. We will use Pandas to extract key financials for any company available in Yahoo Financials.

Yahoo Finance Scrapping
Photo by Serpstat on Pexels.com

In my latest posts, we have performed multiple financial analysis using Python through a great API financialmodelingprep. In this article, we will follow a different approach, we will scrap Yahoo Financials using Python.

To scrape Yahoo finance using Python, we only need a couple of lines of codes as we will see below thanks to Pandas. This is what makes Pandas one of the best Python packages.

Extracting Financials from Yahoo Finance

Let’s imagine that we are interested in knowing how much key executives are getting paid in Apple and Microsoft. Instead of going to Yahoo Finance and check that manually, we could run below Python code and store the data into a Pandas DataFrame for further analysis.

#url https://finance.yahoo.com/quote/AAPL/profile?p=AAPL

import pandas as pd

#Executive_pay
profile = pd.read_html('https://finance.yahoo.com/quote/AAPL/profile?p=AAPL')
profile[0]

First, we import pandas and use pd.read_html to scrap the Executive Pay table included in Yahoo Finance. As an argument, we need to pass the url where the table is included under the Yahoo Finance site. The returning table will be part of a Python list. Therefore, we index the first element of the list to get below DataFrame:

Scrape Yahoo Finance using Python
Scrape Yahoo Finance using Python

As you see in the DataFrame above, we get the name of key executives from Apple and the corresponding pay check. And if we compare to the information included in Yahoo Finance, we can see that it matches:

Extracting Financials from Yahoo
Extracting Financials from Yahoo

Scrapping Key Statistics from Yahoo Finance

Now, lets move to scrape more attractive financials. We can see them under the Statistics section under the Yahoo site.

Note that in this case, we use f-strings to pass the company variable into the url as an argument.

company = 'AAPL'
statistics = pd.read_html(f'https://finance.yahoo.com/quote/{company}/key-statistics?p={company}')

print(statistics)

###outcome
[                    Unnamed: 0 As of Date: 9/12/2020Current  ... 9/30/2019 6/30/2019
 0      Market Cap (intraday) 5                        1.94T  ...   995.15B   910.64B
 1           Enterprise Value 3                        1.96T  ...     1.01T   943.18B
 2                 Trailing P/E                        34.42  ...     19.01     16.58
 3                Forward P/E 1                        28.41  ...     17.27     15.97
 4  PEG Ratio (5 yr expected) 1                         2.70  ...      2.04      1.45
 5            Price/Sales (ttm)                         7.35  ...      4.09      3.68
 6             Price/Book (mrq)                        26.85  ...     10.32      8.47
 7   Enterprise Value/Revenue 3                         7.16  ...     15.76     17.53
 8    Enterprise Value/EBITDA 6                        23.65  ...     50.16     60.04
 
 [9 rows x 7 columns],                           0        1
 0         Beta (5Y Monthly)     1.28
 1          52-Week Change 3  103.73%

As we see above, our returning Python list contains different tables. If we print the length of the list, we see that we have 10 different tables.

print(len(statistics))
###outcome
10

Finally, we can extract all tables by indexing each of the elements in the Python list and adding them to a Python variable:

company = 'AAPL'
statistics = pd.read_html(f'https://finance.yahoo.com/quote/{company}/key-statistics?p={company}')

valuation_Measures = statistics[0]
stock_Price_History = statistics[1]
   
share_Statistics = statistics[2]

dividend_Info = statistics[3]
profitability_Info = statistics[5]
management_Efectiveness = statistics[6]

income_Statement = statistics[7]
balance_Sheet = statistics[8]
cash_Flow = statistics[9]

print(valuation_Measures)

Let’s print as a sample the valuation measures to see what we get:

Yahoo Finance Key statistics Python
Yahoo Finance Key statistics Python

Great, we get some key financial ratios from Apple. You can check by your own that our numbers match what is included in Yahoo Finance. Here is the link.

Furthermore, by simply changing the value of the variable company, we can extract financials from any other company that we are interested in. For instance, by replacing AAPL by MSFT, we will get the exact same financial information for Microsoft.

From Yahoo Finance to Excel using Python

Now that we have some key financials stored in a Pandas DataFrame, we can use the full Pandas potential to analyse data and help further equity analysis.

On top of that, we can also store the Pandas DataFrames into Excel files by using the Pandas command pd.to_excel(). It is as simple as shown below:

valuation_Measures.to_excel('nameofthefile.xlsx')

As an argument of the function, we need to pass the name of the Excel file. After running the code, a new Excel file is created in our working folder containing the selected DataFrame. In this case, the valuation measure financials.

If you have enjoyed this article, feel free to have a look at some of my other posts on Python for financial a analysis:

2 thoughts on “Scrape Yahoo Finance using Python

  1. Spot on with this write-up, I actually suppose this website needs far more consideration. I抣l in all probability be again to read rather more, thanks for that info.

Comments are closed.