Scrape SEC Edgar Balance Sheet with Python

Lets continue our series of posts on how to scrape SEC Edgar filings using Python. In this post we will scrape SEC Edgar balance sheet with Python. More in particular, we will extract Apple balance sheet.

So far, we have learned how to retrieve from SEC the quarterly and annual financial reports for any company. We also learned how to parse SEC financial reports to extract risk and company competitors.

In this post, we go a step further and we will start retrieving company financials. We will extract the balance sheet from Apple and convert in a Panda DataFrame.

We are going to build the code in a standard way so that it can be reused for any company without having to rewrite it.

Scrape SEC Edgar Balance Sheet with Python
Photo by Aleksandar Pasaric on Pexels.com

Retrieving Apple Balance Sheet from SEC

If you are not familiar with the Balance sheet concept, I recommend you to have a look at my previous post on what is a balance sheet.

First thing is to run the code from my previous post in order to retrieve Apple latest quarterly report. Once it is run, we will have the Apple quarterly report in a Beautiful Soup object ready to be parsed. And more important, we are ready to continue with this post. The original SEC filing in html format can be found here.

I recommend you to have a look at the SEC Edgar term and conditions before parsing their website. That will ensure that you know their conditions. Specially, do not make more than 10 requests per second.

I will shortly explain in a few bullet points how the code works. After the explanation, I will provide the whole code.

The Python script to extract company’s balance sheet from SEC will work as follows:

  • First, we will find the table containing Apple balance sheet in the SEC Apple filing. (Remember to run the code on my previous post to have the company filling stored in a beautiful soup object named soup).
  • Second, we create an empty dictionary using the year and quarter as the key. Then, a nested dictionary that will contain the balance sheet items and values.
  • Third, loop through each of the rows in the balance sheet table. Then, get the name of the table row index (1) using row.find(“ix:nonfraction”). The html element Ix:nonfraction is the html identification that we use to extract the name of the balance sheet line item and the respective value (e.g. Cash and Cash Equivalents)
  • Next, we use try and except to get the name of the balance sheet item. It will extract the name including us-gaap. This naming convention is standardise and will be the same for all companies and years. That way, we can parse the quarterly reports from any company without having to rewrite the code.
  • To extract the value, we need to extract the text from the ix:nonfraction element (2). Note that there are a couple of try and except in order to format the number. For example, getting rid of the brackets in case of negative numbers and from thousand’s separator.
  • Next, we append the balance sheet line items and values into our dictionary (3). They key of the dictionary will be the quarter and the year of the filing. Then, we have a nested dictionary containing the balance sheet line item and the respective value. If we print it out, we should get something similar to below:
  • {‘2020QTR2’: {‘1 ASSETS:’: ”, ‘2 Current assets:’: ”, ‘3 Cash and cash equivalents’: 40174.0, ‘4 Marketable securities’: 53877.0….
  • Finally, we convert the dictionary into a Pandas DataFrame (4).

As a sidetone, if you are running a version of Python that is lower than 3.6, you will realise that the order of the dictionary keys is not preserved when covered into a Pandas DataFrame. This is a mini bug that was solved with Python 3.7. Therefore, you only need to add below 4 lines of codes if you are running version 3.6 or an earlier version. Otherwise, the order of the balance sheet items will look funny.

 column_order = list(dict_value[str(year) + quarter].keys())

  BS = BS.T
  BS = BS[column_order]
  BS = BS.T

Great, so we have covered the main points of the code. As promised, see below the whole script to Scrape SEC Edgar Balance Sheet with Python.

def balance_Sheet(soup,year,quarter):
  table = soup.find(text="Current assets:").find_parent("table")
  items = []
  values = []

  dict_value = {}
  dict_value[str(year) + quarter] = {}
  name_key = 0

  for row in table.findAll('tr')[3:]:
      #1 Get name of the Table Row Index
          try:
            item = row.find("ix:nonfraction").attrs['name']
            item = item.replace('us-gaap:','')
          except:
            continue
      #2 Get value and transform the number to an appropiate format
          try:
            value = row.find("ix:nonfraction").text
            
            try:
              value = value.replace(',','')
            except:
              
              value
            try:
              value = value.replace('(','-')
            except:

              value
            try:
              #if there is a sign, we need to add it
              sign = row.find("ix:nonfraction")["sign"]
              value = sign + value 
            except:
              value
            try:
              value = float(value)
            except:
              print(value + ' 5')
          except:
            value = ''
        #3 Add elements to dictionaries
          dict_value[str(year) + quarter][item] = value
          
 #4Convert to DataFrames
  BS = pd.DataFrame(dict_value)

  #To keep the column order. No needed if Python version is higher than 3.7
  #for dict value we need to have 2020QTR3
  column_order = list(dict_value[str(year) + quarter].keys())

  BS = BS.T
  BS = BS[column_order]
  BS = BS.T

  return BS

balance_Sheet(soup,year,quarter)

Wrapping Up

After executing the script, we get below pandas DataFrame. As you can observe, the balance sheet item names have a standarise format. This will allow us to extract balance sheets from any other companies in SEC without having to change the code.

In the next post, we will take this one step further to merge different quarters into a single pandas DataFrame. This will be super useful to enable comparison across periods.

Thanks for reading! If you like the content of my blog on Python for Finance, please support and share on your social media.

Scrape SEC Edgar Balance Sheet with Python
Python Scraping SEC Edgar Balance Sheet

3 thoughts on “Scrape SEC Edgar Balance Sheet with Python

  1. Thank you so much! I’m following the code trying to learn. Looking forward to your follow up on merging qtrs.

Comments are closed.