Skip to content

Replicating the income statement as it appears on the SEC form without extra revenue segments being pulled in  #41

@ghost

Description

I am trying to replicate the income statement as shown in the company's 10-K (or 10-Q) so I can perform segment analysis.
For example, for Tesla (found here) we want

tsla-income-statement

where we can see revenue is broken out by segment. However, it is difficult (if not impossible to) recreate this table without extra data being pull in. My code as follows will demonstrate this.

import pandas as pd

url_10k_tsla = 'https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm'
xbrl_tsla_json = xbrlApi.xbrl_to_json(htm_url=url_10k_tsla)


# Initialize an empty DataFrame
df = pd.DataFrame(columns=['period'])

# Iterate through each key in the JSON data
for key, values in xbrl_tsla_json['StatementsOfIncome'].items():

    # See if there are segments first
    has_segments = False
    for value in values:
        if 'segment' in value:
            has_segments = True

    
    for value in values:
        # Extract period and value
        period = value['period']['startDate'] + ' - ' + value['period']['endDate']

        # if there are segments 
        if has_segments:
            # Get the value out of the segment or call it total.
            segment_name = value.get('segment', {}).get('value', 'Total').split(':')[-1]
            value_key = key + ' - ' + segment_name
        # There are no segments
        else:
            segment_name = value.get('value')
            value_key = key
        
        # Check if period already exists in DataFrame
        if period in df['period'].values:
            # Update existing row
            df.loc[df['period'] == period, value_key] = value['value']
        else:
            # Add new row
            row = {'period': period}
            row[value_key] = value['value']
            df = pd.concat([df, pd.DataFrame([row])], ignore_index=True)

df.set_index('period', inplace=True)

# convert all DataFrame columns to the int64 dtype
df = df[df.columns[~df.columns.isin(['EarningsPerShareBasic', 'EarningsPerShareDiluted'])]].astype(np.float64)

# Round all columns to millions except the EarningsPerShareBasic and EarningsPerShareDiluted	
df = df[df.columns[~df.columns.isin(['EarningsPerShareBasic', 'EarningsPerShareDiluted'])]].divide(1_000_000)

df.T

will produce the following pandas table:

image

In particular, the table lists two extra revenue sources, RevenueFromContractWithCustomerExcludingAssessedTax - EnergyGenerationAndStorageSalesMember and RevenueFromContractWithCustomerExcludingAssessedTax - SalesAndServicesMember as well as two extra gross profits (although I am not as concerned with these). These two extra revenue sources come from the 'Revenue Recognition - Revenue by source' table which is separate from the income statement. Is there a way to keep these extra revenue sources from being pulled in (without hard coding their names since I want to apply this technique to many companies) so that my code replicates exactly the income statement as it appears in the filing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions