Replicating the income statement as it appears on the SEC form without extra revenue segments being pulled in 

I am trying to replicate the income statement as shown in the company's 10-K (or 10-Q) so I can perform segment analysis.
For example, for Tesla (found [here](https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm)) we want 

![tsla-income-statement](https://github.com/janlukasschroeder/sec-api/assets/140670320/fd62a933-ba88-40e1-81dc-b53036650d3b)

where we can see revenue is broken out by segment. However, it is difficult (if not impossible to) recreate this table without extra data being pull in. My code as follows will demonstrate this.

```
import pandas as pd

url_10k_tsla = 'https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm'
xbrl_tsla_json = xbrlApi.xbrl_to_json(htm_url=url_10k_tsla)


# Initialize an empty DataFrame
df = pd.DataFrame(columns=['period'])

# Iterate through each key in the JSON data
for key, values in xbrl_tsla_json['StatementsOfIncome'].items():

    # See if there are segments first
    has_segments = False
    for value in values:
        if 'segment' in value:
            has_segments = True

    
    for value in values:
        # Extract period and value
        period = value['period']['startDate'] + ' - ' + value['period']['endDate']

        # if there are segments 
        if has_segments:
            # Get the value out of the segment or call it total.
            segment_name = value.get('segment', {}).get('value', 'Total').split(':')[-1]
            value_key = key + ' - ' + segment_name
        # There are no segments
        else:
            segment_name = value.get('value')
            value_key = key
        
        # Check if period already exists in DataFrame
        if period in df['period'].values:
            # Update existing row
            df.loc[df['period'] == period, value_key] = value['value']
        else:
            # Add new row
            row = {'period': period}
            row[value_key] = value['value']
            df = pd.concat([df, pd.DataFrame([row])], ignore_index=True)

df.set_index('period', inplace=True)

# convert all DataFrame columns to the int64 dtype
df = df[df.columns[~df.columns.isin(['EarningsPerShareBasic', 'EarningsPerShareDiluted'])]].astype(np.float64)

# Round all columns to millions except the EarningsPerShareBasic and EarningsPerShareDiluted	
df = df[df.columns[~df.columns.isin(['EarningsPerShareBasic', 'EarningsPerShareDiluted'])]].divide(1_000_000)

df.T
```

will produce the following pandas table:

![image](https://github.com/janlukasschroeder/sec-api/assets/140670320/35c66d1c-c5a2-425e-9d15-ee2626aff806)

In particular, the table lists two extra revenue sources, RevenueFromContractWithCustomerExcludingAssessedTax - EnergyGenerationAndStorageSalesMember and RevenueFromContractWithCustomerExcludingAssessedTax - SalesAndServicesMember as well as two extra gross profits (although I am not as concerned with these). These two extra revenue sources come from the 'Revenue Recognition - Revenue by source' table which is separate from the income statement. Is there a way to keep these extra revenue sources from being pulled in (without hard coding their names since I want to apply this technique to many companies) so that my code replicates exactly the income statement as it appears in the filing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replicating the income statement as it appears on the SEC form without extra revenue segments being pulled in #41

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Replicating the income statement as it appears on the SEC form without extra revenue segments being pulled in #41

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions