Updates to KDE for income distribution #37

jdebacker · 2025-03-27T22:40:59Z

This PR updates how the KDE of the income distribution is handled. It does 3 main things:

Reinstates the bw_method argument.
Updates how f and f' are computed from the KDE.
Tries to draw the tail from a Pareto distribution before the KDE is estimated. This would be an alternative to stitching a Pareto tail onto the KDE. It's a work in progress.

jdebacker · 2025-03-27T22:41:57Z

Here's an example of the g_z estimated with these changes an a bw_method=2:

codecov-commenter · 2025-03-27T22:43:23Z

Codecov Report

Attention: Patch coverage is 5.55556% with 34 lines in your changes missing coverage. Please review.

Project coverage is 41.02%. Comparing base (10985c1) to head (dd29dc4).
Report is 57 commits behind head on main.

Files with missing lines	Patch %	Lines
iot/inverse_optimal_tax.py	5.55%	34 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (10985c1) and HEAD (dd29dc4). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (10985c1) HEAD (dd29dc4)

unittests 3 2

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #37       +/-   ##
===========================================
- Coverage   72.81%   41.02%   -31.79%     
===========================================
  Files           3        3               
  Lines         103      195       +92     
===========================================
+ Hits           75       80        +5     
- Misses         28      115       +87

Flag	Coverage Δ
unittests	`41.02% <5.55%> (-31.79%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jdebacker · 2025-04-01T14:58:49Z

@john-p-ryan I've added some code to the compute_income_dist to splice a pareto tail to the KDE. Right now, the numerical approach is commented out and the analytical approach is used.

Here's some code I was using to test the results:

# %%
import numpy as np
import pandas as pd
from iot import iot_user

# %%
biden2020_path = "https://raw.githubusercontent.com/PSLmodels/examples/main/psl_examples/taxcalc/Biden2020.json"

b = iot_user.iot_comparison(
    policies=[biden2020_path],
    labels=["Biden 2020"],
    years=[2021],
    dist_type="kde",  # "log_normal",
    kde_bw=1.5,
    data="CPS",
)

# %%
# Plot theta_z
theta_plot = b.plot(var="theta_z")
theta_plot.show()


# %%
# Plot g_z
theta_plot = b.plot(var="g_z")
theta_plot.show()

# %%
# Plot f
theta_plot = b.plot(var="f")
theta_plot.show()

# %%
# Plot f
theta_plot = b.plot(var="f_prime")
theta_plot.show()

# %%
# plot income dist
# NOTE: this is not a function of the dist_type, just raw data
inc_plot = b.SaezFig2(upper_bound=2_000_000)
inc_plot.show()
# %%

A few notes:

Even when f() and f'() look smooth, I see a kind in theta_z and thus g_z. JJZ also have a bit of a kink in theta_z (see Fig 1), but it's not a large and it's not noticeable in the g_z. Given that f and f' are smooth, it's hard to know how to do anything further to make theta_z smoother.
I cannot get this to work if the kde_bw arg is >=3. And the lower this value, the "smoother" things look.

jdebacker · 2025-05-13T16:50:30Z

Theta:

jdebacker · 2025-05-15T19:35:07Z

With latest changes to this branch, here's what the results are looking like (with a kde_bw=1.5):

@john-p-ryan thoughts? In my view, $\theta_z$ and $f(z)$ look good. The patterns and scale of the $g_z$ aren't bad, although the kink from the stitched Pareto tail is more apparent there then I'd like.

john-p-ryan · 2025-05-16T13:24:26Z

@jdebacker I think this looks really good. Just so I'm understanding, what this does is it directly calculates the Pareto $\alpha$ using the value and derivative of the KDE at the cutoff point? I had considered doing something like this but my main concern was that the data wasn't being used to calculate the $\alpha$ parameter. However, if the $\alpha$ you get isn't too far off from the empirical value, then this is not necessarily an issue. Do you know what the $\alpha$ is in this version?

jdebacker · 2025-05-16T13:32:22Z

@john-p-ryan asks:

I'm understanding, what this does is it directly calculates the Pareto
$\alpha$ using the value and derivative of the KDE at the cutoff point?

That's correct. The value it gives does vary with the KDE. But the KDE with a bandwidth of 1.5 (which are used for the plots above) gives an $\alpha=4.6$.

It did seem that a higher bandwidth would give less curvature on the KDE at the cutoff and the $\alpha$ was not as high.

jdebacker · 2025-05-24T20:44:16Z

@john-p-ryan Here's what the same plots look like with a cutoff of $200k (rather than $350k) to splice the Pareto tail.

The $\alpha$ in this case is about 1.5.

jdebacker added 2 commits March 27, 2025 18:36

edits to kde

0136489

Merge remote-tracking branch 'upstream/main' into kde

eb14d8e

jdebacker added 3 commits April 1, 2025 10:47

splice pareto

530fbe0

start g_z at ndex 0 regardless of method

107779c

format

0534bde

don't divide by integral

dd29dc4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to KDE for income distribution #37

Updates to KDE for income distribution #37

Uh oh!

jdebacker commented Mar 27, 2025

Uh oh!

jdebacker commented Mar 27, 2025

Uh oh!

codecov-commenter commented Mar 27, 2025 •

edited

Loading

Uh oh!

jdebacker commented Apr 1, 2025

Uh oh!

jdebacker commented May 13, 2025

Uh oh!

jdebacker commented May 15, 2025

Uh oh!

john-p-ryan commented May 16, 2025

Uh oh!

jdebacker commented May 16, 2025 •

edited

Loading

Uh oh!

jdebacker commented May 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Updates to KDE for income distribution #37

Are you sure you want to change the base?

Updates to KDE for income distribution #37

Uh oh!

Conversation

jdebacker commented Mar 27, 2025

Uh oh!

jdebacker commented Mar 27, 2025

Uh oh!

codecov-commenter commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jdebacker commented Apr 1, 2025

Uh oh!

jdebacker commented May 13, 2025

Uh oh!

jdebacker commented May 15, 2025

Uh oh!

john-p-ryan commented May 16, 2025

Uh oh!

jdebacker commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdebacker commented May 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Mar 27, 2025 •

edited

Loading

jdebacker commented May 16, 2025 •

edited

Loading