Skip to content

Conversation

@jdebacker
Copy link
Member

This PR updates how the KDE of the income distribution is handled. It does 3 main things:

  1. Reinstates the bw_method argument.
  2. Updates how f and f' are computed from the KDE.
  3. Tries to draw the tail from a Pareto distribution before the KDE is estimated. This would be an alternative to stitching a Pareto tail onto the KDE. It's a work in progress.

@jdebacker
Copy link
Member Author

Here's an example of the g_z estimated with these changes an a bw_method=2:
gz_numerical_all

@codecov-commenter
Copy link

codecov-commenter commented Mar 27, 2025

Codecov Report

Attention: Patch coverage is 5.55556% with 34 lines in your changes missing coverage. Please review.

Project coverage is 41.02%. Comparing base (10985c1) to head (dd29dc4).
Report is 57 commits behind head on main.

Files with missing lines Patch % Lines
iot/inverse_optimal_tax.py 5.55% 34 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (10985c1) and HEAD (dd29dc4). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (10985c1) HEAD (dd29dc4)
unittests 3 2
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #37       +/-   ##
===========================================
- Coverage   72.81%   41.02%   -31.79%     
===========================================
  Files           3        3               
  Lines         103      195       +92     
===========================================
+ Hits           75       80        +5     
- Misses         28      115       +87     
Flag Coverage Δ
unittests 41.02% <5.55%> (-31.79%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jdebacker
Copy link
Member Author

@john-p-ryan I've added some code to the compute_income_dist to splice a pareto tail to the KDE. Right now, the numerical approach is commented out and the analytical approach is used.

Here's some code I was using to test the results:

# %%
import numpy as np
import pandas as pd
from iot import iot_user

# %%
biden2020_path = "https://raw.githubusercontent.com/PSLmodels/examples/main/psl_examples/taxcalc/Biden2020.json"

b = iot_user.iot_comparison(
    policies=[biden2020_path],
    labels=["Biden 2020"],
    years=[2021],
    dist_type="kde",  # "log_normal",
    kde_bw=1.5,
    data="CPS",
)

# %%
# Plot theta_z
theta_plot = b.plot(var="theta_z")
theta_plot.show()


# %%
# Plot g_z
theta_plot = b.plot(var="g_z")
theta_plot.show()

# %%
# Plot f
theta_plot = b.plot(var="f")
theta_plot.show()

# %%
# Plot f
theta_plot = b.plot(var="f_prime")
theta_plot.show()

# %%
# plot income dist
# NOTE: this is not a function of the dist_type, just raw data
inc_plot = b.SaezFig2(upper_bound=2_000_000)
inc_plot.show()
# %%

A few notes:

  • Even when f() and f'() look smooth, I see a kind in theta_z and thus g_z. JJZ also have a bit of a kink in theta_z (see Fig 1), but it's not a large and it's not noticeable in the g_z. Given that f and f' are smooth, it's hard to know how to do anything further to make theta_z smoother.
  • I cannot get this to work if the kde_bw arg is >=3. And the lower this value, the "smoother" things look.

@jdebacker
Copy link
Member Author

Theta:
theta

@jdebacker
Copy link
Member Author

With latest changes to this branch, here's what the results are looking like (with a kde_bw=1.5):

theta_z
f_z
g_z

@john-p-ryan thoughts? In my view, $\theta_z$ and $f(z)$ look good. The patterns and scale of the $g_z$ aren't bad, although the kink from the stitched Pareto tail is more apparent there then I'd like.

@john-p-ryan
Copy link
Contributor

@jdebacker I think this looks really good. Just so I'm understanding, what this does is it directly calculates the Pareto $\alpha$ using the value and derivative of the KDE at the cutoff point? I had considered doing something like this but my main concern was that the data wasn't being used to calculate the $\alpha$ parameter. However, if the $\alpha$ you get isn't too far off from the empirical value, then this is not necessarily an issue. Do you know what the $\alpha$ is in this version?

@jdebacker
Copy link
Member Author

jdebacker commented May 16, 2025

@john-p-ryan asks:

I'm understanding, what this does is it directly calculates the Pareto
$\alpha$ using the value and derivative of the KDE at the cutoff point?

That's correct. The value it gives does vary with the KDE. But the KDE with a bandwidth of 1.5 (which are used for the plots above) gives an $\alpha=4.6$.

It did seem that a higher bandwidth would give less curvature on the KDE at the cutoff and the $\alpha$ was not as high.

@jdebacker
Copy link
Member Author

@john-p-ryan Here's what the same plots look like with a cutoff of $200k (rather than $350k) to splice the Pareto tail.

The $\alpha$ in this case is about 1.5.

f_200k
theta_cut200k
gz_200k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants