This repository demonstrates two methodologies for improving computational efficiency in FFRD stochastic storm transposition (SST) simulations:
-
k-Nearest Neighbors (k-NN) Subsampling
Reduces computational burden by selecting representative storm events while preserving multivariate dependence. We stratify in transformed Z-space (Gaussian copula) and subsample within each stratum, then reconstruct frequency curves using the law of total probability. -
Importance Sampling (IS)
Focuses computational effort on storm realizations that meaningfully intersect the watershed. Synthetic toy problems illustrate how adaptive and mixture-based IS can dramatically improve efficiency over uniform sampling by targeting regions of interest.
The k-NN subsampling approach transforms high-dimensional marginal flows into standardized normal variates (Z-space) via plotting positions. A Gaussian copula captures inter-site dependence in Z-space. We then stratify the joint Z-domain into probability bins, and for each bin, select the K nearest neighbors from a large Monte Carlo ensemble. Finally, we apply the law of total probability to reconstruct marginal and joint frequency curves.
-
Toy Example (
Test_kNN_SubSampling_ToyProblem)- Generate multivariate Normal samples at 5 synthetic sites.
- Convert each marginal to Weibull plotting positions and Z-space.
- Stratify joint Z into 10 bins and subsample K=100 events per bin via k-NN.
- Reconstruct AEP curves using weighted exceedance counts.
-
Case Study (
Test_kNN_SubSampling_Kanawha_14_sites)- Load 25,040 raw SST events across 14 sites.
- Estimate marginal plotting positions and fit a D-dimensional Gaussian copula.
- Stratify joint Z into 10 bins and subsample K=100 events per bin.
- Build site-specific flow frequency curves at pre-defined AEP levels.
- Multivariate Preservation: Maintains dependence structure via Gaussian copula in Z-space.
- Variance Reduction: Stratified subsampling reduces estimator variance compared to crude MC at equal sample size.
- Computational Savings: Only K·bins runs needed, instead of N, to approximate frequency curves.
- Curse of Dimensionality: In high D, nearest-neighbor neighborhoods become less meaningful without very large N (e.g., ≥100,000).
- Bin and K Selection: Empirical choice of bin count and K trades off bias vs. variance: too few neighbors → high variance; too many → biased smoothing.
- Copula Assumption: Gaussian copula may under-represent tail dependence present in true hydrologic extremes.
- Tail Strata Coverage: Extreme bins may lack sufficient samples, leading to duplicated neighbors or poor representation.
- Dimensionality Reduction: Apply PCA or t-SNE to Z-space before k-NN to focus on dominant variance directions [1][2].
- Adaptive Binning: Define strata so each contains roughly equal Monte Carlo counts for balanced variance.
- Alternative Sampling: Use clustering (e.g., k-means) to select representative centroids, or direct multivariate importance sampling to draw from tail-focused copula distributions.
- Tail-Copulas: Replace Gaussian copula with t-copula or extreme-value copulas for better tail dependence modeling [3].
- Loftsgaarden, D. O., & Quesenberry, C. P. (1965). A nonparametric estimate of a multivariate density function. Annals of Mathematical Statistics.
- Devroye, L., & Wagner, T. J. (1977). The L₁ convergence of nearest neighbor density estimates. Annals of Statistics.
- Salvadori, G., De Michele, C., Kottegoda, N. T., & Rosso, R. (2007). Extremes in hydrology: a review. Water Resources Research.
Importance Sampling (IS) reroutes sampling effort to high-impact regions (e.g., storms overlapping a watershed). By choosing an appropriate proposal distribution (often a truncated normal or adaptive mixture) IS can dramatically reduce the number of simulations needed for reliable estimates.
The test class Test_SST_ImportanceSampling includes the following regions:
Simple geometric integration of a rectangular watershed using:
Test_BasicMonteCarlo_Integration: baseline MC estimate using uniform sampling.Test_ImportanceSampling_Integration: basic IS with truncated normal proposals.Test_ImportanceSampling_Integration_ReducedSamples: IS with fewer samples.Test_Adaptive_ImportanceSampling_Integration_ReducedSamples: adaptive proposal update via weighted sample moments.
Storms are sampled with varying footprints and rainfall depth:
Test_BasicMonteCarlo_Parametric_Simulation: classic MC with variable storm sizes.Test_ImportanceSampling_Parametric_Simulation_Reduced_Samples: IS with truncated normal proposals.Test_Adaptive_ImportanceSampling_Parametric_Simulation_ReducedSamples: adaptive proposal using depth-weighted moment matching and EM-updated mixture weights.
Storm depth derived from overlap area and storm intensity:
Test_BasicMonteCarlo_Nonparametric_Simulation: classic MC simulation.Test_ImportanceSampling_Nonparametric_Simulation_Reduced_Samples: IS using fixed truncated normal proposals.Test_Adaptive_ImportanceSampling_Nonparametric_Simulation_ReducedSamples: adaptive IS with depth-based weighting and mixture modeling.
- Simple Toy Problems: Simple examples are provided to demonstrate the sampling techniques.
- Adaptive Moment‐Matching: Proposal updates via weighted sample moments [4].
- Mixture Proposals: EM‐based mixture weights for flexible tail coverage [5].
- Truncated Normal Sampling: Efficiently targets the watershed‐intersection region.
- Frequency Curve Generation: Computes depth‐AEP curves via weighted exceedance.
- All spatial domains are idealized (square basin and storms), as shown in the image above.
- These sampling approaches need to be extended to support real-world watershed geometry and rainfall data.
- Explore options for irregularly shaped watersheds and transposition domains.
- Cappé, O., Douc, R., Guillin, A., Marin, J.-M., & Robert, C. P. (2008). Adaptive importance sampling in general mixture classes. The Annals of Statistics, 36(4), 1947–1976.
- Bugallo, M. F., Elvira, V., Martino, L., Luengo, D., Míguez, J., & Djuric, P. M. (2017). Adaptive Importance Sampling: The Past, the Present, and the Future. IEEE Signal Processing Magazine, 34(4), 60–79.
- Numerics library suite:
Numerics.Data.StatisticsNumerics.DistributionsNumerics.MachineLearningNumerics.SamplingNumerics.Data
Provides utilities for multivariate distributions, stratified sampling, k-NN algorithms, and probability transforms.
GitHub: USACE-RMC/Numerics
Use MSTest (or compatible framework) to execute all [TestMethod] routines. Results are printed via Debug.WriteLine in CSV format (AEP,Value) for easy plotting.
- Metrics Reported: Estimated mean, standard error, and depth-frequency (or flow-frequency) curves.
- Visualization: Plot results on log-probability axes to assess tail performance.
Haden Smith
USACE Risk Management Center
[email protected]


