In risk analysis, researchers often fit many candidate distributions to frequency and severity data and select based on goodness-of-fit. In a new study published in Risk Sciences, the authors cautioned that this can foster overfitting through idiosyncratic functional forms—excessive complexity or contrivance—not just too many parameters.
“To that end, we proposed a formal mathematical measure for assessing the versatility of frequency and severity distributions prior to their application,” says co-author Michael Powers. “We then illustrated this approach by computing and comparing values of the versatility measure for a variety of probability distributions commonly used in risk analysis.”
The measure frames versatility as the opposite of idiosyncrasy, combining functional simplicity (linked to Shannon entropy) and adaptability (captured through Fisher information). Formally, the authors derived a normalized, Bayesian Fisher-information-based expression and refine it by requiring the least-compressible parameterization to avoid artificial inflation from reparameterization.
“Applied to common severity distributions (Exponential, Gamma, Weibull, Pareto Type II, Lognormal) and discrete frequency models (Negative Binomial, Discrete Weibull, Waring, Generalized Poisson), the results show parameters controlling tail behavior (e.g., Weibull τ, Pareto-II α) have larger effects on versatility,” shares co-author Jiaxin Xu. “Several models exhibit noticeably lower versatility (e.g., Pareto Type II; Lognormal with σ = 1; certain Waring and Negative Binomial cases), indicating potential idiosyncrasy.”
In simulations comparing Pareto Type II and Inverse Gamma, both models maintained low Type 1 errors, but Pareto Type II showed persistent difficulty rejecting Inverse Gamma data, with Type 2 error rates above 0.80 even at large samples, reflecting limited adaptability near the lower sample-space boundary.
“Based on our findings, a new approach was proposed to reduce the idiosyncratic overfitting of risk-analytic models by ensuring that, for a fixed number of parameters, the probability distributions used to model frequencies and severities are reasonably versatile,” adds Powers.
###
References
DOI
10.1016/j.risk.2025.100017
Original Source URL
https://doi.org/10.1016/j.risk.2025.100017
Journal
Risk Sciences