Extreme Probability Modeling
The module from probextreme import frequentist_extreme as fe allows to analyse a dataset under the scope of Extreme Value Theory using a frequentist approach.
Basic Usage Example (freqentist approach)
import pandas as pd
from probextreme import frequentist_extreme as fe
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# import you data. Here we assume the data contain a column called 'rain'. Index must be datetime.
df = pd.read_csv('mydata.csv')
# Initialize with the time series df.rain
ev_rain = fe.extreme_values(df.rain, BM_window='365.25D', POT_threshold=45, mtd='48H')
ev_rain.find_POT_threshold()
ev_rain.plot_extremes()
ev_rain.get_return_levels(return_periods=np.arange(2, 101, 2))
ev_rain.return_periods_CI = np.arange(2, 101, 2)
# Bootstrapping to evaluate uncertainty of fit
ev_rain.execute_bootstrapping(n_iterations=1000)
ev_rain.print_summary()
fig, ax = plt.subplots(2, 1, figsize=(12, 8))
ev_rain.plot_return_levels(ax[0])
ev_rain.plot_distribution(ax[1])
Things to be aware
- The argument
BM_windowis a Pandas freq type of variable. If using365.25D, it will use the first timestamp of the timeseries as origin for the Block Maxima computation, whereas if using1Y, it will use a calendar year as reference. - If the distribution 'GEV' does not converge during the fit, try using the argument
locequal thelocfrom the Gumbel distribution fit. Loc will then be used as initial value for the fit and is more likely to converge on a credible fit.
Resources
Coles, S., Bawa, J., Trenner, L., & Dorazio, P. (2001). An introduction to statistical modeling of extreme values (Vol. 208, p. 208). London: Springer. doi: 10.1007/978-1-4471-3675-0