API References¶
Compute best fit to your empirical distribution for 89 different theoretical distributions using the Residual Sum of Squares (RSS) estimates.
-
class
distfit.distfit.
distfit
(method='parametric', alpha=0.05, multtest='fdr_bh', bins=50, bound='both', distr='popular', smooth=None, n_perm=10000)¶ Probability density function fitting across 89 univariate distributions to non-censored data by residual sum of squares (RSS), making plots, and hypothesis testing.
Probability density fitting across 89 univariate distributions to non-censored data by Residual Sum of Squares (RSS), and hypothesis testing.
Example
>>> from distfit import distfit >>> >>> # Create dataset >>> X = np.random.normal(0, 2, 1000) >>> y = [-8,-6,0,1,2,3,4,5,6] >>> >>> # Set parameters >>> # Default method is set to parameteric models >>> dist = distfit() >>> # In case of quantile >>> dist = distfit(method='quantile') >>> # In case of quantile >>> dist = distfit(method='percentile') >>> # Fit using method >>> model_results = dist.fit_transform(X) >>> dist.plot() >>> >>> # Make prediction >>> results = dist.predict(y) >>> dist.plot()
- Parameters
method (str, default: 'parametric') – Specify the method type: ‘parametric’,’quantile’,’percentile’
alpha (float, default: 0.05) – Significance alpha.
multtest (str, default: 'fdr_bh') – None, ‘bonferroni’, ‘sidak’, ‘holm-sidak’, ‘holm’, ‘simes-hochberg’, ‘hommel’, ‘fdr_bh’, ‘fdr_by’, ‘fdr_tsbh’, ‘fdr_tsbky’
bins (int, default: 50) – Bin size to determine the empirical historgram.
bound (str, default: 'both') – Set the directionality to test for significance. Upperbounds = ‘up’, ‘high’ or ‘right’, whereas lowerbounds = ‘down’, ‘low’ or ‘left’
distr (str, default: 'popular') – The (set) of distribution to test. A set of distributions can be tested by: ‘popular’, ‘full’, or specify the theoretical distribution: ‘norm’, ‘t’. See docs for more information about ‘popular’ and ‘full’.
smooth (int, default: None) – Smoothing the histogram can help to get a better fit when there are only few samples available.
n_perm (int, default: 10000) – Number of permutations to model null-distribution in case of method is “quantile”
- Returns
object.
method (str) – Specified method for fitting and predicting.
alpha (float) – Specified cut-off for P-value significance.
bins (int) – Number of bins specified to create histogram.
bound (str) – Specified testing directionality of the distribution.
distr (str) – Specified distribution or a set of distributions.
multtest (str) – Specified multiple test correction method.
-
fit
(verbose=3)¶ Collect the required distribution functions.
- Parameters
verbose (int [1-5], default: 3) – Print information to screen. A higher number will print more.
- Returns
Object.
self.distributions (functions) – list of functions containing distributions.
-
fit_transform
(X, verbose=3)¶ Fit best scoring theoretical distribution to the empirical data (X).
- Parameters
X (array-like) – Set of values belonging to the data
verbose (int [1-5], default: 3) – Print information to screen. A higher number will print more.
- Returns
dict.
model (dict) – dict containing keys with distribution parameters RSS : Residual Sum of Squares name : distribution name distr : distribution function params : all kind of parameters loc : loc function parameter scale : scale function parameter arg : arg function parameter
summary (list) – Residual Sum of Squares
histdata (tuple (observed, bins)) – tuple containing observed and bins for data X in the histogram.
size (int) – total number of elements in for data X
-
load
(filepath, verbose=3)¶ Load learned model.
- Parameters
filepath (str) – Pathname to stored pickle files.
verbose (int, optional) – Show message. A higher number gives more information. The default is 3.
- Returns
- Return type
Object.
-
plot
(title='', figsize=10, 8, xlim=None, ylim=None, verbose=3)¶ Make plot.
- Parameters
title (String, optional (default: '')) – Title of the plot.
figsize (tuple, optional (default: (10,8))) – The figure size.
xlim (Float, optional (default: None)) – Limit figure in x-axis.
ylim (Float, optional (default: None)) – Limit figure in y-axis.
verbose (Int [1-5], optional (default: 3)) – Print information to screen.
- Returns
- Return type
tuple (fig, ax)
-
plot_summary
(n_top=None, figsize=15, 8, ylim=None, verbose=3)¶ Plot summary results.
- Parameters
n_top (int, optional) – Show the top number of results. The default is None.
figsize (tuple, optional (default: (10,8))) – The figure size.
ylim (Float, optional (default: None)) – Limit figure in y-axis.
verbose (Int [1-5], optional (default: 3)) – Print information to screen.
- Returns
- Return type
tuple (fig, ax)
-
predict
(y, verbose=3)¶ Compute probability for response variables y, using the specified method.
Computes P-values for [y] based on the fitted distribution from X. The empirical distribution of X is used to estimate the loc/scale/arg parameters for a theoretical distribution in case method type is
parametric
.- Parameters
y (array-like) – Values to be predicted.
model (dict, default : None) – The model created by the .fit() function.
verbose (int [1-5], default: 3) – Print information to screen. A higher number will print more.
- Returns
Object.
y_pred (list of str) – prediction of bounds [upper, lower] for input y, using the fitted distribution X.
y_proba (list of float) – probability for response variable y.
df (pd.DataFrame) – Dataframe containing the predictions in a structed manner.
-
save
(filepath, verbose=3)¶ Save learned model in pickle file.
- Parameters
filepath (str) – Pathname to store pickle files.
verbose (int, optional) – Show message. A higher number gives more informatie. The default is 3.
- Returns
- Return type
object
-
transform
(X, verbose=3)¶ Determine best model for input data X.
The input data X can be modellend in two manners:
- parametric
In the parametric case, the best fit on the data is determined using the Residual Sum of Squares approach (RSS) for the specified distributions. Based on the best distribution-fit, the confidence intervals (CII) can be determined for later usage in the
predict()
function.- quantile
In the quantile case, the data is ranked and the top/lower quantiles are determined.
- Parameters
X (array-like) – The Null distribution or background data is build from X.
verbose (int [1-5], default: 3) – Print information to screen. A higher number will print more.
- Returns
Object.
model (dict) – dict containing keys with distribution parameters RSS : Residual Sum of Squares name : distribution name distr : distribution function params : all kind of parameters loc : loc function parameter scale : scale function parameter arg : arg function parameter
summary (list) – Residual Sum of Squares
histdata (tuple (observed, bins)) – tuple containing observed and bins for data X in the histogram.
size (int) – total number of elements in for data X