API References

Compute best fit to your empirical distribution for 89 different theoretical distributions using the Residual Sum of Squares (RSS) estimates.

class distfit.distfit.distfit(method='parametric', alpha=0.05, multtest='fdr_bh', bins=50, bound='both', distr='popular', smooth=None, n_perm=10000)

Probability density function fitting across 89 univariate distributions to non-censored data by residual sum of squares (RSS), making plots, and hypothesis testing.

Probability density fitting across 89 univariate distributions to non-censored data by Residual Sum of Squares (RSS), and hypothesis testing.

Example

>>> from distfit import distfit
>>>
>>> # Create dataset
>>> X = np.random.normal(0, 2, 1000)
>>> y = [-8,-6,0,1,2,3,4,5,6]
>>>
>>> # Set parameters
>>> # Default method is set to parameteric models
>>> dist = distfit()
>>> # In case of quantile
>>> dist = distfit(method='quantile')
>>> # In case of quantile
>>> dist = distfit(method='percentile')
>>> # Fit using method
>>> model_results = dist.fit_transform(X)
>>> dist.plot()
>>>
>>> # Make prediction
>>> results = dist.predict(y)
>>> dist.plot()
Parameters
  • method (str, default: 'parametric') – Specify the method type: ‘parametric’,’quantile’,’percentile’

  • alpha (float, default: 0.05) – Significance alpha.

  • multtest (str, default: 'fdr_bh') – None, ‘bonferroni’, ‘sidak’, ‘holm-sidak’, ‘holm’, ‘simes-hochberg’, ‘hommel’, ‘fdr_bh’, ‘fdr_by’, ‘fdr_tsbh’, ‘fdr_tsbky’

  • bins (int, default: 50) – Bin size to determine the empirical historgram.

  • bound (str, default: 'both') – Set the directionality to test for significance. Upperbounds = ‘up’, ‘high’ or ‘right’, whereas lowerbounds = ‘down’, ‘low’ or ‘left’

  • distr (str, default: 'popular') – The (set) of distribution to test. A set of distributions can be tested by: ‘popular’, ‘full’, or specify the theoretical distribution: ‘norm’, ‘t’. See docs for more information about ‘popular’ and ‘full’.

  • smooth (int, default: None) – Smoothing the histogram can help to get a better fit when there are only few samples available.

  • n_perm (int, default: 10000) – Number of permutations to model null-distribution in case of method is “quantile”

Returns

  • object.

  • method (str) – Specified method for fitting and predicting.

  • alpha (float) – Specified cut-off for P-value significance.

  • bins (int) – Number of bins specified to create histogram.

  • bound (str) – Specified testing directionality of the distribution.

  • distr (str) – Specified distribution or a set of distributions.

  • multtest (str) – Specified multiple test correction method.

fit(verbose=3)

Collect the required distribution functions.

Parameters

verbose (int [1-5], default: 3) – Print information to screen. A higher number will print more.

Returns

  • Object.

  • self.distributions (functions) – list of functions containing distributions.

fit_transform(X, verbose=3)

Fit best scoring theoretical distribution to the empirical data (X).

Parameters
  • X (array-like) – Set of values belonging to the data

  • verbose (int [1-5], default: 3) – Print information to screen. A higher number will print more.

Returns

  • dict.

  • model (dict) – dict containing keys with distribution parameters RSS : Residual Sum of Squares name : distribution name distr : distribution function params : all kind of parameters loc : loc function parameter scale : scale function parameter arg : arg function parameter

  • summary (list) – Residual Sum of Squares

  • histdata (tuple (observed, bins)) – tuple containing observed and bins for data X in the histogram.

  • size (int) – total number of elements in for data X

load(filepath, verbose=3)

Load learned model.

Parameters
  • filepath (str) – Pathname to stored pickle files.

  • verbose (int, optional) – Show message. A higher number gives more information. The default is 3.

Returns

Return type

Object.

plot(title='', figsize=10, 8, xlim=None, ylim=None, verbose=3)

Make plot.

Parameters
  • title (String, optional (default: '')) – Title of the plot.

  • figsize (tuple, optional (default: (10,8))) – The figure size.

  • xlim (Float, optional (default: None)) – Limit figure in x-axis.

  • ylim (Float, optional (default: None)) – Limit figure in y-axis.

  • verbose (Int [1-5], optional (default: 3)) – Print information to screen.

Returns

Return type

tuple (fig, ax)

plot_summary(n_top=None, figsize=15, 8, ylim=None, verbose=3)

Plot summary results.

Parameters
  • n_top (int, optional) – Show the top number of results. The default is None.

  • figsize (tuple, optional (default: (10,8))) – The figure size.

  • ylim (Float, optional (default: None)) – Limit figure in y-axis.

  • verbose (Int [1-5], optional (default: 3)) – Print information to screen.

Returns

Return type

tuple (fig, ax)

predict(y, verbose=3)

Compute probability for response variables y, using the specified method.

Computes P-values for [y] based on the fitted distribution from X. The empirical distribution of X is used to estimate the loc/scale/arg parameters for a theoretical distribution in case method type is parametric.

Parameters
  • y (array-like) – Values to be predicted.

  • model (dict, default : None) – The model created by the .fit() function.

  • verbose (int [1-5], default: 3) – Print information to screen. A higher number will print more.

Returns

  • Object.

  • y_pred (list of str) – prediction of bounds [upper, lower] for input y, using the fitted distribution X.

  • y_proba (list of float) – probability for response variable y.

  • df (pd.DataFrame) – Dataframe containing the predictions in a structed manner.

save(filepath, verbose=3)

Save learned model in pickle file.

Parameters
  • filepath (str) – Pathname to store pickle files.

  • verbose (int, optional) – Show message. A higher number gives more informatie. The default is 3.

Returns

Return type

object

transform(X, verbose=3)

Determine best model for input data X.

The input data X can be modellend in two manners:

parametric

In the parametric case, the best fit on the data is determined using the Residual Sum of Squares approach (RSS) for the specified distributions. Based on the best distribution-fit, the confidence intervals (CII) can be determined for later usage in the predict() function.

quantile

In the quantile case, the data is ranked and the top/lower quantiles are determined.

Parameters
  • X (array-like) – The Null distribution or background data is build from X.

  • verbose (int [1-5], default: 3) – Print information to screen. A higher number will print more.

Returns

  • Object.

  • model (dict) – dict containing keys with distribution parameters RSS : Residual Sum of Squares name : distribution name distr : distribution function params : all kind of parameters loc : loc function parameter scale : scale function parameter arg : arg function parameter

  • summary (list) – Residual Sum of Squares

  • histdata (tuple (observed, bins)) – tuple containing observed and bins for data X in the histogram.

  • size (int) – total number of elements in for data X