Release Notes
The CHANGELOG for the current development version is available at
https://github.com/rasbt/mlxtend/blob/master/docs/sources/CHANGELOG.md.
Version 0.9.1 (2017-11-19)
Downloads
New Features
- Added
mlxtend.evaluate.bootstrap_point632_score
to evaluate the performance of estimators using the .632 bootstrap. (#283) - New
max_len
parameter for the frequent itemset generation via theapriori
function to allow for early stopping. ([#270](https://github.co - m/rasbt/mlxtend/pull/270))
Changes
- All feature index tuples in
SequentialFeatureSelector
or now in sorted order. (#262) - The
SequentialFeatureSelector
now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994).
Note that this didn’t cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases.
(#262) - utils.Counter now accepts a name variable to help distinguish between multiple counters, time precision can be set with the ‘precision’ kwarg and the new attribute end_time holds the time the last iteration completed. (#278)
Bug Fixes
- Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. (#283)
Version 0.9.0 (2017-10-21)
Downloads
New Features
- Added
evaluate.permutation_test
, a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). (#250) - Added
'leverage'
and'conviction
as evaluation metrics to thefrequent_patterns.association_rules
function. (#246 & #247) - Added a
loadings_
attribute toPrincipalComponentAnalysis
to compute the factor loadings of the features on the principal components. (#251) - Allow grid search over classifiers/regressors in ensemble and stacking estimators. (#259)
- New
make_multiplexer_dataset
function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. (#263) - Added a new
BootstrapOutOfBag
class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. (#265) - The parameters for
StackingClassifier
,StackingCVClassifier
,StackingRegressor
,StackingCVRegressor
, andEnsembleVoteClassifier
can now be tuned using scikit-learn’sGridSearchCV
(#254 via James Bourbeau)
Changes
- The
'support'
column returned byfrequent_patterns.association_rules
was changed to compute the support of “antecedant union consequent”, and newantecedant support'
and'consequent support'
column were added to avoid ambiguity. (#245) - Allow the
OnehotTransactions
to be cloned via scikit-learn’sclone
function, which is required by e.g., scikit-learn’sFeatureUnion
orGridSearchCV
(via Iaroslav Shcherbatyi). (#249)
Bug Fixes
- Fix issues with
self._init_time
parameter in_IterativeModel
subclasses. (#256) - Fix imprecision bug that occurred in
plot_ecdf
when run on Python 2.7. (264) - The vectors from SVD in
PrincipalComponentAnalysis
are now being scaled so that the eigenvalues viasolver='eigen'
andsolver='svd'
now store eigenvalues that have the same magnitudes. (#251)
Version 0.8.0 (2017-09-09)
Downloads
New Features
- Added a
mlxtend.evaluate.bootstrap
that implements the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth) #232 SequentialFeatureSelecor
‘sk_features
now accepts a string argument “best” or “parsimonious” for more “automated” feature selection. For instance, if “best” is provided, the feature selector will return the feature subset with the best cross-validation performance. If “parsimonious” is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. #238
Changes
SequentialFeatureSelector
now usesnp.nanmean
over normal mean to support scorers that may returnnp.nan
#211 (via mrkaiser)- The
skip_if_stuck
parameter was removed fromSequentialFeatureSelector
in favor of a more efficient implementation comparing the conditional inclusion/exclusion results (in the floating versions) to the performances of previously sampled feature sets that were cached #237 ExhaustiveFeatureSelector
was modified to consume substantially less memory #195 (via Adam Erickson)
Bug Fixes
- Fixed a bug where the
SequentialFeatureSelector
selected a feature subset larger than then specified via thek_features
tuple max-value #213
Version 0.7.0 (2017-06-22)
Downloads
New Features
- New mlxtend.plotting.ecdf function for plotting empirical cumulative distribution functions (#196).
- New
StackingCVRegressor
for stacking regressors with out-of-fold predictions to prevent overfitting (#201via Eike Dehling).
Changes
- The TensorFlow estimator have been removed from mlxtend, since TensorFlow has now very convenient ways to build on estimators, which render those implementations obsolete.
plot_decision_regions
now supports plotting decision regions for more than 2 training features #189, via James Bourbeau).- Parallel execution in
mlxtend.feature_selection.SequentialFeatureSelector
andmlxtend.feature_selection.ExhaustiveFeatureSelector
is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large (#193, via @whalebot-helmsman). - Raise meaningful error messages if pandas
DataFrame
s or Python lists of lists are fed into theStackingCVClassifer
as afit
arguments (198). - The
n_folds
parameter of theStackingCVClassifier
was changed tocv
and can now accept any kind of cross validation technique that is available from scikit-learn. For example,StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3))
orStackingCVClassifier(..., cv=GroupKFold(n_splits=3))
(#203, via Konstantinos Paliouras).
Bug Fixes
SequentialFeatureSelector
now correctly accepts aNone
argument for thescoring
parameter to infer the default scoring metric from scikit-learn classifiers and regressors (#171).- The
plot_decision_regions
function now supports pre-existing axes objects generated via matplotlib’splt.subplots
. (#184, see example) - Made
math.num_combinations
andmath.num_permutations
numerically stable for large numbers of combinations and permutations (#200).
Version 0.6.0 (2017-03-18)
Downloads
New Features
- An
association_rules
function is implemented that allows to generate rules based on a list of frequent itemsets (via Joshua Goerner).
Changes
- Adds a black
edgecolor
to plots viaplotting.plot_decision_regions
to make markers more distinguishable from the background inmatplotlib>=2.0
. - The
association
submodule was renamed tofrequent_patterns
.
Bug Fixes
- The
DataFrame
index ofapriori
results are now unique and ordered. - Fixed typos in autompg and wine datasets (via James Bourbeau).
Version 0.5.1 (2017-02-14)
Downloads
New Features
- The
EnsembleVoteClassifier
has a newrefit
attribute that prevents refitting classifiers ifrefit=False
to save computational time. - Added a new
lift_score
function inevaluate
to compute lift score (via Batuhan Bardak). StackingClassifier
andStackingRegressor
support multivariate targets if the underlying models do (via kernc).StackingClassifier
has a newuse_features_in_secondary
attribute likeStackingCVClassifier
.
Changes
- Changed default verbosity level in
SequentialFeatureSelector
to 0 - The
EnsembleVoteClassifier
now raises aNotFittedError
if the estimator wasn’tfit
before callingpredict
. (via Anton Loss) - Added new TensorFlow variable initialization syntax to guarantee compatibility with TensorFlow 1.0
Bug Fixes
- Fixed wrong default value for
k_features
inSequentialFeatureSelector
- Cast selected feature subsets in the
SequentialFeautureSelector
as sets to prevent the iterator from getting stuck if thek_idx
are different permutations of the same combination (via Zac Wellmer). - Fixed an issue with learning curves that caused the performance metrics to be reversed (via ipashchenko)
- Fixed a bug that could occur in the
SequentialFeatureSelector
if there are similarly-well performing subsets in the floating variants (via Zac Wellmer).
Version 0.5.0 (2016-11-09)
Downloads
New Features
- New
ExhaustiveFeatureSelector
estimator inmlxtend.feature_selection
for evaluating all feature combinations in a specified range - The
StackingClassifier
has a new parameteraverage_probas
that is set toTrue
by default to maintain the current behavior. A deprecation warning was added though, and it will default toFalse
in future releases (0.6.0);average_probas=False
will result in stacking of the level-1 predicted probabilities rather than averaging these. - New
StackingCVClassifier
estimator in ‘mlxtend.classifier’ for implementing a stacking ensemble that uses cross-validation techniques for training the meta-estimator to avoid overfitting (Reiichiro Nakano) - New
OnehotTransactions
encoder class added to thepreprocessing
submodule for transforming transaction data into a one-hot encoded array - The
SequentialFeatureSelector
estimator inmlxtend.feature_selection
now is safely stoppable mid-process by control+c, and deprecated print_progress in favor of a more tunable verbose parameter (Will McGinnis) - New
apriori
function inassociation
to extract frequent itemsets from transaction data for association rule mining - New
checkerboard_plot
function inplotting
to plot checkerboard tables / heat maps - New
mcnemar_table
andmcnemar
functions inevaluate
to compute 2x2 contingency tables and McNemar’s test
Changes
- All plotting functions have been moved to
mlxtend.plotting
for compatibility reasons with continuous integration services and to make the installation ofmatplotlib
optional for users ofmlxtend
‘s core functionality - Added a compatibility layer for
scikit-learn 0.18
using the newmodel_selection
module while maintaining backwards compatibility to scikit-learn 0.17.
Bug Fixes
mlxtend.plotting.plot_decision_regions
now draws decision regions correctly if more than 4 class labels are present- Raise
AttributeError
inplot_decision_regions
when theX_higlight
argument is a 1D array (chkoar)
Version 0.4.2 (2016-08-24)
Downloads
New Features
- Added
preprocessing.CopyTransformer
, a mock class that returns copies of
imput arrays viatransform
andfit_transform
Changes
- Added AppVeyor to CI to ensure MS Windows compatibility
- Dataset are now saved as compressed .txt or .csv files rather than being imported as Python objects
feature_selection.SequentialFeatureSelector
now supports the selection ofk_features
using a tuple to specify a “min-max”k_features
range- Added “SVD solver” option to the
PrincipalComponentAnalysis
- Raise a
AttributeError
with “not fitted” message inSequentialFeatureSelector
iftransform
orget_metric_dict
are called prior tofit
- Use small, positive bias units in
TfMultiLayerPerceptron
‘s hidden layer(s) if the activations are ReLUs in order to avoid dead neurons - Added an optional
clone_estimator
parameter to theSequentialFeatureSelector
that defaults toTrue
, avoiding the modification of the original estimator objects - More rigorous type and shape checks in the
evaluate.plot_decision_regions
function DenseTransformer
now doesn’t raise and error if the input array is not sparse- API clean-up using scikit-learn’s
BaseEstimator
as parent class forfeature_selection.ColumnSelector
Bug Fixes
- Fixed a problem when a tuple-range was provided as argument to the
SequentialFeatureSelector
‘sk_features
parameter and the scoring metric was more negative than -1 (e.g., as in scikit-learn’s MSE scoring function) (wahutch](https://github.com/wahutch)) - Fixed an
AttributeError
issue whenverbose
> 1 inStackingClassifier
- Fixed a bug in
classifier.SoftmaxRegression
where the mean values of the offsets were used to update the bias units rather than their sum - Fixed rare bug in MLP
_layer_mapping
functions that caused a swap between the random number generation seed when initializing weights and biases
Version 0.4.1 (2016-05-01)
Downloads
New Features
- New TensorFlow estimator for Linear Regression (
tf_regressor.TfLinearRegression
) - New k-means clustering estimator (
cluster.Kmeans
) - New TensorFlow k-means clustering estimator (
tf_cluster.Kmeans
)
Changes
- Due to refactoring of the estimator classes, the
init_weights
parameter of thefit
methods was globally renamed toinit_params
- Overall performance improvements of estimators due to code clean-up and refactoring
- Added several additional checks for correct array types and more meaningful exception messages
- Added optional
dropout
to thetf_classifier.TfMultiLayerPerceptron
classifier for regularization - Added an optional
decay
parameter to thetf_classifier.TfMultiLayerPerceptron
classifier for adaptive learning via an exponential decay of the learning rate eta - Replaced old
NeuralNetMLP
by more streamlinedMultiLayerPerceptron
(classifier.MultiLayerPerceptron
); now also with softmax in the output layer and categorical cross-entropy loss. - Unified
init_params
parameter for fit functions to continue training where the algorithm left off (if supported)
Version 0.4.0 (2016-04-09)
New Features
- New
TfSoftmaxRegression
classifier using Tensorflow (tf_classifier.TfSoftmaxRegression
) - New
SoftmaxRegression
classifier (classifier.SoftmaxRegression
) - New
TfMultiLayerPerceptron
classifier using Tensorflow (tf_classifier.TfMultiLayerPerceptron
) - New
StackingRegressor
(regressor.StackingRegressor
) - New
StackingClassifier
(classifier.StackingClassifier
) - New function for one-hot encoding of class labels (
preprocessing.one_hot
) - Added
GridSearch
support to theSequentialFeatureSelector
(feature_selection/.SequentialFeatureSelector
) evaluate.plot_decision_regions
improvements:- Function now handles class y-class labels correctly if array is of type
float
- Correct handling of input arguments
markers
andcolors
- Accept an existing
Axes
via theax
argument
- Function now handles class y-class labels correctly if array is of type
- New
print_progress
parameter for all generalized models and multi-layer neural networks for printing time elapsed, ETA, and the current cost of the current epoch - Minibatch learning for
classifier.LogisticRegression
,classifier.Adaline
, andregressor.LinearRegression
plus streamlined API - New Principal Component Analysis class via
mlxtend.feature_extraction.PrincipalComponentAnalysis
- New RBF Kernel Principal Component Analysis class via
mlxtend.feature_extraction.RBFKernelPCA
- New Linear Discriminant Analysis class via
mlxtend.feature_extraction.LinearDiscriminantAnalysis
Changes
- The
column
parameter inmlxtend.preprocessing.standardize
now defaults toNone
to standardize all columns more conveniently
Version 0.3.0 (2016-01-31)
Downloads
New Features
- Added a progress bar tracker to
classifier.NeuralNetMLP
- Added a function to score predicted vs. target class labels
evaluate.scoring
- Added confusion matrix functions to create (
evaluate.confusion_matrix
) and plot (evaluate.plot_confusion_matrix
) confusion matrices - New style parameter and improved axis scaling in
mlxtend.evaluate.plot_learning_curves
- Added
loadlocal_mnist
tomlxtend.data
for streaming MNIST from a local byte files into numpy arrays - New
NeuralNetMLP
parameters:random_weights
,shuffle_init
,shuffle_epoch
- New
SFS
features such as the generation of pandasDataFrame
results tables and plotting functions (with confidence intervals, standard deviation, and standard error bars) - Added support for regression estimators in
SFS
- Added Boston
housing dataset
- New
shuffle
parameter forclassifier.NeuralNetMLP
Changes
- The
mlxtend.preprocessing.standardize
function now optionally returns the parameters, which are estimated from the array, for re-use. A further improvement makes thestandardize
function smarter in order to avoid zero-division errors - Cosmetic improvements to the
evaluate.plot_decision_regions
function such as hiding plot axes - Renaming of
classifier.EnsembleClassfier
toclassifier.EnsembleVoteClassifier
- Improved random weight initialization in
Perceptron
,Adaline
,LinearRegression
, andLogisticRegression
- Changed
learning
parameter ofmlxtend.classifier.Adaline
tosolver
and added “normal equation” as closed-form solution solver - Hide y-axis labels in
mlxtend.evaluate.plot_decision_regions
in 1 dimensional evaluations - Sequential Feature Selection algorithms were unified into a single
SequentialFeatureSelector
class with parameters to enable floating selection and toggle between forward and backward selection. - Stratified sampling of MNIST (now 500x random samples from each of the 10 digit categories)
- Renaming
mlxtend.plotting
tomlxtend.general_plotting
in order to distinguish general plotting function from specialized utility function such asevaluate.plot_decision_regions
Version 0.2.9 (2015-07-14)
Downloads
New Features
- Sequential Feature Selection algorithms: SFS, SFFS, SBS, and SFBS
Changes
- Changed
regularization
&lambda
parameters inLogisticRegression
to single parameterl2_lambda
Version 0.2.8 (2015-06-27)
- API changes:
mlxtend.sklearn.EnsembleClassifier
->mlxtend.classifier.EnsembleClassifier
mlxtend.sklearn.ColumnSelector
->mlxtend.feature_selection.ColumnSelector
mlxtend.sklearn.DenseTransformer
->mlxtend.preprocessing.DenseTransformer
mlxtend.pandas.standardizing
->mlxtend.preprocessing.standardizing
mlxtend.pandas.minmax_scaling
->mlxtend.preprocessing.minmax_scaling
mlxtend.matplotlib
->mlxtend.plotting
- Added momentum learning parameter (alpha coefficient) to
mlxtend.classifier.NeuralNetMLP
. - Added adaptive learning rate (decrease constant) to
mlxtend.classifier.NeuralNetMLP
. mlxtend.pandas.minmax_scaling
becamemlxtend.preprocessing.minmax_scaling
and also supports NumPy arrays nowmlxtend.pandas.standardizing
becamemlxtend.preprocessing.standardizing
and now supports both NumPy arrays and pandas DataFrames; also, nowddof
parameters to set the degrees of freedom when calculating the standard deviation
Version 0.2.7 (2015-06-20)
- Added multilayer perceptron (feedforward artificial neural network) classifier as
mlxtend.classifier.NeuralNetMLP
. - Added 5000 labeled trainingsamples from the MNIST handwritten digits dataset to
mlxtend.data
Version 0.2.6 (2015-05-08)
- Added ordinary least square regression using different solvers (gradient and stochastic gradient descent, and the closed form solution (normal equation)
- Added option for random weight initialization to logistic regression classifier and updated l2 regularization
- Added
wine
dataset tomlxtend.data
- Added
invert_axes
parametermlxtend.matplotlib.enrichtment_plot
to optionally plot the “Count” on the x-axis - New
verbose
parameter formlxtend.sklearn.EnsembleClassifier
by Alejandro C. Bahnsen - Added
mlxtend.pandas.standardizing
to standardize columns in a Pandas DataFrame - Added parameters
linestyles
andmarkers
tomlxtend.matplotlib.enrichment_plot
mlxtend.regression.lin_regplot
automatically adds np.newaxis and works w. python lists- Added tokenizers:
mlxtend.text.extract_emoticons
andmlxtend.text.extract_words_and_emoticons
Version 0.2.5 (2015-04-17)
- Added Sequential Backward Selection (mlxtend.sklearn.SBS)
- Added
X_highlight
parameter tomlxtend.evaluate.plot_decision_regions
for highlighting test data points. - Added mlxtend.regression.lin_regplot to plot the fitted line from linear regression.
- Added mlxtend.matplotlib.stacked_barplot to conveniently produce stacked barplots using pandas
DataFrame
s. - Added mlxtend.matplotlib.enrichment_plot
Version 0.2.4 (2015-03-15)
- Added
scoring
tomlxtend.evaluate.learning_curves
(by user pfsq) - Fixed setup.py bug caused by the missing README.html file
- matplotlib.category_scatter for pandas DataFrames and Numpy arrays
Version 0.2.3 (2015-03-11)
- Added Logistic regression
- Gradient descent and stochastic gradient descent perceptron was changed
to Adaline (Adaptive Linear Neuron) - Perceptron and Adaline for {0, 1} classes
- Added
mlxtend.preprocessing.shuffle_arrays_unison
function to
shuffle one or more NumPy arrays. - Added shuffle and random seed parameter to stochastic gradient descent classifier.
- Added
rstrip
parameter tomlxtend.file_io.find_filegroups
to allow trimming of base names. - Added
ignore_substring
parameter tomlxtend.file_io.find_filegroups
andfind_files
. - Replaced .rstrip in
mlxtend.file_io.find_filegroups
with more robust regex. - Gridsearch support for
mlxtend.sklearn.EnsembleClassifier
Version 0.2.2 (2015-03-01)
- Improved robustness of EnsembleClassifier.
- Extended plot_decision_regions() functionality for plotting 1D decision boundaries.
- Function matplotlib.plot_decision_regions was reorganized to evaluate.plot_decision_regions .
- evaluate.plot_learning_curves() function added.
- Added Rosenblatt, gradient descent, and stochastic gradient descent perceptrons.
Version 0.2.1 (2015-01-20)
- Added mlxtend.pandas.minmax_scaling - a function to rescale pandas DataFrame columns.
- Slight update to the EnsembleClassifier interface (additional
voting
parameter) - Fixed EnsembleClassifier to return correct class labels if class labels are not
integers from 0 to n. - Added new matplotlib function to plot decision regions of classifiers.
Version 0.2.0 (2015-01-13)
- Improved mlxtend.text.generalize_duplcheck to remove duplicates and prevent endless looping issue.
- Added
recursive
search parameter to mlxtend.file_io.find_files. - Added
check_ext
parameter mlxtend.file_io.find_files to search based on file extensions. - Default parameter to ignore invisible files for mlxtend.file_io.find.
- Added
transform
andfit_transform
to theEnsembleClassifier
. - Added mlxtend.file_io.find_filegroups function.
Version 0.1.9 (2015-01-10)
- Implemented scikit-learn EnsembleClassifier (majority voting rule) class.
Version 0.1.8 (2015-01-07)
- Improvements to mlxtend.text.generalize_names to handle certain Dutch last name prefixes (van, van der, de, etc.).
- Added mlxtend.text.generalize_name_duplcheck function to apply mlxtend.text.generalize_names function to a pandas DataFrame without creating duplicates.
Version 0.1.7 (2015-01-07)
- Added text utilities with name generalization function.
- Added and file_io utilities.
Version 0.1.6 (2015-01-04)
- Added combinations and permutations estimators.
Version 0.1.5 (2014-12-11)
- Added
DenseTransformer
for pipelines and grid search.
Version 0.1.4 (2014-08-20)
mean_centering
function is now a Class that createsMeanCenterer
objects
that can be used to fit data via thefit
method, and center data at the column
means via thetransform
andfit_transform
method.
Version 0.1.3 (2014-08-19)
- Added
preprocessing
module andmean_centering
function.
Version 0.1.2 (2014-08-19)
- Added
matplotlib
utilities andremove_borders
function.
Version 0.1.1 (2014-08-13)
- Simplified code for ColumnSelector.