--- title: Analysis keywords: fastai sidebar: home_sidebar summary: "This contains fastai Learner extensions useful to perform prediction analysis." description: "This contains fastai Learner extensions useful to perform prediction analysis." nb_path: "nbs/052b_analysis.ipynb" ---
We've also introduced 2 methods to help you better understand how important certain features or certain steps are for your model. Both methods use permutation importance.
⚠️The permutation feature or step importance is defined as the decrease in a model score when a single feature or step value is randomly shuffled.
So if you using accuracy (higher is better), the most important features or steps will be those with a lower value on the chart (as randomly shuffling them reduces performance).
The opposite occurs for metrics like mean squared error (lower is better). In this case, the most important features or steps will be those with a higher value on the chart.
There are 2 issues with step importance:
For those reasons, we've introduced an argument (n_steps) to group steps. In this way you'll be able to know which part of the time series is the most important.
Feature importance has been adapted from https://www.kaggle.com/cdeotte/lstm-feature-importance by Chris Deotte (Kaggle GrandMaster).
from tsai.data.external import get_UCR_data
from tsai.data.preprocessing import TSRobustScale, TSStandardize
from tsai.learner import ts_learner
from tsai.models.FCNPlus import FCNPlus
from tsai.metrics import accuracy
dsid = 'NATOPS'
X, y, splits = get_UCR_data(dsid, split_data=False)
tfms = [None, [TSClassification()]]
batch_tfms = TSRobustScale()
batch_tfms = TSStandardize()
dls = get_ts_dls(X, y, splits=splits, sel_vars=[0, 3, 5, 8, 10], sel_steps=slice(-30, None), tfms=tfms, batch_tfms=batch_tfms)
learn = ts_learner(dls, FCNPlus, metrics=accuracy, train_metrics=True)
learn.fit_one_cycle(2)
learn.plot_metrics()
learn.show_probas()
learn.plot_confusion_matrix()
learn.plot_top_losses(X[splits[1]], y[splits[1]], largest=True)
learn.top_losses(X[splits[1]], y[splits[1]], largest=True)
learn.feature_importance()
learn.step_importance(n_steps=5);
You may pass an X and y if you want to analyze a particular group of samples:
learn.feature_importance(X=X[splits[1]], y=y[splits[1]])
If you have a large validation dataset, you may also use the partial_n argument to select a fixed amount of samples (integer) or a percentage of the validation dataset (float):
learn.feature_importance(partial_n=.1)
learn.feature_importance(partial_n=100)