autots package

Subpackages

Module contents

Automated Time Series Model Selection for Python

https://github.com/winedarksea/AutoTS

autots.load_daily(long: bool = True)

2020 Covid, Air Pollution, and Economic Data.

Sources: Covid Tracking Project, EPA, and FRED

Parameters

long (bool) – if True, return data in long format. Otherwise return wide

autots.load_monthly(long: bool = True)

Federal Reserve of St. Louis monthly economic indicators.

autots.load_yearly(long: bool = True)

Federal Reserve of St. Louis annual economic indicators.

autots.load_hourly(long: bool = True)

Traffic data from the MN DOT via the UCI data repository.

autots.load_weekly(long: bool = True)

Weekly petroleum industry data from the EIA.

autots.load_weekdays(long: bool = False, categorical: bool = True, periods: int = 180)

Test edge cases by creating a Series with values as day of week.

Parameters
  • long (bool) – if True, return a df with columns “value” and “datetime” if False, return a Series with dt index

  • categorical (bool) – if True, return str/object, else return int

  • periods (int) – number of periods, ie length of data to generate

autots.load_live_daily(long: bool = False, fred_key: str = None, fred_series: list = ['DGS10', 'T5YIE', 'SP500', 'DCOILWTICO', 'DEXUSEU'], tickers: list = ['MSFT'], trends_list: list = ['forecasting', 'cycling', 'cpu', 'microsoft'], weather_data_types: list = ['AWND', 'WSF2', 'TAVG'], weather_stations: list = ['USW00094846', 'USW00014925'], weather_years: int = 10, london_air_stations: list = ['CT3', 'SK8'], london_air_species: str = 'PM25', london_air_days: int = 180, earthquake_days: int = 180, earthquake_min_magnitude: int = 5)

Generates a dataframe of data up to the present day.

Parameters
  • long (bool) – whether to return in long format or wide

  • fred_key (str) – https://fred.stlouisfed.org/docs/api/api_key.html

  • fred_series (list) – list of FRED series IDs. This requires fredapi package

  • tickers (list) – list of stock tickers, requires yfinance

  • trends_list (list) – list of search keywords, requires pytrends. None to skip.

  • weather_data_types (list) – from NCEI NOAA api data types, GHCN Daily Weather Elements PRCP, SNOW, TMAX, TMIN, TAVG, AWND, WSF1, WSF2, WSF5, WSFG

  • weather_stations (list) – from NCEI NOAA api station ids. Pass empty list to skip.

  • london_air_stations (list) – londonair.org.uk source station IDs. Pass empty list to skip.

  • london_species (str) – what measurement to pull from London Air. Not all stations have all metrics. earthquake_min_magnitude (int): smallest earthquake magnitude to pull from earthquake.usgs.gov. Set None to skip this.

autots.load_linear(long=False, shape=None, start_date: str = '2021-01-01', introduce_nan: float = None, introduce_random: float = None, random_seed: int = 123)

Create a dataset of just zeroes for testing edge case.

Parameters
  • long (bool) – whether to make long or wide

  • shape (tuple) – shape of output dataframe

  • start_date (str) – first date of index

  • introduce_nan (float) – percent of rows to make null. 0.2 = 20%

  • introduce_random (float) – shape of gamma distribution

  • random_seed (int) – seed for random

class autots.AutoTS(forecast_length: int = 14, frequency: str = 'infer', prediction_interval: float = 0.9, max_generations: int = 10, no_negatives: bool = False, constraint: float = None, ensemble: str = 'auto', initial_template: str = 'General+Random', random_seed: int = 2020, holiday_country: str = 'US', subset: int = None, aggfunc: str = 'first', na_tolerance: float = 1, metric_weighting: dict = {'containment_weighting': 0, 'contour_weighting': 1, 'mae_weighting': 2, 'rmse_weighting': 2, 'runtime_weighting': 0.05, 'smape_weighting': 10, 'spl_weighting': 2}, drop_most_recent: int = 0, drop_data_older_than_periods: int = 100000, model_list: str = 'default', transformer_list: dict = 'fast', transformer_max_depth: int = 6, models_mode: str = 'random', num_validations: int = 2, models_to_validate: float = 0.15, max_per_model_class: int = None, validation_method: str = 'backwards', min_allowed_train_percent: float = 0.5, remove_leading_zeroes: bool = False, prefill_na: str = None, introduce_na: bool = None, model_interrupt: bool = False, verbose: int = 1, n_jobs: int = None)

Bases: object

Automate time series modeling using a genetic algorithm.

Parameters
  • forecast_length (int) – number of periods over which to evaluate forecast. Can be overriden later in .predict().

  • frequency (str) – ‘infer’ or a specific pandas datetime offset. Can be used to force rollup of data (ie daily input, but frequency ‘M’ will rollup to monthly).

  • prediction_interval (float) – 0-1, uncertainty range for upper and lower forecasts. Adjust range, but rarely matches actual containment.

  • max_generations (int) – number of genetic algorithms generations to run. More runs = longer runtime, generally better accuracy. It’s called max because someday there will be an auto early stopping option, but for now this is just the exact number of generations to run.

  • no_negatives (bool) – if True, all negative predictions are rounded up to 0.

  • constraint (float) – when not None, use this value * data st dev above max or below min for constraining forecast values. Applied to point forecast only, not upper/lower forecasts.

  • ensemble (str) – None or list or comma-separated string containing: ‘auto’, ‘simple’, ‘distance’, ‘horizontal’, ‘horizontal-min’, ‘horizontal-max’, “mosaic”, “subsample”

  • initial_template (str) – ‘Random’ - randomly generates starting template, ‘General’ uses template included in package, ‘General+Random’ - both of previous. Also can be overriden with self.import_template()

  • random_seed (int) – random seed allows (slightly) more consistent results.

  • holiday_country (str) – passed through to Holidays package for some models.

  • subset (int) – maximum number of series to evaluate at once. Useful to speed evaluation when many series are input. takes a new subset of columns on each validation, unless mosaic ensembling, in which case columns are the same in each validation

  • aggfunc (str) – if data is to be rolled up to a higher frequency (daily -> monthly) or duplicate timestamps are included. Default ‘first’ removes duplicates, for rollup try ‘mean’ or np.sum. Beware numeric aggregations like ‘mean’ will not work with non-numeric inputs.

  • na_tolerance (float) – 0 to 1. Series are dropped if they have more than this percent NaN. 0.95 here would allow series containing up to 95% NaN values.

  • metric_weighting (dict) – weights to assign to metrics, effecting how the ranking score is generated.

  • drop_most_recent (int) – option to drop n most recent data points. Useful, say, for monthly sales data where the current (unfinished) month is included. occurs after any aggregration is applied, so will be whatever is specified by frequency, will drop n frequencies

  • drop_data_older_than_periods (int) – take only the n most recent timestamps

  • model_list (list) – str alias or list of names of model objects to use

  • transformer_list (list) – list of transformers to use, or dict of transformer:probability. Note this does not apply to initial templates. can accept string aliases: “all”, “fast”, “superfast”

  • transformer_max_depth (int) – maximum number of sequential transformers to generate for new Random Transformers. Fewer will be faster.

  • models_mode (str) – option to adjust parameter options for newly generated models. Currently includes: ‘default’, ‘deep’ (searches more params, likely slower), and ‘regressor’ (forces ‘User’ regressor mode in regressor capable models)

  • num_validations (int) – number of cross validations to perform. 0 for just train/test on best split. Possible confusion: num_validations is the number of validations to perform after the first eval segment, so totally eval/validations will be this + 1.

  • models_to_validate (int) – top n models to pass through to cross validation. Or float in 0 to 1 as % of tried. 0.99 is forced to 100% validation. 1 evaluates just 1 model. If horizontal or mosaic ensemble, then additional min per_series models above the number here are added to validation.

  • max_per_model_class (int) – of the models_to_validate what is the maximum to pass from any one model class/family.

  • validation_method (str) – ‘even’, ‘backwards’, or ‘seasonal n’ where n is an integer of seasonal ‘backwards’ is better for recency and for shorter training sets ‘even’ splits the data into equally-sized slices best for more consistent data, a poetic but less effective strategy than others here ‘seasonal n’ for example ‘seasonal 364’ would test all data on each previous year of the forecast_length that would immediately follow the training data. ‘similarity’ automatically finds the data sections most similar to the most recent data that will be used for prediction ‘custom’ - if used, .fit() needs validation_indexes passed - a list of pd.DatetimeIndex’s, tail of each is used as test

  • min_allowed_train_percent (float) – percent of forecast length to allow as min training, else raises error. 0.5 with a forecast length of 10 would mean 5 training points are mandated, for a total of 15 points. Useful in (unrecommended) cases where forecast_length > training length.

  • remove_leading_zeroes (bool) – replace leading zeroes with NaN. Useful in data where initial zeroes mean data collection hasn’t started yet.

  • prefill_na (str) – value to input to fill all NaNs with. Leaving as None and allowing model interpolation is recommended. None, 0, ‘mean’, or ‘median’. 0 may be useful in for examples sales cases where all NaN can be assumed equal to zero.

  • introduce_na (bool) – whether to force last values in one training validation to be NaN. Helps make more robust models. defaults to None, which introduces NaN in last rows of validations if any NaN in tail of training data. Will not introduce NaN to all series if subset is used. if True, will also randomly change 20% of all rows to NaN in the validations

  • model_interrupt (bool) – if False, KeyboardInterrupts quit entire program. if True, KeyboardInterrupts attempt to only quit current model. if True, recommend use in conjunction with verbose > 0 and result_file in the event of accidental complete termination.

  • verbose (int) – setting to 0 or lower should reduce most output. Higher numbers give more output.

  • n_jobs (int) – Number of cores available to pass to parallel processing. A joblib context manager can be used instead (pass None in this case). Also ‘auto’.

best_model

DataFrame containing template for the best ranked model

Type

pd.DataFrame

best_model_name

model name

Type

str

best_model_params

model params

Type

dict

best_model_transformation_params

transformation parameters

Type

dict

best_model_ensemble

Ensemble type int id

Type

int

regression_check

If True, the best_model uses an input ‘User’ future_regressor

Type

bool

df_wide_numeric

dataframe containing shaped final data

Type

pd.DataFrame

initial_results.model_results

contains a collection of result metrics

Type

object

score_per_series

generated score of metrics given per input series, if horizontal ensembles

Type

pd.DataFrame

fit, predict
export_template, import_template, import_results
results, failure_rate
horizontal_to_df, mosaic_to_df
plot_horizontal, plot_horizontal_transformers, plot_generation_loss, plot_backforecast
back_forecast(column=None, n_splits: int = 3, tail: int = None, verbose: int = 0)

Create forecasts for the historical training data, ie. backcast or back forecast.

This actually forecasts on historical data, these are not fit model values as are often returned by other packages. As such, this will be slower, but more representative of real world model performance. There may be jumps in data between chunks.

Args are same as for model_forecast except… n_splits(int): how many pieces to split data into. Pass 2 for fastest, or “auto” for best accuracy column (str): if to run on only one column, pass column name. Faster than full. tail (int): df.tail() of the dataset, back_forecast is only run on n most recent observations.

Returns a standard prediction object (access .forecast, .lower_forecast, .upper_forecast)

export_template(filename=None, models: str = 'best', n: int = 5, max_per_model_class: int = None, include_results: bool = False)

Export top results as a reusable template.

Parameters
  • filename (str) – ‘csv’ or ‘json’ (in filename). None to return a dataframe and not write a file.

  • models (str) – ‘best’ or ‘all’

  • n (int) – if models = ‘best’, how many n-best to export

  • max_per_model_class (int) – if models = ‘best’, the max number of each model class to include in template

  • include_results (bool) – whether to include performance metrics

failure_rate(result_set: str = 'initial')

Return fraction of models passing with exceptions.

Parameters

result_set (str, optional) – ‘validation’ or ‘initial’. Defaults to ‘initial’.

Returns

float.

fit(df, date_col: str = None, value_col: str = None, id_col: str = None, future_regressor=None, weights: dict = {}, result_file: str = None, grouping_ids=None, validation_indexes: list = None)

Train algorithm given data supplied.

Parameters
  • df (pandas.DataFrame) – Datetime Indexed dataframe of series, or dataframe of three columns as below.

  • date_col (str) – name of datetime column

  • value_col (str) – name of column containing the data of series.

  • id_col (str) – name of column identifying different series.

  • future_regressor (numpy.Array) – single external regressor matching train.index

  • weights (dict) – {‘colname1’: 2, ‘colname2’: 5} - increase importance of a series in metric evaluation. Any left blank assumed to have weight of 1. pass the alias ‘mean’ as a str ie weights=’mean’ to automatically use the mean value of a series as its weight available aliases: mean, median, min, max

  • result_file (str) – results saved on each new generation. Does not include validation rounds. “.csv” save model results table. “.pickle” saves full object, including ensemble information.

  • grouping_ids (dict) – currently a one-level dict containing series_id:group_id mapping. used in 0.2.x but not 0.3.x+ versions. retained for potential future use

horizontal_to_df()

helper function for plotting.

import_results(filename)

Add results from another run on the same data.

Input can be filename with .csv or .pickle. or can be a DataFrame of model results or a full TemplateEvalObject

import_template(filename: str, method: str = 'add_on', enforce_model_list: bool = True)

Import a previously exported template of model parameters. Must be done before the AutoTS object is .fit().

Parameters
  • filename (str) – file location (or a pd.DataFrame already loaded)

  • method (str) – ‘add_on’ or ‘only’ - “add_on” keeps initial_template generated in init. “only” uses only this template.

  • enforce_model_list (bool) – if True, remove model types not in model_list

mosaic_to_df()

Helper function to create a readable df of models in mosaic.

plot_backforecast(series=None, n_splits: int = 3, start_date=None, **kwargs)

Plot the historical data and fit forecast on historic.

Parameters
  • series (str or list) – column names of time series

  • n_splits (int or str) – “auto”, number > 2, higher more accurate but slower

  • passed to pd.DataFrame.plot() (**kwargs) –

plot_generation_loss(**kwargs)

Plot improvement in accuracy over generations. Note: this is only “one size fits all” accuracy and doesn’t account for the benefits seen for ensembling.

Parameters

passed to pd.DataFrame.plot() (**kwargs) –

plot_horizontal(max_series: int = 20, **kwargs)

Simple plot to visualize assigned series: models.

Note that for ‘mosiac’ ensembles, it only plots the type of the most common model_id for that series, or the first if all are mode.

Parameters
  • max_series (int) – max number of points to plot

  • passed to pandas.plot() (**kwargs) –

plot_horizontal_transformers(method='transformers', color_list=None, **kwargs)

Simple plot to visualize transformers used. Note this doesn’t capture transformers nested in simple ensembles.

Parameters
  • method (str) – ‘fillna’ or ‘transformers’ - which to plot

  • = list of colors to sample for bar colors. Can be names or hex. (color_list) –

  • passed to pandas.plot() (**kwargs) –

predict(forecast_length: int = 'self', prediction_interval: float = 'self', future_regressor=None, hierarchy=None, just_point_forecast: bool = False, verbose: int = 'self')

Generate forecast data immediately following dates of index supplied to .fit().

Parameters
  • forecast_length (int) – Number of periods of data to forecast ahead

  • prediction_interval (float) – interval of upper/lower forecasts. defaults to ‘self’ ie the interval specified in __init__() if prediction_interval is a list, then returns a dict of forecast objects.

  • future_regressor (numpy.Array) – additional regressor

  • hierarchy – Not yet implemented

  • just_point_forecast (bool) – If True, return a pandas.DataFrame of just point forecasts

Returns

Either a PredictionObject of forecasts and metadata, or if just_point_forecast == True, a dataframe of point forecasts

results(result_set: str = 'initial')

Convenience function to return tested models table.

Parameters

result_set (str) – ‘validation’ or ‘initial’

autots.TransformTS

alias of autots.tools.transform.GeneralTransformer

class autots.GeneralTransformer(fillna: str = 'ffill', transformations: dict = {}, transformation_params: dict = {}, grouping: str = None, reconciliation: str = None, grouping_ids=None, random_seed: int = 2020)

Bases: object

Remove fillNA and then mathematical transformations.

Expects a chronologically sorted pandas.DataFrame with a DatetimeIndex, only numeric data, and a ‘wide’ (one column per series) shape.

Warning

  • inverse_transform will not fully return the original data under many conditions
    • the primary intention of inverse_transform is to inverse for forecast (immediately following the historical time period) data from models, not to return original data

    • NAs filled will be returned with the filled value

    • Discretization, statsmodels filters, Round, Slice, ClipOutliers cannot be inversed

    • RollingMean, PctChange, CumSum, Seasonal Difference, and DifferencedTransformer will only return original or an immediately following forecast
      • by default ‘forecast’ is expected, ‘original’ can be set in trans_method

Parameters
  • fillNA (str) –

    • method to fill NA, passed through to FillNA()

    ’ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling_mean’ - fill with last n (window = 10) values ‘rolling_mean_24’ - fill with avg of last 24 ‘ffill_mean_biased’ - simple avg of ffill and mean ‘fake_date’ - shifts forward data over nan, thus values will have incorrect timestamps ‘IterativeImputer’ - sklearn iterative imputer most of the interpolate methods from pandas.interpolate

  • transformations (dict) –

    • transformations to apply {0: “MinMaxScaler”, 1: “Detrend”, …}

    ’None’ ‘MinMaxScaler’ - Sklearn MinMaxScaler ‘PowerTransformer’ - Sklearn PowerTransformer ‘QuantileTransformer’ - Sklearn ‘MaxAbsScaler’ - Sklearn ‘StandardScaler’ - Sklearn ‘RobustScaler’ - Sklearn ‘PCA, ‘FastICA’ - performs sklearn decomposition and returns n-cols worth of n_components ‘Detrend’ - fit then remove a linear regression from the data ‘RollingMeanTransformer’ - 10 period rolling average, can receive a custom window by transformation_param if used as second_transformation ‘FixedRollingMean’ - same as RollingMean, but with inverse_transform disabled, so smoothed forecasts are maintained. ‘RollingMean10’ - 10 period rolling average (smoothing) ‘RollingMean100thN’ - Rolling mean of periods of len(train)/100 (minimum 2) ‘DifferencedTransformer’ - makes each value the difference of that value and the previous value ‘PctChangeTransformer’ - converts to pct_change, not recommended if lots of zeroes in data ‘SinTrend’ - removes a sin trend (fitted to each column) from the data ‘CumSumTransformer’ - makes value sum of all previous ‘PositiveShift’ - makes all values >= 1 ‘Log’ - log transform (uses PositiveShift first as necessary) ‘IntermittentOccurrence’ - -1, 1 for non median values ‘SeasonalDifference’ - remove the last lag values from all values ‘SeasonalDifferenceMean’ - remove the average lag values from all ‘SeasonalDifference7’,’12’,’28’ - non-parameterized version of Seasonal ‘CenterLastValue’ - center data around tail of dataset ‘Round’ - round values on inverse or transform ‘Slice’ - use only recent records ‘ClipOutliers’ - remove outliers ‘Discretize’ - bin or round data into groups ‘DatepartRegression’ - move a trend trained on datetime index “ScipyFilter” - filter data (lose information but smoother!) from scipy “HPFilter” - statsmodels hp_filter “STLFilter” - seasonal decompose and keep just one part of decomposition.

  • transformation_params (dict) – params of transformers {0: {}, 1: {‘model’: ‘Poisson’}, …} pass through dictionary of empty dictionaries to utilize defaults

  • random_seed (int) – random state passed through where applicable

fill_na(df, window: int = 10)
Parameters
  • df (pandas.DataFrame) – Datetime Indexed

  • window (int) – passed through to rolling mean fill technique

Returns

pandas.DataFrame

fit(df)

Apply transformations and return transformer object.

Parameters

df (pandas.DataFrame) – Datetime Indexed

fit_transform(df)

Directly fit and apply transformations to convert df.

inverse_transform(df, trans_method: str = 'forecast', fillzero: bool = False)

Undo the madness.

Parameters
  • df (pandas.DataFrame) – Datetime Indexed

  • trans_method (str) – ‘forecast’ or ‘original’ passed through

  • fillzero (bool) – if inverse returns NaN, fill with zero

classmethod retrieve_transformer(transformation: str = None, param: dict = {}, df=None, random_seed: int = 2020)

Retrieves a specific transformer object from a string.

Parameters
  • df (pandas.DataFrame) – Datetime Indexed - required to set params for some transformers

  • transformation (str) – name of desired method

  • param (dict) – dict of kwargs to pass (legacy: an actual param)

Returns

transformer object

transform(df)

Apply transformations to convert df.

autots.RandomTransform(transformer_list: dict = {None: 0.0, 'MinMaxScaler': 0.05, 'PowerTransformer': 0.02, 'QuantileTransformer': 0.05, 'MaxAbsScaler': 0.05, 'StandardScaler': 0.04, 'RobustScaler': 0.05, 'PCA': 0.01, 'FastICA': 0.01, 'Detrend': 0.1, 'RollingMeanTransformer': 0.02, 'RollingMean100thN': 0.01, 'DifferencedTransformer': 0.1, 'SinTrend': 0.01, 'PctChangeTransformer': 0.01, 'CumSumTransformer': 0.02, 'PositiveShift': 0.02, 'Log': 0.01, 'IntermittentOccurrence': 0.01, 'SeasonalDifference': 0.1, 'cffilter': 0.01, 'bkfilter': 0.05, 'convolution_filter': 0.001, 'HPFilter': 0.02, 'DatepartRegression': 0.01, 'ClipOutliers': 0.05, 'Discretize': 0.03, 'CenterLastValue': 0.01, 'Round': 0.02, 'Slice': 0.02, 'ScipyFilter': 0.02, 'STLFilter': 0.01}, transformer_max_depth: int = 4, na_prob_dict: dict = {'ffill': 0.3, 'fake_date': 0.1, 'rolling_mean': 0.2, 'rolling_mean_24': 0.1, 'IterativeImputer': 0.1, 'mean': 0.05, 'zero': 0.05, 'ffill_mean_biased': 0.1, 'median': 0.05, None: 0.001, 'interpolate': 0.5, 'KNNImputer': 0.05, 'IterativeImputerExtraTrees': 0.0001}, fast_params: bool = None, superfast_params: bool = None, traditional_order: bool = False)

Return a dict of randomly choosen transformation selections.

SinTrend is used as a signal that slow parameters are allowed.

autots.long_to_wide(df, date_col: str = 'datetime', value_col: str = 'value', id_col: str = 'series_id', aggfunc: str = 'first')

Take long data and convert into wide, cleaner data.

Parameters
  • df (pd.DataFrame) –

  • date_col (str) –

  • value_col (str) –

    • the name of the column with the values of the time series (ie sales $)

  • id_col (str) –

    • name of the id column, unique for each time series

  • aggfunc (str) –

    • passed to pd.pivot_table, determines how to aggregate duplicates for series_id and datetime

    other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. if categorical data is provided, aggfunc=’first’ is recommended

autots.model_forecast(model_name, model_param_dict, model_transform_dict, df_train, forecast_length: int, frequency: str = 'infer', prediction_interval: float = 0.9, no_negatives: bool = False, constraint: float = None, future_regressor_train=None, future_regressor_forecast=None, holiday_country: str = 'US', startTimeStamps=None, grouping_ids=None, random_seed: int = 2020, verbose: int = 0, n_jobs: int = 'auto', template_cols: list = ['Model', 'ModelParameters', 'TransformationParameters', 'Ensemble'], horizontal_subset: list = None)

Takes numeric data, returns numeric forecasts.

Only one model (albeit potentially an ensemble)! Horizontal ensembles can not be nested, other ensemble types can be.

Well, she turned me into a newt. A newt? I got better. -Python

Parameters
  • model_name (str) – a string to be direct to the appropriate model, used in ModelMonster

  • model_param_dict (dict) – dictionary of parameters to be passed into the model.

  • model_transform_dict (dict) – a dictionary of fillNA and transformation methods to be used pass an empty dictionary if no transformations are desired.

  • df_train (pandas.DataFrame) – numeric training dataset of DatetimeIndex and series as cols

  • forecast_length (int) – number of periods to forecast

  • frequency (str) – str representing frequency alias of time series

  • prediction_interval (float) – width of errors (note: rarely do the intervals accurately match the % asked for…)

  • no_negatives (bool) – whether to force all forecasts to be > 0

  • constraint (float) – when not None, use this value * data st dev above max or below min for constraining forecast values.

  • future_regressor_train (pd.Series) – with datetime index, of known in advance data, section matching train data

  • future_regressor_forecast (pd.Series) – with datetime index, of known in advance data, section matching test data

  • holiday_country (str) – passed through to holiday package, used by a few models as 0/1 regressor.

  • n_jobs (int) – number of CPUs to use when available.

  • template_cols (list) – column names of columns used as model template

  • horizontal_subset (list) – columns of df_train to use for forecast, meant for internal use for horizontal ensembling

Returns

Prediction from AutoTS model object

Return type

PredictionObject (autots.PredictionObject)

autots.create_lagged_regressor(df, forecast_length: int, frequency: str = 'infer', scale: bool = True, summarize: str = None, backfill: str = 'bfill', n_jobs: str = 'auto', fill_na: str = 'ffill')

Create a regressor of features lagged by forecast length. Useful to some models that don’t otherwise use such information.

It is recommended that the .head(forecast_length) of both regressor_train and the df for training are dropped. df = df.iloc[forecast_length:]

Parameters
  • df (pd.DataFrame) – training data

  • forecast_length (int) – length of forecasts, to shift data by

  • frequency (str) – the ever necessary frequency for datetime things. Default ‘infer’

  • scale (bool) – if True, use the StandardScaler to standardize the features

  • summarize (str) – options to summarize the features, if large: ‘pca’, ‘median’, ‘mean’, ‘mean+std’, ‘feature_agglomeration’, ‘gaussian_random_projection’, “auto”

  • backfill (str) – method to deal with the NaNs created by shifting “bfill”- backfill with last values “ETS” -backfill with ETS backwards forecast “DatepartRegression” - backfill with DatepartRegression

  • fill_na (str) – method to prefill NAs in data, same methods as available elsewhere

Returns

regressor_train, regressor_forecast