autots.datasets package

Submodules

autots.datasets.fred module

FRED (Federal Reserve Economic Data) Data Import

requires API key from FRED and pip install fredapi

autots.datasets.fred.get_fred_data(fredkey: str, SeriesNameDict: dict = None, long=True, observation_start=None, sleep_seconds: int = 1, **kwargs)

Imports Data from Federal Reserve. For simplest results, make sure requested series are all of the same frequency.

Parameters
  • fredkey (str) – an API key from FRED

  • SeriesNameDict (dict) – pairs of FRED Series IDs and Series Names like: {‘SeriesID’: ‘SeriesName’} or a list of FRED IDs. Series id must match Fred IDs, but name can be anything if None, several default series are returned

  • long (bool) – if True, return long style data, else return wide style data with dt index

  • observation_start (datetime) – passed to Fred get_series

  • sleep_seconds (int) – seconds to sleep between each series call, reduces failure chance usually

Module contents

Tools for Importing Sample Data

autots.datasets.load_daily(long: bool = True)

2020 Covid, Air Pollution, and Economic Data.

Sources: Covid Tracking Project, EPA, and FRED

Parameters

long (bool) – if True, return data in long format. Otherwise return wide

autots.datasets.load_monthly(long: bool = True)

Federal Reserve of St. Louis monthly economic indicators.

autots.datasets.load_yearly(long: bool = True)

Federal Reserve of St. Louis annual economic indicators.

autots.datasets.load_hourly(long: bool = True)

Traffic data from the MN DOT via the UCI data repository.

autots.datasets.load_weekly(long: bool = True)

Weekly petroleum industry data from the EIA.

autots.datasets.load_weekdays(long: bool = False, categorical: bool = True, periods: int = 180)

Test edge cases by creating a Series with values as day of week.

Parameters
  • long (bool) – if True, return a df with columns “value” and “datetime” if False, return a Series with dt index

  • categorical (bool) – if True, return str/object, else return int

  • periods (int) – number of periods, ie length of data to generate

autots.datasets.load_live_daily(long: bool = False, fred_key: str = None, fred_series=['DGS10', 'T5YIE', 'SP500', 'DCOILWTICO', 'DEXUSEU', 'WPU0911'], observation_start: str = '2000-01-01', tickers: list = ['MSFT'], trends_list: list = ['forecasting', 'cycling', 'microsoft'], trends_geo: str = 'US', weather_data_types: list = ['AWND', 'WSF2', 'TAVG'], weather_stations: list = ['USW00094846', 'USW00014925'], weather_years: int = 10, london_air_stations: list = ['CT3', 'SK8'], london_air_species: str = 'PM25', london_air_days: int = 180, earthquake_days: int = 180, earthquake_min_magnitude: int = 5, gsa_key: str = None, gov_domain_list=['nasa.gov'], gov_domain_limit: int = 600, weather_event_types=['%28Z%29+Winter+Weather', '%28Z%29+Winter+Storm'], timeout: float = 300.05, sleep_seconds: int = 1)

Generates a dataframe of data up to the present day. Requires active internet connection. Pass None instead of specification lists to exclude a data source.

Parameters
  • long (bool) – whether to return in long format or wide

  • fred_key (str) – https://fred.stlouisfed.org/docs/api/api_key.html

  • fred_series (list) – list of FRED series IDs. This requires fredapi package

  • observation_start (datetime) – earliest day to retrieve, passed to Fred.get_series and yfinance.history

  • tickers (list) – list of stock tickers, requires yfinance

  • trends_list (list) – list of search keywords, requires pytrends. None to skip.

  • weather_data_types (list) – from NCEI NOAA api data types, GHCN Daily Weather Elements PRCP, SNOW, TMAX, TMIN, TAVG, AWND, WSF1, WSF2, WSF5, WSFG

  • weather_stations (list) – from NCEI NOAA api station ids. Pass empty list to skip.

  • london_air_stations (list) – londonair.org.uk source station IDs. Pass empty list to skip.

  • london_species (str) – what measurement to pull from London Air. Not all stations have all metrics.

  • earthquake_min_magnitude (int) – smallest earthquake magnitude to pull from earthquake.usgs.gov. Set None to skip this.

  • gsa_key (str) – api key from https://open.gsa.gov/api/dap/

  • gov_domain_list (list) – dist of government run domains to get traffic data for. Can be very slow, so fewer is better. some examples: [‘usps.com’, ‘ncbi.nlm.nih.gov’, ‘cdc.gov’, ‘weather.gov’, ‘irs.gov’, “usajobs.gov”, “studentaid.gov”, ‘nasa.gov’, “uk.usembassy.gov”, “tsunami.gov”]

  • gov_domain_limit (int) – max number of records. Smaller will be faster. Max is currently 10000.

  • weather_event_types (list) – list of html encoded severe weather event types https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/Storm-Data-Export-Format.pdf

  • timeout (float) – used by some queries

  • sleep_seconds (int) – increasing this may reduce probability of server download failures

autots.datasets.load_zeroes(long=False, shape=None, start_date: str = '2021-01-01')

Create a dataset of just zeroes for testing edge case.

autots.datasets.load_linear(long=False, shape=None, start_date: str = '2021-01-01', introduce_nan: float = None, introduce_random: float = None, random_seed: int = 123)

Create a dataset of just zeroes for testing edge case.

Parameters
  • long (bool) – whether to make long or wide

  • shape (tuple) – shape of output dataframe

  • start_date (str) – first date of index

  • introduce_nan (float) – percent of rows to make null. 0.2 = 20%

  • introduce_random (float) – shape of gamma distribution

  • random_seed (int) – seed for random

autots.datasets.load_sine(long=False, shape=None, start_date: str = '2021-01-01')

Create a dataset of just zeroes for testing edge case.