autots.datasets package¶
Submodules¶
autots.datasets.fred module¶
FRED (Federal Reserve Economic Data) Data Import
requires API key from FRED and pip install fredapi
-
autots.datasets.fred.
get_fred_data
(fredkey: str, SeriesNameDict: dict = None, long=True, observation_start=None, sleep_seconds: int = 1, **kwargs)¶ Imports Data from Federal Reserve. For simplest results, make sure requested series are all of the same frequency.
- Parameters:
fredkey (str) – an API key from FRED
SeriesNameDict (dict) – pairs of FRED Series IDs and Series Names like: {‘SeriesID’: ‘SeriesName’} or a list of FRED IDs. Series id must match Fred IDs, but name can be anything if None, several default series are returned
long (bool) – if True, return long style data, else return wide style data with dt index
observation_start (datetime) – passed to Fred get_series
sleep_seconds (int) – seconds to sleep between each series call, reduces failure chance usually
Module contents¶
Tools for Importing Sample Data
-
autots.datasets.
load_daily
(long: bool = True)¶ 2020 Covid, Air Pollution, and Economic Data.
Sources: Covid Tracking Project, EPA, and FRED
- Parameters:
long (bool) – if True, return data in long format. Otherwise return wide
-
autots.datasets.
load_monthly
(long: bool = True)¶ Federal Reserve of St. Louis monthly economic indicators.
-
autots.datasets.
load_yearly
(long: bool = True)¶ Federal Reserve of St. Louis annual economic indicators.
-
autots.datasets.
load_hourly
(long: bool = True)¶ Traffic data from the MN DOT via the UCI data repository.
-
autots.datasets.
load_weekly
(long: bool = True)¶ Weekly petroleum industry data from the EIA.
-
autots.datasets.
load_weekdays
(long: bool = False, categorical: bool = True, periods: int = 180)¶ Test edge cases by creating a Series with values as day of week.
- Parameters:
long (bool) – if True, return a df with columns “value” and “datetime” if False, return a Series with dt index
categorical (bool) – if True, return str/object, else return int
periods (int) – number of periods, ie length of data to generate
-
autots.datasets.
load_live_daily
(long: bool = False, fred_key: str = None, fred_series=['DGS10', 'T5YIE', 'SP500', 'DCOILWTICO', 'DEXUSEU', 'WPU0911'], observation_start: str = '2000-01-01', tickers: list = ['MSFT'], trends_list: list = ['forecasting', 'cycling', 'microsoft'], trends_geo: str = 'US', weather_data_types: list = ['AWND', 'WSF2', 'TAVG'], weather_stations: list = ['USW00094846', 'USW00014925'], weather_years: int = 10, london_air_stations: list = ['CT3', 'SK8'], london_air_species: str = 'PM25', london_air_days: int = 180, earthquake_days: int = 180, earthquake_min_magnitude: int = 5, gsa_key: str = None, gov_domain_list=['nasa.gov'], gov_domain_limit: int = 600, weather_event_types=['%28Z%29+Winter+Weather', '%28Z%29+Winter+Storm'], timeout: float = 300.05, sleep_seconds: int = 1)¶ Generates a dataframe of data up to the present day. Requires active internet connection. Pass None instead of specification lists to exclude a data source.
- Parameters:
long (bool) – whether to return in long format or wide
fred_key (str) – https://fred.stlouisfed.org/docs/api/api_key.html
fred_series (list) – list of FRED series IDs. This requires fredapi package
observation_start (datetime) – earliest day to retrieve, passed to Fred.get_series and yfinance.history
tickers (list) – list of stock tickers, requires yfinance
trends_list (list) – list of search keywords, requires pytrends. None to skip.
weather_data_types (list) – from NCEI NOAA api data types, GHCN Daily Weather Elements PRCP, SNOW, TMAX, TMIN, TAVG, AWND, WSF1, WSF2, WSF5, WSFG
weather_stations (list) – from NCEI NOAA api station ids. Pass empty list to skip.
london_air_stations (list) – londonair.org.uk source station IDs. Pass empty list to skip.
london_species (str) – what measurement to pull from London Air. Not all stations have all metrics.
earthquake_min_magnitude (int) – smallest earthquake magnitude to pull from earthquake.usgs.gov. Set None to skip this.
gsa_key (str) – api key from https://open.gsa.gov/api/dap/
gov_domain_list (list) – dist of government run domains to get traffic data for. Can be very slow, so fewer is better. some examples: [‘usps.com’, ‘ncbi.nlm.nih.gov’, ‘cdc.gov’, ‘weather.gov’, ‘irs.gov’, “usajobs.gov”, “studentaid.gov”, ‘nasa.gov’, “uk.usembassy.gov”, “tsunami.gov”]
gov_domain_limit (int) – max number of records. Smaller will be faster. Max is currently 10000.
weather_event_types (list) – list of html encoded severe weather event types https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/Storm-Data-Export-Format.pdf
timeout (float) – used by some queries
sleep_seconds (int) – increasing this may reduce probability of server download failures
-
autots.datasets.
load_zeroes
(long=False, shape=None, start_date: str = '2021-01-01')¶ Create a dataset of just zeroes for testing edge case.
-
autots.datasets.
load_linear
(long=False, shape=None, start_date: str = '2021-01-01', introduce_nan: float = None, introduce_random: float = None, random_seed: int = 123)¶ Create a dataset of just zeroes for testing edge case.
- Parameters:
long (bool) – whether to make long or wide
shape (tuple) – shape of output dataframe
start_date (str) – first date of index
introduce_nan (float) – percent of rows to make null. 0.2 = 20%
introduce_random (float) – shape of gamma distribution
random_seed (int) – seed for random
-
autots.datasets.
load_sine
(long=False, shape=None, start_date: str = '2021-01-01')¶ Create a dataset of just zeroes for testing edge case.
-
autots.datasets.
load_artificial
(long=False, date_start=None, date_end=None)¶ Load artifically generated series from random distributions.
- Parameters:
long (bool) – if True long style data, if False, wide style data
date_start – str or datetime.datetime of start date
date_end – str or datetime.datetime of end date