Tutorial - Time series forecasting¶
Introduction¶
Time series are an ubiquitous type of data in all types of processes. Producing forecasts for them can be highly valuable in domains like retail or industrial manufacture, among many others.
Lightwood supports time series forecasting (both univariate and multivariate inputs), handling many of the pain points commonly associated with setting up a manual time series predictive pipeline.
In this tutorial, we will train a lightwood predictor and analyze its forecasts for the task of counting sunspots in monthly intervals.
Load data¶
Let’s begin by loading the dataset and looking at it:
[5]:
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/mindsdb/benchmarks/main/benchmarks/datasets/monthly_sunspots/data.csv")
df
[5]:
Month | Sunspots | |
---|---|---|
0 | 1749-01 | 58.0 |
1 | 1749-02 | 62.6 |
2 | 1749-03 | 70.0 |
3 | 1749-04 | 55.7 |
4 | 1749-05 | 85.0 |
... | ... | ... |
2815 | 1983-08 | 71.8 |
2816 | 1983-09 | 50.3 |
2817 | 1983-10 | 55.8 |
2818 | 1983-11 | 33.3 |
2819 | 1983-12 | 33.4 |
2820 rows × 2 columns
This is a very simple dataset. It’s got a single column that specifies the month in which the measurement was done, and then in the ‘Sunspots’ column we have the actual quantity we are interested in forecasting. As such, we can characterize this as a univariate time series problem.
Define the predictive task¶
We will use Lightwood high level methods to state what we want to predict. As this is a time series task (because we want to leverage the notion of time to predict), we need to specify a set of arguments that will activate Lightwood’s time series pipeline:
[8]:
from lightwood.api.high_level import ProblemDefinition
[21]:
tss = {'nr_predictions': 6, # the predictor will learn to forecast what the next semester counts will look like (6 data points at monthly intervals -> 6 months)
'order_by': ['Month'], # what column is used to order the entire datset
'window': 12 # how many past values to consider for emitting predictions
}
pdef = ProblemDefinition.from_dict({'target': 'Sunspots', # specify the column to forecast
'timeseries_settings': tss # pass along all time series specific parameters
})
Now, let’s do a very simple train-test split, leaving 10% of the data to check the forecasts that our predictor will produce:
[22]:
cutoff = int(len(df)*0.9)
train = df[:cutoff]
test = df[cutoff:]
print(train.shape, test.shape)
(2538, 2) (282, 2)
Generate the predictor object¶
Now, we can generate code for a machine learning model by using our problem definition and the data:
[23]:
from lightwood.api.high_level import (
json_ai_from_problem,
code_from_json_ai,
predictor_from_code
)
json_ai = json_ai_from_problem(df, problem_definition=pdef)
code = code_from_json_ai(json_ai)
predictor = predictor_from_code(code)
# uncomment this to see the generated code:
# print(code)
INFO:lightwood-46866:Dropping features: []
INFO:lightwood-46866:Analyzing a sample of 2467
INFO:lightwood-46866:from a total population of 2820, this is equivalent to 87.5% of your data.
INFO:lightwood-46866:Using 15 processes to deduct types.
INFO:lightwood-46866:Starting statistical analysis
INFO:lightwood-46866:Finished statistical analysis
[23]:
<A2JDXXBL9A1E16341560437535849.Predictor at 0x15685d970>
Train¶
Okay, everything is ready now for our predictor to learn based on the training data we will provide.
Internally, lightwood cleans and reshapes the data, featurizes measurements and timestamps, and comes up with a handful of different models that will be evaluated to keep the one that produces the best forecasts.
Let’s train the predictor. This should take a couple of minutes, at most:
[27]:
predictor.learn(train)
INFO:lightwood-46866:Dropping features: []
INFO:lightwood-46866:Performing statistical analysis on data
INFO:lightwood-46866:Starting statistical analysis
INFO:lightwood-46866:Finished statistical analysis
INFO:lightwood-46866:Cleaning the data
INFO:lightwood-46866:Transforming timeseries data
INFO:lightwood-46866:Using 15 processes to reshape.
INFO:lightwood-46866:Splitting the data into train/test
INFO:lightwood-46866:Preparing the encoders
INFO:lightwood-46866:Encoder prepping dict length of: 1
INFO:lightwood-46866:Done running for: Sunspots
INFO:lightwood-46866:time series encoder epoch [1/100000] average_loss = 0.020042672178201507
INFO:lightwood-46866:time series encoder epoch [2/100000] average_loss = 0.0077215013273975305
INFO:lightwood-46866:time series encoder epoch [3/100000] average_loss = 0.0064399814919421546
INFO:lightwood-46866:time series encoder epoch [4/100000] average_loss = 0.005441865690967493
INFO:lightwood-46866:time series encoder epoch [5/100000] average_loss = 0.005300704742732801
INFO:lightwood-46866:time series encoder epoch [6/100000] average_loss = 0.004992981385766414
INFO:lightwood-46866:time series encoder epoch [7/100000] average_loss = 0.00491229374157755
INFO:lightwood-46866:time series encoder epoch [8/100000] average_loss = 0.004856080601089879
INFO:lightwood-46866:time series encoder epoch [9/100000] average_loss = 0.004799575188703704
INFO:lightwood-46866:time series encoder epoch [10/100000] average_loss = 0.0047617426566910325
INFO:lightwood-46866:time series encoder epoch [11/100000] average_loss = 0.004732183615366618
INFO:lightwood-46866:time series encoder epoch [12/100000] average_loss = 0.004704843226232026
INFO:lightwood-46866:time series encoder epoch [13/100000] average_loss = 0.004697896095744351
INFO:lightwood-46866:time series encoder epoch [14/100000] average_loss = 0.004687661141679998
INFO:lightwood-46866:time series encoder epoch [15/100000] average_loss = 0.004655592012823674
INFO:lightwood-46866:time series encoder epoch [16/100000] average_loss = 0.004595928704529478
INFO:lightwood-46866:time series encoder epoch [17/100000] average_loss = 0.004568418233018173
INFO:lightwood-46866:time series encoder epoch [18/100000] average_loss = 0.004558674494425456
INFO:lightwood-46866:time series encoder epoch [19/100000] average_loss = 0.004570525518634863
INFO:lightwood-46866:time series encoder epoch [20/100000] average_loss = 0.004572713087525284
INFO:lightwood-46866:time series encoder epoch [21/100000] average_loss = 0.004563712864591364
INFO:lightwood-46866:time series encoder epoch [22/100000] average_loss = 0.004498099365778136
INFO:lightwood-46866:time series encoder epoch [23/100000] average_loss = 0.004449873953534846
INFO:lightwood-46866:time series encoder epoch [24/100000] average_loss = 0.004484773205037703
INFO:lightwood-46866:time series encoder epoch [25/100000] average_loss = 0.004398583738427413
INFO:lightwood-46866:time series encoder epoch [26/100000] average_loss = 0.004340721536100957
INFO:lightwood-46866:time series encoder epoch [27/100000] average_loss = 0.004394709227377908
INFO:lightwood-46866:time series encoder epoch [28/100000] average_loss = 0.004414253694969311
INFO:lightwood-46866:time series encoder epoch [29/100000] average_loss = 0.0043628366892797905
INFO:lightwood-46866:time series encoder epoch [30/100000] average_loss = 0.0042474141246394105
INFO:lightwood-46866:time series encoder epoch [31/100000] average_loss = 0.004357850760744329
INFO:lightwood-46866:time series encoder epoch [32/100000] average_loss = 0.004315985190240961
INFO:lightwood-46866:time series encoder epoch [33/100000] average_loss = 0.00410254764975163
INFO:lightwood-46866:time series encoder epoch [34/100000] average_loss = 0.004112129096399274
INFO:lightwood-46866:time series encoder epoch [35/100000] average_loss = 0.004205447932084401
INFO:lightwood-46866:time series encoder epoch [36/100000] average_loss = 0.004242659451668723
INFO:lightwood-46866:time series encoder epoch [37/100000] average_loss = 0.0042895584252842685
INFO:lightwood-46866:time series encoder epoch [38/100000] average_loss = 0.00440603481572971
INFO:lightwood-46866:time series encoder epoch [39/100000] average_loss = 0.004132882597153647
INFO:lightwood-46866:time series encoder epoch [40/100000] average_loss = 0.0040611259769975094
INFO:lightwood-46866:time series encoder epoch [41/100000] average_loss = 0.00396897013772998
INFO:lightwood-46866:time series encoder epoch [42/100000] average_loss = 0.003915625183205856
INFO:lightwood-46866:time series encoder epoch [43/100000] average_loss = 0.003940282500626748
INFO:lightwood-46866:time series encoder epoch [44/100000] average_loss = 0.004178977953760247
INFO:lightwood-46866:Featurizing the data
INFO:lightwood-46866:Training the mixers
/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/engine.py:151: UserWarning: Found `num_iterations` in params. Will use it instead of argument
warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))
WARNING:lightwood-46866:LightGBM running on CPU, this somewhat slower than the GPU version, consider using a GPU instead
WARNING:lightwood-46866:LightGBM running on CPU, this somewhat slower than the GPU version, consider using a GPU instead
WARNING:lightwood-46866:LightGBM running on CPU, this somewhat slower than the GPU version, consider using a GPU instead
WARNING:lightwood-46866:LightGBM running on CPU, this somewhat slower than the GPU version, consider using a GPU instead
WARNING:lightwood-46866:LightGBM running on CPU, this somewhat slower than the GPU version, consider using a GPU instead
WARNING:lightwood-46866:LightGBM running on CPU, this somewhat slower than the GPU version, consider using a GPU instead
/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py:116: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn("torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.")
/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/pytorch_ranger/ranger.py:172: UserWarning: This overload of addcmul_ is deprecated:
addcmul_(Number value, Tensor tensor1, Tensor tensor2)
Consider using one of the following signatures instead:
addcmul_(Tensor tensor1, Tensor tensor2, *, Number value) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:1005.)
exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
INFO:lightwood-46866:Loss of 0.539688229560852 with learning rate 0.0001
INFO:lightwood-46866:Loss of 0.7796856760978699 with learning rate 0.00014
INFO:lightwood-46866:Found learning rate of: 0.0001
DEBUG:lightwood-46866:Loss @ epoch 1: 0.6908893585205078
DEBUG:lightwood-46866:Loss @ epoch 2: 0.6882499903440475
DEBUG:lightwood-46866:Loss @ epoch 3: 0.6850549429655075
DEBUG:lightwood-46866:Loss @ epoch 4: 0.6813623607158661
DEBUG:lightwood-46866:Loss @ epoch 5: 0.6772531270980835
DEBUG:lightwood-46866:Loss @ epoch 6: 0.6728083938360214
DEBUG:lightwood-46866:Loss @ epoch 7: 0.6652606427669525
DEBUG:lightwood-46866:Loss @ epoch 8: 0.6601350754499435
DEBUG:lightwood-46866:Loss @ epoch 9: 0.6548376232385635
DEBUG:lightwood-46866:Loss @ epoch 10: 0.6494599282741547
DEBUG:lightwood-46866:Loss @ epoch 11: 0.6441417187452316
DEBUG:lightwood-46866:Loss @ epoch 12: 0.6389893442392349
DEBUG:lightwood-46866:Loss @ epoch 13: 0.6309126764535904
DEBUG:lightwood-46866:Loss @ epoch 14: 0.6257634907960892
DEBUG:lightwood-46866:Loss @ epoch 15: 0.6205589026212692
DEBUG:lightwood-46866:Loss @ epoch 16: 0.6152833849191666
DEBUG:lightwood-46866:Loss @ epoch 17: 0.6099573820829391
DEBUG:lightwood-46866:Loss @ epoch 18: 0.6046575754880905
DEBUG:lightwood-46866:Loss @ epoch 19: 0.5962131917476654
DEBUG:lightwood-46866:Loss @ epoch 20: 0.5909084677696228
DEBUG:lightwood-46866:Loss @ epoch 21: 0.5856661349534988
DEBUG:lightwood-46866:Loss @ epoch 22: 0.5805662572383881
DEBUG:lightwood-46866:Loss @ epoch 23: 0.575617328286171
DEBUG:lightwood-46866:Loss @ epoch 24: 0.5707968175411224
DEBUG:lightwood-46866:Loss @ epoch 25: 0.5632813721895218
DEBUG:lightwood-46866:Loss @ epoch 26: 0.5587586611509323
DEBUG:lightwood-46866:Loss @ epoch 27: 0.554344117641449
DEBUG:lightwood-46866:Loss @ epoch 28: 0.5499386340379715
DEBUG:lightwood-46866:Loss @ epoch 29: 0.5455891937017441
DEBUG:lightwood-46866:Loss @ epoch 30: 0.5413248538970947
DEBUG:lightwood-46866:Loss @ epoch 31: 0.5345934927463531
DEBUG:lightwood-46866:Loss @ epoch 32: 0.5304456949234009
DEBUG:lightwood-46866:Loss @ epoch 33: 0.526373103260994
DEBUG:lightwood-46866:Loss @ epoch 34: 0.5223924517631531
DEBUG:lightwood-46866:Loss @ epoch 35: 0.5184392035007477
DEBUG:lightwood-46866:Loss @ epoch 36: 0.5145991444587708
DEBUG:lightwood-46866:Loss @ epoch 37: 0.5086493343114853
DEBUG:lightwood-46866:Loss @ epoch 38: 0.5050476491451263
DEBUG:lightwood-46866:Loss @ epoch 39: 0.5015637576580048
DEBUG:lightwood-46866:Loss @ epoch 40: 0.49815742671489716
DEBUG:lightwood-46866:Loss @ epoch 41: 0.4948585033416748
DEBUG:lightwood-46866:Loss @ epoch 42: 0.49173182249069214
DEBUG:lightwood-46866:Loss @ epoch 43: 0.48690974712371826
DEBUG:lightwood-46866:Loss @ epoch 44: 0.4839773178100586
DEBUG:lightwood-46866:Loss @ epoch 45: 0.4811210632324219
DEBUG:lightwood-46866:Loss @ epoch 46: 0.4783552885055542
DEBUG:lightwood-46866:Loss @ epoch 47: 0.4757150560617447
DEBUG:lightwood-46866:Loss @ epoch 48: 0.47318898141384125
DEBUG:lightwood-46866:Loss @ epoch 49: 0.46942955255508423
DEBUG:lightwood-46866:Loss @ epoch 50: 0.4671967923641205
DEBUG:lightwood-46866:Loss @ epoch 51: 0.4650762975215912
DEBUG:lightwood-46866:Loss @ epoch 52: 0.4630257934331894
DEBUG:lightwood-46866:Loss @ epoch 53: 0.46110378205776215
DEBUG:lightwood-46866:Loss @ epoch 54: 0.45930930972099304
DEBUG:lightwood-46866:Loss @ epoch 55: 0.45666399598121643
DEBUG:lightwood-46866:Loss @ epoch 56: 0.4550795406103134
DEBUG:lightwood-46866:Loss @ epoch 57: 0.4535674601793289
DEBUG:lightwood-46866:Loss @ epoch 58: 0.45216208696365356
DEBUG:lightwood-46866:Loss @ epoch 59: 0.45088090002536774
DEBUG:lightwood-46866:Loss @ epoch 60: 0.4496418982744217
DEBUG:lightwood-46866:Loss @ epoch 61: 0.4477883279323578
DEBUG:lightwood-46866:Loss @ epoch 62: 0.4467353969812393
DEBUG:lightwood-46866:Loss @ epoch 63: 0.4457828402519226
DEBUG:lightwood-46866:Loss @ epoch 64: 0.4448719322681427
DEBUG:lightwood-46866:Loss @ epoch 65: 0.44403648376464844
DEBUG:lightwood-46866:Loss @ epoch 66: 0.44328153133392334
DEBUG:lightwood-46866:Loss @ epoch 67: 0.44207488000392914
DEBUG:lightwood-46866:Loss @ epoch 68: 0.4413738548755646
DEBUG:lightwood-46866:Loss @ epoch 69: 0.44084450602531433
DEBUG:lightwood-46866:Loss @ epoch 70: 0.4403578191995621
DEBUG:lightwood-46866:Loss @ epoch 71: 0.4398685395717621
DEBUG:lightwood-46866:Loss @ epoch 72: 0.43935835361480713
DEBUG:lightwood-46866:Loss @ epoch 73: 0.43840254843235016
DEBUG:lightwood-46866:Loss @ epoch 74: 0.4378361850976944
DEBUG:lightwood-46866:Loss @ epoch 75: 0.4375789165496826
DEBUG:lightwood-46866:Loss @ epoch 76: 0.43739429116249084
DEBUG:lightwood-46866:Loss @ epoch 77: 0.4372607320547104
DEBUG:lightwood-46866:Loss @ epoch 78: 0.43708017468452454
DEBUG:lightwood-46866:Loss @ epoch 79: 0.4364318400621414
DEBUG:lightwood-46866:Loss @ epoch 80: 0.43584632873535156
DEBUG:lightwood-46866:Loss @ epoch 81: 0.4356466382741928
DEBUG:lightwood-46866:Loss @ epoch 82: 0.4355204701423645
DEBUG:lightwood-46866:Loss @ epoch 83: 0.43557313084602356
DEBUG:lightwood-46866:Loss @ epoch 84: 0.43554021418094635
DEBUG:lightwood-46866:Loss @ epoch 85: 0.43514105677604675
DEBUG:lightwood-46866:Loss @ epoch 86: 0.43462760746479034
DEBUG:lightwood-46866:Loss @ epoch 87: 0.43442972004413605
DEBUG:lightwood-46866:Loss @ epoch 88: 0.43443459272384644
DEBUG:lightwood-46866:Loss @ epoch 89: 0.4344787895679474
DEBUG:lightwood-46866:Loss @ epoch 90: 0.4345344454050064
DEBUG:lightwood-46866:Loss @ epoch 1: 0.329136921600862
DEBUG:lightwood-46866:Loss @ epoch 2: 0.3284675722772425
DEBUG:lightwood-46866:Loss @ epoch 3: 0.33007449995387683
DEBUG:lightwood-46866:Loss @ epoch 4: 0.32765168764374475
DEBUG:lightwood-46866:Loss @ epoch 5: 0.3260806582190774
DEBUG:lightwood-46866:Loss @ epoch 6: 0.3272357068278573
DEBUG:lightwood-46866:Loss @ epoch 7: 0.3281749730760401
INFO:lightwood-46866:Started fitting LGBM models for array prediction
INFO:lightwood-46866:Started fitting LGBM model
/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/engine.py:151: UserWarning: Found `num_iterations` in params. Will use it instead of argument
warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))
INFO:lightwood-46866:A single GBM iteration takes 0.1 seconds
INFO:lightwood-46866:Training GBM (<module 'lightgbm' from '/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/__init__.py'>) with 1325 iterations given 165.66666666666666 seconds constraint
/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/engine.py:156: UserWarning: Found `early_stopping_rounds` in params. Will use it instead of argument
warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))
INFO:lightwood-46866:Lightgbm model contains 1 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 2 weak estimators
INFO:lightwood-46866:Started fitting LGBM model
INFO:lightwood-46866:A single GBM iteration takes 0.1 seconds
INFO:lightwood-46866:Training GBM (<module 'lightgbm' from '/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/__init__.py'>) with 1325 iterations given 165.66666666666666 seconds constraint
INFO:lightwood-46866:Lightgbm model contains 1 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 2 weak estimators
INFO:lightwood-46866:Started fitting LGBM model
INFO:lightwood-46866:A single GBM iteration takes 0.1 seconds
INFO:lightwood-46866:Training GBM (<module 'lightgbm' from '/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/__init__.py'>) with 1325 iterations given 165.66666666666666 seconds constraint
INFO:lightwood-46866:Lightgbm model contains 1 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 2 weak estimators
INFO:lightwood-46866:Started fitting LGBM model
INFO:lightwood-46866:A single GBM iteration takes 0.1 seconds
INFO:lightwood-46866:Training GBM (<module 'lightgbm' from '/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/__init__.py'>) with 1325 iterations given 165.66666666666666 seconds constraint
INFO:lightwood-46866:Lightgbm model contains 1 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 2 weak estimators
INFO:lightwood-46866:Started fitting LGBM model
INFO:lightwood-46866:A single GBM iteration takes 0.1 seconds
INFO:lightwood-46866:Training GBM (<module 'lightgbm' from '/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/__init__.py'>) with 1325 iterations given 165.66666666666666 seconds constraint
INFO:lightwood-46866:Lightgbm model contains 1 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 2 weak estimators
INFO:lightwood-46866:Started fitting LGBM model
INFO:lightwood-46866:A single GBM iteration takes 0.1 seconds
INFO:lightwood-46866:Training GBM (<module 'lightgbm' from '/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/__init__.py'>) with 1325 iterations given 165.66666666666666 seconds constraint
INFO:lightwood-46866:Lightgbm model contains 1 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 2 weak estimators
INFO:lightwood-46866:Ensembling the mixer
INFO:lightwood-46866:Mixer: Neural got accuracy: 0.19612012470445245
INFO:lightwood-46866:Mixer: LightGBMArray got accuracy: 0.21013741093675975
INFO:lightwood-46866:Picked best mixer: LightGBMArray
INFO:lightwood-46866:Analyzing the ensemble of mixers
INFO:lightwood-46866:Adjustment on validation requested.
INFO:lightwood-46866:Updating the mixers
/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py:116: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn("torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.")
DEBUG:lightwood-46866:Loss @ epoch 1: 0.33339183280865353
DEBUG:lightwood-46866:Loss @ epoch 2: 0.3303144524494807
DEBUG:lightwood-46866:Loss @ epoch 3: 0.330986554423968
DEBUG:lightwood-46866:Loss @ epoch 4: 0.3315189927816391
DEBUG:lightwood-46866:Loss @ epoch 5: 0.33072087665398914
DEBUG:lightwood-46866:Loss @ epoch 6: 0.33309372514486313
INFO:lightwood-46866:Updating array of LGBM models...
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/engine.py:151: UserWarning: Found `num_iterations` in params. Will use it instead of argument
warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))
/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/lightgbm/engine.py:156: UserWarning: Found `early_stopping_rounds` in params. Will use it instead of argument
warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))
INFO:lightwood-46866:Model now has a total of 3 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 3 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 3 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 3 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 3 weak estimators
INFO:lightwood-46866:Updating lightgbm model with 1 iterations
INFO:lightwood-46866:Model now has a total of 3 weak estimators
Predict¶
Once the predictor has trained, we can use it to generate 6-month forecasts for each of the test set data points:
[28]:
forecasts = predictor.predict(test)
INFO:lightwood-46866:Dropping features: []
INFO:lightwood-46866:Cleaning the data
INFO:lightwood-46866:Transforming timeseries data
INFO:lightwood-46866:Featurizing the data
/Users/Pato/Work/MindsDB/env/lib/python3.8/site-packages/pandas/core/indexing.py:1637: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
INFO:lightwood-46866:AccStats.explain() has not been implemented, no modifications will be done to the data insights.
INFO:lightwood-46866:GlobalFeatureImportance.explain() has not been implemented, no modifications will be done to the data insights.
Let’s check how a single row might look:
[37]:
forecasts.iloc[[10]]
[37]:
prediction | truth | order_Month | confidence | lower | upper | anomaly | |
---|---|---|---|---|---|---|---|
10 | [51.28799878891615, 46.76867159945164, 52.0899... | 51.0 | [-272577600.0, -269899200.0, -267220800.0, -26... | [0.24, 0.24, 0.24, 0.24, 0.24, 0.24] | [30.80746268275371, 26.288135493289204, 31.609... | [71.76853489507859, 67.24920770561408, 72.5704... | False |
You’ll note that the point prediction
has associated lower
and upper
bounds that are a function of the estimated confidence
the model has on its own output. Apart from this, order_Month
yields the timestamps of each prediction, truth
lets us know what is the one-step-ahead observed value (if it exists at all). Finally, the anomaly
tag will let you know if the observed value falls outside of the predicted region.
Visualizing a forecast¶
Okay, time series are much easier to appreciate through plots. Let’s make one:
NOTE: We will use matplotlib
to generate a simple plot of these forecasts. If you want to run this notebook locally, you will need to pip install matplotlib
for the following code to work.
[38]:
import matplotlib.pyplot as plt
[69]:
plt.figure(figsize=(12, 8))
plt.plot(forecasts['truth'].iloc[-24:], color='green', label='observed series')
plt.plot([None for _ in range(forecasts.shape[0])] + forecasts.iloc[-1]['prediction'], color='purple', label='point prediction')
plt.plot([None for _ in range(forecasts.shape[0])] + forecasts.iloc[-1]['lower'], color='grey')
plt.plot([None for _ in range(forecasts.shape[0])] + forecasts.iloc[-1]['upper'], color='grey')
plt.xlabel('timestep')
plt.ylabel('# sunspots')
plt.title("Forecasted amount of sunspots for the next semester")
plt.legend()
plt.show()

Conclusion¶
In this tutorial, we have gone through how you can train a machine learning model with Lightwood to produce forecasts for a univariate time series task.
There are additional parameters to further customize your timeseries settings and/or prediction insights, so be sure to check the rest of the documentation.