Experiment of CO2 measurements in Mauna Loa, Hawaii, using a single output Gaussian process with a spectral mixture kernel. The data set contains daily measurements of CO2 in the air from 1958 to 2001. We will resample the data to obtain 10 averaged samples per year. That is, any yearly pattern will be at $\frac{1}{10} = 0.1$.
import mogptk
import numpy as np
from sklearn.datasets import fetch_openml
def load_mauna_loa_atmospheric_co2():
ml_data = fetch_openml(data_id=41187)
months = []
ppmv_sums = []
counts = []
y = ml_data.data['year']
m = ml_data.data['month']
month_float = y + (m - 1) / 12
ppmvs = ml_data.target
for month, ppmv in zip(month_float, ppmvs):
if not months or month != months[-1]:
months.append(month)
ppmv_sums.append(ppmv)
counts.append(1)
else:
# aggregate monthly sum to produce average
ppmv_sums[-1] += ppmv
counts[-1] += 1
months = np.asarray(months).reshape(-1)
avg_ppmvs = np.asarray(ppmv_sums) / counts
return months, avg_ppmvs
First we load the dataset, and define a variable stop
with the index that separates between train and test
# load dataset
x, y = load_mauna_loa_atmospheric_co2()
# stop omde to separate train from test
stop = 200
data = mogptk.Data(x, y, name='Mauna Loa')
data.remove_range(start=x[stop])
data.transform(mogptk.TransformDetrend(3))
data.plot();
We initialize the model with random parameters and show the spectral density of the kernel. As we are only taking random values, there is no relation with the data.
# create model
model = mogptk.SM(data, Q=3)
model.plot_spectrum(title='SD with random parameters');
Then we initialize the parameters before training, using Bayesian Nonparametric spectral estimation (BNSE) (Tobar 2017), and use the estimated Power spectral density (PSD) to define initial spectral mean and magnitudes.
method = 'BNSE'
model.init_parameters(method)
model.plot_spectrum(title='PSD with {} initialization'.format(method));
Then we train the model and show the power spectral density of the trained model.
model.log_marginal_likelihood()
-244.87747801589666
model.train(method='Adam', iters=500, lr=0.05, plot=True, error='MAE', verbose=True)
model.plot_spectrum(title='PSD with model trained');
Starting optimization using Adam ‣ Model: SM ‣ Channels: 1 ‣ Parameters: 10 ‣ Training points: 201 ‣ Initial loss: 244.877 ‣ Initial error: 1.4977 Start Adam: 0/500 0:00:00 loss= 244.877 error= 1.4977 5/500 0:00:00 loss= 248.806 error= 1.63836 10/500 0:00:00 loss= 241.626 error= 1.6542 15/500 0:00:00 loss= 234.74 error= 1.49014 20/500 0:00:00 loss= 233.613 error= 1.50605 25/500 0:00:01 loss= 230.106 error= 1.54659 30/500 0:00:01 loss= 227.66 error= 1.52707 35/500 0:00:02 loss= 225.626 error= 1.45765 40/500 0:00:02 loss= 223.288 error= 1.47511 45/500 0:00:02 loss= 221.532 error= 1.50617 50/500 0:00:03 loss= 219.716 error= 1.45195 55/500 0:00:03 loss= 218.113 error= 1.45226 60/500 0:00:03 loss= 216.701 error= 1.47307 65/500 0:00:03 loss= 215.391 error= 1.43962 70/500 0:00:03 loss= 214.239 error= 1.43845 75/500 0:00:03 loss= 213.229 error= 1.44502 80/500 0:00:04 loss= 212.339 error= 1.42374 85/500 0:00:04 loss= 211.565 error= 1.42493 90/500 0:00:04 loss= 210.903 error= 1.42007 95/500 0:00:04 loss= 210.339 error= 1.40785 100/500 0:00:04 loss= 209.862 error= 1.40813 105/500 0:00:04 loss= 209.461 error= 1.39803 110/500 0:00:04 loss= 209.126 error= 1.39287 115/500 0:00:05 loss= 208.847 error= 1.38807 120/500 0:00:05 loss= 208.615 error= 1.37975 125/500 0:00:05 loss= 208.421 error= 1.37582 130/500 0:00:05 loss= 208.258 error= 1.36827 135/500 0:00:05 loss= 208.12 error= 1.36311 140/500 0:00:05 loss= 207.999 error= 1.35681 145/500 0:00:05 loss= 207.893 error= 1.35064 150/500 0:00:06 loss= 207.798 error= 1.34485 155/500 0:00:06 loss= 207.71 error= 1.33825 160/500 0:00:06 loss= 207.627 error= 1.33235 165/500 0:00:06 loss= 207.548 error= 1.32587 170/500 0:00:06 loss= 207.471 error= 1.3202 175/500 0:00:06 loss= 207.396 error= 1.31407 180/500 0:00:06 loss= 207.322 error= 1.30824 185/500 0:00:06 loss= 207.249 error= 1.30231 190/500 0:00:07 loss= 207.176 error= 1.29668 195/500 0:00:07 loss= 207.103 error= 1.29093 200/500 0:00:07 loss= 207.03 error= 1.28543 205/500 0:00:07 loss= 206.958 error= 1.27999 210/500 0:00:07 loss= 206.885 error= 1.27471 215/500 0:00:07 loss= 206.812 error= 1.26961 220/500 0:00:07 loss= 206.74 error= 1.26466 225/500 0:00:08 loss= 206.667 error= 1.25979 230/500 0:00:08 loss= 206.594 error= 1.25511 235/500 0:00:08 loss= 206.52 error= 1.25053 240/500 0:00:08 loss= 206.447 error= 1.24612 245/500 0:00:08 loss= 206.373 error= 1.24184 250/500 0:00:09 loss= 206.3 error= 1.23772 255/500 0:00:09 loss= 206.226 error= 1.23378 260/500 0:00:09 loss= 206.152 error= 1.23 265/500 0:00:09 loss= 206.077 error= 1.22637 270/500 0:00:09 loss= 206.003 error= 1.22289 275/500 0:00:09 loss= 205.929 error= 1.21957 280/500 0:00:09 loss= 205.854 error= 1.2164 285/500 0:00:09 loss= 205.78 error= 1.21337 290/500 0:00:10 loss= 205.706 error= 1.2105 295/500 0:00:10 loss= 205.631 error= 1.20777 300/500 0:00:10 loss= 205.557 error= 1.20518 305/500 0:00:10 loss= 205.483 error= 1.20272 310/500 0:00:10 loss= 205.409 error= 1.20044 315/500 0:00:10 loss= 205.335 error= 1.19832 320/500 0:00:10 loss= 205.262 error= 1.19636 325/500 0:00:10 loss= 205.189 error= 1.19453 330/500 0:00:10 loss= 205.117 error= 1.19284 335/500 0:00:11 loss= 205.045 error= 1.19125 340/500 0:00:11 loss= 204.973 error= 1.18975 345/500 0:00:11 loss= 204.903 error= 1.18832 350/500 0:00:11 loss= 204.832 error= 1.18696 355/500 0:00:11 loss= 204.763 error= 1.18565 360/500 0:00:11 loss= 204.694 error= 1.1844 365/500 0:00:11 loss= 204.626 error= 1.18319 370/500 0:00:11 loss= 204.56 error= 1.182 375/500 0:00:12 loss= 204.494 error= 1.18085 380/500 0:00:12 loss= 204.429 error= 1.17973 385/500 0:00:12 loss= 204.365 error= 1.17862 390/500 0:00:12 loss= 204.302 error= 1.17751 395/500 0:00:12 loss= 204.24 error= 1.17642 400/500 0:00:12 loss= 204.18 error= 1.17536 405/500 0:00:12 loss= 204.12 error= 1.17429 410/500 0:00:13 loss= 204.062 error= 1.17321 415/500 0:00:13 loss= 204.005 error= 1.1721 420/500 0:00:13 loss= 203.95 error= 1.17097 425/500 0:00:13 loss= 203.895 error= 1.16984 430/500 0:00:13 loss= 203.842 error= 1.16867 435/500 0:00:14 loss= 203.791 error= 1.16746 440/500 0:00:14 loss= 203.741 error= 1.16621 445/500 0:00:14 loss= 203.692 error= 1.16491 450/500 0:00:14 loss= 203.644 error= 1.16356 455/500 0:00:14 loss= 203.598 error= 1.16216 460/500 0:00:15 loss= 203.553 error= 1.16073 465/500 0:00:15 loss= 203.509 error= 1.15926 470/500 0:00:15 loss= 203.467 error= 1.15777 475/500 0:00:15 loss= 203.426 error= 1.15625 480/500 0:00:15 loss= 203.386 error= 1.15471 485/500 0:00:15 loss= 203.348 error= 1.15312 490/500 0:00:16 loss= 203.311 error= 1.15148 495/500 0:00:16 loss= 203.275 error= 1.14979 500/500 0:00:16 loss= 203.24 error= 1.14805 Finished Optimization finished in 16.388 seconds ‣ Iterations: 500 ‣ Final loss: 203.24 ‣ Final error: 1.14805
Lastly we predict in the test set.
model.plot_prediction();