Mauna-Loa CO2 concentration example¶

Experiment of CO2 measurements in Mauna Loa, Hawaii, using a single output Gaussian process with a spectral mixture kernel. The data set contains daily measurements of CO2 in the air from 1958 to 2001. We will resample the data to obtain 10 averaged samples per year. That is, any yearly pattern will be at $\frac{1}{10} = 0.1$.

In [1]:
import mogptk
import numpy as np
In [2]:
from sklearn.datasets import fetch_openml

def load_mauna_loa_atmospheric_co2():
    ml_data = fetch_openml(data_id=41187)
    months = []
    ppmv_sums = []
    counts = []

    y = ml_data.data['year']
    m = ml_data.data['month']
    month_float = y + (m - 1) / 12
    ppmvs = ml_data.target

    for month, ppmv in zip(month_float, ppmvs):
        if not months or month != months[-1]:
            months.append(month)
            ppmv_sums.append(ppmv)
            counts.append(1)
        else:
            # aggregate monthly sum to produce average
            ppmv_sums[-1] += ppmv
            counts[-1] += 1

    months = np.asarray(months).reshape(-1)
    avg_ppmvs = np.asarray(ppmv_sums) / counts
    return months, avg_ppmvs

First we load the dataset, and define a variable stop with the index that separates between train and test

In [3]:
# load dataset
x, y = load_mauna_loa_atmospheric_co2()

# stop omde to separate train from test
stop = 200

data = mogptk.Data(x, y, name='Mauna Loa')
data.remove_range(start=x[stop])
data.transform(mogptk.TransformDetrend(3))
data.plot();

We initialize the model with random parameters and show the spectral density of the kernel. As we are only taking random values, there is no relation with the data.

In [4]:
# create model
model = mogptk.SM(data, Q=3)
model.plot_spectrum(title='SD with random parameters');

Then we initialize the parameters before training, using Bayesian Nonparametric spectral estimation (BNSE) (Tobar 2017), and use the estimated Power spectral density (PSD) to define initial spectral mean and magnitudes.

In [5]:
method = 'BNSE'
model.init_parameters(method)
model.plot_spectrum(title='PSD with {} initialization'.format(method));

Then we train the model and show the power spectral density of the trained model.

In [6]:
model.log_marginal_likelihood()
Out[6]:
-244.87747801589666
In [7]:
model.train(method='Adam', iters=500, lr=0.05, plot=True, error='MAE', verbose=True)
model.plot_spectrum(title='PSD with model trained');
Starting optimization using Adam
‣ Model: SM
‣ Channels: 1
‣ Parameters: 10
‣ Training points: 201
‣ Initial loss: 244.877
‣ Initial error: 1.4977

Start Adam:
    0/500   0:00:00  loss=     244.877  error=      1.4977
    5/500   0:00:00  loss=     248.806  error=     1.63836
   10/500   0:00:00  loss=     241.626  error=      1.6542
   15/500   0:00:00  loss=      234.74  error=     1.49014
   20/500   0:00:00  loss=     233.613  error=     1.50605
   25/500   0:00:01  loss=     230.106  error=     1.54659
   30/500   0:00:01  loss=      227.66  error=     1.52707
   35/500   0:00:02  loss=     225.626  error=     1.45765
   40/500   0:00:02  loss=     223.288  error=     1.47511
   45/500   0:00:02  loss=     221.532  error=     1.50617
   50/500   0:00:03  loss=     219.716  error=     1.45195
   55/500   0:00:03  loss=     218.113  error=     1.45226
   60/500   0:00:03  loss=     216.701  error=     1.47307
   65/500   0:00:03  loss=     215.391  error=     1.43962
   70/500   0:00:03  loss=     214.239  error=     1.43845
   75/500   0:00:03  loss=     213.229  error=     1.44502
   80/500   0:00:04  loss=     212.339  error=     1.42374
   85/500   0:00:04  loss=     211.565  error=     1.42493
   90/500   0:00:04  loss=     210.903  error=     1.42007
   95/500   0:00:04  loss=     210.339  error=     1.40785
  100/500   0:00:04  loss=     209.862  error=     1.40813
  105/500   0:00:04  loss=     209.461  error=     1.39803
  110/500   0:00:04  loss=     209.126  error=     1.39287
  115/500   0:00:05  loss=     208.847  error=     1.38807
  120/500   0:00:05  loss=     208.615  error=     1.37975
  125/500   0:00:05  loss=     208.421  error=     1.37582
  130/500   0:00:05  loss=     208.258  error=     1.36827
  135/500   0:00:05  loss=      208.12  error=     1.36311
  140/500   0:00:05  loss=     207.999  error=     1.35681
  145/500   0:00:05  loss=     207.893  error=     1.35064
  150/500   0:00:06  loss=     207.798  error=     1.34485
  155/500   0:00:06  loss=      207.71  error=     1.33825
  160/500   0:00:06  loss=     207.627  error=     1.33235
  165/500   0:00:06  loss=     207.548  error=     1.32587
  170/500   0:00:06  loss=     207.471  error=      1.3202
  175/500   0:00:06  loss=     207.396  error=     1.31407
  180/500   0:00:06  loss=     207.322  error=     1.30824
  185/500   0:00:06  loss=     207.249  error=     1.30231
  190/500   0:00:07  loss=     207.176  error=     1.29668
  195/500   0:00:07  loss=     207.103  error=     1.29093
  200/500   0:00:07  loss=      207.03  error=     1.28543
  205/500   0:00:07  loss=     206.958  error=     1.27999
  210/500   0:00:07  loss=     206.885  error=     1.27471
  215/500   0:00:07  loss=     206.812  error=     1.26961
  220/500   0:00:07  loss=      206.74  error=     1.26466
  225/500   0:00:08  loss=     206.667  error=     1.25979
  230/500   0:00:08  loss=     206.594  error=     1.25511
  235/500   0:00:08  loss=      206.52  error=     1.25053
  240/500   0:00:08  loss=     206.447  error=     1.24612
  245/500   0:00:08  loss=     206.373  error=     1.24184
  250/500   0:00:09  loss=       206.3  error=     1.23772
  255/500   0:00:09  loss=     206.226  error=     1.23378
  260/500   0:00:09  loss=     206.152  error=        1.23
  265/500   0:00:09  loss=     206.077  error=     1.22637
  270/500   0:00:09  loss=     206.003  error=     1.22289
  275/500   0:00:09  loss=     205.929  error=     1.21957
  280/500   0:00:09  loss=     205.854  error=      1.2164
  285/500   0:00:09  loss=      205.78  error=     1.21337
  290/500   0:00:10  loss=     205.706  error=      1.2105
  295/500   0:00:10  loss=     205.631  error=     1.20777
  300/500   0:00:10  loss=     205.557  error=     1.20518
  305/500   0:00:10  loss=     205.483  error=     1.20272
  310/500   0:00:10  loss=     205.409  error=     1.20044
  315/500   0:00:10  loss=     205.335  error=     1.19832
  320/500   0:00:10  loss=     205.262  error=     1.19636
  325/500   0:00:10  loss=     205.189  error=     1.19453
  330/500   0:00:10  loss=     205.117  error=     1.19284
  335/500   0:00:11  loss=     205.045  error=     1.19125
  340/500   0:00:11  loss=     204.973  error=     1.18975
  345/500   0:00:11  loss=     204.903  error=     1.18832
  350/500   0:00:11  loss=     204.832  error=     1.18696
  355/500   0:00:11  loss=     204.763  error=     1.18565
  360/500   0:00:11  loss=     204.694  error=      1.1844
  365/500   0:00:11  loss=     204.626  error=     1.18319
  370/500   0:00:11  loss=      204.56  error=       1.182
  375/500   0:00:12  loss=     204.494  error=     1.18085
  380/500   0:00:12  loss=     204.429  error=     1.17973
  385/500   0:00:12  loss=     204.365  error=     1.17862
  390/500   0:00:12  loss=     204.302  error=     1.17751
  395/500   0:00:12  loss=      204.24  error=     1.17642
  400/500   0:00:12  loss=      204.18  error=     1.17536
  405/500   0:00:12  loss=      204.12  error=     1.17429
  410/500   0:00:13  loss=     204.062  error=     1.17321
  415/500   0:00:13  loss=     204.005  error=      1.1721
  420/500   0:00:13  loss=      203.95  error=     1.17097
  425/500   0:00:13  loss=     203.895  error=     1.16984
  430/500   0:00:13  loss=     203.842  error=     1.16867
  435/500   0:00:14  loss=     203.791  error=     1.16746
  440/500   0:00:14  loss=     203.741  error=     1.16621
  445/500   0:00:14  loss=     203.692  error=     1.16491
  450/500   0:00:14  loss=     203.644  error=     1.16356
  455/500   0:00:14  loss=     203.598  error=     1.16216
  460/500   0:00:15  loss=     203.553  error=     1.16073
  465/500   0:00:15  loss=     203.509  error=     1.15926
  470/500   0:00:15  loss=     203.467  error=     1.15777
  475/500   0:00:15  loss=     203.426  error=     1.15625
  480/500   0:00:15  loss=     203.386  error=     1.15471
  485/500   0:00:15  loss=     203.348  error=     1.15312
  490/500   0:00:16  loss=     203.311  error=     1.15148
  495/500   0:00:16  loss=     203.275  error=     1.14979
  500/500   0:00:16  loss=      203.24  error=     1.14805
Finished

Optimization finished in 16.388 seconds
‣ Iterations: 500
‣ Final loss: 203.24
‣ Final error: 1.14805

Lastly we predict in the test set.

In [8]:
model.plot_prediction();
In [ ]: