Bramblemet water level example¶

The goal of this notebook is to compare different multi-output Gaussian process models, such as the MOSM, CSM, SM-LMC, and the CONV. The dataset consists of four weather stations in south England, each registering since April 2012 for every 5 minutes the following statistics:

  • WSPD: wind speed in knots
  • WD: wind direction in degrees
  • GST: maximum gust in knots
  • ATMP: air temperature in degrees Celsius
  • WTMP: water temperature in degrees Celsius
  • BARO: barometric pressure in millibars
  • DEPTH: water depth in metres
  • VIS: visibility

The four stations have the following locations: Bramblemet

We replicate a similar experiment by Álvarez and Lawrence (https://papers.nips.cc/paper/3553-sparse-convolved-gaussian-processes-for-multi-output-regression), in which we train a multi-output Gaussian process on the Tidal Height data of the four sensors.

In [1]:
import numpy as np
import pandas as pd
import mogptk

We will load the four data series and combine their date and time fields. Additionally, we select a single week in June 2020.

In [2]:
start_date = '2020-06-01'
end_date = '2020-06-08'

print('Loading Bramble')
bramblemet = pd.read_csv('data/bramblemet/bramblemet.csv.gz', compression='gzip', low_memory=False)
bramblemet['Date'] = pd.to_datetime(bramblemet['Date'] + ' ' + bramblemet['Time'], format='%d/%m/%Y %H:%M')
bramblemet = bramblemet.drop(columns=['Time'])
bramblemet = bramblemet.loc[(bramblemet['Date'] >= start_date) & (bramblemet['Date'] < end_date)]

print('Loading Camber')
cambermet = pd.read_csv('data/bramblemet/cambermet.csv.gz', compression='gzip', low_memory=False)
cambermet['Date'] = pd.to_datetime(cambermet['Date'] + ' ' + cambermet['Time'], format='%d/%m/%Y %H:%M')
cambermet = cambermet.drop(columns=['Time'])
cambermet = cambermet.loc[(cambermet['Date'] >= start_date) & (cambermet['Date'] < end_date)]

print('Loading Chi')
chimet = pd.read_csv('data/bramblemet/chimet.csv.gz', compression='gzip', low_memory=False)
chimet['Date'] = pd.to_datetime(chimet['Date'] + ' ' + chimet['Time'], format='%d/%m/%Y %H:%M')
chimet = chimet.drop(columns=['Time'])
chimet = chimet.loc[(chimet['Date'] >= start_date) & (chimet['Date'] < end_date)]

print('Loading Soton')
sotonmet = pd.read_csv('data/bramblemet/sotonmet.csv.gz', compression='gzip', low_memory=False)
sotonmet['Date'] = pd.to_datetime(sotonmet['Date'] + ' ' + sotonmet['Time'], format='%d/%m/%Y %H:%M')
sotonmet = sotonmet.drop(columns=['Time'])
sotonmet = sotonmet.loc[(sotonmet['Date'] >= start_date) & (sotonmet['Date'] < end_date)]

bramblemet.head(5)
Loading Bramble
Loading Camber
Loading Chi
Loading Soton
Out[2]:
Date WSPD WD GST ATMP WTMP BARO DEPTH VIS
37972 2020-06-01 00:00:00 12.5 56 13.9 15.8 16.1 1021.9 1.91 27
37973 2020-06-01 00:05:00 11.5 56 13.0 15.7 16.3 1021.9 1.86 27
37974 2020-06-01 00:15:00 10.6 55 11.0 15.8 16.3 1021.8 1.80 27
37975 2020-06-01 00:20:00 9.5 47 10.3 15.7 16.4 1021.8 1.76 27
37976 2020-06-01 00:25:00 10.0 48 11.6 15.7 16.4 1021.7 1.75 27

We load the data series into MOGPTK and we detrend the data to improve trainability. We remove 90% of the data points at random so that training is quicker and required less memory. This has the added benefit that we can calculate the prediction error effectively for all the removed data points. In addition to removing data points at random, we also remove a few ranges in order to see how well data imputation for larger ranges works.

In [3]:
dataset = mogptk.DataSet(
    mogptk.LoadDataFrame(bramblemet, x_col='Date', y_col=['DEPTH'], name='Bramble'),
    mogptk.LoadDataFrame(cambermet, x_col='Date', y_col=['DEPTH'], name='Camber'),
    mogptk.LoadDataFrame(chimet, x_col='Date', y_col=['DEPTH'], name='Chi'),
    mogptk.LoadDataFrame(sotonmet, x_col='Date', y_col=['DEPTH'], name='Soton'),
)

for i, data in enumerate(dataset):
    data.transform(mogptk.TransformDetrend)
    data.remove_randomly(pct=0.9)

dataset['Bramble'].remove_range(start='2020-06-02', end='2020-06-03')
dataset['Camber'].remove_range(start='2020-06-06', end='2020-06-07')
dataset['Chi'].remove_range(start='2020-06-07', end='2020-06-08')
dataset['Soton'].remove_range(start='2020-06-04', end='2020-06-05')
In [4]:
dataset.plot(transformed=True);
In [5]:
dataset.plot_spectrum(per='day', maxfreq=5, transformed=True);

In the spectral information of our dataset we can observe that all channels have a periodicity of (almost) twice a day. This is true as we know that the ebb and flow tides have a period of about 12 hours and 25 minutes, so that we have slightly less than two full periods a day. $1.93$ times to be exact.

We will now train the MOSM, CSM, SM-LMC, and CONV kernels on this data set and compare their prediction errors. Training using the Adam optimizer with a relatively high learning rate has proven to give good results.

In [6]:
method = 'Adam'
lr = 0.02
iters = 500
In [7]:
mosm = mogptk.MOSM(dataset, Q=1)
mosm.init_parameters(method='LS')
mosm.train(method=method, lr=lr, iters=iters, verbose=True)
mosm.print_parameters()
mosm.plot_prediction(transformed=True);
Starting optimization using Adam
‣ Model: MOSM
‣ Channels: 4
‣ Parameters: 24
‣ Training points: 656
‣ Initial loss: 786.062

Start Adam:
    0/500   0:00:00  loss=     786.062
    5/500   0:00:00  loss=     795.188
   10/500   0:00:01  loss=         782
   15/500   0:00:01  loss=     771.757
   20/500   0:00:02  loss=     764.159
   25/500   0:00:02  loss=     756.437
   30/500   0:00:03  loss=     750.094
   35/500   0:00:03  loss=     743.911
   40/500   0:00:03  loss=      738.08
   45/500   0:00:04  loss=     732.187
   50/500   0:00:05  loss=     726.097
   55/500   0:00:05  loss=     720.219
   60/500   0:00:06  loss=     714.316
   65/500   0:00:06  loss=     708.415
   70/500   0:00:07  loss=     702.543
   75/500   0:00:07  loss=     696.688
   80/500   0:00:08  loss=     690.835
   85/500   0:00:08  loss=         685
   90/500   0:00:09  loss=     679.175
   95/500   0:00:09  loss=     673.362
  100/500   0:00:10  loss=     667.561
  105/500   0:00:10  loss=     661.773
  110/500   0:00:11  loss=     655.997
  115/500   0:00:11  loss=     650.234
  120/500   0:00:12  loss=     644.484
  125/500   0:00:12  loss=     638.747
  130/500   0:00:13  loss=     633.023
  135/500   0:00:13  loss=     627.313
  140/500   0:00:14  loss=     621.618
  145/500   0:00:14  loss=     615.937
  150/500   0:00:15  loss=      610.27
  155/500   0:00:15  loss=     604.619
  160/500   0:00:15  loss=     598.983
  165/500   0:00:16  loss=     593.362
  170/500   0:00:16  loss=     587.758
  175/500   0:00:17  loss=      582.17
  180/500   0:00:17  loss=     576.599
  185/500   0:00:18  loss=     571.045
  190/500   0:00:18  loss=     565.509
  195/500   0:00:19  loss=      559.99
  200/500   0:00:19  loss=      554.49
  205/500   0:00:20  loss=     549.008
  210/500   0:00:20  loss=     543.545
  215/500   0:00:21  loss=     538.102
  220/500   0:00:22  loss=     532.681
  225/500   0:00:22  loss=     528.741
  230/500   0:00:23  loss=     522.976
  235/500   0:00:23  loss=     516.926
  240/500   0:00:23  loss=     511.387
  245/500   0:00:24  loss=     506.179
  250/500   0:00:25  loss=     500.667
  255/500   0:00:25  loss=     496.508
  260/500   0:00:26  loss=      490.46
  265/500   0:00:26  loss=     485.053
  270/500   0:00:27  loss=     479.739
  275/500   0:00:28  loss=     474.699
  280/500   0:00:28  loss=     470.087
  285/500   0:00:29  loss=     465.042
  290/500   0:00:29  loss=     459.954
  295/500   0:00:30  loss=     454.478
  300/500   0:00:30  loss=     449.243
  305/500   0:00:31  loss=     444.437
  310/500   0:00:31  loss=     439.658
  315/500   0:00:32  loss=     435.318
  320/500   0:00:32  loss=     429.746
  325/500   0:00:33  loss=     425.007
  330/500   0:00:33  loss=     420.098
  335/500   0:00:34  loss=     415.259
  340/500   0:00:34  loss=     410.412
  345/500   0:00:35  loss=      405.69
  350/500   0:00:35  loss=     400.511
  355/500   0:00:36  loss=     396.208
  360/500   0:00:36  loss=     392.667
  365/500   0:00:37  loss=     387.523
  370/500   0:00:37  loss=     382.441
  375/500   0:00:38  loss=      377.95
  380/500   0:00:38  loss=     372.572
  385/500   0:00:38  loss=     368.444
  390/500   0:00:39  loss=     364.263
  395/500   0:00:39  loss=     359.892
  400/500   0:00:40  loss=      355.42
  405/500   0:00:41  loss=     351.035
  410/500   0:00:41  loss=     345.964
  415/500   0:00:42  loss=     342.307
  420/500   0:00:42  loss=     338.099
  425/500   0:00:43  loss=     333.791
  430/500   0:00:43  loss=     329.921
  435/500   0:00:44  loss=      325.76
  440/500   0:00:44  loss=     320.972
  445/500   0:00:45  loss=     317.583
  450/500   0:00:45  loss=     313.309
  455/500   0:00:46  loss=     310.912
  460/500   0:00:46  loss=     306.698
  465/500   0:00:47  loss=     301.783
  470/500   0:00:47  loss=      297.81
  475/500   0:00:48  loss=     294.537
  480/500   0:00:48  loss=     290.138
  485/500   0:00:49  loss=     287.786
  490/500   0:00:50  loss=     282.784
  495/500   0:00:50  loss=     278.182
  500/500   0:00:51  loss=     276.189
Finished

Optimization finished in 51.212 seconds
‣ Iterations: 500
‣ Final loss: 276.189
NameRangeValue
MOSM.weight[1e-08, ∞)[[21.98881847] [24.92294238] [24.905429 ] [23.23260914]]
MOSM.mean[1e-08, 0.1][[[0.00134441]] [[0.00134449]] [[0.00133982]] [[0.00134212]]]
MOSM.variance[1e-08, ∞)[[[1.e-08]] [[1.e-08]] [[1.e-08]] [[1.e-08]]]
MOSM.delay(-∞, ∞)[[[-0.00430478]] [[-0.03177763]] [[ 0.06000117]] [[-0.0458639 ]]]
MOSM.phase(-∞, ∞)[[ 0.00302489] [-0.01283903] [ 0.00893065] [-0.00023575]]
Gaussian.scale[1e-08, ∞)[0.46666492 0.5027038 0.49471458 0.51616945]
In [8]:
csm = mogptk.CSM(dataset, Q=2)
csm.init_parameters(method='LS')
csm.train(method=method, lr=lr, iters=iters, verbose=True)
csm.print_parameters()
csm.plot_prediction();
Starting optimization using Adam
‣ Model: CSM
‣ Channels: 4
‣ Parameters: 24
‣ Training points: 656
‣ Initial loss: 776.742

Start Adam:
    0/500   0:00:00  loss=     776.742
    5/500   0:00:00  loss=     772.852
   10/500   0:00:01  loss=      765.38
   15/500   0:00:02  loss=     758.795
   20/500   0:00:03  loss=     752.497
   25/500   0:00:03  loss=     746.605
   30/500   0:00:04  loss=     740.829
   35/500   0:00:05  loss=     734.988
   40/500   0:00:06  loss=     729.157
   45/500   0:00:07  loss=     723.358
   50/500   0:00:08  loss=     717.581
   55/500   0:00:09  loss=      711.82
   60/500   0:00:09  loss=     706.064
   65/500   0:00:10  loss=     700.321
   70/500   0:00:11  loss=     694.583
   75/500   0:00:12  loss=     688.855
   80/500   0:00:13  loss=     683.133
   85/500   0:00:14  loss=      677.42
   90/500   0:00:15  loss=     671.715
   95/500   0:00:16  loss=     666.019
  100/500   0:00:16  loss=     660.343
  105/500   0:00:17  loss=     654.681
  110/500   0:00:18  loss=     649.025
  115/500   0:00:19  loss=     643.336
  120/500   0:00:19  loss=     637.711
  125/500   0:00:20  loss=     632.071
  130/500   0:00:21  loss=     626.419
  135/500   0:00:22  loss=     620.829
  140/500   0:00:22  loss=     615.204
  145/500   0:00:23  loss=     609.636
  150/500   0:00:24  loss=     604.036
  155/500   0:00:25  loss=     598.504
  160/500   0:00:25  loss=     592.942
  165/500   0:00:26  loss=     587.368
  170/500   0:00:27  loss=     581.915
  175/500   0:00:27  loss=     576.365
  180/500   0:00:28  loss=     570.825
  185/500   0:00:29  loss=     565.415
  190/500   0:00:30  loss=     559.899
  195/500   0:00:30  loss=     554.416
  200/500   0:00:31  loss=     549.071
  205/500   0:00:32  loss=     543.581
  210/500   0:00:33  loss=     538.149
  215/500   0:00:34  loss=     532.843
  220/500   0:00:35  loss=     527.408
  225/500   0:00:35  loss=     522.036
  230/500   0:00:36  loss=     516.799
  235/500   0:00:37  loss=     511.401
  240/500   0:00:37  loss=     506.089
  245/500   0:00:38  loss=     500.865
  250/500   0:00:39  loss=     495.553
  255/500   0:00:40  loss=     490.333
  260/500   0:00:40  loss=     485.119
  265/500   0:00:41  loss=      479.94
  270/500   0:00:42  loss=     474.783
  275/500   0:00:43  loss=     469.586
  280/500   0:00:43  loss=      464.48
  285/500   0:00:44  loss=     459.373
  290/500   0:00:45  loss=     454.244
  295/500   0:00:45  loss=     449.182
  300/500   0:00:46  loss=     444.266
  305/500   0:00:47  loss=     439.314
  310/500   0:00:47  loss=     434.297
  315/500   0:00:48  loss=     429.283
  320/500   0:00:49  loss=       424.3
  325/500   0:00:49  loss=     419.459
  330/500   0:00:50  loss=     414.495
  335/500   0:00:51  loss=      409.68
  340/500   0:00:51  loss=     404.849
  345/500   0:00:52  loss=     400.075
  350/500   0:00:53  loss=     395.292
  355/500   0:00:53  loss=     390.542
  360/500   0:00:54  loss=     385.809
  365/500   0:00:55  loss=     381.295
  370/500   0:00:55  loss=     376.493
  375/500   0:00:56  loss=     372.004
  380/500   0:00:57  loss=     367.282
  385/500   0:00:57  loss=     362.783
  390/500   0:00:58  loss=     358.283
  395/500   0:00:59  loss=      353.66
  400/500   0:00:59  loss=     349.509
  405/500   0:01:00  loss=     344.835
  410/500   0:01:01  loss=     340.467
  415/500   0:01:02  loss=     336.031
  420/500   0:01:02  loss=     331.746
  425/500   0:01:03  loss=      327.46
  430/500   0:01:04  loss=     323.614
  435/500   0:01:04  loss=     319.019
  440/500   0:01:05  loss=     314.922
  445/500   0:01:06  loss=     310.686
  450/500   0:01:06  loss=     306.558
  455/500   0:01:07  loss=     302.509
  460/500   0:01:08  loss=     298.513
  465/500   0:01:08  loss=      294.41
  470/500   0:01:09  loss=      290.55
  475/500   0:01:10  loss=     286.874
  480/500   0:01:10  loss=     282.821
  485/500   0:01:11  loss=     279.041
  490/500   0:01:12  loss=      275.31
  495/500   0:01:13  loss=     271.425
  500/500   0:01:13  loss=     267.842
Finished

Optimization finished in 1 minute 13 seconds
‣ Iterations: 500
‣ Final loss: 267.842
NameRangeValue
Mixture[0].CSM.amplitude[1e-08, ∞)[[1.33006447] [1.70562927] [1.73567783] [1.49981674]]
Mixture[0].CSM.mean[1e-08, 0.1][0.00134278]
Mixture[0].CSM.variance[1e-08, ∞)[1.e-08]
Mixture[0].CSM.shift(-∞, ∞)[[ 0.00027924] [-0.01055069] [ 0.01106413] [-0.00224129]]
Mixture[1].CSM.amplitude[1e-08, ∞)[[0.12330041] [0.11775069] [0.09704823] [0.17043213]]
Mixture[1].CSM.mean[1e-08, 0.1][0.0474827]
Mixture[1].CSM.variance[1e-08, ∞)[1.e-08]
Mixture[1].CSM.shift(-∞, ∞)[[ 0.07079462] [-0.16967516] [-0.07995379] [ 0.06302846]]
Gaussian.scale[1e-08, ∞)[0.46427281 0.50143449 0.49446221 0.5148525 ]
In [9]:
sm_lmc = mogptk.SM_LMC(dataset, Q=2)
sm_lmc.init_parameters(method='LS')
sm_lmc.train(method=method, lr=lr, iters=iters, verbose=True)
sm_lmc.print_parameters()
sm_lmc.plot_prediction();
Starting optimization using Adam
‣ Model: SM-LMC
‣ Channels: 4
‣ Parameters: 16
‣ Training points: 656
‣ Initial loss: 801.481

Start Adam:
    0/500   0:00:00  loss=     801.481
    5/500   0:00:00  loss=     795.616
   10/500   0:00:01  loss=     789.659
   15/500   0:00:01  loss=     783.863
   20/500   0:00:02  loss=     778.003
   25/500   0:00:02  loss=     772.165
   30/500   0:00:03  loss=     766.387
   35/500   0:00:04  loss=      760.58
   40/500   0:00:04  loss=     754.804
   45/500   0:00:05  loss=     749.023
   50/500   0:00:05  loss=     743.249
   55/500   0:00:06  loss=     737.482
   60/500   0:00:06  loss=     731.724
   65/500   0:00:07  loss=     725.977
   70/500   0:00:07  loss=     720.236
   75/500   0:00:08  loss=     714.504
   80/500   0:00:08  loss=     708.781
   85/500   0:00:09  loss=     703.068
   90/500   0:00:09  loss=     697.364
   95/500   0:00:10  loss=      691.67
  100/500   0:00:10  loss=     685.986
  105/500   0:00:11  loss=     680.312
  110/500   0:00:11  loss=     674.649
  115/500   0:00:11  loss=     668.998
  120/500   0:00:12  loss=     663.358
  125/500   0:00:12  loss=     657.729
  130/500   0:00:13  loss=     652.113
  135/500   0:00:13  loss=      646.51
  140/500   0:00:14  loss=     640.919
  145/500   0:00:14  loss=     635.341
  150/500   0:00:15  loss=     629.778
  155/500   0:00:15  loss=     624.228
  160/500   0:00:16  loss=     618.692
  165/500   0:00:16  loss=     613.174
  170/500   0:00:17  loss=     607.667
  175/500   0:00:17  loss=     602.177
  180/500   0:00:18  loss=     596.704
  185/500   0:00:19  loss=     591.244
  190/500   0:00:19  loss=     585.809
  195/500   0:00:20  loss=     580.378
  200/500   0:00:20  loss=     574.975
  205/500   0:00:21  loss=     569.582
  210/500   0:00:21  loss=     564.214
  215/500   0:00:22  loss=     558.863
  220/500   0:00:22  loss=     553.528
  225/500   0:00:23  loss=     548.213
  230/500   0:00:23  loss=     542.922
  235/500   0:00:24  loss=     537.654
  240/500   0:00:24  loss=     532.395
  245/500   0:00:25  loss=     527.174
  250/500   0:00:25  loss=     521.954
  255/500   0:00:26  loss=     516.769
  260/500   0:00:26  loss=     511.606
  265/500   0:00:27  loss=     506.467
  270/500   0:00:27  loss=     501.344
  275/500   0:00:28  loss=     496.258
  280/500   0:00:28  loss=     491.179
  285/500   0:00:29  loss=     486.146
  290/500   0:00:29  loss=     481.114
  295/500   0:00:30  loss=     476.125
  300/500   0:00:30  loss=     471.153
  305/500   0:00:31  loss=     466.211
  310/500   0:00:31  loss=     461.299
  315/500   0:00:32  loss=     456.407
  320/500   0:00:32  loss=     451.551
  325/500   0:00:33  loss=     446.718
  330/500   0:00:33  loss=     441.915
  335/500   0:00:34  loss=     437.145
  340/500   0:00:34  loss=     432.397
  345/500   0:00:35  loss=     427.689
  350/500   0:00:35  loss=     423.001
  355/500   0:00:36  loss=     418.354
  360/500   0:00:36  loss=      413.73
  365/500   0:00:37  loss=     409.144
  370/500   0:00:37  loss=     404.586
  375/500   0:00:38  loss=     400.064
  380/500   0:00:38  loss=     395.574
  385/500   0:00:39  loss=     391.118
  390/500   0:00:40  loss=     386.696
  395/500   0:00:40  loss=      382.31
  400/500   0:00:41  loss=     377.956
  405/500   0:00:41  loss=     373.641
  410/500   0:00:42  loss=     369.357
  415/500   0:00:42  loss=     365.116
  420/500   0:00:43  loss=     360.905
  425/500   0:00:43  loss=     356.737
  430/500   0:00:44  loss=       352.6
  435/500   0:00:44  loss=     348.508
  440/500   0:00:45  loss=     344.448
  445/500   0:00:45  loss=     340.432
  450/500   0:00:46  loss=      336.45
  455/500   0:00:46  loss=     332.512
  460/500   0:00:47  loss=     328.609
  465/500   0:00:47  loss=     324.752
  470/500   0:00:48  loss=     320.929
  475/500   0:00:48  loss=     317.153
  480/500   0:00:49  loss=     313.412
  485/500   0:00:49  loss=     309.719
  490/500   0:00:50  loss=     306.061
  495/500   0:00:50  loss=     302.452
  500/500   0:00:51  loss=     298.878
Finished

Optimization finished in 51.066 seconds
‣ Iterations: 500
‣ Final loss: 298.878
NameRangeValue
LMC[0].Spectral.magnitudefixed1.0000000000000002
LMC[0].Spectral.mean[1e-08, 0.1][0.00134533]
LMC[0].Spectral.variance[1e-08, ∞)[1.e-08]
LMC[1].Spectral.magnitudefixed1.0000000000000002
LMC[1].Spectral.mean[1e-08, 0.1][0.04775743]
LMC[1].Spectral.variance[1e-08, ∞)[1.e-08]
LMC.weight[1e-08, ∞)[[[0.77631484] [0.22574124]] [[0.88511433] [0.21209765]] [[0.87966789] [0.18451463]] [[0.81840271] [0.26983282]]]
Gaussian.scale[1e-08, ∞)[0.46282907 0.50399841 0.49855575 0.5143879 ]

Above we can see that the models fit the data fairly well. For a quantitative analysis, below we calculate the prediction error using the MAE, RMSE, and MAPE metrics. It can be observed that in this case the MOSM model fits well to the data set.

In [11]:
mogptk.error(mosm, csm, sm_lmc, disp=True)
MAE MAPE RMSE
Name
MOSM 0.242323 11.379547 0.299774
CSM 0.238431 11.189683 0.298232
SM-LMC 0.245076 11.510077 0.304930