Three year summary Lake Geneva

Notebook 2

Purpose: Present an analysis method for survey results from beach litter inventories on Lake Geneva.

Background: This is in the context of the global movement to reduce plastic debris in the maritime environment. Riverine inputs are major contributors of plastic debris (and all types of refuse) to the oceans. This is an analysis of the data collected on the shores of Lake Geneva over a three year period. The maritime protocol was modified in very specific ways to adjust for the local geography and population density.

Research question: Is this a representative sample ?

from notebook one we can assume the following:
  1. The data was collected at different locations
  2. The data was collected by different groups of people
  3. For each year there is one group that collected 50% or more of the samples

If sampling all the trash is not possibile, what if we sample as much as possible and see what that looks like? The following questions could be answered:

  1. What does the distribution of survey results look like?
  2. Do different groups of people produce different survey results?
  3. How different are the survey results from one location to another?
  4. What are the most abundant objects?
  5. How different are the survey results year over year?

From Notebook 1 IMPORTANT!

In notebook one the conclusion was that the data has different geographic centers and those centers reflect different land use patterns.

The data is now local so it is allways best to run notebook one first

This notebook and all subsequent use the directory establsihed in notebook one (see above)

Directory already in place

Read in the json data

Make pcs/m, make grouping levels

Results: Total pieces of trash per meter

Cumulative results

Reported as the total number of objects found, the cumulative and year over year results are given in the table below:

Statistic Year one Year two Year three All
Samples 83 41 24 148
mean 8.77 9.99 9.01 9.16
median 4.83 6.84 4.24 5.52
Std dev 9.92 8.52 16.14 10.75
25\%ile 3.113 4.41 2.36 3.18
75\%ile 10.405 12.35 7.79 11.24
Minimum 0.68 0.57 0.11 0.11
Maximum 50.075 39.54 77.05 77.05
MCBP samples 80 22 5 107
SLR samples 0 18 15 33
EPFL samples 2 2 2 6
PC samples 0 0 2 2

Year two has the highest average, median and the greatest innerquartile range. The lowest and the highest daily values were reported in year three. In each year there is one group that collected at least 50% of all the samples.

For each year the mean is greater than the median, suggesting a right skewed distribution. The mean and median are clossest at year two.

All data Nov 2015 - Nov 2018: mean, median, count...

pcs_m
count 148.000000
mean 9.160844
std 10.752655
min 0.117647
25% 3.183535
50% 5.526347
75% 11.240594
max 77.055556

Year one: Nov 2015 - Nov 2016

pcs_m
count 83.000000
mean 8.708838
std 9.882661
min 0.681159
25% 3.149679
50% 4.796875
75% 10.155738
max 50.075000

Year two: Nov 2016 - Nov 2017

pcs_m
count 41.000000
mean 10.158769
std 8.569256
min 0.576471
25% 5.666667
50% 6.857143
75% 12.387097
max 39.540541

Year three: Nov 2017 - Nov 2018

pcs_m
count 24.000000
mean 9.019244
std 16.140460
min 0.117647
25% 2.366757
50% 4.244992
75% 7.791667
max 77.055556