Get_Started.Rmd
In this short tutorial, we will show how to fit a flight curve function on butterfly count recorded on a weekly base. We will use R functions implemented in the rbms
package and data bundled within the same package. The data we use are real Butterfly Monitoring Scheme counts and transect visit dates. The flight curve are based on spline fitted on count collected over multiple sites and standardized to sum to 1 (area under the curve is one).
## Welcome to rbms, version 1.0.0
## While this package has been tested by several users,
## it is still in active development and feedbacks are welcome
## https://github.com/RetoSchmucki/rbms/issues
The visit and count data are packaged in data.table
format, but can also be provided as data.frame
. The function will convert them into data.table
as this format allow us to deal with large data sets in a more efficient way. On the other hand, header names need to be consistent and some columns are essential for the functions to work.
Visit data are the visit date at which each site has been visited and thereby monitored for butterfly count. If no butterfly was observed during the visit, the abundance for the that day will be set to zero [0]. This allow you to only document the positive non-zero count in the count data set and thereby handle a smaller object. The visit data can contain a many column as you want, but two are essential
DateFormat
)## SITE_ID DATE
## 1: 1 2000-04-07
## 2: 1 2000-04-19
## 3: 1 2000-05-12
## 4: 1 2000-05-15
## 5: 1 2000-05-24
## ---
## 12136: 193 2004-08-19
## 12137: 193 2004-08-31
## 12138: 193 2004-09-04
## 12139: 193 2004-09-15
## 12140: 193 2004-09-28
Count data must also be provided with specific columns with the following heading, more can be provided but will not be used in the function.
## SITE_ID DATE SPECIES DAY MONTH YEAR COUNT
## 1: 1 2000-04-07 2 7 4 2000 5
## 2: 1 2000-04-19 2 19 4 2000 3
## 3: 1 2000-05-12 2 12 5 2000 1
## 4: 1 2000-05-12 4 12 5 2000 2
## 5: 1 2000-05-15 4 15 5 2000 4
## ---
## 5303: 193 2003-09-06 2 6 9 2003 1
## 5304: 193 2004-04-23 2 23 4 2004 2
## 5305: 193 2004-05-02 2 2 5 2004 1
## 5306: 193 2004-06-19 2 19 6 2004 2
## 5307: 193 2004-07-10 2 10 7 2004 1
One you have your visit and count data, we need to merge them together into a data.table
object the covers that complete time-series of interest and contains information about the start and end of the monitoring season and if the flight curve should be computed weekly or daily.
2.1. In a first step, we will initialize a time-series with day-week-month-year information.
2.2. You can then add the monitoring season to the time-series, providing the StartMonth and EndMonth arguments. You can refine this with more arguments (StartDay, EndDay). Here you will also define if the series is on a weekly or daily basis, where TimeUnit = 'w'
, means that the flight curve will defined on a week basis. The alternative is 'd'
for daily basis. The ANCHOR
argument will add zeros (0) before and after the monitoring season to ensure that the flight curve starts at zero.
ts_season <- rbms::ts_monit_season(ts_date, StartMonth = 4, EndMonth = 9, StartDay = 1, EndDay = NULL, CompltSeason = TRUE, Anchor = TRUE, AnchorLength = 2, AnchorLag = 2, TimeUnit = 'w')
NOTE: for species having overwintering adult, with early counts, having an Anchor set to zero might sound wrong and we are currently working on finding an alternative for these case to better represent the flight curve of those species.
2.3. After having defined the monitoring season for a specific the time period and monitoring scheme, we will use the ts_monit_site()
function to expend inform where each site was visited along the time-series. you can add your data, starting with the visit date to inform about the sites and the visits
2.4. We can then add the observed count for the species of interest to the data.table
object. Here we use the count recorded for species “2” (species names can also be character).
This result in a data.table
object that contains the zeros and count recorded along the BMS transect for species “2” over the time-series of interest and within the monitoring season of the schemes. This time-series also contains the week, or days, where count are missing because the site has not been visited. These missing week are marked as NA
in the count.
With the object constructed above, you can now compute the flight curve for each year where sufficient data are available. The flight_curve
function assumes that the phenology follow the same shape across the sites.
The objective of this function is to extract the shape of your flight curve, so you might want to impose some minimal quality thresholds to the data used to inform this model. This can be done by setting the Minimum number of visit MinNbrVisit
, the minimum number of occurrence MinOccur
and number of site MinNbrSite
.
These values are likely to influence your model and will differ across species and data set, but MinOccur
should be set >= 2, the MinNbrVisit
> MinOccur
and MinNbrSite
>= 5. The chose for these threshold will affect how well your model is informed and the with higher value you might have less site and therefore will have to revise the chose.
ts_flight_curve <- rbms::flight_curve(ts_season_count, NbrSample = 300, MinVisit = 5, MinOccur = 3, MinNbrSite = 5, MaxTrial = 4, GamFamily = 'nb', SpeedGam = FALSE, CompltSeason = TRUE, SelectYear = NULL, TimeUnit = 'w')
## [1] "Fitting the flight curve spline for species 2 and year 2000 with 76 sites, using gam() : 2019-12-01 21:04:51 -> trial 1"
## [1] "Fitting the flight curve spline for species 2 and year 2001 with 76 sites, using gam() : 2019-12-01 21:04:52 -> trial 1"
## [1] "Fitting the flight curve spline for species 2 and year 2002 with 88 sites, using gam() : 2019-12-01 21:04:53 -> trial 1"
## [1] "Fitting the flight curve spline for species 2 and year 2003 with 103 sites, using gam() : 2019-12-01 21:04:55 -> trial 1"
NOTE: for the
flight_curve
furnction, you will also have to define some parameters for the distribution for the GAM model as well as the maximum number of time to try to fit the model and the number of sample to use. The later will take a random sample from the data set if it contain more site than the number specified.
From the flight_curve()
function, you will retrieve a list of 3 objects: - pheno
that contain the standardized phenology curve derived by fitting a GAM model, with a cubic spline to the count data. - model
that contain the result of the fitted GAM model. - data
the data that where used to fit the GAM model.
You can extract the pheno
object which is a data.frame
that contains the shape of the annual flight curves, standardized to sum to 1. These can be illustrated in a figure with the following line of codes.
## Extract phenology part
pheno <- ts_flight_curve$pheno
## add the line of the first year
yr <- unique(pheno[order(M_YEAR), as.numeric(as.character(M_YEAR))])
if("trimWEEKNO" %in% names(pheno)){
plot(pheno[M_YEAR == yr[1], trimWEEKNO], pheno[M_YEAR == yr[1], NM], type = 'l', ylim = c(0, max(pheno[, NM])), xlab = 'Monitoring Week', ylab = 'Relative Abundance')
} else {
plot(pheno[M_YEAR == yr[1], trimDAYNO], pheno[M_YEAR == yr[1], NM], type = 'l', ylim = c(0, max(pheno[, NM])), xlab = 'Monitoring Day', ylab = 'Relative Abundance')
}
## add individual curves for additional years
if(length(yr) > 1) {
i <- 2
for(y in yr[-1]){
if("trimWEEKNO" %in% names(pheno)){
points(pheno[M_YEAR == y , trimWEEKNO], pheno[M_YEAR == y, NM], type = 'l', col = i)
} else {
points(pheno[M_YEAR == y, trimDAYNO], pheno[M_YEAR == y, NM], type = 'l', col = i)
}
i <- i + 1
}
}
## add legend
legend('topright', legend = c(yr), col = c(seq_along(c(yr))), lty = 1, bty = 'n')
Flight curve.