Thin tracking data by resampling or aggregation — atl_thin

Uniformly reduce data volumes with either aggregation or resampling (specified by the method argument) over an interval specified in seconds using the interval argument. Both options make two important assumptions: (1) that timestamps are named 'time' and 'datetime', and (2) all columns except the identity columns can be averaged in R. While the 'subsample' option returns a thinned dataset with all columns from the input data, the 'aggregate' option drops the column covxy, since this cannot be propagated to the averaged position. Both options handle the column 'time' differently: while 'subsample' returns the actual timestamp (in UNIX time) of each sample, 'aggregate' returns the mean timestamp (also in UNIX time). The 'aggregate' option only recognises errors named varx and vary. If all of these columns are not present together the function assumes there is no measure of error, and drops those columns. If there is actually no measure of error, the function simply returns the averaged position and covariates in each time interval. Grouping variables' names (such as animal identity) may be passed as a character vector to the id_columns argument.

Usage

atl_thin_data(
  data,
  interval = 60,
  id_columns = NULL,
  method = c("subsample", "aggregate")
)

Arguments

data: Tracking data to aggregate. Must have columns x and y, and a numeric column named time, as well as datetime.
interval: The interval in seconds over which to aggregate.
id_columns: Column names for grouping columns.
method: Should the data be thinned by subsampling or aggregation. If resampling (method = "subsample"), the first position of each group is taken. If aggregation (method = "aggregate"), the group positions' mean is taken.

Value

A data.table with aggregated or subsampled data.

Author

Pratik Gupte & Allert Bijleveld & Johannes Krietsch

Examples

library(data.table)

data <- data.table(
  tag = as.character(rep(1:2, each = 10)),
  time = rep(seq(1696218721, 1696218721 + 92, by = 10), 2),
  x = rnorm(20, 10, 1),
  y = rnorm(20, 15, 1)
)

data[, datetime := as.POSIXct(time, origin = "1970-01-01", tz = "UTC")]
#>        tag       time         x        y            datetime
#>     <char>      <num>     <num>    <num>              <POSc>
#>  1:      1 1696218721  9.313147 13.86186 2023-10-02 03:52:01
#>  2:      1 1696218731  9.554338 16.25381 2023-10-02 03:52:11
#>  3:      1 1696218741 11.224082 15.42646 2023-10-02 03:52:21
#>  4:      1 1696218751 10.359814 14.70493 2023-10-02 03:52:31
#>  5:      1 1696218761 10.400771 15.89513 2023-10-02 03:52:41
#>  6:      1 1696218771 10.110683 15.87813 2023-10-02 03:52:51
#>  7:      1 1696218781  9.444159 15.82158 2023-10-02 03:53:01
#>  8:      1 1696218791 11.786913 15.68864 2023-10-02 03:53:11
#>  9:      1 1696218801 10.497850 15.55392 2023-10-02 03:53:21
#> 10:      1 1696218811  8.033383 14.93809 2023-10-02 03:53:31
#> 11:      2 1696218721 10.701356 14.69404 2023-10-02 03:52:01
#> 12:      2 1696218731  9.527209 14.61953 2023-10-02 03:52:11
#> 13:      2 1696218741  8.932176 14.30529 2023-10-02 03:52:21
#> 14:      2 1696218751  9.782025 14.79208 2023-10-02 03:52:31
#> 15:      2 1696218761  8.973996 13.73460 2023-10-02 03:52:41
#> 16:      2 1696218771  9.271109 17.16896 2023-10-02 03:52:51
#> 17:      2 1696218781  9.374961 16.20796 2023-10-02 03:53:01
#> 18:      2 1696218791  8.313307 13.87689 2023-10-02 03:53:11
#> 19:      2 1696218801 10.837787 14.59712 2023-10-02 03:53:21
#> 20:      2 1696218811 10.153373 14.53334 2023-10-02 03:53:31
#>        tag       time         x        y            datetime

# Thin the data by aggregation with a 60-second interval
thinned_aggregated <- atl_thin_data(
  data = data,
  interval = 60,
  id_columns = "tag",
  method = "aggregate"
)

# Thin the data by subsampling with a 60-second interval
thinned_subsampled <- atl_thin_data(
  data = data,
  interval = 60,
  id_columns = "tag",
  method = "subsample"
)

# View results
print(thinned_aggregated)
#>       tag       time         x        y            datetime n_aggregated
#>    <char>      <num>     <num>    <num>              <POSc>        <int>
#> 1:      1 1696218720 10.160472 15.33672 2023-10-02 03:52:00            6
#> 2:      1 1696218780  9.940576 15.50056 2023-10-02 03:53:00            4
#> 3:      2 1696218720  9.531312 14.88575 2023-10-02 03:52:00            6
#> 4:      2 1696218780  9.669857 14.80383 2023-10-02 03:53:00            4
print(thinned_subsampled)
#>       tag       time         x        y            datetime n_subsampled
#>    <char>      <num>     <num>    <num>              <POSc>        <int>
#> 1:      1 1696218721  9.313147 13.86186 2023-10-02 03:52:01            6
#> 2:      1 1696218781  9.444159 15.82158 2023-10-02 03:53:01            4
#> 3:      2 1696218721 10.701356 14.69404 2023-10-02 03:52:01            6
#> 4:      2 1696218781  9.374961 16.20796 2023-10-02 03:53:01            4