Fast plotting

Plotting in R can get really slow when using big data, like high-throughput WATLAS tracking data, which can consist >50 million localizations in one season. There are two things to consider: the necessary content of the plot (practical side) and plotting performance (technical side).

The necessary content of the plot

For what do you want to plot the data? A quick look or a publication? What do you want to see with the plot? An overview of all positions or a specific part of the track? Depending on your answers choose the smallest suitable dataset. For example, thinning the data (e.g. to one or 10-min intervals) can greatly reduce the number of localization to be plotted and therefore speed up plotting. When plotting a lot of data, it can be better to plot a heat map (which is the fastest way of plotting many localization) than many single points that just overlap and are anyway not seen ultimately.

Plotting performance

One way to speed up plotting is to switch from the R standard grDevices to ragg, which provides graphic devices for R based on the AGG library and provides both higher performance (up to 40% faster) and higher quality than the standard raster devices. ragg can be used as the graphic back-end to the RStudio device by choosing AGG as the backend in the graphics pane in general options (see).

To further speed up plotting with ggplot2, it helps to plot points simply as pch = ".", to use geom_scattermore(), or to summarize locations and plot them as heat map. See below examples of each option. Note that the run time at the end of each chunk should be seen as relative and should be faster when the code is not complied, but simply run.

Load packages and data

This vignette shows different ways on how to plot WATLAS data. Each chunk of code only requires this chunk with loading the data to be run before and is otherwise independent.

# Packages
library(tools4watlas)
library(data.table)
library(sf)
library(ggplot2)
library(scattermore)
library(scales)
library(viridis)

Create dummy tracking data and base map

Create dummy tracks of 300 individuals for 1000 steps (3000.000 points).

# set seed for reproducibility
set.seed(123)

# define parameters
n_individuals <- 300 # number of individuals
interval <- 6 # time interval in seconds
n_steps <- 1000 # number of time steps per individual

# reference location (Griend) in UTM Zone 31N
griend <- st_sfc(st_point(c(5.2525, 53.2523)), crs = st_crs(4326)) |>
  st_transform(crs = st_crs(32631)) |>
  st_coordinates()

# generate initial positions
initial_positions <- data.table(
  tag = 1:n_individuals,
  x = rnorm(n_individuals, mean = griend[1], sd = 50),
  y = rnorm(n_individuals, mean = griend[2], sd = 50)
)

# create tracking data
data <- rbindlist(lapply(1:n_individuals, function(id) {
  # generate timestamps
  timestamps <- seq.POSIXt(
    from = Sys.time(), by = interval, length.out = n_steps
  )

  # simulate movement with small random steps
  x_mov <- cumsum(runif(n_steps, -100, 100))
  y_mov <- cumsum(runif(n_steps, -100, 100))

  # compute positions
  x <- initial_positions[tag == id, x] + x_mov
  y <- initial_positions[tag == id, y] + y_mov

  data.table(tag = as.character(id), x = x, y = y, datetime = timestamps)
}))

# Create base map
bm <- atl_create_bm(data, buffer = 3000)

Standard `ggplot2` with points and tracks

# start time
st <- Sys.time()

# plot
bm +
  geom_path(
    data = data, aes(x, y, colour = tag), alpha = 0.1,
    show.legend = FALSE
  ) +
  geom_point(
    data = data, aes(x, y, colour = tag), size = 0.5,
    show.legend = FALSE
  ) +
  scale_colour_viridis(discrete = TRUE)

Standard ggplot2 with points and tracks

# run time
round(Sys.time() - st, 2)

## Time difference of 6.04 secs

`ggplot2` with points as pch = “.” and tracks

# start time
st <- Sys.time()

# plot
bm +
  geom_path(
    data = data, aes(x, y, colour = tag), alpha = 0.1,
    show.legend = FALSE
  ) +
  geom_point(
    data = data, aes(x, y, colour = tag), pch = ".", size = 0.5,
    show.legend = FALSE
  ) +
  scale_colour_viridis(discrete = TRUE)

ggplot2 with points as dots and tracks

# run time
round(Sys.time() - st, 2)

## Time difference of 2.12 secs

`ggplot2` with points as geom_scattermore() and tracks

# start time
st <- Sys.time()

# plot
bm +
  geom_path(
    data = data, aes(x, y, colour = tag), alpha = 0.1,
    show.legend = FALSE
  ) +
  geom_scattermore(
    data = data, aes(x, y, colour = tag), pch = ".", size = 0.5,
    show.legend = FALSE
  ) +
  scale_colour_viridis(discrete = TRUE)

ggplot2 with points as geom_scattermore() and tracks

# run time
round(Sys.time() - st, 2)

## Time difference of 1.7 secs

`ggplot2` summarized points in heat map

The larger the grid cell size, the faster.

# Round data to 200 m grid cells
data_heatmap <- copy(data)
data_heatmap[, c("x_round", "y_round") := list(
  plyr::round_any(x, 200),
  plyr::round_any(y, 200)
)]
data_heatmap <- data_heatmap[, .N, by = c("x_round", "y_round")]

# start time
st <- Sys.time()

# Plot heat map
bm +
  geom_tile(
    data = data_heatmap, aes(x_round, y_round, fill = N),
    linewidth = 0.1, show.legend = TRUE
  ) +
  scale_fill_viridis(
    option = "A", discrete = FALSE, trans = "log10", name = "N positions",
    breaks = trans_breaks("log10", function(x) 10^x),
    labels = trans_format("log10", math_format(10^.x)),
    direction = -1
  )

ggplot2 summarized points in heat map

# run time
round(Sys.time() - st, 2)

## Time difference of 0.24 secs

Johannes Krietsch