Purrring progress bars (adding a progress bar to `purrr::map`)

Important note (2023-04-25): In purrr version 1.0.0 the capability of integral progress bars was introduced. Just use the argument .progress=TRUE in the map function. See the documentation here. The rest of this post remains unchanged, even though it is no longer the optimal way to solve this issue.

With all the functional programming going on (i.e., purrr::map and the likes), there is at least one thing that I found missing: progress bars. The plyr::do function had a nice looking progress bar open up by default if the operation took more than 2 seconds and had at least two more to go (as per Hadley’s description in Issue#149 in tidyverse/purrr).

The issue is still open, for the time of writing these lines, and will probably be solved sometime in the near future as a feature of purrr::map.

Personally, I like @cderv’s elegent solution suggested at that same github issue.

Here is an example implementation for reading multiple files within a directory and combining them into a single tibble while showing a progress bar when reading the files. The file reading is very similar to what was suggested in this post.

library(purrr)
library(readr)
library(dplyr)

# directory from which to read a bunch of files (the example here uses csv)
file_list <- dir(path = "PATH_TO_DIRECTORY", pattern = ".csv")

# define reading function which includes the progress bar updates and printing
read_with_progress <- function(filename){
  pb$tick()$print()
  data_read <- read_csv(filename)
  # you can add additional operations on data_read, or 
  # decide on entirely different task that this function should do.
}

# create the progress bar with a dplyr function. 
pb <- progress_estimated(length(file_list))
res <- file_list %>%
  map_df(~read_with_progress(.))

That’s it. You’re set to go with a cool progress bar which will print out something like this while the operation is carried out:

|=====================================           |80% ~23 s remaining

Partner and Head of Data Science

Related