Simon Barthelmé (GIPSA-lab, CNRS)

There are several ways of doing things in parallel with imager. One of them is to use of R’s many packages for doing things in parallel (parallel, futures, etc.). The other one is to take advantage of CImg’s ative use of OpenMP.

1 Parallelising from R

Parallelising from R is very easy (provided that what you want to do actually parallelises). Something that does parallelise easily is to run the same operations on different images, or on different image channels.

library(imager)
## Loading required package: magrittr
## 
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
## 
##     add
## The following objects are masked from 'package:stats':
## 
##     convolve, spectrum
## The following object is masked from 'package:graphics':
## 
##     frame
## The following object is masked from 'package:base':
## 
##     save.image
library(parallel)

#A really big image
im <- boats %>% imresize(8)
#Rank pixels in each image channel 
#Serial version 
fun <- function() imsplit(im,"c") %>% lapply(rank)
    
system.time(fun())
##    user  system elapsed 
##   8.004   0.219   8.230
#Parallel version: use mclapply
fun.par <- function() imsplit(im,"c") %>% mclapply(rank,mc.cores=2)
system.time(fun.par())
##    user  system elapsed 
##   2.800   0.369   5.955

2 Native parallelisation: CImg and OpenMP

Many CImg operations are parallelised natively. The parallelisation is optional and is only activated starting from a certain image size. The speed-ups are sublinear, meaning that unless your image is gigantic you won’t gain much from throwing 200 cores at a problem.

By default OpenMP will grab all the CPU cores it can. To correctly use multiple threads users should set nthreads in cimg.use.openmp. You also need to be careful that this is not higher than the value in the system environment variable OMP_THREAD_LIMIT (this can be checked with Sys.getenv(‘OMP_THREAD_LIMIT’)). The OMP_THREAD_LIMIT thread limit usually needs to be correctly set before launching R, so using Sys.setenv once a session has started is not certain to work.

library(imager)
library(microbenchmark)
#Let's do a big convolution
a <- boats
b <- imnoise(30,30) 
fun <- function() convolve(a,b)
#No parallelisation
cimg.use.openmp(nthreads = 1)
## NULL
microbenchmark(fun(),times=15)
## Unit: milliseconds
##   expr      min       lq     mean   median       uq      max neval
##  fun() 758.5587 770.5574 793.5305 783.0165 816.8524 843.8794    15
#2 cores
cimg.use.openmp(nthreads = 2)
## NULL
microbenchmark(fun(),times=15)
## Unit: milliseconds
##   expr      min      lq     mean   median       uq      max neval
##  fun() 388.7225 395.259 403.4662 400.2312 405.8684 444.1365    15
#4 cores, etc.
cimg.use.openmp(nthreads = 4)
## NULL
microbenchmark(fun(),times=15)
## Unit: milliseconds
##   expr      min       lq     mean   median      uq      max neval
##  fun() 199.6942 202.2179 205.0651 204.1085 206.369 217.8173    15

If CImg’s parallelisation doesn’t seem to work on your machine, it’s probably because you compiled the package with Clang, which has patchy support for OpenMP. Recompile using gcc if possible. For macOS check here: https://mac.r-project.org/openmp/

3 Parallelisation in R vs. native parallelisation

Here’s a simple benchmark: medianblur can be parallelised across image channels. First, the R version using mclapply:

cimg.use.openmp(nthreads = 1)
## NULL
fun.R <- function() imsplit(boats,"c") %>% mclapply(function(v) medianblur(v,50),mc.cores=3)
microbenchmark(fun.R(),times=20)
## Unit: seconds
##     expr      min       lq     mean  median       uq      max neval
##  fun.R() 2.522276 2.573416 2.676955 2.65224 2.783309 2.861924    20

Second, CImg’s native version:

cimg.use.openmp(nthreads = 4)
## NULL
fun.nat <- function() medianblur(boats,50)
microbenchmark(fun.nat(),times=20)
## Unit: seconds
##       expr      min       lq     mean  median       uq      max neval
##  fun.nat() 1.975586 1.980098 2.019698 2.00531 2.032463 2.142027    20

Pros and cons of using native parallelisation:

Pros and cons of parallelisation from R:

Note that both types of parallelisation can be combined if you can spread the load over several machines.