Simon Barthelmé (GIPSA-lab, CNRS)
There are several ways of doing things in parallel with imager. One of them is to use of R’s many packages for doing things in parallel (parallel, futures, etc.). The other one is to take advantage of CImg’s ative use of OpenMP.
Parallelising from R is very easy (provided that what you want to do actually parallelises). Something that does parallelise easily is to run the same operations on different images, or on different image channels.
library(imager)
## Loading required package: magrittr
##
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
##
## add
## The following objects are masked from 'package:stats':
##
## convolve, spectrum
## The following object is masked from 'package:graphics':
##
## frame
## The following object is masked from 'package:base':
##
## save.image
library(parallel)
#A really big image
im <- boats %>% imresize(8)
#Rank pixels in each image channel
#Serial version
fun <- function() imsplit(im,"c") %>% lapply(rank)
system.time(fun())
## user system elapsed
## 8.004 0.219 8.230
#Parallel version: use mclapply
fun.par <- function() imsplit(im,"c") %>% mclapply(rank,mc.cores=2)
system.time(fun.par())
## user system elapsed
## 2.800 0.369 5.955
Many CImg operations are parallelised natively. The parallelisation is optional and is only activated starting from a certain image size. The speed-ups are sublinear, meaning that unless your image is gigantic you won’t gain much from throwing 200 cores at a problem.
By default OpenMP will grab all the CPU cores it can. To correctly use multiple threads users should set nthreads in cimg.use.openmp. You also need to be careful that this is not higher than the value in the system environment variable OMP_THREAD_LIMIT (this can be checked with Sys.getenv(‘OMP_THREAD_LIMIT’)). The OMP_THREAD_LIMIT thread limit usually needs to be correctly set before launching R, so using Sys.setenv once a session has started is not certain to work.
library(imager)
library(microbenchmark)
#Let's do a big convolution
a <- boats
b <- imnoise(30,30)
fun <- function() convolve(a,b)
#No parallelisation
cimg.use.openmp(nthreads = 1)
## NULL
microbenchmark(fun(),times=15)
## Unit: milliseconds
## expr min lq mean median uq max neval
## fun() 758.5587 770.5574 793.5305 783.0165 816.8524 843.8794 15
#2 cores
cimg.use.openmp(nthreads = 2)
## NULL
microbenchmark(fun(),times=15)
## Unit: milliseconds
## expr min lq mean median uq max neval
## fun() 388.7225 395.259 403.4662 400.2312 405.8684 444.1365 15
#4 cores, etc.
cimg.use.openmp(nthreads = 4)
## NULL
microbenchmark(fun(),times=15)
## Unit: milliseconds
## expr min lq mean median uq max neval
## fun() 199.6942 202.2179 205.0651 204.1085 206.369 217.8173 15
If CImg’s parallelisation doesn’t seem to work on your machine, it’s probably because you compiled the package with Clang, which has patchy support for OpenMP. Recompile using gcc if possible. For macOS check here: https://mac.r-project.org/openmp/
Here’s a simple benchmark: medianblur can be parallelised across image channels. First, the R version using mclapply:
cimg.use.openmp(nthreads = 1)
## NULL
fun.R <- function() imsplit(boats,"c") %>% mclapply(function(v) medianblur(v,50),mc.cores=3)
microbenchmark(fun.R(),times=20)
## Unit: seconds
## expr min lq mean median uq max neval
## fun.R() 2.522276 2.573416 2.676955 2.65224 2.783309 2.861924 20
Second, CImg’s native version:
cimg.use.openmp(nthreads = 4)
## NULL
fun.nat <- function() medianblur(boats,50)
microbenchmark(fun.nat(),times=20)
## Unit: seconds
## expr min lq mean median uq max neval
## fun.nat() 1.975586 1.980098 2.019698 2.00531 2.032463 2.142027 20
Pros and cons of using native parallelisation:
Pros and cons of parallelisation from R:
Note that both types of parallelisation can be combined if you can spread the load over several machines.