This package requires an MPI library (OpenMPI, MPICH2, or LAM/MPI). Standard
installation in an R session with
> install.packages("pbdMPI")
should work in most cases.
On HPC clusters, it is
strongly recommended that you check with your HPC cluster documentation for
specific requirements, such as
module
software environments. Some module examples relevant to R and MPI
are
$ module load openmpi
$ module load openblas
$ module load flexiblas
$ module load r
possibly giving specific versions and possibly with some upper case letters.
Although module software environments are widely used, the specific module
names and their dependence structure are not standard across cluster
installations. The command
$ module avail
usually lists the available software modules on your cluster.
To install on the Unix command line after
downloading the source file, use R CMD INSTALL
.
If the MPI library is not found, after checking that you are loading the
correct module environments, the following arguments can be used to
specify its non-standard location on your system
Argument | Default |
--with-mpi-type | OPENMPI |
--with-mpi-include | ${MPI_ROOT}/include |
--with-mpi-libpath | ${MPI_ROOT}/lib |
--with-mpi | ${MPI_ROOT} |
where ${MPI_ROOT}
is the path to the MPI root.
See the package source file pbdMPI/configure
for details.
Loading library(pbdMPI)
sets a few global variables, including the
environment .pbd_env
, where many defaults are set, and initializes MPI.
In most cases, the defaults should not be modified. Rather, the parameters
of the functions that use them should be changed. All codes must end
with finalize()
to cleanly exit MPI.
Most functions are assumed to run as Single Program, Multiple Data (SPMD),
i.e. in batch mode. SPMD is based on cooperation between parallel copies of a
single program, which is more scalable than a manager-workers approach that is
natural in interactive programming. Interactivity with an HPC cluster is more
efficiently handled by a client-server approach, such as that enabled by the
remoter package.
On most clusters, codes run with mpirun
or
mpiexec
and Rscript
, such as
> mpiexec -np 2 Rscript some_code.r
where some_code.r
contains the entire SPMD program. The MPI
Standard 4.0 recommends mpiexec
over mpirun
. Some
MPI implementations may have minor differences between the two but under
OpenMPI 5.0 they are synonyms that produce the same behavior.
The package source files provide several examples based on pbdMPI,
such as
Directory | Examples |
pbdMPI/inst/examples/test_spmd/ | main SPMD functions |
pbdMPI/inst/examples/test_rmpi/ | analogues to Rmpi |
pbdMPI/inst/examples/test_parallel/ | analogues to parallel |
pbdMPI/inst/examples/test_performance/ | performance tests |
pbdMPI/inst/examples/test_s4/ | S4 extension |
pbdMPI/inst/examples/test_cs/ | client/server examples |
pbdMPI/inst/examples/test_long_vector/ | long vector examples |
where test_long_vector
needs a recompile with setting
#define MPI_LONG_DEBUG 1
in pbdMPI/src/pkg_constant.h
.
The current version is mainly written and tested under OpenMPI
environments on Linux systems (CentOS 7, RHEL 8, Xubuntu). Also, it is tested
on macOS with Homebrew-installed OpenMPI and under MPICH2 environments on
Windows systems, although the primary target systems are HPC clusters running
Linux OS.