The ``dsm'' in ``Rdsm'' stands for distributed shared memory, a term from the parallel processing community in which nodes in a cluster share (real or conceptual) memory. It is based on a similar package the author wrote for Perl some years ago (Matloff (2002)).
newbm()
for
creating The code finds the product of matrices m1
and m2
, placing
the produce in prd
. The core lines of the code are
myid <- myinfo$myid # this thread's ID # determine number of columns of m1 k <- if(class(m1) == "big.matrix") dim(m1)[2] else m1$size[2] nth <- myinfo$nclnt # number of threads chunksize <- k/nth # determine which columns of m1 this thread will process firstcol <- 1 + (myid-1) * chunksize lastcol <- firstcol + chunksize - 1 # process this thread's share of the columns prd[,firstcol:lastcol] <- m1[,] %*% m2[,firstcol:lastcol]
The work is parallelized by assigning each thread a certain set of
columns of prd
. Each thread then computes its columns and
then places them in the proper section of prd
. This is a
classical shared-memory pattern, thus illustrating the point that
prd
here is a shared variable, created beforehand
via a call to cnewdsm()
in the case of an newbm()
if a
Other examples, including directions for running them, are given in
the
Suppose for instance we wish to copy x
to y
. In a
message-passing setting such as x
and y
may
reside in processes 2 and 5, say. The programmer would write code
(described here in pseudocode)
send x to process 5
to run on process 2, and write code
receive data item from process 2
set y to received item
to run on process 5. By contrast, in a shared-memory environment,
the programmer would merely write
y <- x
which is vastly simpler. (Brackets would be required, as
explained below.)
This also means that it is easy to convert sequential Rcode to parallel
Packages such as
serialize()
and unserialize()
are used for
lists.
Manual operation:
To run
Then:
srvr()
in your server window, with argument n,
which is 2 by default.init()
.srvr()
is still running; you do not need to rerun init()
at the
clients. Application-program Automatic launching:
If you are running on a Unix-family system (Linux, Mac OS, or Cygwin
on Windows),
Then each time the user wishes to issue a command to all the clients,
say a command to run an
Here's a quick summary example of autolaunch. Say we wish to run two
threads, with our application consisting of a function x()
contained in the source code file
alinit(2) # create clients cmdtoclnts('source("y.R")') # have clients source the app code go() # set up server/client connections cmdtoclnts('x(3,100)') # first run of app cmdtoclnts('x(12,5000)') # second run of app ...
Here's what it does:
alinit()
opens two other terminal
windows, startsRin them, and loads thecmdtoclnts()
then has the instances ofRat the client windows load our application source file.go()
then startssrvr()
in the
server window andinit()
in each client window. For example, suppose your program includes m
, a 4x5 shared
matrix variable. If you wished to fill the second column with 1, 2,
3 and 4, you would write
m[,2] <- 1:4
just as you would in ordinary R.
Note carefully that you must always use brackets with shared
variables. For instance, to copy the shared vector x
to
an ordinary Rvariable y
, write
y <- x[]
not y <- x
myinfo
, a list consisting of these components:
myid
: the ID number of this client, starting with 1nclnt
: the total number of clientsbarr()
: barrier operation, synchs all threads to the
same code linelock()
: lock operation, gives thread exclusive access
to shared variablesunlock()
: unlock operation, relinquishes exclusive
accesswait()
: wait operationsignal()
: signal operation; releases all waiting
clientssignal1()
: same assignal()
, but releases only
the first waiting clientfa()
: fetch-and-add operationinit()
: initializes a client's connection to the serversrvr()
: initializes the serverdsmexit()
: can be called when a client has finished its
work (note: this will stop the server when all clients make this
call, and thus this function should not be used in most
applications)cnewdsm()
: creates annewbm()
: creates a The dsmv
, dsmm
and
dsml
, respectively. Indexing operations for these classes
communicate with the server to read or write the desired objects.
See the big.matrix
. Of course, a vector can be represented as a
one-row vector.
Again, all this is transparent to the programmer. However, as with
any system, a good understanding of the internals can result in your
writing much better code.
Hess, Matthias et al (2003), Experiences Using OpenMP Based on Compiler Directive Software DSM on a PC Cluster, in OpenMP Shared Memory Parallel Programming: International Workshop on OpenMP Applications and Tools, Michael Voss (ed.), Springer, p.216.
Matloff, Norman (2002), PerlDSM: A Distributed Shared Memory System for Perl. Proceedings of PDPTA 2002, 2002, 63-68.