scala: Create an Instance of an Embedded Scala Interpreter

Description

The function scala creates an instance of an embedded Scala interpreter/compiler and binds a Scala object named R to permit callbacks to R. Options are available to customize where Scala is found and how it is invoked (e.g., setting the classpath and maximum heap size). Multiple interpreters can be created and each runs independently with its own memory. Each interpreter can use multiple threads/cores, but the bridge between R and Scala is not thread-safe. As such, multiple R threads/cores should not simultaneously access the same interpreter.

The functions scalaInfo and .rscalaJar provide file paths to JAR files, installation directories, the Scala executable, and this package. Note that if you only want to embed R in a Scala application, you do not need to install the package. Simply add the following line to the your SBT build.sbt file: ‘libraryDependencies += "org.ddahl" "_VERSION_"’, where _VERSION_ is the rscala version number (i.e., 2.3.5).

scalaInstall downloads and installs Scala in “~/.rscala” in the user's home directory. System administrators can install Scala globally as described here: http://www.scala-lang.org/download/install.html. In short, simply download the archive, unpack it, and add the “scala” script to the path.

The function .rscalaPackage should be called in the .onLoad function of a package that wishes to depend on this package. The function should not be called elsewhere. It sets the classpath to the JAR files in the ‘java’ directory of the package and passes the ... arguments to the scala function. This instance of Scala is available as the object s in the namespace of the package (thereby making it available to the package's function) but it is not exported from the namespace. The object s is only initialized on its first usage. The function .rscalaPackageUnload should be called in the .onUnload function of a package that wishes to depend on this package so that close(s) is called (if needed).

Usage

scala(classpath=character(), classpath.packages=character(),
      serialize.output=.Platform$OS.type=="windows", scala.home=NULL,
      heap.maximum=NULL, command.line.options=NULL, row.major=TRUE,
      timeout=60, debug=FALSE, stdout=TRUE, stderr=TRUE, port=0,
      scalaInfo=NULL, major.release=c("2.10","2.11","2.12"))
scalaInfo(scala.home=NULL, major.release=c("2.10","2.11","2.12"),
          verbose=FALSE)
scalaInstall(major.release=c("2.10","2.11","2.12"))
.rscalaJar(major.release=c("2.10","2.11","2.12"))
.rscalaPackage(pkgname, snippet=character(), classpath.packages=character(),
               classpath.prepend=character(), classpath.append=character(),
               major.release=c("2.10","2.11","2.12"), ...)
.rscalaPackageUnload()
.rscalaDelay(expression)

Arguments

classpath

A character vector whose elements are paths to JAR files or directories which specify the classpath for the Scala compiler/interpreter.

classpath.packages

A character vector giving names of other installed packages whose JAR files should be appended to the classpath.

serialize.output

Should standard output (stdout) and standard error (stderr) be captured and serialized back to R? The default is TRUE on Windows and FALSE on other operating systems. FALSE requires less computing and is usually not necessary on Linux and Mac OS X. Depending on the environment and operating system in which R is run, TRUE may be needed to see output and error messages.

scala.home

A character vector of length one giving the path where Scala is installed. When set to NULL (the default), the function sequentially tries to find the Scala home by: i. querying the glocal option rscala.scala.home, ii. using the environment variable SCALA_HOME, iii. querying the operating system search path, and iv. looking in subdirectories of ~/.rscala. If all these fail, the function displaces a message to help the user install Scala. Alternatively, in the class of scalaInfo, scala.home may also be the result of the scala function.

heap.maximum

A character vector of length one used to specify the maximum heap size in the JVM. If NULL, the global option rscala.heap.maximum is queried and, if that is also NULL, Scala's default value is used. This option is ignored if command.line.options is not null.

command.line.options

A character vector whose elements are passed as command line arguments when invoking Scala. If NULL, the global option rscala.command.line.options is queried and, if that is also NULL, the value is set to NULL. A value of NULL means no extra arguments are provided. If you simply want to add to the classpath and/or set the maximum heap size, use the classpath and heap.maximum arguments.

row.major

Should matrices in Scala be row major?

timeout

A numeric vector of length one giving the number of seconds to wait for Scala to start before aborting. The default value is 60 seconds.

debug

An option meant only for developers of the package itself and not intended for users of the package.

stdout, stderr

Where standard output and standard error results that are not serialized should be sent. TRUE (the default) or "" sends output to the R console (although that may not work on Windows). FALSE or NULL discards the output. Otherwise, this is the name of the file that receives the output.

port

If 0, two random ports are selected. Otherwise, port and port+1 are used to the TCP/IP connections.

scalaInfo

The result of a previous call to scalaInfo.

verbose

A logical vector of length one indicating whether information regarding the search for the Scala installation should be displayed.

major.release

The character vector giving acceptable major.release numbers (e.g., c("2.10","2.11","2.12")), or NA in which case the system picks the appropriate version.

pkgname

A character string giving the name of the package (as provided the second argument of the .onLoad function) that wishes to depend on this package.

snippet

A character vector providing Scala code that will be evaluated when the interpreter in the package namespace is first used.

classpath.prepend

A character vector giving the full path to JAR files to add to the beginning of the classpath for the Scala compiler/interpreter embedded within a package via the .rscalaPackage function. The JAR files in the package's ‘java’ directory are already included and do not need to be added here.

classpath.append

A character vector giving the full path to JAR files to add to the end of the classpath for the Scala compiler/interpreter embedded within a package via the .rscalaPackage function. The JAR files in the package's ‘java’ directory are already included and do not need to be added here.

...

These arguments are passed by the .rscalaPackage function to the scala function.

expression

(.rscalaDelay is deprecated.) An expression that will be evaluated when the .rscalaPackage function runs.

Value

scala returns an R object representing an embedded Scala interpreter.

scalaInfo returns a list detailing the Scala executable, version, jars, etc.

Examples

Run this code

# NOT RUN {
# Uncomment the next line to download and install Scala
# scalaInstall()

.rscalaJar()
scalaInfo(verbose=TRUE)

# }
# NOT RUN {
# Make an instance of the Scala interpreter and see how its output is captured.
s <- scala(serialize.output=TRUE)
capture.output(s %~% 'println("This is Scala "+scala.util.Properties.versionString)')
scalaSettings(s)

# Demonstrate convenient notation and string interpolation
stringFromScala <- s %~% '"Hello @{Sys.getenv("USER")} from @{R.Version()$nickname}" + "!"*10'
stringFromScala

# Set and get variables
s$rPi <- pi
s$rPi
s$val("rPi")
s$.val("rPi")

s$rPi <- I(pi)     # Now rPi is an array of length one.
s$rPi              # It doesn't matter to R...
s$.val("rPi")      # ... but it does to Scala.

# Convenient notation
a1 <- s %~%  "rPi(0)/2"   # As an R value
a2 <- s %.~% "rPi(0)/2"   # As a Scala reference

# References can be set
s$foo <- a2
s$foo

# Instantiate an object
seed <- 2349234L
rng <- s$.scala.util.Random$new(seed)  # Scala equivalent: new scala.util.Random(seed)

# Call method of a reference
system.time(rng$nextInt(100L))   # Scala equivalent: rng.nextInt(100)
system.time(rng$nextInt(100L))   # Notice it runs much faster the second time due to caching

rInt <- rng$nextInt(100L,.EVALUATE=FALSE)  # Define function to call quickly later without ...
rInt(100)                                     # ... needing to protect scalars and ensure type.

# Call method of companion object and call methods of a reference
# Scala equivalent: (scala.math.BigInt("777",8) - 500).intValue
s$.scala.math.BigInt$apply("777",8L)$'-'(500L)$intValue()

# Example showing callback functionality
f <- function(func=NULL, data=numeric(), quiet=TRUE) s %!% '
  if ( ! quiet ) println("Here I am in Scala.")
  R.invokeD1(func, data.map(2*_), "verbose" -> !quiet ).sum
'

cube <- function(x, ignored.argument, verbose=TRUE) {
  if ( verbose ) cat("Here I am in R.\n")
  x^3
}

identical( f(cube,1:4,FALSE), sum((2*(1:4))^3) )
identical( f(cube,1:4,TRUE),  sum((2*(1:4))^3) )

# Longer example showing more flexible than '%~%'
drawGaussian <- function(mean=0.0, sd=1.0, rng=scalaNull("scala.util.Random")) s %!% '
  mean+sd*rng.nextDouble
'
drawGaussian(3,0.1,rng)  # No scalar protection or casting is needed.
n.draws <- 100
s$random <- rng
system.time({
  draws <- s %~% '
    val result = new Array[Double](@{n.draws})
    result(0) = random.nextGaussian
    for ( i <- 1 until @{n.draws} ) {
      result(i) = 0.5*result(i-1) + random.nextGaussian
    }
    result
  '
  acf(draws,plot=FALSE)
})
sampler <- function(nDraws=1L, rho=0.0, rng=scalaNull("scala.util.Random")) s %!% '
  val result = new Array[Double](nDraws)
  result(0) = rng.nextGaussian
  for ( i <- 1 until nDraws ) {
    result(i) = rho*result(i-1) + rng.nextGaussian
  }
  result
'
system.time(acf(sampler(n.draws,0.5,rng),plot=FALSE))
system.time(acf(sampler(n.draws,0.5,rng),plot=FALSE))
close(s)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples