install.spark
downloads and installs Spark to a local directory if
it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is
returned. The Spark version we use is the same as the SparkR version. Users can specify a desired
Hadoop version, the remote mirror site, and the directory where the package is installed locally.
install.spark(
hadoopVersion = "2.7",
mirrorUrl = NULL,
localDir = NULL,
overwrite = FALSE
)
Version of Hadoop to install. Default is "2.7"
. It can take other
version number in the format of "x.y" where x and y are integer.
If hadoopVersion = "without"
, "Hadoop free" build is installed.
See
"Hadoop Free" Build for more information.
Other patched version names can also be used, e.g. "cdh4"
base URL of the repositories to use. The directory layout should follow Apache mirrors.
a local directory where Spark is installed. The directory contains version-specific folders of Spark packages. Default is path to the cache directory:
Mac OS X: ~/Library/Caches/spark
Unix: $XDG_CACHE_HOME
if defined, otherwise ~/.cache/spark
Windows: %LOCALAPPDATA%\Apache\Spark\Cache
.
If TRUE
, download and overwrite the existing tar file in localDir
and force re-install Spark (in case the local directory or file is corrupted)
the (invisible) local directory where Spark is found or installed
The full url of remote file is inferred from mirrorUrl
and hadoopVersion
.
mirrorUrl
specifies the remote path to a Spark folder. It is followed by a subfolder
named after the Spark version (that corresponds to SparkR), and then the tar filename.
The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz.
For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
http://apache.osuosl.org
has path:
http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
.
For hadoopVersion = "without"
, [Hadoop version] in the filename is then
without-hadoop
.
See available Hadoop versions: Apache Spark
# NOT RUN {
install.spark()
# }
Run the code above in your browser using DataLab