⚠️There's a newer version (3.1.2) of this package.Take me there.

SparkR (version 2.3.0)

R Frontend for Apache Spark

Description

Provides an R Frontend for Apache Spark.

Copy Link

Version

Install

install.packages('SparkR')

Monthly Downloads

Version

2.3.0

License

Apache License (== 2.0)

Maintainer

Shivaram Venkataraman

Last Published

March 3rd, 2018

Functions in SparkR (2.3.0)

BisectingKMeansModel-class

S4 class that represents a BisectingKMeansModel

arrange

Arrange Rows by Variables

as.data.frame

Download data from a SparkDataFrame into a R data.frame

cast

Casts the column to a different data type.

Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.

column_window_functions

Window functions for Column operations

asc

A set of operations working with SparkDataFrame columns

describe

dim

Returns the dimensions of SparkDataFrame

Return the first row of a SparkDataFrame

localCheckpoint

isStreaming

%in%

Match a column with given values.

otherwise

orderBy

Ordering Columns in a WindowSpec

print.jobj

Print a JVM object reference.

predict

Makes predictions from a MLlib model

read.jdbc

Create a SparkDataFrame representing the database table accessible via JDBC URL

read.json

Create a SparkDataFrame from a JSON file.

sampleBy

Returns a stratified sample without replacement

saveAsTable

Save the contents of the SparkDataFrame to a data source as a table

setJobGroup

Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.

setLocalProperty

Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool.

spark.getSparkFiles

Get the absolute path of a file added through spark.addFile.

spark.getSparkFilesRootDirectory

Get the root directory that contains files added through spark.addFile.

spark.kmeans

K-Means Clustering Model

spark.kstest

(One-Sample) Kolmogorov-Smirnov Test

spark.survreg

Accelerated Failure Time (AFT) Survival Regression Model

spark.svmLinear

Linear SVM Model

sparkRSQL.init

(Deprecated) Initialize a new SQLContext

S4 class that represents an LDAModel

RandomForestRegressionModel-class

S4 class that represents a RandomForestRegressionModel

GBTClassificationModel-class

S4 class that represents a GBTClassificationModel

GBTRegressionModel-class

S4 class that represents a GBTRegressionModel

LinearSVCModel-class

S4 class that represents an LinearSVCModel

SparkDataFrame-class

S4 class that represents a SparkDataFrame

subset

Subset

unionByName

Return a new SparkDataFrame containing the union of rows, matched by column names

clearCache

Clear Cache

write.jdbc

Save the content of SparkDataFrame to an external database table via JDBC.

write.json

Save the contents of SparkDataFrame as a JSON file

unpersist

Unpersist

clearJobGroup

Clear current job group ID and its description

column_misc_functions

Miscellaneous functions for Column operations

column_math_functions

Math functions for Column operations

createDataFrame

Create a SparkDataFrame

createExternalTable

(Deprecated) Create an external table

crosstab

Computes a pair-wise frequency table of the given columns

Compute the hashCode of an object

Merges two data frames

Create a SparkDataFrame from a text file.

recoverPartitions

Recovers all the partitions in the directory of a table and update the catalog

spark.gaussianMixture

Multivariate Gaussian Mixture Model (GMM)

spark.gbt

Gradient Boosted Tree Model for Regression and Classification

spark.logit

Logistic Regression Model

DecisionTreeRegressionModel-class

S4 class that represents a DecisionTreeRegressionModel

KMeansModel-class

S4 class that represents a KMeansModel

FPGrowthModel-class

S4 class that represents a FPGrowthModel

spark.mlp

Multilayer Perceptron Classification Model

KSTest-class

S4 class that represents an KSTest

AFTSurvivalRegressionModel-class

S4 class that represents a AFTSurvivalRegressionModel

GaussianMixtureModel-class

S4 class that represents a GaussianMixtureModel

attach,SparkDataFrame-method

Attach SparkDataFrame to R search path

ALSModel-class

S4 class that represents an ALSModel

avg

GroupedData-class

S4 class that represents a GroupedData

IsotonicRegressionModel-class

S4 class that represents an IsotonicRegressionModel

S4 class that represents a StreamingQuery

GeneralizedLinearRegressionModel-class

S4 class that represents a generalized linear model

WindowSpec-class

S4 class that represents a WindowSpec

column_aggregate_functions

Aggregate functions for Column operations

between

sparkR.session

Get the existing SparkSession or initialize a new SparkSession.

Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame

status

column_collection_functions

Collection functions for Column operations

cancelJobGroup

Cancel active jobs for the specified group

LogisticRegressionModel-class

S4 class that represents an LogisticRegressionModel

toJSON

MultilayerPerceptronClassificationModel-class

S4 class that represents a MultilayerPerceptronClassificationModel

createTable

Creates a table based on the dataset in a data source

createOrReplaceTempView

Creates a temporary view using the given name.

corr

dropTempView

Drops the temporary view with the given view name in the catalog.

colnames

Column Names of SparkDataFrame

approxQuantile

Calculates the approximate quantiles of numerical columns of a SparkDataFrame

S4 class that represents a SparkDataFrame column

column_nonaggregate_functions

Non-aggregate functions for Column operations

fitted

Get fitted result from a k-means model

write.df

Save the contents of SparkDataFrame to a data source.

withWatermark

write.ml

Saves the MLlib model to the input path

hint

histogram

Compute histogram statistics for given column

freqItems

Finding frequent items for columns, possibly with false positives

write.orc

Save the contents of SparkDataFrame as an ORC file, preserving the schema.

column_datetime_diff_functions

Date time arithmetic functions for Column operations

join

Join

column_datetime_functions

Date time functions for Column operations

last

count

Count

column_string_functions

String functions for Column operations

Returns the current default database

nrow

Returns the number of rows in a SparkDataFrame

dropTempTable

(Deprecated) Drop Temporary Table

dropDuplicates

printSchema

Print Schema of a SparkDataFrame

dapplyCollect

glm,formula,ANY,SparkDataFrame-method

Generalized Linear Models (R-compliant)

Load a fitted MLlib model from the input path.

read.orc

Create a SparkDataFrame from an ORC file.

listColumns

Returns a list of columns for the given table/view in the specified database

install.spark

Download and Install Apache Spark to a Local Directory

listDatabases

Returns a list of databases available

persist

Persist

ncol

Returns the number of columns in a SparkDataFrame

pivot

Pivot a column of the GroupedData and perform the specified aggregation.

dropna

A set of SparkDataFrame functions working with NA values

getLocalProperty

Get a local property set in this thread, or NULL if it is missing. See setLocalProperty.

listFunctions

Returns a list of functions registered in the specified database

isActive

drop

read.parquet

Create a SparkDataFrame from a Parquet file.

Returns a list of tables or views in the specified database

read.stream

Load a streaming SparkDataFrame

Set checkpoint directory

print.structType

Print a Spark StructType.

print.structField

Print a Spark StructField.

rbind

Union two or more SparkDataFrames

read.df

Load a SparkDataFrame

spark.als

Alternating Least Squares (ALS) for Collaborative Filtering

spark.bisectingKmeans

Bisecting K-Means Clustering Model

spark.glm

Generalized Linear Models

refreshByPath

Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path

refreshTable

Invalidates and refreshes all the cached data and metadata of the given table

Decision Tree Model for Regression and Classification

spark.fpGrowth

FP-growth

spark.lapply

Run a function over a list of elements, distributing the computations with Spark

spark.isoreg

Isotonic Regression Model

sparkR.conf

Get Runtime Config from the current active SparkSession

spark.lda

Latent Dirichlet Allocation

sparkR.callJMethod

Call Java Methods

sparkR.callJStatic

Call Static Java Methods

registerTempTable

(Deprecated) Register Temporary Table

sparkR.init

(Deprecated) Initialize a new Spark Context

Compactly display the structure of a dataset

setCurrentDatabase

Sets the current default database

agg

summarize

setJobDescription

Set a human readable description of the current job.

Save the content of SparkDataFrame in a text file at the specified path.

union

Return a new SparkDataFrame containing the union of rows

windowPartitionBy

spark.addFile

Add a file or directory to be downloaded with this Spark job on every node.

spark.naiveBayes

Naive Bayes Models

spark.randomForest

Random Forest Model for Regression and Classification

sparkR.session.stop

Stop the Spark Session and Spark Context

sparkR.uiWebUrl

Get the URL of the SparkUI instance for the current active SparkSession

sparkR.version

Get version of Spark on which this application is running

sparkRHive.init

(Deprecated) Initialize a new HiveContext

tableToDF

Create a SparkDataFrame from a SparkSQL table or view

tables

Tables

with

Evaluate a R expression in an environment constructed from a SparkDataFrame

withColumn

WithColumn

write.parquet

Save the contents of SparkDataFrame as a Parquet file, preserving the schema.

write.stream

Write the streaming SparkDataFrame to a data source.

RandomForestClassificationModel-class

S4 class that represents a RandomForestClassificationModel

DecisionTreeClassificationModel-class

S4 class that represents a DecisionTreeClassificationModel

NaiveBayesModel-class

S4 class that represents a NaiveBayesModel