Learn R Programming

⚠️There's a newer version (3.1.2) of this package.Take me there.

SparkR (version 2.3.0)

R Frontend for Apache Spark

Description

Provides an R Frontend for Apache Spark.

Copy Link

Version

Install

install.packages('SparkR')

Monthly Downloads

119

Version

2.3.0

License

Apache License (== 2.0)

Maintainer

Shivaram Venkataraman

Last Published

March 3rd, 2018

Functions in SparkR (2.3.0)

BisectingKMeansModel-class

S4 class that represents a BisectingKMeansModel
arrange

Arrange Rows by Variables
as.data.frame

Download data from a SparkDataFrame into a R data.frame
cast

Casts the column to a different data type.
checkpoint

checkpoint
coalesce

Coalesce
collect

Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.
column_window_functions

Window functions for Column operations
asc

A set of operations working with SparkDataFrame columns
describe

describe
dim

Returns the dimensions of SparkDataFrame
except

except
explain

Explain
filter

Filter
isLocal

isLocal
first

Return the first row of a SparkDataFrame
localCheckpoint

localCheckpoint
isStreaming

isStreaming
%in%

Match a column with given values.
otherwise

otherwise
orderBy

Ordering Columns in a WindowSpec
print.jobj

Print a JVM object reference.
predict

Makes predictions from a MLlib model
read.jdbc

Create a SparkDataFrame representing the database table accessible via JDBC URL
read.json

Create a SparkDataFrame from a JSON file.
sampleBy

Returns a stratified sample without replacement
saveAsTable

Save the contents of the SparkDataFrame to a data source as a table
setJobGroup

Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
setLocalProperty

Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool.
spark.getSparkFiles

Get the absolute path of a file added through spark.addFile.
spark.getSparkFilesRootDirectory

Get the root directory that contains files added through spark.addFile.
spark.kmeans

K-Means Clustering Model
spark.kstest

(One-Sample) Kolmogorov-Smirnov Test
spark.survreg

Accelerated Failure Time (AFT) Survival Regression Model
spark.svmLinear

Linear SVM Model
sparkRSQL.init

(Deprecated) Initialize a new SQLContext
sql

SQL Query
structType

structType
LDAModel-class

S4 class that represents an LDAModel
RandomForestRegressionModel-class

S4 class that represents a RandomForestRegressionModel
GBTClassificationModel-class

S4 class that represents a GBTClassificationModel
GBTRegressionModel-class

S4 class that represents a GBTRegressionModel
LinearSVCModel-class

S4 class that represents an LinearSVCModel
SparkDataFrame-class

S4 class that represents a SparkDataFrame
subset

Subset
unionByName

Return a new SparkDataFrame containing the union of rows, matched by column names
clearCache

Clear Cache
write.jdbc

Save the content of SparkDataFrame to an external database table via JDBC.
write.json

Save the contents of SparkDataFrame as a JSON file
unpersist

Unpersist
clearJobGroup

Clear current job group ID and its description
column_misc_functions

Miscellaneous functions for Column operations
column_math_functions

Math functions for Column operations
createDataFrame

Create a SparkDataFrame
createExternalTable

(Deprecated) Create an external table
crosstab

Computes a pair-wise frequency table of the given columns
crossJoin

CrossJoin
endsWith

endsWith
%<=>%

%<=>%
gapply

gapply
hashCode

Compute the hashCode of an object
gapplyCollect

gapplyCollect
head

Head
lastProgress

lastProgress
limit

Limit
merge

Merges two data frames
mutate

Mutate
rangeBetween

rangeBetween
randomSplit

randomSplit
read.text

Create a SparkDataFrame from a text file.
recoverPartitions

Recovers all the partitions in the directory of a table and update the catalog
rowsBetween

rowsBetween
sample

Sample
schema

Get schema object
select

Select
spark.gaussianMixture

Multivariate Gaussian Mixture Model (GMM)
spark.gbt

Gradient Boosted Tree Model for Regression and Classification
spark.logit

Logistic Regression Model
DecisionTreeRegressionModel-class

S4 class that represents a DecisionTreeRegressionModel
KMeansModel-class

S4 class that represents a KMeansModel
FPGrowthModel-class

S4 class that represents a FPGrowthModel
spark.mlp

Multilayer Perceptron Classification Model
KSTest-class

S4 class that represents an KSTest
AFTSurvivalRegressionModel-class

S4 class that represents a AFTSurvivalRegressionModel
GaussianMixtureModel-class

S4 class that represents a GaussianMixtureModel
attach,SparkDataFrame-method

Attach SparkDataFrame to R search path
ALSModel-class

S4 class that represents an ALSModel
avg

avg
GroupedData-class

S4 class that represents a GroupedData
IsotonicRegressionModel-class

S4 class that represents an IsotonicRegressionModel
awaitTermination

awaitTermination
sparkR.newJObject

Create Java Objects
StreamingQuery-class

S4 class that represents a StreamingQuery
GeneralizedLinearRegressionModel-class

S4 class that represents a generalized linear model
WindowSpec-class

S4 class that represents a WindowSpec
column_aggregate_functions

Aggregate functions for Column operations
between

between
sparkR.session

Get the existing SparkSession or initialize a new SparkSession.
startsWith

startsWith
cacheTable

Cache Table
take

Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame
status

status
column_collection_functions

Collection functions for Column operations
cancelJobGroup

Cancel active jobs for the specified group
LogisticRegressionModel-class

S4 class that represents an LogisticRegressionModel
toJSON

toJSON
MultilayerPerceptronClassificationModel-class

S4 class that represents a MultilayerPerceptronClassificationModel
createTable

Creates a table based on the dataset in a data source
createOrReplaceTempView

Creates a temporary view using the given name.
corr

corr
dropTempView

Drops the temporary view with the given view name in the catalog.
colnames

Column Names of SparkDataFrame
approxQuantile

Calculates the approximate quantiles of numerical columns of a SparkDataFrame
alias

alias
coltypes

coltypes
broadcast

broadcast
column

S4 class that represents a SparkDataFrame column
dtypes

DataTypes
cache

Cache
column_nonaggregate_functions

Non-aggregate functions for Column operations
fitted

Get fitted result from a k-means model
write.df

Save the contents of SparkDataFrame to a data source.
withWatermark

withWatermark
write.ml

Saves the MLlib model to the input path
hint

hint
histogram

Compute histogram statistics for given column
freqItems

Finding frequent items for columns, possibly with false positives
write.orc

Save the contents of SparkDataFrame as an ORC file, preserving the schema.
column_datetime_diff_functions

Date time arithmetic functions for Column operations
join

Join
column_datetime_functions

Date time functions for Column operations
last

last
count

Count
column_string_functions

String functions for Column operations
cov

cov
not

!
cube

cube
dapply

dapply
currentDatabase

Returns the current default database
nrow

Returns the number of rows in a SparkDataFrame
dropTempTable

(Deprecated) Drop Temporary Table
dropDuplicates

dropDuplicates
printSchema

Print Schema of a SparkDataFrame
dapplyCollect

dapplyCollect
glm,formula,ANY,SparkDataFrame-method

Generalized Linear Models (R-compliant)
queryName

queryName
insertInto

insertInto
group_by

GroupBy
distinct

Distinct
read.ml

Load a fitted MLlib model from the input path.
read.orc

Create a SparkDataFrame from an ORC file.
listColumns

Returns a list of columns for the given table/view in the specified database
install.spark

Download and Install Apache Spark to a Local Directory
listDatabases

Returns a list of databases available
persist

Persist
ncol

Returns the number of columns in a SparkDataFrame
pivot

Pivot a column of the GroupedData and perform the specified aggregation.
dropna

A set of SparkDataFrame functions working with NA values
getLocalProperty

Get a local property set in this thread, or NULL if it is missing. See setLocalProperty.
listFunctions

Returns a list of functions registered in the specified database
isActive

isActive
drop

drop
read.parquet

Create a SparkDataFrame from a Parquet file.
getNumPartitions

getNumPartitions
intersect

Intersect
repartition

Repartition
listTables

Returns a list of tables or views in the specified database
read.stream

Load a streaming SparkDataFrame
selectExpr

SelectExpr
rollup

rollup
over

over
partitionBy

partitionBy
setCheckpointDir

Set checkpoint directory
print.structType

Print a Spark StructType.
print.structField

Print a Spark StructField.
rbind

Union two or more SparkDataFrames
read.df

Load a SparkDataFrame
spark.als

Alternating Least Squares (ALS) for Collaborative Filtering
spark.bisectingKmeans

Bisecting K-Means Clustering Model
spark.glm

Generalized Linear Models
refreshByPath

Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path
refreshTable

Invalidates and refreshes all the cached data and metadata of the given table
setLogLevel

Set new log level
show

show
spark.decisionTree

Decision Tree Model for Regression and Classification
spark.fpGrowth

FP-growth
spark.lapply

Run a function over a list of elements, distributing the computations with Spark
spark.isoreg

Isotonic Regression Model
sparkR.conf

Get Runtime Config from the current active SparkSession
spark.lda

Latent Dirichlet Allocation
sparkR.callJMethod

Call Java Methods
sparkR.callJStatic

Call Static Java Methods
registerTempTable

(Deprecated) Register Temporary Table
sparkR.init

(Deprecated) Initialize a new Spark Context
stopQuery

stopQuery
storageLevel

StorageLevel
rename

rename
substr

substr
str

Compactly display the structure of a dataset
setCurrentDatabase

Sets the current default database
agg

summarize
setJobDescription

Set a human readable description of the current job.
uncacheTable

Uncache Table
structField

structField
summary

summary
showDF

showDF
tableNames

Table Names
windowOrderBy

windowOrderBy
write.text

Save the content of SparkDataFrame in a text file at the specified path.
union

Return a new SparkDataFrame containing the union of rows
windowPartitionBy

windowPartitionBy
spark.addFile

Add a file or directory to be downloaded with this Spark job on every node.
spark.naiveBayes

Naive Bayes Models
spark.randomForest

Random Forest Model for Regression and Classification
sparkR.session.stop

Stop the Spark Session and Spark Context
sparkR.uiWebUrl

Get the URL of the SparkUI instance for the current active SparkSession
sparkR.version

Get version of Spark on which this application is running
sparkRHive.init

(Deprecated) Initialize a new HiveContext
tableToDF

Create a SparkDataFrame from a SparkSQL table or view
tables

Tables
with

Evaluate a R expression in an environment constructed from a SparkDataFrame
withColumn

WithColumn
write.parquet

Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
write.stream

Write the streaming SparkDataFrame to a data source.
RandomForestClassificationModel-class

S4 class that represents a RandomForestClassificationModel
DecisionTreeClassificationModel-class

S4 class that represents a DecisionTreeClassificationModel
NaiveBayesModel-class

S4 class that represents a NaiveBayesModel