Rdocumentation
powered by
Learn R Programming
⚠️
There's a newer version (3.1.2) of this package.
Take me there.
SparkR (version 2.3.0)
R Frontend for Apache Spark
Description
Provides an R Frontend for Apache Spark.
Copy Link
Link to current version
Version
Version
3.1.2
2.4.6
2.4.5
2.4.4
2.4.3
2.4.2
2.4.1
2.3.0
2.1.2
Install
install.packages('SparkR')
Monthly Downloads
119
Version
2.3.0
License
Apache License (== 2.0)
Maintainer
Shivaram Venkataraman
Last Published
March 3rd, 2018
Functions in SparkR (2.3.0)
Search all functions
BisectingKMeansModel-class
S4 class that represents a BisectingKMeansModel
arrange
Arrange Rows by Variables
as.data.frame
Download data from a SparkDataFrame into a R data.frame
cast
Casts the column to a different data type.
checkpoint
checkpoint
coalesce
Coalesce
collect
Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.
column_window_functions
Window functions for Column operations
asc
A set of operations working with SparkDataFrame columns
describe
describe
dim
Returns the dimensions of SparkDataFrame
except
except
explain
Explain
filter
Filter
isLocal
isLocal
first
Return the first row of a SparkDataFrame
localCheckpoint
localCheckpoint
isStreaming
isStreaming
%in%
Match a column with given values.
otherwise
otherwise
orderBy
Ordering Columns in a WindowSpec
print.jobj
Print a JVM object reference.
predict
Makes predictions from a MLlib model
read.jdbc
Create a SparkDataFrame representing the database table accessible via JDBC URL
read.json
Create a SparkDataFrame from a JSON file.
sampleBy
Returns a stratified sample without replacement
saveAsTable
Save the contents of the SparkDataFrame to a data source as a table
setJobGroup
Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
setLocalProperty
Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool.
spark.getSparkFiles
Get the absolute path of a file added through spark.addFile.
spark.getSparkFilesRootDirectory
Get the root directory that contains files added through spark.addFile.
spark.kmeans
K-Means Clustering Model
spark.kstest
(One-Sample) Kolmogorov-Smirnov Test
spark.survreg
Accelerated Failure Time (AFT) Survival Regression Model
spark.svmLinear
Linear SVM Model
sparkRSQL.init
(Deprecated) Initialize a new SQLContext
sql
SQL Query
structType
structType
LDAModel-class
S4 class that represents an LDAModel
RandomForestRegressionModel-class
S4 class that represents a RandomForestRegressionModel
GBTClassificationModel-class
S4 class that represents a GBTClassificationModel
GBTRegressionModel-class
S4 class that represents a GBTRegressionModel
LinearSVCModel-class
S4 class that represents an LinearSVCModel
SparkDataFrame-class
S4 class that represents a SparkDataFrame
subset
Subset
unionByName
Return a new SparkDataFrame containing the union of rows, matched by column names
clearCache
Clear Cache
write.jdbc
Save the content of SparkDataFrame to an external database table via JDBC.
write.json
Save the contents of SparkDataFrame as a JSON file
unpersist
Unpersist
clearJobGroup
Clear current job group ID and its description
column_misc_functions
Miscellaneous functions for Column operations
column_math_functions
Math functions for Column operations
createDataFrame
Create a SparkDataFrame
createExternalTable
(Deprecated) Create an external table
crosstab
Computes a pair-wise frequency table of the given columns
crossJoin
CrossJoin
endsWith
endsWith
%<=>%
%<=>%
gapply
gapply
hashCode
Compute the hashCode of an object
gapplyCollect
gapplyCollect
head
Head
lastProgress
lastProgress
limit
Limit
merge
Merges two data frames
mutate
Mutate
rangeBetween
rangeBetween
randomSplit
randomSplit
read.text
Create a SparkDataFrame from a text file.
recoverPartitions
Recovers all the partitions in the directory of a table and update the catalog
rowsBetween
rowsBetween
sample
Sample
schema
Get schema object
select
Select
spark.gaussianMixture
Multivariate Gaussian Mixture Model (GMM)
spark.gbt
Gradient Boosted Tree Model for Regression and Classification
spark.logit
Logistic Regression Model
DecisionTreeRegressionModel-class
S4 class that represents a DecisionTreeRegressionModel
KMeansModel-class
S4 class that represents a KMeansModel
FPGrowthModel-class
S4 class that represents a FPGrowthModel
spark.mlp
Multilayer Perceptron Classification Model
KSTest-class
S4 class that represents an KSTest
AFTSurvivalRegressionModel-class
S4 class that represents a AFTSurvivalRegressionModel
GaussianMixtureModel-class
S4 class that represents a GaussianMixtureModel
attach,SparkDataFrame-method
Attach SparkDataFrame to R search path
ALSModel-class
S4 class that represents an ALSModel
avg
avg
GroupedData-class
S4 class that represents a GroupedData
IsotonicRegressionModel-class
S4 class that represents an IsotonicRegressionModel
awaitTermination
awaitTermination
sparkR.newJObject
Create Java Objects
StreamingQuery-class
S4 class that represents a StreamingQuery
GeneralizedLinearRegressionModel-class
S4 class that represents a generalized linear model
WindowSpec-class
S4 class that represents a WindowSpec
column_aggregate_functions
Aggregate functions for Column operations
between
between
sparkR.session
Get the existing SparkSession or initialize a new SparkSession.
startsWith
startsWith
cacheTable
Cache Table
take
Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame
status
status
column_collection_functions
Collection functions for Column operations
cancelJobGroup
Cancel active jobs for the specified group
LogisticRegressionModel-class
S4 class that represents an LogisticRegressionModel
toJSON
toJSON
MultilayerPerceptronClassificationModel-class
S4 class that represents a MultilayerPerceptronClassificationModel
createTable
Creates a table based on the dataset in a data source
createOrReplaceTempView
Creates a temporary view using the given name.
corr
corr
dropTempView
Drops the temporary view with the given view name in the catalog.
colnames
Column Names of SparkDataFrame
approxQuantile
Calculates the approximate quantiles of numerical columns of a SparkDataFrame
alias
alias
coltypes
coltypes
broadcast
broadcast
column
S4 class that represents a SparkDataFrame column
dtypes
DataTypes
cache
Cache
column_nonaggregate_functions
Non-aggregate functions for Column operations
fitted
Get fitted result from a k-means model
write.df
Save the contents of SparkDataFrame to a data source.
withWatermark
withWatermark
write.ml
Saves the MLlib model to the input path
hint
hint
histogram
Compute histogram statistics for given column
freqItems
Finding frequent items for columns, possibly with false positives
write.orc
Save the contents of SparkDataFrame as an ORC file, preserving the schema.
column_datetime_diff_functions
Date time arithmetic functions for Column operations
join
Join
column_datetime_functions
Date time functions for Column operations
last
last
count
Count
column_string_functions
String functions for Column operations
cov
cov
not
!
cube
cube
dapply
dapply
currentDatabase
Returns the current default database
nrow
Returns the number of rows in a SparkDataFrame
dropTempTable
(Deprecated) Drop Temporary Table
dropDuplicates
dropDuplicates
printSchema
Print Schema of a SparkDataFrame
dapplyCollect
dapplyCollect
glm,formula,ANY,SparkDataFrame-method
Generalized Linear Models (R-compliant)
queryName
queryName
insertInto
insertInto
group_by
GroupBy
distinct
Distinct
read.ml
Load a fitted MLlib model from the input path.
read.orc
Create a SparkDataFrame from an ORC file.
listColumns
Returns a list of columns for the given table/view in the specified database
install.spark
Download and Install Apache Spark to a Local Directory
listDatabases
Returns a list of databases available
persist
Persist
ncol
Returns the number of columns in a SparkDataFrame
pivot
Pivot a column of the GroupedData and perform the specified aggregation.
dropna
A set of SparkDataFrame functions working with NA values
getLocalProperty
Get a local property set in this thread, or
NULL
if it is missing. See
setLocalProperty
.
listFunctions
Returns a list of functions registered in the specified database
isActive
isActive
drop
drop
read.parquet
Create a SparkDataFrame from a Parquet file.
getNumPartitions
getNumPartitions
intersect
Intersect
repartition
Repartition
listTables
Returns a list of tables or views in the specified database
read.stream
Load a streaming SparkDataFrame
selectExpr
SelectExpr
rollup
rollup
over
over
partitionBy
partitionBy
setCheckpointDir
Set checkpoint directory
print.structType
Print a Spark StructType.
print.structField
Print a Spark StructField.
rbind
Union two or more SparkDataFrames
read.df
Load a SparkDataFrame
spark.als
Alternating Least Squares (ALS) for Collaborative Filtering
spark.bisectingKmeans
Bisecting K-Means Clustering Model
spark.glm
Generalized Linear Models
refreshByPath
Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path
refreshTable
Invalidates and refreshes all the cached data and metadata of the given table
setLogLevel
Set new log level
show
show
spark.decisionTree
Decision Tree Model for Regression and Classification
spark.fpGrowth
FP-growth
spark.lapply
Run a function over a list of elements, distributing the computations with Spark
spark.isoreg
Isotonic Regression Model
sparkR.conf
Get Runtime Config from the current active SparkSession
spark.lda
Latent Dirichlet Allocation
sparkR.callJMethod
Call Java Methods
sparkR.callJStatic
Call Static Java Methods
registerTempTable
(Deprecated) Register Temporary Table
sparkR.init
(Deprecated) Initialize a new Spark Context
stopQuery
stopQuery
storageLevel
StorageLevel
rename
rename
substr
substr
str
Compactly display the structure of a dataset
setCurrentDatabase
Sets the current default database
agg
summarize
setJobDescription
Set a human readable description of the current job.
uncacheTable
Uncache Table
structField
structField
summary
summary
showDF
showDF
tableNames
Table Names
windowOrderBy
windowOrderBy
write.text
Save the content of SparkDataFrame in a text file at the specified path.
union
Return a new SparkDataFrame containing the union of rows
windowPartitionBy
windowPartitionBy
spark.addFile
Add a file or directory to be downloaded with this Spark job on every node.
spark.naiveBayes
Naive Bayes Models
spark.randomForest
Random Forest Model for Regression and Classification
sparkR.session.stop
Stop the Spark Session and Spark Context
sparkR.uiWebUrl
Get the URL of the SparkUI instance for the current active SparkSession
sparkR.version
Get version of Spark on which this application is running
sparkRHive.init
(Deprecated) Initialize a new HiveContext
tableToDF
Create a SparkDataFrame from a SparkSQL table or view
tables
Tables
with
Evaluate a R expression in an environment constructed from a SparkDataFrame
withColumn
WithColumn
write.parquet
Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
write.stream
Write the streaming SparkDataFrame to a data source.
RandomForestClassificationModel-class
S4 class that represents a RandomForestClassificationModel
DecisionTreeClassificationModel-class
S4 class that represents a DecisionTreeClassificationModel
NaiveBayesModel-class
S4 class that represents a NaiveBayesModel