Rdocumentation
powered by
Learn R Programming
SparkR (version 3.1.2)
R Front End for 'Apache Spark'
Description
Provides an R Front end for 'Apache Spark'
.
Copy Link
Link to current version
Version
Version
3.1.2
2.4.6
2.4.5
2.4.4
2.4.3
2.4.2
2.4.1
2.3.0
2.1.2
Install
install.packages('SparkR')
Monthly Downloads
155
Version
3.1.2
License
Apache License (== 2.0)
Maintainer
Felix Cheung
Last Published
June 3rd, 2021
Functions in SparkR (3.1.2)
Search all functions
ALSModel-class
S4 class that represents an ALSModel
FMClassificationModel-class
S4 class that represents a FMClassificationModel
DecisionTreeClassificationModel-class
S4 class that represents a DecisionTreeClassificationModel
GBTRegressionModel-class
S4 class that represents a GBTRegressionModel
BisectingKMeansModel-class
S4 class that represents a BisectingKMeansModel
DecisionTreeRegressionModel-class
S4 class that represents a DecisionTreeRegressionModel
AFTSurvivalRegressionModel-class
S4 class that represents a AFTSurvivalRegressionModel
LinearSVCModel-class
S4 class that represents an LinearSVCModel
FMRegressionModel-class
S4 class that represents a FMRegressionModel
GBTClassificationModel-class
S4 class that represents a GBTClassificationModel
FPGrowthModel-class
S4 class that represents a FPGrowthModel
PrefixSpan-class
S4 class that represents a PrefixSpan
PowerIterationClustering-class
S4 class that represents a PowerIterationClustering
between
between
GeneralizedLinearRegressionModel-class
S4 class that represents a generalized linear model
NaiveBayesModel-class
S4 class that represents a NaiveBayesModel
GaussianMixtureModel-class
S4 class that represents a GaussianMixtureModel
MultilayerPerceptronClassificationModel-class
S4 class that represents a MultilayerPerceptronClassificationModel
as.data.frame
Download data from a SparkDataFrame into a R data.frame
broadcast
broadcast
GroupedData-class
S4 class that represents a GroupedData
checkpoint
checkpoint
KMeansModel-class
S4 class that represents a KMeansModel
clearCache
Clear Cache
corr
corr
column_collection_functions
Collection functions for Column operations
column_avro_functions
Avro processing functions for Column operations
count
Count
cube
cube
LDAModel-class
S4 class that represents an LDAModel
currentDatabase
Returns the current default database
LinearRegressionModel-class
S4 class that represents a LinearRegressionModel
dropFields
dropFields
dropDuplicates
dropDuplicates
LogisticRegressionModel-class
S4 class that represents an LogisticRegressionModel
RandomForestClassificationModel-class
S4 class that represents a RandomForestClassificationModel
RandomForestRegressionModel-class
S4 class that represents a RandomForestRegressionModel
attach,SparkDataFrame-method
Attach SparkDataFrame to R search path
column
S4 class that represents a SparkDataFrame column
approxQuantile
Calculates the approximate quantiles of numerical columns of a SparkDataFrame
glm,formula,ANY,SparkDataFrame-method
Generalized Linear Models (R-compliant)
KSTest-class
S4 class that represents an KSTest
cache
Cache
WindowSpec-class
S4 class that represents a WindowSpec
alias
alias
group_by
GroupBy
IsotonicRegressionModel-class
S4 class that represents an IsotonicRegressionModel
column_aggregate_functions
Aggregate functions for Column operations
SparkDataFrame-class
S4 class that represents a SparkDataFrame
avg
avg
awaitTermination
awaitTermination
StreamingQuery-class
S4 class that represents a StreamingQuery
cancelJobGroup
Cancel active jobs for the specified group
arrange
Arrange Rows by Variables
hashCode
Compute the hashCode of an object
column_ml_functions
ML functions for Column operations
head
Head
column_datetime_diff_functions
Date time arithmetic functions for Column operations
cast
Casts the column to a different data type.
join
Join
clearJobGroup
Clear current job group ID and its description
timestamp_seconds
Date time functions for Column operations
last
last
not
!
cov
cov
createTable
Creates a table based on the dataset in a data source
nrow
Returns the number of rows in a SparkDataFrame
printSchema
Print Schema of a SparkDataFrame
cacheTable
Cache Table
column_nonaggregate_functions
Non-aggregate functions for Column operations
collect
Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.
coltypes
coltypes
distinct
Distinct
drop
drop
create_lambda
Create o.a.s.sql.expressions.LambdaFunction corresponding to transformation described by func. Used by higher order functions.
dropTempTable
(Deprecated) Drop Temporary Table
createDataFrame
Create a SparkDataFrame
dropTempView
Drops the temporary view with the given view name in the catalog.
coalesce
Coalesce
describe
describe
crosstab
Computes a pair-wise frequency table of the given columns
first
Return the first row of a SparkDataFrame
crossJoin
CrossJoin
filter
Filter
install.spark
Download and Install Apache Spark to a Local Directory
insertInto
insertInto
dropna
A set of SparkDataFrame functions working with NA values
dim
Returns the dimensions of SparkDataFrame
exceptAll
exceptAll
read.jdbc
Create a SparkDataFrame representing the database table accessible via JDBC URL
queryName
queryName
read.text
Create a SparkDataFrame from a text file.
read.json
Create a SparkDataFrame from a JSON file.
select
Select
selectExpr
SelectExpr
recoverPartitions
Recovers all the partitions in the directory of a table and update the catalog
explain
Explain
spark.fmClassifier
Factorization Machines Classification Model
spark.getSparkFilesRootDirectory
Get the root directory that contains files added through spark.addFile.
spark.logit
Logistic Regression Model
sparkR.version
Get version of Spark on which this application is running
column_math_functions
Math functions for Column operations
column_string_functions
String functions for Column operations
spark.fmRegressor
Factorization Machines Regression Model
spark.glm
Generalized Linear Models
spark.mlp
Multilayer Perceptron Classification Model
column_misc_functions
Miscellaneous functions for Column operations
ncol
Returns the number of columns in a SparkDataFrame
persist
Persist
column_window_functions
Window functions for Column operations
pivot
Pivot a column of the GroupedData and perform the specified aggregation.
createOrReplaceTempView
Creates a temporary view using the given name.
createExternalTable
(Deprecated) Create an external table
dtypes
DataTypes
getLocalProperty
Get a local property set in this thread, or
NULL
if it is missing. See
setLocalProperty
.
asc
A set of operations working with SparkDataFrame columns
colnames
Column Names of SparkDataFrame
setJobDescription
Set a human readable description of the current job.
repartitionByRange
Repartition by range
repartition
Repartition
rbind
Union two or more SparkDataFrames
read.df
Load a SparkDataFrame
setJobGroup
Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.
sparkRHive.init
(Deprecated) Initialize a new HiveContext
%<=>%
%<=>%
dapplyCollect
dapplyCollect
invoke_higher_order_function
Invokes higher order function expression identified by name, (relative to o.a.s.sql.catalyst.expressions)
isActive
isActive
getNumPartitions
getNumPartitions
startsWith
startsWith
take
Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame
gapply
gapply
status
status
listColumns
Returns a list of columns for the given table/view in the specified database
spark.isoreg
Isotonic Regression Model
listDatabases
Returns a list of databases available
except
except
spark.survreg
Accelerated Failure Time (AFT) Survival Regression Model
sparkR.callJMethod
Call Java Methods
spark.kmeans
K-Means Clustering Model
spark.svmLinear
Linear SVM Model
sparkR.callJStatic
Call Static Java Methods
endsWith
endsWith
read.ml
Load a fitted MLlib model from the input path.
gapplyCollect
gapplyCollect
hint
hint
histogram
Compute histogram statistics for given column
read.orc
Create a SparkDataFrame from an ORC file.
intersect
Intersect
intersectAll
intersectAll
toJSON
toJSON
with
Evaluate a R expression in an environment constructed from a SparkDataFrame
isLocal
isLocal
setCheckpointDir
Set checkpoint directory
setCurrentDatabase
Sets the current default database
fitted
Get fitted result from a k-means model
limit
Limit
lastProgress
lastProgress
localCheckpoint
localCheckpoint
freqItems
Finding frequent items for columns, possibly with false positives
%in%
Match a column with given values.
structField
structField
str
Compactly display the structure of a dataset
spark.fpGrowth
FP-growth
setLocalProperty
Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool.
spark.gaussianMixture
Multivariate Gaussian Mixture Model (GMM)
setLogLevel
Set new log level
write.stream
Write the streaming SparkDataFrame to a data source.
withColumn
WithColumn
write.text
Save the content of SparkDataFrame in a text file at the specified path.
substr
substr
listFunctions
Returns a list of functions registered in the specified database
unionByName
Return a new SparkDataFrame containing the union of rows, matched by column names
unionAll
Return a new SparkDataFrame containing the union of rows.
agg
summarize
print.structField
Print a Spark StructField.
listTables
Returns a list of tables or views in the specified database
orderBy
Ordering Columns in a WindowSpec
isStreaming
isStreaming
spark.naiveBayes
Naive Bayes Models
partitionBy
partitionBy
mutate
Mutate
over
over
merge
Merges two data frames
spark.assignClusters
PowerIterationClustering
otherwise
otherwise
print.structType
Print a Spark StructType.
sparkR.conf
Get Runtime Config from the current active SparkSession
sparkR.init
(Deprecated) Initialize a new Spark Context
subset
Subset
structType
structType
read.parquet
Create a SparkDataFrame from a Parquet file.
rowsBetween
rowsBetween
rollup
rollup
read.stream
Load a streaming SparkDataFrame
tableToDF
Create a SparkDataFrame from a SparkSQL table or view
tables
Tables
predict
Makes predictions from a MLlib model
schema
Get schema object
saveAsTable
Save the contents of the SparkDataFrame to a data source as a table
show
show
randomSplit
randomSplit
rangeBetween
rangeBetween
showDF
showDF
registerTempTable
(Deprecated) Register Temporary Table
sample
Sample
rename
rename
spark.bisectingKmeans
Bisecting K-Means Clustering Model
sampleBy
Returns a stratified sample without replacement
sparkR.uiWebUrl
Get the URL of the SparkUI instance for the current active SparkSession
sparkR.session.stop
Stop the Spark Session and Spark Context
sparkRSQL.init
(Deprecated) Initialize a new SQLContext
spark.lm
Linear Regression Model
spark.lda
Latent Dirichlet Allocation
sql
SQL Query
print.jobj
Print a JVM object reference.
spark.als
Alternating Least Squares (ALS) for Collaborative Filtering
spark.addFile
Add a file or directory to be downloaded with this Spark job on every node.
refreshByPath
Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path
refreshTable
Invalidates and refreshes all the cached data and metadata of the given table
spark.decisionTree
Decision Tree Model for Regression and Classification
spark.kstest
(One-Sample) Kolmogorov-Smirnov Test
spark.lapply
Run a function over a list of elements, distributing the computations with Spark
sparkR.newJObject
Create Java Objects
spark.randomForest
Random Forest Model for Regression and Classification
storageLevel
StorageLevel
spark.getSparkFiles
Get the absolute path of a file added through spark.addFile.
spark.gbt
Gradient Boosted Tree Model for Regression and Classification
spark.findFrequentSequentialPatterns
PrefixSpan
stopQuery
stopQuery
unpersist
Unpersist
uncacheTable
Uncache Table
sparkR.session
Get the existing SparkSession or initialize a new SparkSession.
summary
summary
write.orc
Save the contents of SparkDataFrame as an ORC file, preserving the schema.
unresolved_named_lambda_var
Create o.a.s.sql.expressions.UnresolvedNamedLambdaVariable, convert it to o.s.sql.Column and wrap with R Column. Used by higher order functions.
write.parquet
Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
windowPartitionBy
windowPartitionBy
write.json
Save the contents of SparkDataFrame as a JSON file
tableNames
Table Names
windowOrderBy
windowOrderBy
write.ml
Saves the MLlib model to the input path
withField
withField
write.df
Save the contents of SparkDataFrame to a data source.
write.jdbc
Save the content of SparkDataFrame to an external database table via JDBC.
withWatermark
withWatermark
union
Return a new SparkDataFrame containing the union of rows
dapply
dapply