Learn R Programming

SparkR (version 2.1.2)

group_by: GroupBy

Description

Groups the SparkDataFrame using the specified columns, so we can run aggregation on them.

Usage

group_by(x, ...)

groupBy(x, ...)

# S4 method for SparkDataFrame groupBy(x, ...)

# S4 method for SparkDataFrame group_by(x, ...)

Arguments

x

a SparkDataFrame.

...

variable(s) (character names(s) or Column(s)) to group on.

Value

A GroupedData.

See Also

Other SparkDataFrame functions: SparkDataFrame-class, agg, arrange, as.data.frame, attach, cache, coalesce, collect, colnames, coltypes, createOrReplaceTempView, crossJoin, dapplyCollect, dapply, describe, dim, distinct, dropDuplicates, dropna, drop, dtypes, except, explain, filter, first, gapplyCollect, gapply, getNumPartitions, head, histogram, insertInto, intersect, isLocal, join, limit, merge, mutate, ncol, nrow, persist, printSchema, randomSplit, rbind, registerTempTable, rename, repartition, sample, saveAsTable, schema, selectExpr, select, showDF, show, storageLevel, str, subset, take, union, unpersist, withColumn, with, write.df, write.jdbc, write.json, write.orc, write.parquet, write.text

Examples

Run this code
# NOT RUN {
  # Compute the average for all numeric columns grouped by department.
  avg(groupBy(df, "department"))

  # Compute the max age and average salary, grouped by department and gender.
  agg(groupBy(df, "department", "gender"), salary="avg", "age" -> "max")
# }

Run the code above in your browser using DataLab