Learn R Programming

relations (version 0.6-13)

algebra: Relational Algebra

Description

Various “relational algebra”-like operations.

Usage

relation_projection(x, margin = NULL)
relation_selection(x, subset)
relation_cartesian(x, y, ...)
relation_complement(x, y)
relation_intersection(x, y, ...)
relation_union(x, y, ...)
relation_symdiff(x, y)
relation_division(x, y)
relation_remainder(x, y)
relation_join(x, y, ...)
relation_semijoin(x, y, ...)
relation_antijoin(x, y, ...)

Arguments

x, y

Relation objects.

margin

Either a character vector of domain names, or an integer vector of domain indices.

subset

Expression resulting in a logical vector of length equal to the number of tuples in the graph.

...

Relation objects for relation_cartesian(), relation_intersection(), and relation_union(). Otherwise, passed to merge().

Details

These functions provide functionality similar to the corresponding operations defined in relational algebra theory as introduced by Codd (1970). Note, however, that domains in database relations, unlike the concept of relations we use here, are unordered. In fact, a database relation (“table”) is defined as a set of elements called “tuples”, where the “tuple” components are named, but unordered. So in fact, a “tuple” in this sense is a set of mappings from the attribute names into the union of the attribute domains.

The projection of a relation on a specified margin (i.e., a vector of domain names or indices) is the relation obtained when all tuples are restricted to this margin. As a consequence, duplicate tuples are removed.

The selection of a relation is the relation obtained by taking a subset of the relation graph, defined by some logical expression.

The Cartesian product of two relations is obtained by basically building the Cartesian product of all graph elements, but combining the resulting pairs into single tuples.

The union of two relations simply combines the graph elements of both relations; the complement of two relations \(R\) and \(S\) removes the tuples of \(S\) from \(R\).

The intersection (symmetric difference) of two relations is the relation with all tuples they have (do not have) in common.

The division of relation \(R\) by relation \(S\) is the reversed Cartesian product. The result is a relation with the domain unique to \(R\) and containing the maximum number of tuples which, multiplied by \(S\), are contained in \(R\). The remainder of this operation is the complement of \(R\) and the division of \(R\) by \(S\). Note that for both operations, the domain of \(S\) must be contained in the domain of \(R\).

The (natural) join of two relations is their Cartesian product, restricted to the subset where the elements of the common attributes do match. The left/right/full outer join of two relations \(R\) and \(S\) is the union of \(R\)/\(S\)/\(R\) and \(S\), and the inner join of \(R\) and \(S\). The implementation uses merge(), and so the left/right/full outer joins are obtained by setting all.x/all.y/all to TRUE in relation_join(). The domains to be matched are specified using by.

The left (right) semijoin of two relations \(R\) and \(S\) is the join of these, projected to the attributes of \(R\) (\(S\)). Thus, it yields all tuples of \(R\) (\(S\)) participating in the join of \(R\) and \(S\).

The left (right) antijoin of two relations \(R\) and \(S\) is the complement of \(R\) (\(S\)) and the join of both, projected to the attributes of \(R\) (\(S\)). Thus, it yields all tuples of \(R\) (\(S\)) not participating in the join of \(R\) and \(S\).

The operators %><%, %=><%, %><=%, %=><=%, %|><%, %><|%, %|><|%, %|>%, %<|%, and %U% can be used for the Cartesian product, left outer join, right outer join, full outer join, left semi-join, right semi-join, join, left antijoin, right antijoin, and union, respectively.

References

E. F. Codd (1970), A relational model of data for large shared data banks. Communications of the ACM, 13/6, 377--387. tools:::Rd_expr_doi("10.1145/362384.362685").

See Also

relation()

Examples

Run this code
## projection
Person <-
    data.frame(Name = c("Harry", "Sally", "George", "Helena", "Peter"),
               Age = c(34, 28, 29, 54, 34),
               Weight = c(80, 64, 70, 54, 80),
               stringsAsFactors = FALSE)
Person <- as.relation(Person)
relation_table(Person)
relation_table(relation_projection(Person, c("Age", "Weight")))

## selection
relation_table(R1 <- relation_selection(Person, Age < 29))
relation_table(R2 <- relation_selection(Person, Age >= 34))
relation_table(R3 <- relation_selection(Person, Age == Weight))

## union
relation_table(R1 %U% R2)

## works only for the same domains:
relation_table(R2 | R3)

## complement
relation_table(Person - R2)

## intersection
relation_table(relation_intersection(R2, R3))

## works only for the same domains:
relation_table(R2 & R3)

## symmetric difference
relation_table(relation_symdiff(R2, R3))

## Cartesian product
Employee <-
    data.frame(Name =
               c("Harry", "Sally", "George", "Harriet", "John"),
               EmpId = c(3415, 2241, 3401, 2202, 3999),
               DeptName =
               c("Finance", "Sales", "Finance", "Sales", "N.N."),
	       stringsAsFactors = FALSE)
Employee <- as.relation(Employee)
relation_table(Employee)
Dept <- data.frame(DeptName = c("Finance", "Sales", "Production"),
                   Manager = c("George", "Harriet", "Charles"),
                   stringsAsFactors = FALSE)
Dept <- as.relation(Dept)
relation_table(Dept)

relation_table(Employee %><% Dept)

## Natural join
relation_table(Employee %|><|% Dept)

## left (outer) join
relation_table(Employee %=><% Dept)

## right (outer) join
relation_table(Employee %><=% Dept)

## full outer join
relation_table(Employee %=><=% Dept)

## antijoin
relation_table(Employee %|>% Dept)
relation_table(Employee %<|% Dept)

## semijoin
relation_table(Employee %|><% Dept)
relation_table(Employee %><|% Dept)

## division
Completed <-
    data.frame(Student = c("Fred", "Fred", "Fred", "Eugene",
                           "Eugene", "Sara", "Sara"),
               Task = c("Database1", "Database2", "Compiler1",
                        "Database1", "Compiler1", "Database1",
                        "Database2"),
               stringsAsFactors = FALSE)
Completed <- as.relation(Completed)
relation_table(Completed)
DBProject <- data.frame(Task = c("Database1", "Database2"),
                        stringsAsFactors = FALSE)
DBProject <- as.relation(DBProject)
relation_table(DBProject)

relation_table(Completed %/% DBProject)

## division remainder
relation_table(Completed %% DBProject)

Run the code above in your browser using DataLab