Learn R Programming

autodb

autodb is an R package for automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.)

Installation

You can install the development version of autodb from GitHub with:

# install.packages("devtools")
devtools::install_github("CharnelMouse/autodb")

Example

Turning a simple data frame into a database:

library(autodb)
#> 
#> Attaching package: 'autodb'
#> The following object is masked from 'package:stats':
#> 
#>     decompose
summary(ChickWeight)
#>      weight           Time           Chick     Diet   
#>  Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
#>  1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
#>  Median :103.0   Median :10.00   20     : 12   3:120  
#>  Mean   :121.8   Mean   :10.72   10     : 12   4:118  
#>  3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
#>  Max.   :373.0   Max.   :21.00   19     : 12          
#>                                  (Other):506
db <- autodb(ChickWeight, name = "ChickWeight")
db
#> database ChickWeight with 2 relations
#> 4 attributes: weight, Time, Chick, Diet
#> relation Chick: Chick, Diet; 50 records
#>   key 1: Chick
#> relation Time_Chick: Time, Chick, weight; 578 records
#>   key 1: Time, Chick
#> references:
#> Time_Chick.{Chick} -> Chick.{Chick}
graphviz_text <- gv(db)
DiagrammeR::grViz(graphviz_text)

Using the exclude argument to forbid certain variables from appearing in keys:

summary(CO2)
#>      Plant             Type         Treatment       conc          uptake     
#>  Qn1    : 7   Quebec     :42   nonchilled:42   Min.   :  95   Min.   : 7.70  
#>  Qn2    : 7   Mississippi:42   chilled   :42   1st Qu.: 175   1st Qu.:17.90  
#>  Qn3    : 7                                    Median : 350   Median :28.30  
#>  Qc1    : 7                                    Mean   : 435   Mean   :27.21  
#>  Qc3    : 7                                    3rd Qu.: 675   3rd Qu.:37.12  
#>  Qc2    : 7                                    Max.   :1000   Max.   :45.50  
#>  (Other):42
db2_noexclude <- autodb(CO2, name = "CO2")
db2_noexclude
#> database CO2 with 3 relations
#> 5 attributes: Plant, Type, Treatment, conc, uptake
#> relation Plant: Plant, Type, Treatment; 12 records
#>   key 1: Plant
#> relation Plant_conc: Plant, conc, Treatment, uptake; 84 records
#>   key 1: Plant, conc
#>   key 2: Treatment, conc, uptake
#> relation conc_uptake: conc, uptake, Type; 82 records
#>   key 1: conc, uptake
#> references:
#> Plant_conc.{Plant} -> Plant.{Plant}
#> Plant_conc.{conc, uptake} -> conc_uptake.{conc, uptake}
graphviz_text2_noexclude <- gv(db2_noexclude)
DiagrammeR::grViz(graphviz_text2_noexclude)
db2 <- autodb(CO2, name = "CO2", exclude = "uptake")
db2
#> database CO2 with 2 relations
#> 5 attributes: Plant, Type, Treatment, conc, uptake
#> relation Plant: Plant, Type, Treatment; 12 records
#>   key 1: Plant
#> relation Plant_conc: Plant, conc, uptake; 84 records
#>   key 1: Plant, conc
#> references:
#> Plant_conc.{Plant} -> Plant.{Plant}
graphviz_text2 <- gv(db2)
DiagrammeR::grViz(graphviz_text2)

There are also functions for doing each step of the database creation separately, including functional dependency detection and normalisation. See the vignette for more details.

Copy Link

Version

Install

install.packages('autodb')

Version

2.3.1

License

BSD_3_clause + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Webster

Last Published

March 19th, 2025

Functions in autodb (2.3.1)

gv.data.frame

Generate Graphviz input text to plot a data frame
detset

Determinant sets
df_equiv

Test data frames for equivalence under row reordering
gv

Generate Graphviz input text to plot objects
df_rbind

Combine R Objects by Rows or Columns
df_duplicated

Determine Duplicate Elements
gv.database_schema

Generate Graphviz input text to plot database schemas
gv.database

Generate Graphviz input text to plot databases
discover

Dependency discovery with DFD
functional_dependency

Functional dependency vectors
nudge

Nudge meta-analysis data
merge_empty_keys

Merge relation schemas with empty keys
gv.relation

Generate Graphviz input text to plot relations
records

Relational data records
insert

Insert data
normalise

Create normalised database schemas from functional dependencies
keys

Relational data keys
gv.relation_schema

Generate Graphviz input text to plot relation schemas
reduce

Remove relations not linked to the main relations
reduce.database

Remove database relations not linked to the main relations
reduce.database_schema

Remove database schema relations not linked to the given relations
subschemas

Schema subschemas
synthesise

Synthesise relation schemas from functional dependencies
rename_attrs

Rename relational data attributes
merge_schemas

Merge relation schemas in given pairs
subrelations

Database subrelations
references

Schema references
rejoin

Join a database into a data frame
relation

Relation vectors
relation_schema

Relation schema vectors
autodb-package

Database-style normalisation for data.frames
autoref

Add foreign key references to a normalised database
attrs_order

Relational data attribute order
database_schema

Database schemas
dependant

Dependants
autodb

Create a normalised database from a data frame
create

Create instance of a schema
database

Databases
decompose

Decompose a data frame based on given normalised dependencies
attrs

Relational data attributes