Learn R Programming

arkhe

Overview

A dependency-free collection of simple functions for cleaning rectangular data. This package allows to detect, count and replace values or discard rows/columns using a predicate function. In addition, it provides tools to check conditions and return informative error messages.


To cite arkhe in publications use:

Frerebeau N (2025). arkhe: Tools for Cleaning Rectangular Data. Université Bordeaux Montaigne, Pessac, France. doi:10.5281/zenodo.3526659 https://doi.org/10.5281/zenodo.3526659, R package version 1.10.0, https://packages.tesselle.org/arkhe/.

This package is a part of the tesselle project https://www.tesselle.org.

Installation

You can install the released version of arkhe from CRAN with:

install.packages("arkhe")

And the development version from Codeberg with:

# install.packages("remotes")
remotes::install_git("https://codeberg.org/tesselle/arkhe")

Usage

## Load the package
library(arkhe)

## Set seed for reproductibility
set.seed(12345)

## Create a matrix
X <- matrix(sample(1:10, 25, TRUE), nrow = 5, ncol = 5)

## Add NA
k <- sample(1:25, 3, FALSE)
X[k] <- NA
X
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    2    1    4    4
#> [2,]   10    6    8    8   10
#> [3,]    8   NA    7   10    7
#> [4,]   NA   NA    6    3    2
#> [5,]    8   10    1    9    4

## Count missing values in rows
count(X, f = is.na, margin = 1)
#> [1] 0 0 1 2 0

## Count non-missing values in columns
count(X, f = is.na, margin = 2, negate = TRUE)
#> [1] 4 3 5 5 5

## Find row with NA
detect(X, f = is.na, margin = 1)
#> [1] FALSE FALSE  TRUE  TRUE FALSE

## Find column without any NA
detect(X, f = is.na, margin = 2, negate = TRUE, all = TRUE)
#> [1] FALSE FALSE  TRUE  TRUE  TRUE

## Remove row with any NA
discard(X, f = is.na, margin = 1, all = FALSE)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    2    1    4    4
#> [2,]   10    6    8    8   10
#> [3,]    8   10    1    9    4

## Remove column with any NA
discard(X, f = is.na, margin = 2, all = FALSE)
#>      [,1] [,2] [,3]
#> [1,]    1    4    4
#> [2,]    8    8   10
#> [3,]    7   10    7
#> [4,]    6    3    2
#> [5,]    1    9    4

## Replace NA with zeros
replace_NA(X, value = 0)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    2    1    4    4
#> [2,]   10    6    8    8   10
#> [3,]    8    0    7   10    7
#> [4,]    0    0    6    3    2
#> [5,]    8   10    1    9    4

Translation

This package provides translations of user-facing communications, like messages, warnings and errors. The preferred language is by default taken from the locale. This can be overridden by setting of the environment variable LANGUAGE (you only need to do this once per session):

Sys.setenv(LANGUAGE = "<language code>")

Languages currently available are English (en) and French (fr).

Contributing

Please note that the arkhe project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('arkhe')

Monthly Downloads

1,015

Version

1.10.0

License

GPL (>= 3)

Maintainer

Nicolas Frerebeau

Last Published

February 25th, 2025

Functions in arkhe (1.10.0)

check_class

Class Diagnostic
describe

Data Description
assert_missing

Check Missing Values
compact

Remove Empty Rows/Columns
assert_square

Check Matrix
assert_type

Check Data Types
assert_names

Check Object Names
math_lcm

Least Common Multiple
confidence_binomial

Confidence Interval for Binomial Proportions
confidence_mean

Confidence Interval for a Mean
clean_whitespace

Remove Leading/Trailing Whitespace
assert_numeric

Check Numeric Values
concat

Concatenate
assert_package

Check the Availability of a Package
math_gcd

Greatest Common Divisor
predicate-attributes

Attributes Predicates
conditions

Conditions
detect

Find Rows/Columns Using a Predicate
keep

Keep Rows/Columns Using a Predicate
interval_credible

Bayesian Credible Interval
label_percent

Label Percentages
null

Default value for NULL
confidence_multinomial

Confidence Interval for Multinomial Proportions
count

Count Values Using a Predicate
get

Get Rows/Columns by Name
discard

Remove Rows/Columns Using a Predicate
is_scalar

Scalar Type Predicates
replace_NA

Replace Missing Values
jackknife

Jackknife Estimation
replace_empty

Replace Empty String
predicate-matrix

Matrix Predicates
sparsity

Sparsity
predicate-data

Utility Predicates
predicate-trend

Numeric Trend Predicates
predicate-type

Type Predicates
validate

Validate a Condition
remove_Inf

Remove Rows/Columns with Infinite Values
remove_NA

Remove Rows/Columns with Missing Values
with_seed

Evaluate an Expression with a Temporarily Seed
interval_hdr

Highest Density Regions
remove_empty

Remove Rows/Columns with Empty String
remove_constant

Remove Constant Columns
remove_zero

Remove Rows/Columns with Zeros
replace_Inf

Replace Infinite Values
scale_range

Rescale Continuous Vector (minimum, maximum)
scale_midpoint

Rescale Continuous Vector (minimum, midpoint, maximum)
predicate-numeric

Numeric Predicates
replace_zero

Replace Zeros
predicate-names

Names Predicates
seek

Search Rows/Columns by Name
arkhe-package

arkhe: Tools for Cleaning Rectangular Data
assert_infinite

Check Infinite Values
assert_dim

Check Object Dimensions
assert_length

Check Object Length(s)
assert_empty

Check Object Filling
assert_lower

Check Numeric Relations
assert_constant

Check Numeric Trend
append_column

Add a (Named) Vector as a Column
append_rownames

Convert Row Names to an Explicit Column
assign

Assign a Specific Row/Column to the Column/Row Names
arkhe-deprecated

Deprecated Functions in arkhe
bootstrap

Bootstrap Estimation
assert_unique

Check Duplicates