Learn R Programming

PGRdup (version 0.2.3.9)

read.genesys: Convert 'Darwin Core - Germplasm' zip archive to a flat file

Description

read.genesys reads PGR data in a Darwin Core - germplasm zip archive downloaded from genesys database and creates a flat file data.frame from it.

Usage

read.genesys(zip.genesys, scrub.names.space = TRUE, readme = TRUE)

Value

A data.frame with the flat file form of the genesys data.

Arguments

zip.genesys

A character vector giving the file path to the downloaded zip file from Genesys.

scrub.names.space

logical. If TRUE, all space characters are removed from name field in names extension (see Details).

readme

logical. If TRUE, the genesys zip file readme is printed to console.

Details

This function helps to import to R environment, the PGR data downloaded from genesys database https://www.genesys-pgr.org/ as a Darwin Core - germplasm (DwC-germplasm) zip archive. The different csv files in the archive are merged as a flat file into a single data.frame.

All the space characters can be removed from the fields corresponding to accession names such as acceNumb, collNumb, ACCENAME, COLLNUMB, DONORNUMB and OTHERNUMB using the argument scrub.names.space to facilitate creation of KWIC index with KWIC function and subsequent matching operations to identify probable duplicates with ProbDup function.

The argument readme can be used to print the readme file in the archive to console, if required.

See Also

Examples

Run this code

# \dontshow{
threads_dt <- data.table::getDTthreads()
threads_OMP <- Sys.getenv("OMP_THREAD_LIMIT")
data.table::setDTthreads(2)

data.table::setDTthreads(2)
Sys.setenv(`OMP_THREAD_LIMIT` = 2)
# }

if (FALSE) {
# Import the DwC-Germplasm zip archive "genesys-accessions-filtered.zip"
PGRgenesys <- read.genesys("genesys-accessions-filtered.zip",
                           scrub.names.space = TRUE, readme = TRUE)
}

# \dontshow{
data.table::setDTthreads(threads_dt)
Sys.setenv(`OMP_THREAD_LIMIT` = threads_OMP)
# }

Run the code above in your browser using DataLab