sample_sets-class: An S4 class to represent a set of PGS Catalog Sample Sets

Description

The sample_sets object consists of four tables (slots) that combined form a relational database of a subset of PGS Catalog sample sets. Each sample set is an observation (row) in the sample_sets table (first table).

Arguments

Slots

sample_sets

A table of sample sets. Each sample set (row) is uniquely identified by the column pss_id. Columns:

pss_id: A PGS Sample Set identifier. Example: "PSS000042".

samples

A table of samples. Each sample (row) is uniquely identified by the combination of values from the columns: pss_id and sample_id. Columns:

pss_id: A PGS Sample Set identifier. Example: "PSS000042".

sample_id

Sample identifier. This is a surrogate key to identify each sample.

stage

Sample stage: should be always Evaluation ("eval").

sample_size

Number of individuals included in the sample.

sample_cases

Number of cases.

sample_controls

Number of controls.

sample_percent_male

Percentage of male participants.

phenotype_description

Detailed phenotype description.

ancestry_category

Author reported ancestry is mapped to the best matching ancestry category from the NHGRI-EBI GWAS Catalog framework (see ancestry_categories) for possible values.

ancestry

A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).

country

Author reported countries of recruitment (if available).

ancestry_additional_description

Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).

study_id

Associated GWAS Catalog study accession identifier, e.g., "GCST002735".

pubmed_id

PubMed identifier.

cohorts_additional_description

Any additional description about the samples (e.g. sub-cohort information).

demographics

A table of sample demographics' variables. Each demographics' variable (row) is uniquely identified by the combination of values from the columns: pss_id, sample_id, and variable. Columns:

pss_id: A PGS Sample Set identifier. Example: "PSS000042".

sample_id

Sample identifier. This is a surrogate identifier to identify each sample.

variable

Demographics variable. Following columns report about the indicated variable.

estimate_type

Type of statistical estimate for variable.

estimate

The variable's statistical value.

unit

Unit of the variable.

variability_type

Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).

variability

The value of the measure of dispersion.

interval_type

Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).

interval_lower

Interval lower bound.

interval_upper

Interval upper bound.

cohorts

A table of cohorts. Each cohort (row) is uniquely identified by the combination of values from the columns: pss_id, sample_id and cohort_symbol. Columns:

pss_id: A PGS Sample Set identifier. Example: "PSS000042".

sample_id

Sample identifier. This is a surrogate key to identify each sample.

cohort_symbol

Cohort symbol.

cohort_name

Cohort full name.