The sample_sets object consists of four tables (slots) that combined form a
relational database of a subset of PGS Catalog sample sets. Each sample set
is an observation (row) in the sample_sets
table (first table).
sample_sets
A table of sample sets. Each sample set (row) is uniquely
identified by the column pss_id
. Columns:
A PGS Sample Set identifier. Example: "PSS000042"
.
samples
A table of samples. Each sample (row) is uniquely identified by
the combination of values from the columns: pss_id
and
sample_id
. Columns:
A PGS Sample Set identifier. Example: "PSS000042"
.
Sample identifier. This is a surrogate key to identify each sample.
Sample stage: should be always Evaluation ("eval"
).
Number of individuals included in the sample.
Number of cases.
Number of controls.
Percentage of male participants.
Detailed phenotype description.
Author reported ancestry is mapped to the best matching
ancestry category from the NHGRI-EBI GWAS Catalog framework (see
ancestry_categories
) for possible values.
A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).
Author reported countries of recruitment (if available).
Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).
Associated GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.
PubMed identifier.
Any additional description about the samples (e.g. sub-cohort information).
demographics
A table of sample demographics' variables. Each
demographics' variable (row) is uniquely identified by the combination of
values from the columns: pss_id
, sample_id
, and
variable
. Columns:
A PGS Sample Set identifier. Example: "PSS000042"
.
Sample identifier. This is a surrogate identifier to identify each sample.
Demographics variable. Following columns report about the indicated variable.
Type of statistical estimate for variable.
The variable's statistical value.
Unit of the variable.
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
The value of the measure of dispersion.
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
Interval lower bound.
Interval upper bound.
cohorts
A table of cohorts. Each cohort (row) is uniquely identified by
the combination of values from the columns: pss_id
, sample_id
and cohort_symbol
. Columns:
A PGS Sample Set identifier. Example: "PSS000042"
.
Sample identifier. This is a surrogate key to identify each sample.
Cohort symbol.
Cohort full name.