performance_metrics-class: An S4 class to represent a set of PGS Catalog Performance Metrics

Description

The performance_metrics object consists of nine tables (slots) that combined form a relational database of a subset of performance metrics. Each performance metric is an observation (row) in the scores table (first table).

Arguments

Slots

performance_metrics

A table of PGS Performance Metrics (PPM). Each PPM (row) is uniquely identified by the ppm_id column. Columns:

ppm_id: A PGS Performance Metrics identifier. Example: "PPM000001".

pgs_id

Polygenic Score (PGS) identifier.

reported_trait

The author-reported trait that the PGS has been developed to predict. Example: "Breast Cancer".

covariates

Comma-separated list of covariates used in the prediction model to evaluate the PGS.

comments

Any other information relevant to the understanding of the performance metrics.

publications

A table of publications. Each publication (row) is uniquely identified by the column pgp_id. Columns:

ppm_id: A PGS Performance Metrics identifier. Example: "PPM000001".

pgp_id

PGS Publication identifier. Example: "PGP000001".

pubmed_id

PubMed identifier. Example: "25855707".

publication_date

Publication date. Example: "2020-09-28". Note that the class of publication_date is Date.

publication

Abbreviated name of the journal. Example: "Am J Hum Genet".

title

Publication title.

author_fullname

First author of the publication. Example: 'Mavaddat N'.

doi

Digital Object Identifier (DOI). This variable is also curated to allow unpublished work (e.g. preprints) to be added to the catalog. Example: "10.1093/jnci/djv036".

sample_sets

A table of sample sets. Each sample set (row) is uniquely identified by the column pss_id. Columns:

ppm_id: A PGS Performance Metrics identifier. Example: "PPM000001".

pss_id

A PGS Sample Set identifier. Example: "PSS000042".

samples

A table of samples. Each sample (row) is uniquely identified by the combination of values from the columns: ppm_id, pss_id, and sample_id. Columns:

ppm_id: A PGS Performance Metrics identifier. Example: "PPM000001".

pss_id

A PGS Sample Set identifier. Example: "PSS000042".

sample_id

Sample identifier. This is a surrogate key to identify each sample.

stage

Sample stage: should be always Evaluation ("eval").

sample_size

Number of individuals included in the sample.

sample_cases

Number of cases.

sample_controls

Number of controls.

sample_percent_male

Percentage of male participants.

phenotype_description

Detailed phenotype description.

ancestry_category

Author reported ancestry is mapped to the best matching ancestry category from the NHGRI-EBI GWAS Catalog framework (see ancestry_categories) for possible values.

ancestry

A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).

country

Author reported countries of recruitment (if available).

ancestry_additional_description

Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).

study_id

Associated GWAS Catalog study accession identifier, e.g., "GCST002735".

pubmed_id

PubMed identifier.

cohorts_additional_description

Any additional description about the samples (e.g. sub-cohort information).

demographics

A table of sample demographics' variables. Each demographics' variable (row) is uniquely identified by the combination of values from the columns: ppm_id, pss_id, sample_id, and variable. Columns:

ppm_id: A PGS Performance Metrics identifier. Example: "PPM000001".

pss_id

A PGS Sample Set identifier. Example: "PSS000042".

sample_id

Sample identifier. This is a surrogate identifier to identify each sample.

variable

Demographics variable. Following columns report about the indicated variable.

estimate_type

Type of statistical estimate for variable.

estimate

The variable's statistical value.

unit

Unit of the variable.

variability_type

Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).

variability

The value of the measure of dispersion.

interval_type

Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).

interval_lower

Interval lower bound.

interval_upper

Interval upper bound.

cohorts

A table of cohorts. Each cohort (row) is uniquely identified by the combination of values from the columns: ppm_id, sample_id and cohort_symbol. Columns:

ppm_id: A PGS Performance Metrics identifier. Example: "PPM000001".

sample_id

Sample identifier. This is a surrogate key to identify each sample.

cohort_symbol

Cohort symbol.

cohort_name

Cohort full name.

pgs_effect_sizes

A table of effect sizes per standard deviation change in PGS. Examples include regression coefficients (betas) for continuous traits, odds ratios (OR) and/or hazard ratios (HR) for dichotomous traits depending on the availability of time-to-event data. Each effect size is uniquely identified by the combination of values from the columns: ppm_id and effect_size_id. Columns:

ppm_id: A PGS Performance Metrics identifier. Example: "PPM000001".

effect_size_id

Effect size identifier. This is a surrogate identifier to identify each effect size.

estimate_type_long

Long notation of the effect size (e.g. Odds Ratio).

estimate_type

Short notation of the effect size (e.g. OR).

estimate

The estimate's value.

unit

Unit of the estimate.

variability_type

Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).

variability

The value of the measure of dispersion.

interval_type

Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).

interval_lower

Interval lower bound.

interval_upper

Interval upper bound.

pgs_classification_metrics

A table of classification metrics. Examples include the Area under the Receiver Operating Characteristic (AUROC) or Harrell's C-index (Concordance statistic). Columns:

ppm_id: A PGS Performance Metrics identifier. Example: "PPM000001".

classification_metrics_id

Classification metric identifier. This is a surrogate identifier to identify each classification metric.

estimate_type_long

Long notation of the classification metric (e.g. Concordance Statistic).

estimate_type

Short notation classification metric (e.g. C-index).

estimate

The estimate's value.

unit

Unit of the estimate.

variability_type

Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).

variability

The value of the measure of dispersion.

interval_type

Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).

interval_lower

Interval lower bound.

interval_upper

Interval upper bound.

pgs_other_metrics

A table of other metrics that are neither effect sizes nor classification metrics. Examples include: R² (proportion of the variance explained), or reclassification metrics. Columns:

ppm_id: A PGS Performance Metrics identifier. Example: "PPM000001".

other_metrics_id

Other metric identifier. This is a surrogate identifier to identify each metric.

estimate_type_long

Long notation of the metric. Example: "Proportion of the variance explained".

estimate_type

Short notation metric. Example: "R²".

estimate

The estimate's value.

unit

Unit of the estimate.

variability_type

Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).

variability

The value of the measure of dispersion.

interval_type

Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).

interval_lower

Interval lower bound.

interval_upper

Interval upper bound.