The scores object consists of six tables (slots) that combined form a
relational database of a subset of PGS Catalog polygenic scores. Each score
is an observation (row) in the scores
table (the first table).
scores
A table of polygenic scores. Each polygenic score (row) is
uniquely identified by the pgs_id
column. Columns:
Polygenic Score (PGS) identifier. Example: "PGS000001"
.
This may be the name that the authors describe the PGS with
in the source publication, or a name that a curator of the PGS Catalog has
assigned to identify the score during the curation process (before a PGS
identifier has been given). Example: PRS77_BC
.
URL to the scoring file on the PGS FTP server. Example:
"http://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/PGS000001.txt.gz"
.
Indicate if the PGS data matches the published
polygenic score (TRUE
). If not (FALSE
), the authors have
provided an alternative polygenic for the Catalog and some other data, such
as performance metrics, may differ from the publication.
The author-reported trait that the PGS has been
developed to predict. Example: "Breast Cancer"
.
Any additional description not captured
in the other columns. Example: "Femoral neck BMD (g/cm2)"
.
The name or description of the method or computational algorithm used to develop the PGS.
A description of the relevant inputs and parameters relevant to the PGS development method/process.
Number of variants used to calculate the PGS.
Number of higher-order variant interactions included in the PGS.
The version of the genome assembly that the variants present
in the PGS are associated with. Example: GRCh37
.
The PGS Catalog distributes its data according to EBI's standard Terms of Use. Some PGS have specific terms, licenses, or restrictions (e.g. non-commercial use) that we highlight in this field, if known.
publications
A table of publications. Each publication (row) is
uniquely identified by the pgp_id
column. Columns:
Polygenic Score (PGS) identifier.
PGS Publication identifier. Example: "PGP000001"
.
PubMed
identifier. Example: "25855707"
.
Publication date. Example: "2020-09-28"
. Note
that the class of publication_date
is Date
.
Abbreviated name of the journal. Example: "Am J Hum
Genet"
.
Publication title.
First author of the publication. Example:
'Mavaddat N'
.
Digital Object Identifier (DOI). This variable is also curated to
allow unpublished work (e.g. preprints) to be added to the catalog. Example:
"10.1093/jnci/djv036"
.
samples
A table of samples. Each sample (row) is uniquely identified by
the combination of values from the columns: pgs_id
and
sample_id
. Columns:
Polygenic score identifier. An identifier that starts with
'PGS'
and is followed by six digits, e.g. 'PGS000001'
.
Sample identifier. This is a surrogate key to identify each sample.
Sample stage: either "discovery"
or "training"
.
Number of individuals included in the sample.
Number of cases.
Number of controls.
Percentage of male participants.
Detailed phenotype description.
Author reported ancestry is mapped to the best matching
ancestry category from the NHGRI-EBI GWAS Catalog framework (see
ancestry_categories
) for possible values.
A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).
Author reported countries of recruitment (if available).
Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).
Associated GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.
PubMed identifier.
Any additional description about the samples (e.g. sub-cohort information).
demographics
A table of sample demographics' variables. Each
demographics' variable (row) is uniquely identified by the combination of
values from the columns: pgs_id
, sample_id
and
variable
. Columns:
Polygenic Score (PGS) identifier.
Sample identifier. This is a surrogate identifier to identify each sample.
Demographics variable. Following columns report about the indicated variable.
Type of statistical estimate for variable.
The variable's statistical value.
Unit of the variable.
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
The value of the measure of dispersion.
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
Interval lower bound.
Interval upper bound.
cohorts
A table of cohorts. Each cohort (row) is uniquely identified by
the combination of values from the columns: pgs_id
, sample_id
and cohort_symbol
. Columns:
Polygenic Score (PGS) identifier.
Sample identifier. This is a surrogate key to identify each sample.
Cohort symbol.
Cohort full name.
traits
A table of EFO traits. Each trait (row) is uniquely identified
by the combination of the columns pgs_id
and efo_id
. Columns:
Polygenic Score (PGS) identifier.
An EFO identifier.
Trait name.
Detailed description of the trait from EFO.
External link to the EFO entry.
stages_tally
A table of sample sizes and number of samples sets at each stage.
Polygenic Score (PGS) identifier.
Sample stage: either "gwas"
, "dev"
or "eval"
.
Sample size.
Number of sample sets (only meaningful for the evaluation stage "eval"
)
ancestry_frequencies
This table describes the ancestry composition at each stage.
Polygenic Score (PGS) identifier.
Sample stage: either "gwas"
, "dev"
or "eval"
.
Ancestry class symbol.
Ancestry fraction (percentage).
multi_ancestry_composition
A table of a breakdown of the ancestries included in multi-ancestries.
Polygenic Score (PGS) identifier.
Sample stage: either "gwas"
, "dev"
or "eval"
.
Multi-ancestry class symbol.
Ancestry class symbol.