Frequently Asked Questions

General

How to access PGS Catalog data?

PGS Catalog metadata (i.e. traits, authors, methods, performance metrics, cohort, etc...) is available through:

On the web interface, separate pages are available for each Score, Trait and Publication so that each of these can be explored individually. It works the same way in the REST API with separated endpoints for score, trait and publication.

PGS Catalog scoring files (i.e. variants, weights, etc...) are available in the PGS Catalog FTP, however links to these files are available in the web interface and REST API.

How to cite the PGS Catalog?

The PGS Catalog development is led by Samuel Lambert under the supervision of Michael Inouye (University of Cambridge & Baker Institute) in collaboration with Health Data Research - UK (Laurent Gil) and the EBI Samples, Phenotypes and Ontologies team / NHGRI-EBI GWAS Catalog (Helen Parkinson, Aoife McMahon, Laura Harris).

The Catalog is under active development, and we continue to add new features and curate new data. If you use the Catalog or Calculator in your research we ask that you cite our below flagship publications:

Samuel A. Lambert, Benjamin Wingfield, Joel T. Gibson, Laurent Gil, Santhi Ramachandran, Florent Yvon, Shirin Saverimuttu, Emily Tinsley, Elizabeth Lewis, Scott C. Ritchie, Jingqin Wu, Rodrigo Canovas, Aoife McMahon, Laura W. Harris, Helen Parkinson, Michael Inouye

Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization

Nature Geneticsdoi: 10.1038/s41588-024-01937-x (2024).


Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur, Michael Inouye

The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation

Nature Genetics volume 53, pages420–425doi: 10.1038/s41588-021-00783-5 (2021).

Individual PGS obtained from the database should also be cited appropriately, and used in accordance with any licensing restrictions set by the authors (see our Terms of Use for more information).

How to submit data to the PGS Catalog?

Published or preprinted polygenic score data meeting our inclusion criteria can be indexed in the Catalog. Please see the About page for a guide for authors looking to submit their data to the Catalog.
Pre-publication data can also be submitted and embargoed until publication to meet journal requirements.

What are the terms and conditions to use the PGS Catalog data?

The PGS Catalog and all its contents are made available through the standard EMBL-EBI terms of use.
Some scores have a known specific license (e.g. Creative Commons or specific non-commercial terms). These licenses are listed in the scoring file header, API/metadata and web display.

Searching the PGS Catalog

How to search the PGS Catalog?

Type your query, e.g. “breast carcinoma”, into the search box and hit return or click the search icon . You can type any text you wish into the search bar.

Note 1: The search is case insensitive, e.g. typing breast cancer or Breast Cancer will return the same results.
Note 2: Autocomplete is available for the trait terms and trait synonyms.

Searching by trait

Here are the trait information that are searchable:
Data typeExample
Ontology identifierEFO_0001645, EFO:0001645
Ontology termcoronary artery disease
Ontology synomymCAD
Ontology mapped termsICD10:I25, OMIM:608901
Trait categoryCardiovascular disease

More information about the Trait data in the PGS Catalog can be found here.

Searching by publication

Here are the publication information that are searchable:
Data typeExample
PGS Catalog Publication identifier (PGP ID)PGP000007
Publication titleGenomic Risk Prediction of Coronary Artery Disease in 480,000 Adults: Implications for Primary Prevention.
Publication authorsInouye
PubMed identifier30309464
DOI10.1016/j.jacc.2018.07.079

More information about the Publication data in the PGS Catalog can be found here.

Searching by score

Here are the score information that are searchable:
Data typeExample
PGS Catalog Score identifier (PGP ID)PGS001530
Score nameGBE_INI4103
Score reported traitSpeed of sound through heel (L)

More information about the Score data in the PGS Catalog can be found here.

How to look at the search results?

The search then returns any Traits (marked with the letter ), Publications () or Scores () in the Catalog that contain a string match within a number of data fields.

By default all the results are showed. However you can use the buttons at the top of the page results to show only Traits, Publications or Scores, e.g.:

All results
37
Scores
15
Traits
11
Publications
11

Each result is displayed as a card, with different information whether it is a Trait, a Publication or a Score, e.g.:

Name: BMINumber of Variants: 122Publication ID: PGP000211
Reported trait: Body mass indexMapped trait(s): body mass index EFO_0004340

body mass index

EFO_0004340
Body measurement
An indicator of body density as determined by the relationship of BODY WEIGHT to BODY HEIG... HT. BMI=weight (kg)/height squared (m2). BMI correlates with body fat (ADIPOSE TISSUE). Their relationship varies with age and gender. For adults, BMI falls into these categories: below 18.5 (underweight); 18.5-24.9 (normal); 25.0-29.9 (overweight); 30.0 and above (obese). (National Center for Health Statistics, Centers for Disease Control and Prevention)Show more
Associated PGS 17 Show PGS
Song M et al. (2017) - DiabetesPMID:29212779doi:10.2337/db17-1156PGP000021
PGS developed 1 - PGS evaluated 1 Show PGS

The buttons "Show PGS" display the list of Polygenic Score(s) associated with the Trait or the Publication.

Trait example:
PGS IDPGS NameReported Trait
PGS000002PRS77_ERposER-positive Breast Cancer
PGS000005PRS313_ERposER-positive Breast Cancer
PGS000008PRS3820_ERposER-positive Breast Cancer
PGS000046BCPRS_ER+Estrogen receptor [ER]-positive breast cancer
PGS000347PRS287_ERposEstrogen receptor positive breast cancer
PGS000774PRS179_ERposEstrogen receptor positive breast cancer
Publication example:
PGS IDPGS NameReported TraitDevelopedEvaluated
PGS000004PRS313_BCBreast Cancer
PGS000007PRS3820_BCBreast Cancer
PGS000001PRS77_BCBreast Cancer-
PGS000006PRS313_ERnegER-negative Breast Cancer
PGS000009PRS3820_ERnegER-negative Breast Cancer
PGS000003PRS77_ERnegER-negative Breast Cancer-
PGS000005PRS313_ERposER-positive Breast Cancer
PGS000008PRS3820_ERposER-positive Breast Cancer

PGS scoring files

For further information about the PGS Catalog scoring files (e.g. file format), please look at the documentation in the Download page.

How to download the PGS scoring files?

There are different ways to download the PGS scoring files:
Note: The web interface and the Score endpoints in the PGS Catalog REST API provide the full URLs of the scoring files.
e.g. with the REST API: https://www.pgscatalog.org/rest/score/PGS000001
{...
  "ftp_scoring_file": "https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/PGS000001.txt.gz",
...}

How to calculate polygenic scores using PGS scoring files?

We wrote pgsc_calc: a reproducible workflow to calculate both PGS Catalog and custom polygenic scores. The workflow automates PGS downloads from the Catalog, reading custom scoring files, variant matching between scoring files and target genotyping samplesets, and the parallel calculation of multiple PGS. See the full documentation here.

Toubleshooting

Why do the FTP downloads (scoring files, metadata) stall or fail?

By default the download of the metadata and scoring files uses the protocol HTTPS, e.g.:

https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/PGS000001.txt.gz

However our FTP server is having some intermittent issues with the HTTPS protocol.

If the downloads don't work with HTTPS, we recommend to replace it by one of the following protocols: