PGS Catalog Downloads

This page contains information about the PGS Catalog downloads and ftp.

PGS Scoring File Downloads

The PGS are held on a ftp server (ftp://ftp.ebi.ac.uk/pub/databases/spot/pgs/) for download. Each score file is formatted to be a gzipped tab-delimited text file, labelled by its PGS Catalog Score ID (e.g. PGS000001.txt.gz). In the current catalog the scores have been extracted from the relevant publication and only been reformatted to have a consistent header (lines starting with #) listing relevant PGS Catalog information (e.g. PGS ID, genome build (if known), number of variants present, reported trait, the original publication's doi), and edited to have consistent column headings based on the following schema:

Column HeaderField NameField Description
rsIDdbSNP Accession ID (rsID) The SNP’s rs ID
chr_nameLocation - Chromosome Chromosome name/number associated with the variant
chr_positionLocation - Position within the ChromosomeChromosomal position associated with the variant
effect_alleleEffect AlleleThe allele that is counted (e.g. dosage {0, 1, 2}) and multiplied by the weight
reference_alleleReference AlleleThe other allele(s) at the loci
effect_weightVariant WeightValue of the effect that is multiplied with the dosage.
weight_typeType of WeightWhether the author supplied Variant Weight is a: beta (effect size), or something like an OR/HR (odds/hazard ratio)
allelefrequency_effectEffect Allele FrequencyReported effect allele frequency, if the associated locus is a haplotype then haplotype frequency will be extracted.
is_interactionFLAG: InteractionThis is a TRUE/FALSE variable that flags whether the weight should be multiplied with the dosage of more than one variant. Interactions are demarcated with a _x_ between entries for each of the variants present in the interaction.
is_recessiveFLAG: Recessive Inheritance ModelThis is a TRUE/FALSE variable that flags whether the weight should be added to the PGS sum only if there are 2 copies of the effect allele (e.g. it is a recessive allele).
is_haplotype
is_diplotype
FLAG: Haplotype or DiplotypeThis is a TRUE/FALSE variable that flags whether the effect allele is a haplotype/diplotype rather than a single SNP. Constituent SNPs in the haplotype are semi-colon separated.
imputation_methodImputation MethodThis described whether the variant was specifically called with a specific imputation or variant calling method. This is mostly kept to describe HLA-genotyping methods (e.g. flag SNP2HLA, HLA*IMP) that gives alleles that are not referenced by genomic position.
locus_nameLocus NameThis is kept in for loci where the variant may be referenced by the gene (APOE e4). It is also common (usually in smaller PGS) to see the variants named according to the genes they impact.
variant_descriptionVariant DescriptionThis field describes any extra information about the variant (e.g. how it is genotyped or scored) that cannot be captured by the other fields.
inclusion_criteriaScore Inclusion CriteriaExplanation of when this variant gets included into the PGS (e.g. if it depends on the results from other variants).

Bulk Catalog Downloads

Downloads of the full PGS Catalog and annotations are currently under development.