Genome Toolbox: Estimate Combined Percent Familial Risk Explained by GWAS Loci

Wednesday, March 12, 2014

Estimate Combined Percent Familial Risk Explained by GWAS Loci

Most current genome-wide association studies (GWAS) include a calculation of the percent familial risk the discovered loci explain. This calculation indicates how much of the familial risk can be accounted for by the known loci and is usually used as evidence there are additional yet undetected loci that remain to be discovered. Looking through references, it can be a bit difficult to find exactly how this calculation is performed. One reference I found that includes a formula for the calculation is by Cox et al. 2007 (PMID:17293864), but I am sure there are plenty others that also include a formula. While the Cox et al. formula for calculating familial relative risk due to each locus is arranged differently than the ones below, the two are equivalent. I just find this arrangement less cumbersome to use. The overall equation is to compare the cumulative risk of the known loci (sum of log lambda k) to the estimated risk of a first degree relative (log lambda 0).

where:

p is the risk allele frequency for locus k

r is the per allele odds ratio for locus k.

To make calculations easy, I made a simple R script that does all the calculations automatically. The input for the script is a file with 3 columns:

(1) Annotation for the SNP - this can be anything, for example: RS number, chromosomal coordinates, etc.
(2) Risk allele frequency - this is the frequency of the risk allele (range: 0-1) equal to p in the above equation.
(3) Per allele odds ratio - odds ratio for every one unit increase in the number of risk alleles.

Note, the risk allele frequency is the frequency of the risk allele and not the minor allele frequency. The program also needs an estimate of the familial relative risk (lambda 0). This can usually be done by looking for previous familial studies for the disease.

Here is the R script:

It can be run from the command line by the example command:

Rscript familial_risk_snps.R snp_lst.txt 4

where:
familial_risk_snps.R is the name of the script.
snp_lst.txt is the input file with three columns described above.
4 is the estimate of the familial relative risk of the disease.

1 comment:

NussbaumOctober 29, 2017 at 11:50 AM
I am having difficulty deriving the original formula used by Cox et al. or your arithmetically equivalent version given here lambda due to allele k to overall familial risk. Iam wondering how one might explain how to derive this formula from a simple pedigree assuming one affected individual and a first degree relative
ReplyDelete
Replies

Add comment

Pages

Wednesday, March 12, 2014

Estimate Combined Percent Familial Risk Explained by GWAS Loci

1 comment: