Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Wednesday, March 12, 2014

Estimate Combined Percent Familial Risk Explained by GWAS Loci

Most current genome-wide association studies (GWAS) include a calculation of the percent familial risk the discovered loci explain.  This calculation indicates how much of the familial risk can be accounted for by the known loci and is usually used as evidence there are additional yet undetected loci that remain to be discovered.  Looking through references, it can be a bit difficult to find exactly how this calculation is performed.  One reference I found that includes a formula for the calculation is by Cox et al. 2007 (PMID:17293864), but I am sure there are plenty others that also include a formula.   While the Cox et al. formula for calculating familial relative risk due to each locus is arranged differently than the ones below, the two are equivalent.  I just find this arrangement less cumbersome to use.  The overall equation is to compare the cumulative risk of the known loci (sum of log lambda k) to the estimated risk of a first degree relative (log lambda 0).
where:
      p is the risk allele frequency for locus k
      r is the per allele odds ratio for locus k.

To make calculations easy, I made a simple R script that does all the calculations automatically.  The input for the script is a file with 3 columns:

(1) Annotation for the SNP - this can be anything, for example: RS number, chromosomal coordinates, etc.
(2) Risk allele frequency - this is the frequency of the risk allele (range: 0-1) equal to p in the above equation.
(3) Per allele odds ratio - odds ratio for every one unit increase in the number of risk alleles.

Note, the risk allele frequency is the frequency of the risk allele and not the minor allele frequency.  The program also needs an estimate of the familial relative risk (lambda 0).  This can usually be done by looking for previous familial studies for the disease.

Here is the R script:

It can be run from the command line by the example command:

Rscript familial_risk_snps.R snp_lst.txt 4

where:
      familial_risk_snps.R is the name of the script.
      snp_lst.txt is the input file with three columns described above.
      4 is the estimate of the familial relative risk of the disease.

1 comment:

  1. I am having difficulty deriving the original formula used by Cox et al. or your arithmetically equivalent version given here lambda due to allele k to overall familial risk. Iam wondering how one might explain how to derive this formula from a simple pedigree assuming one affected individual and a first degree relative

    ReplyDelete