Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label study. Show all posts
Showing posts with label study. Show all posts

Friday, May 16, 2014

Is there a GWAS on that?


A great online resource to find whether a genome-wide association study (GWAS) has been published on a certain trait or disease is the National Human Genome Research Institute (NHGRI) webpage where they maintain a curated catalog of published GWAS.  Here trained curators are constantly scanning PubMed publications and other genomic resources looking for association studies linking a genomic position (usually a tagging SNP) to a disease trait of interest.  Details listed include study size, population, locus, risk allele, odds ratio, p-value, and other pertinent statistics.  Recently, the NHGRI in collaboration with the European Bioinformatics Institute (EBI) released an interactive version of the GWAS catalog called the GWAS Diagram Browser.  This provides a great way to visualize and filter many of the genome-wide significant findings from genome-wide associations studies.  Highlights include filtering by disease, time series views, and some useful downloads.

Another noteworthy resource is the HuGE Navigator GWAS Interagator.  This is a search tool similar to the NHGRI GWAS catalog, but more focused on a search terms.  Handy links are provided to other resources.  Of particular interest are links to visualize the variants in the UCSC Browser.

I am sure other GWAS resources exist as well, but these were the two main ones that first came to mind for me.  If you know other great GWAS resources capable of linking a genomic marker with a disease, please share in the comments below.

Wednesday, March 12, 2014

Estimate Combined Percent Familial Risk Explained by GWAS Loci

Most current genome-wide association studies (GWAS) include a calculation of the percent familial risk the discovered loci explain.  This calculation indicates how much of the familial risk can be accounted for by the known loci and is usually used as evidence there are additional yet undetected loci that remain to be discovered.  Looking through references, it can be a bit difficult to find exactly how this calculation is performed.  One reference I found that includes a formula for the calculation is by Cox et al. 2007 (PMID:17293864), but I am sure there are plenty others that also include a formula.   While the Cox et al. formula for calculating familial relative risk due to each locus is arranged differently than the ones below, the two are equivalent.  I just find this arrangement less cumbersome to use.  The overall equation is to compare the cumulative risk of the known loci (sum of log lambda k) to the estimated risk of a first degree relative (log lambda 0).
where:
      p is the risk allele frequency for locus k
      r is the per allele odds ratio for locus k.

To make calculations easy, I made a simple R script that does all the calculations automatically.  The input for the script is a file with 3 columns:

(1) Annotation for the SNP - this can be anything, for example: RS number, chromosomal coordinates, etc.
(2) Risk allele frequency - this is the frequency of the risk allele (range: 0-1) equal to p in the above equation.
(3) Per allele odds ratio - odds ratio for every one unit increase in the number of risk alleles.

Note, the risk allele frequency is the frequency of the risk allele and not the minor allele frequency.  The program also needs an estimate of the familial relative risk (lambda 0).  This can usually be done by looking for previous familial studies for the disease.

Here is the R script:

It can be run from the command line by the example command:

Rscript familial_risk_snps.R snp_lst.txt 4

where:
      familial_risk_snps.R is the name of the script.
      snp_lst.txt is the input file with three columns described above.
      4 is the estimate of the familial relative risk of the disease.