Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Tuesday, January 28, 2014

Fisher Exact Test on 2 by N Table in R

Fisher exact tests are non-parametric tests of association that are usually recommended when cell counts fall below 5, since distributional assumptions of parametric tests typically do not hold in these scenarios.  Here are quick code snippets of how to set up a 2 by N table in R and conduct a Fisher exact test (fisher.test) on the table.

First off, the simplest scenario is a 2 by 2 table.  The first example will show how to manually set up a 2x@ table and then run the Fisher exact test.  The code generates a p-value and 95% confidence interval testing whether the observed odds ratio differs from the hypothesized odds ratio of 1.

Alternately, one can also use the table command in R to circumvent the first step above and then apply the fisher.test command on the output from the R table output. Below is an example.


In addition, the fisher.test procedure can be expanded to 2xN tables by using a hybrid approximation of the exact distribution.  The command follows that used above and requires the additional option of hybrid=TRUE to indicate R should use the exact approximation.  Without this option you will get the error:

Error in fisher.test(a) : FEXACT error 7.
LDSTP is too small for this problem.
Try increasing the size of the workspace.

 See the example below.


**Note: This hybrid option does not seem to work on all versions of R.  If you still get the above error after using the hybrid option on a 2 x N contingency table, try using another version.  For example, on my PC R 3.0.1(x64) does not work, but R 2.15.1(i386) does work.  This may have something to do with the default memory settings for each version.  Know of other versions that work/don't work, please comment below.

A work around to the hybrid method not working on 2xN contingency tables would be to use the built in simulation abilities of the fisher.test function to get an estimated p-value.  This uses a Monte Carlo simulation and as with all simulations will be closer to the expected p-value with more and more iterations of the simulation.  To do use this simulated p-value approach, you need to set the option simulate.p.value=T and define B equal to the number of simulations you wish to conduct.  I recommend 1e7 as a good starting point, which should take about a minute or so depending on the speed of your machine.  Below is example code to follow.

No comments:

Post a Comment