Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label data frame. Show all posts
Showing posts with label data frame. Show all posts

Tuesday, April 1, 2014

Create Empty Data Frame in R with Specified Dimensions

Sometimes it is necessary to create an empty data frame in R to fill with output.  An example would be output from a for loop that loops over SNPs and calculates an association p-value to fill into the empty data frame.  The matrix functionality in R makes this easy to do.  The code below will create a 100 by 2 matrix and convert it into a dataframe with 100 rows and 2 column variables.  The nrow and ncol options are what designate the size of the resulting data frame.  The data frame is filled with NA values so that missing values are handled appropriately.  Below is some example code:

Tuesday, January 14, 2014

Remove Rows with NA Values From R Data Frame

Rows with NA values can be a pesky nuisance when trying to analyze data in R. Here is a short primer on how to remove them.

There are two primary options when getting rid of NA values in R, the na.omit/is.na commands and the complete.cases command.  Both are part of the base stats package and require no additional library or package to be loaded.  Below are examples of how the two work with a data frame called data and a variable called var.


The na.omit/is.na commands work as follows:
na.omit(data) - will only select rows with complete data in all columns
data[rowSums(is.na(data[,c(2,3,5)]))==0,] - will only select rows with complete data in columns 2, 3, and 5
var[!is.na(var)] - will only select values of a variable not equal to NA


The complete.cases command works as follows:
data[complete.cases(data),] - will only select rows with complete data in all columns
data[complete.cases(data[,c(2,3,5)]),] - will only select rows with complete data in columns 2, 3, and 5
var[complete.cases(var)] - will only select values of a variable not equal to NA


I use both commands at times, but ultimately prefer the complete.cases command for the cleaner syntax and generalizability.  Hope this helps you remove those NA's from your data.  If you have additional tips or questions please leave a comment below.