Sometimes it is necessary to create an empty data frame in R to fill with output. An example would be output from a for loop that loops over SNPs and calculates an association p-value to fill into the empty data frame. The matrix functionality in R makes this easy to do. The code below will create a 100 by 2 matrix and convert it into a dataframe with 100 rows and 2 column variables. The nrow and ncol options are what designate the size of the resulting data frame. The data frame is filled with NA values so that missing values are handled appropriately. Below is some example code:
A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label data frame. Show all posts
Showing posts with label data frame. Show all posts
Tuesday, April 1, 2014
Tuesday, January 14, 2014
Remove Rows with NA Values From R Data Frame
Rows with NA values can be a pesky nuisance when trying to analyze data in R. Here is a short primer on how to remove them.
There are two primary options when getting rid of NA values in R, the na.omit/is.na commands and the complete.cases command. Both are part of the base stats package and require no additional library or package to be loaded. Below are examples of how the two work with a data frame called data and a variable called var.
The na.omit/is.na commands work as follows:
na.omit(data) - will only select rows with complete data in all columns
data[rowSums(is.na(data[,c(2,3,5)]))==0,] - will only select rows with complete data in columns 2, 3, and 5
var[!is.na(var)] - will only select values of a variable not equal to NA
The complete.cases command works as follows:
data[complete.cases(data),] - will only select rows with complete data in all columns
data[complete.cases(data[,c(2,3,5)]),] - will only select rows with complete data in columns 2, 3, and 5
var[complete.cases(var)] - will only select values of a variable not equal to NA
I use both commands at times, but ultimately prefer the complete.cases command for the cleaner syntax and generalizability. Hope this helps you remove those NA's from your data. If you have additional tips or questions please leave a comment below.
There are two primary options when getting rid of NA values in R, the na.omit/is.na commands and the complete.cases command. Both are part of the base stats package and require no additional library or package to be loaded. Below are examples of how the two work with a data frame called data and a variable called var.
The na.omit/is.na commands work as follows:
na.omit(data) - will only select rows with complete data in all columns
data[rowSums(is.na(data[,c(2,3,5)]))==0,] - will only select rows with complete data in columns 2, 3, and 5
var[!is.na(var)] - will only select values of a variable not equal to NA
The complete.cases command works as follows:
data[complete.cases(data),] - will only select rows with complete data in all columns
data[complete.cases(data[,c(2,3,5)]),] - will only select rows with complete data in columns 2, 3, and 5
var[complete.cases(var)] - will only select values of a variable not equal to NA
I use both commands at times, but ultimately prefer the complete.cases command for the cleaner syntax and generalizability. Hope this helps you remove those NA's from your data. If you have additional tips or questions please leave a comment below.
Subscribe to:
Posts (Atom)