Rows with NA values can be a pesky nuisance when trying to analyze data in R. Here is a short primer on how to remove them.
There are two primary options when getting rid of NA values in R, the na.omit/is.na commands and the complete.cases command. Both are part of the base stats package and require no additional library or package to be loaded. Below are examples of how the two work with a data frame called data and a variable called var.
The na.omit/is.na commands work as follows:
na.omit(data) - will only select rows with complete data in all columns
data[rowSums(is.na(data[,c(2,3,5)]))==0,] - will only select rows with complete data in columns 2, 3, and 5
var[!is.na(var)] - will only select values of a variable not equal to NA
The complete.cases command works as follows:
data[complete.cases(data),] - will only select rows with complete data in all columns
data[complete.cases(data[,c(2,3,5)]),] - will only select rows with complete data in columns 2, 3, and 5
var[complete.cases(var)] - will only select values of a variable not equal to NA
I use both commands at times, but ultimately prefer the complete.cases command for the cleaner syntax and generalizability. Hope this helps you remove those NA's from your data. If you have additional tips or questions please leave a comment below.
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.