Wednesday, January 15, 2014

Split a Variable in R into Components

I had a TCGA barcode that I wanted to extract information about the sample type (ie: cut out characters 14-15).  This would have been easy to do in Python (ex: id[13:15] or id.split("-")[3][0:2]), but I wanted to be able to do this inside R.  To do this I found a handy little base function called substr.  This is a function that allows you to take a subset of a string.  Here is the code to extract characters 14-15 from a string:

substr(x=id, start=14, stop=15)

or also from a variable called ids:

substr(x=data$ids, start=14, stop=15)

