I had a TCGA barcode that I wanted to extract information about the sample type (ie: cut out characters 14-15). This would have been easy to do in Python (ex: id[13:15] or id.split("-")[3][0:2]), but I wanted to be able to do this inside R. To do this I found a handy little base function called substr. This is a function that allows you to take a subset of a string. Here is the code to extract characters 14-15 from a string:
substr(x=id, start=14, stop=15)
or also from a variable called ids:
substr(x=data$ids, start=14, stop=15)
A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment