For loops in R are useful tidbits of code to keep track of a counter, iterate through a variable, or do a complex operation on a subset of variables. Below is an example R script showing the code and syntax needed to do some simple tasks with R for loops.
A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label code. Show all posts
Showing posts with label code. Show all posts
Tuesday, November 4, 2014
Friday, October 31, 2014
Import Variable into R Script
I always forget the syntax for defining a variable to read into an R script. It is relatively easy to import an external variable into R (such as a Unix variable or user-defined variable). The option to feed R a variable is incredibly useful on a cluster system where you want to farm out a large job into smaller sized pieces. Here is example code for doing so:
Monday, July 7, 2014
Creating and Accessing SQL Databases with Python
Python has numerous ways of outputting and storing data. Recently, I investigated using shelves in Python. This is a way to store indexed data that uses command syntax similar to that of Python dictionaries, but I found it too time consuming to create shelves of large datasets. Searching for a way to efficiently build databases in Python, I came across the SQL functionality. The library sqlite3 is an indispensable way to create databases in Python. This package permits Python users to create and query large databases using syntax borrowed from SQL. SQL stands for Structured Query Language and is used for managing data held in a relational database management system. The sqlite3 is a nonstandard variant of SQL query language that is compliant with the DB-API 2.0 specification. As a quick reference, I thought I would create an example script that could be used to build a SQL database using the Python programming language. Below is a simple tutorial to follow that hopefully is useful for learning how to use the sqlite3 package.
Wednesday, March 26, 2014
Convert Word Document Field Codes into Formatted Text
Reference management software such as EndNote, Mendeley, etc. are great time savers when inserting citations in a manuscript typed in Microsoft Word. Sometimes it is necessary to modify or remove the field codes these programs place in a document. Situations include the need to edit some of the fields or submit a text-only article to a journal. In these instances, these fields need to be removed and replaced with the appropriately formatted text. How is this done? Its incredibly easy...as long as you know the keyboard shortcut. Here are the two simple steps:
(1) Select the text you want to remove the field codes from. This can be done by highlighting a section of interest or pressing Ctrl + A if you want to replace the field codes in the entire document.
(2) Press Ctrl + Shift + F9. This is the actual step that converts field codes into formatted text.
That's it. You're done! All your MS Word field codes in your .doc file should now be removed and the appropriate formatted text should be inserted in their place. Hope this works for you as easily as it did for me. If you find this post particularly helpful, please help me out by clicking the +1 link on the bottom of the post.
(1) Select the text you want to remove the field codes from. This can be done by highlighting a section of interest or pressing Ctrl + A if you want to replace the field codes in the entire document.
(2) Press Ctrl + Shift + F9. This is the actual step that converts field codes into formatted text.
That's it. You're done! All your MS Word field codes in your .doc file should now be removed and the appropriate formatted text should be inserted in their place. Hope this works for you as easily as it did for me. If you find this post particularly helpful, please help me out by clicking the +1 link on the bottom of the post.
Tuesday, March 25, 2014
Calculate P-value for Linear Mixed Model in R
The lme4 R package is a powerful tool for fitting mixed models in R where you can specify fixed and random effects. One oddity about the program is it returns t statistics, but no p-value. To get the p-value takes a little extra coding. Here is a quick example for fitting a linear mixed model in R (using lmer) and then the added code to calculate p-values from the t statistic. Either p1 or p2 are acceptable p-values to use.
Wednesday, January 29, 2014
Bar Plot with 95% Conficence Interval Error Bars in R
R is a great plotting program for making publication quality bar plots of your data. I often need to make quick bar plots to help collaborators quickly visualize data, so I thought I would put together a very generalized script that can be modified and built on as a template to make future bar plots in R.
In this script you manually enter your data (as you would see it in a 2x2 table) and then calculate the estimated frequency and 95% CI around that frequency using the binom.confint function in the binom package. Next, a parametric and non-parametric p-value is calculated with the binom.test and fisher.test commands, respectively. These statistics are then plotted using the R's barplot function. The example code is below along with what the plot will look like. P-values and N's are automatically filled and the y limits are calculated to ensure the graph includes all the plotted details.
In this script you manually enter your data (as you would see it in a 2x2 table) and then calculate the estimated frequency and 95% CI around that frequency using the binom.confint function in the binom package. Next, a parametric and non-parametric p-value is calculated with the binom.test and fisher.test commands, respectively. These statistics are then plotted using the R's barplot function. The example code is below along with what the plot will look like. P-values and N's are automatically filled and the y limits are calculated to ensure the graph includes all the plotted details.
Friday, January 24, 2014
15K and Growing
Today marks the 15,000 page view milestone for Genome Toolbox since its beginnings on May 1, 2013. Its great to see so much interest and an encouragement to keep posting new tips I find useful. Keep coming back for more to come!
Thursday, January 16, 2014
Remove Outliers from R Variable
R boxplot is an easy function to visualize a variable and get a sense of the distribution of values as well as potential outlier data points that may exist. By saving the output to a variable (ex: bxplt <- boxplot(data$expression)), you can see a list of outlier points (ex: bxplt$out) that you may wish to exclude from an analysis. The method R uses to identify extreme values is to calculate 1.5 times the interquartile range (ie: third quartile minus first quartile) and create limits by subtracting this value from the first quartile and adding it to the third quartile. Any point that is less than the smaller limit or greater than the larger limit is considered an outlier by this method. Here is a simple function I created to remove outliers from an R variable, the script essentially removes outliers identified by the boxplot function by replacing outlier values with NA and returning this modified variable for analysis. So, for example, if you wanted to find the mean of data$expression with outliers removed all you would need to do is first run the below function and then use the command mean(ro(data$expression)). Overall, a pretty simple way to remove out outliers if you do indeed choose to do so.
Wednesday, January 15, 2014
Syntax for a User-Defined R Function
R functions are incredibly handy ways to have R carry out repetitive tasks for you without having to copy and paste lines in your code over and over again. Since I don't daily write new R functions, I sometimes forget what the syntax is to create these custom functions. Here's an example function I made to tell if a number is even or odd:
To run a custom R function simply use the function name with all the needed input variable in parenthesis (ex: even_num(413)).
To run a custom R function simply use the function name with all the needed input variable in parenthesis (ex: even_num(413)).
Add an Overall Title to an Array of R Plots
Adding a top-level title to a group of R plots is relatively easy...as long as you know the correct commands and sequence to put them in. Here is some quick code to place a title on the top of multiple plots in R:
Note: The command to include an overall array title needs to be at the end of the code after the other plots have been generated.
Note: The command to include an overall array title needs to be at the end of the code after the other plots have been generated.
Thursday, December 12, 2013
Sum Overlapping Base Pairs of Features from Chromosomal BED File
I had a .bed file of genomic features on a chromosome that I wanted to figure out the extent of overlap of the features to investigate commonly covered genes as well as positions where features were likely to form. I wanted to generate a plot similar to a coverage depth plot from next-generation sequencing reads. I am sure more efficient methods exist, but here is some Python code that takes in a .bed file of features (features.bed) and creates an output file (features.depth) with the feature overlap "depth" every 5,000 base pairs across the areas which contain features in your chromosomal .bed file.
Thursday, July 18, 2013
What Are SNP Ambiguity Codes and What Do They Mean?
That's a good question. In fact one that I had myself. Here's what I found:
Apparently single nucleotide polymorphism (SNP) ambiguity codes were constructed by the International Union of Pure and Applied Chemistry (IUPAC) to denote nucleotide changes in SNPs. Here is a table of the meaning of each code taken from the ENSEMBLE SNPView website.
Apparently single nucleotide polymorphism (SNP) ambiguity codes were constructed by the International Union of Pure and Applied Chemistry (IUPAC) to denote nucleotide changes in SNPs. Here is a table of the meaning of each code taken from the ENSEMBLE SNPView website.
IUPAC Code | Mnemonic | Meaning | Complement |
---|---|---|---|
A | Adenine | A | T |
C | Cytosine | C | G |
G | Guanine | G | C |
T/U | Thymidine | T | A |
K | Keto | G or T | M |
M | Amino | A or C | K |
S | Strong | C or G | S |
W | Weak | A or T | W |
R | Purine | A or G | Y |
Y | Pyrimidine | C or T | R |
B | not A | C, G, or T | V |
D | not C | A, G, or T | H |
H | not G | A, C, or T | D |
V | not T or U | A, C, or G | B |
N | any | G, A, T, or C | N |
Friday, July 5, 2013
Best Notepad++ Alternatives for Mac OS
I ❤ Notepad++. Its a powerful, fully-loaded, and free text editing application that has been an invaluable tool for writing code in a variety of programming languages. The only caveat: it's only available for Windows operating systems. With the acquisition of my shiny, new Macbook Pro, I was incredibly disappointed to find out Notepad++ could not be installed on Macs; so much so I almost returned the Macbook. Since I couldn't find a better laptop to meet my needs (and aesthetic desires), the quest has begun to find a comparable and preferably free text editor that runs on a Mac operating system. I was surprised to find the list of candidates quite long. Here are options I found, unfortunately not all options are free:
Komodo Edit
BBedit ($50)
Coda ($75)
Crossover (Windows emulator, $60) + Notepad++
Espresso ($75)
jEditKomodo Edit
TextEdit (the basic text editor pre-loaded on your Mac)
TextMate (€39 or about $53)
TextWrangler (free lite version of BBedit)
Smultron ($5)
SubEthaEdit (€29, or about $43)
Sublime ($70)
Tincta (free, Pro version for $16)
WINE (Windows emulator) + Notepad++
Apparently the market is saturated with Notepad++ "replacement" text editors for Macs. The predominant text editors most recommended online are highlighted in bold. While looking into the options, it became apparent there really is no one best Notepad++ replacement text editor. It all really depends on what the user is using Notepad++ for and the options they need it to do (plus a bit of personal preference in user interface). I have tried a few of the above options and am still not completely satisfied. I am secretly hoping the folks responsible for Notepad++ are cooking up a way to install it on Macs. The emulator approach to installing Notepad++ on a Mac also seems interesting. I will have to try it when I have some free time. In the meantime, I am curious what has been working best for you other Notepad++ lovers who have made the switch to a Mac. Also, if you are aware of other text editors not mentioned here, please share!
TextWrangler (free lite version of BBedit)
Smultron ($5)
SubEthaEdit (€29, or about $43)
Sublime ($70)
Tincta (free, Pro version for $16)
WINE (Windows emulator) + Notepad++
Apparently the market is saturated with Notepad++ "replacement" text editors for Macs. The predominant text editors most recommended online are highlighted in bold. While looking into the options, it became apparent there really is no one best Notepad++ replacement text editor. It all really depends on what the user is using Notepad++ for and the options they need it to do (plus a bit of personal preference in user interface). I have tried a few of the above options and am still not completely satisfied. I am secretly hoping the folks responsible for Notepad++ are cooking up a way to install it on Macs. The emulator approach to installing Notepad++ on a Mac also seems interesting. I will have to try it when I have some free time. In the meantime, I am curious what has been working best for you other Notepad++ lovers who have made the switch to a Mac. Also, if you are aware of other text editors not mentioned here, please share!
Thursday, May 2, 2013
GibHub
Starting this blog, I found it difficult to share snippets of code easily through the blogger interface. I stumbled across an easy tool to help do this: GitHub. Its a pretty easy-to-use interface that allows you to paste your code into a text box, select which coding language you are using, and then creates an easy one line link you can embed into your Blogger blog. It even highlights certain commands and syntax that are relevant to that coding language. Pretty slick.
Subscribe to:
Posts (Atom)