Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Tuesday, March 15, 2016

R Function to Calculate Confidence Interval from Odds Ratio and P-value

Every once in a while I need a confidence interval from a published estimate that only reports the estimated odds ratio and respective p-value. With a little bit of algebra it is simple to build a function in R to calculate a confidence interval. Below is a short, easy example script.

Friday, January 29, 2016

Normal Distribution Functions in R

I always need to look up how to use the distributional functions in R. Rather than it always being a guessing game I made a quick primer with visual example plots of what each command means and the results each command actually returns.

The below examples assume a N(0,1). A link to the R manual is here.

dnorm-returns the height of the normal curve at a specified value along the x-axis
pnorm-the cumulative density function (CDF) that returns the area to the left of a specified value
qnorm-returns quantiles or "critical values"
rnorm-generates random numbers from the normal distribution
These are examples for the normal distribution, but as you could imagine R has commands for numerous other distributions such as the chi-square, beta, uniform, and poisson.

If you are curious, here are the commands I used to plot these figures:

Tuesday, May 12, 2015

Convert Protein Codon to Genome Coordinates

I needed a way to check if a few codons from different proteins were covered in a next-generation sequencing panel. This sounds relatively easy to do, but proves to be a bit difficult. Here are steps to do this.

(1) Use Biomart ID converter to find the Ensembl protein ID for your protein of interest.
(2) Use Ensembl GET map/translation/:id/:region to find the genomic coordinates of the codon of interest using the following script:

ENSP00000288602 is the Ensembl protein ID for your protein of interest (example: BRAF gene)
100..100 are the start and stop codons (example: just codon 100)

The result is a JSON formatted string like this:
{"mappings":[{"assembly_name":"GRCh38","end":140834815,"seq_region_name":"7","gap":0,"strand":-1,"coord_system":"chromosome","rank":0,"start":140834813}]}

This indicates that codon 100 of the BRAF gene (for this protein transcript) is located at chr7:140834813-140834815. Ensembl uses GRCh38. If you need other builds of the genome, use liftOver for converting.

I am sure there are probably more automated ways out there to do this, but this worked for the small subset of codons I needed to check in the design panel. If you have a better way to do this, please share in the comments section.

Thursday, May 7, 2015

Break Age Variable into Age Groups in R

I recently found the cut function in R as a useful resource to divide a variable into groups. This is really handy for dividing age into age groups, but can also be used for a variety of other variable types.

Friday, November 21, 2014

Convert 2 by 2 Contingency Table to Data Frame in R

Contingency tables are useful short hand ways of inputting and visualizing data. I have yet to find an easy way to convert between contingency tables and data frames in R. Below is a short script in which I input a contingency table, create a function to convert the 2 by 2 table to a data frame, and convert the data frame back to a table. These operations are useful for running some statistical operations that either only work on tables or only work on data frames. Hope the below example is useful.

Tuesday, November 4, 2014

Fix Origin of R Plot Axes at Zero

Standard R plots does not set the origin of the x and y axis at zero. To reset this parameter so that the origin of the plot is fixed at 0,0 simply use the xaxs and yaxs parameters. Here is a description from the R graphical parameters help page.

The style of axis interval calculation to be used for the x-axis. Possible values are "r", "i", "e", "s", "d". The styles are generally controlled by the range of data or xlim, if given.
Style "r" (regular) first extends the data range by 4 percent at each end and then finds an axis with pretty labels that fits within the extended range.
Style "i" (internal) just finds an axis with pretty labels that fits within the original data range.
Style "s" (standard) finds an axis with pretty labels within which the original data range fits.
Style "e" (extended) is like style "s", except that it is also ensures that there is room for plotting symbols within the bounding box.
Style "d" (direct) specifies that the current axis should be used on subsequent plots.
(Only "r" and "i" styles have been implemented in R.)

Finally, here is a brief example code snippet to demonstrate how the syntax works.

R Syntax for a Simple For Loop

For loops in R are useful tidbits of code to keep track of a counter, iterate through a variable, or do a complex operation on a subset of variables.  Below is an example R script showing the code and syntax needed to do some simple tasks with R for loops.