Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label plot. Show all posts
Showing posts with label plot. Show all posts

Friday, January 29, 2016

Normal Distribution Functions in R

I always need to look up how to use the distributional functions in R. Rather than it always being a guessing game I made a quick primer with visual example plots of what each command means and the results each command actually returns.

The below examples assume a N(0,1). A link to the R manual is here.

dnorm-returns the height of the normal curve at a specified value along the x-axis
pnorm-the cumulative density function (CDF) that returns the area to the left of a specified value
qnorm-returns quantiles or "critical values"
rnorm-generates random numbers from the normal distribution
These are examples for the normal distribution, but as you could imagine R has commands for numerous other distributions such as the chi-square, beta, uniform, and poisson.

If you are curious, here are the commands I used to plot these figures:

Tuesday, November 4, 2014

Fix Origin of R Plot Axes at Zero

Standard R plots does not set the origin of the x and y axis at zero. To reset this parameter so that the origin of the plot is fixed at 0,0 simply use the xaxs and yaxs parameters. Here is a description from the R graphical parameters help page.

The style of axis interval calculation to be used for the x-axis. Possible values are "r", "i", "e", "s", "d". The styles are generally controlled by the range of data or xlim, if given.
Style "r" (regular) first extends the data range by 4 percent at each end and then finds an axis with pretty labels that fits within the extended range.
Style "i" (internal) just finds an axis with pretty labels that fits within the original data range.
Style "s" (standard) finds an axis with pretty labels within which the original data range fits.
Style "e" (extended) is like style "s", except that it is also ensures that there is room for plotting symbols within the bounding box.
Style "d" (direct) specifies that the current axis should be used on subsequent plots.
(Only "r" and "i" styles have been implemented in R.)

Finally, here is a brief example code snippet to demonstrate how the syntax works.

Friday, August 22, 2014

Create Multiple Y Axes for an R Plot

R base graphics is a powerful tool for plotting data.  Sometimes it is convenient to visualize combinations of variables on the same plot.  Often different variables require different scales.  This can be facilitated by using different Y axes, such as a plot with a Y 2 axis on the right hand side.  In R adding a Y axis is very easy to do.  Here are two simple example scripts you can use to build off of.  The first two examples are two different ways of showing different scales for the same variable (i.e. temperature in Farenheit and Celsus).  The third example is two different variables overlaid on the same R plot with two Y axes used to show the scales of each variable.


Monday, July 14, 2014

Add Custom BedGraph Track to UCSC Genome Browser

Previously, I made a post on adding custom tracks to the UCSC Genome Broswer and have even expanded on coloring these tracks.  For most of my applications, I have simply used standard .bed files that plot features of interest in the Genome Browser in relation to other UCSC tracks.  Today I wanted to plot data from overlapping features in a depth plot similar to what one would see after next-generation sequencing.  I found the easiest way to do this was by importaing a BedGraph file into the Genome Browser.  Following steps similar to the custom tracks post, you go to the My Data -> Custom Tracks -> Add Custom Tracks and then upload the BedGraph file.  Below is an example of what the header and the first few lines need to look like.


More details and options on the BedGraph track format can be found here on the UCSC webpage.

Wednesday, June 4, 2014

Easy Forest Plots in R


Forest plots are great ways to visualize individual group estimates as well as investigate heterogeneity of effect.  Most forest plot programs will display combined effect estimates and give you an indicator of whether there is evidence for heterogeneity among subgroups.  Fortunately, the R metafor package makes meta-analysis and plotting forest plots relatively easy.  Below is a sample R script that will combine beta and standard error estimates from an input file (input.txt) and create simple forest plots with an overall estimate as well as p-values for association and heterogeneity.

In general, the input.txt file should either

(1) Have the columns:
group - name for the study or group
beta - the log odds ratio for the effect of interest
se - the standard error for the log odds ratio estimate

or

(2) Have the columns:
group - name for the study or group
OR - odds ratio for the effect of interest
LCL - lower confidence interval for the odds ratio
UCL - upper confidence interval for the odds ratio

For the second case where you have an odds ratio and 95% confidence estimates, beta and se need to be estimated.  This is done by uncommenting lines 8 and 9 of the script.  Of note: due to rounding error the final forest plot may have 95% CI limits that are one digit off.

The R script that uses the metafor package as well as an example input.txt are below.




Monday, June 2, 2014

Make Venn Diagram in R with Correctly Weighted Areas

Venn diagrams are incredibly intuitive plots that visually display the overlap between groups.  There are a host of programs out there that make Venn diagrams, but very few actually weight the areas correctly to scale.  In my opinion, this is unacceptable in the 21st century.  I did stumble across one useful R package that calculates the appropriate area for intersections.  It is the Vennerable package.  While documentation and options are rather light for this package, it does what it needs to do: correctly size overlapping regions.

Its a little tricky to install this Venn diagram package.  To do so, follow the below steps:
(1) Type setRepositories() in the R command console.
(2) Select R-forge, rforge.net, CRAN extras, BioC software, and BioC extras in the pop-up window and then press OK.  I just have all of them selected.
(3) Install Vennerable package by typing install.packages("Vennerable", repos="http://R-Forge.R-project.org")
(4) Install the dependencies by typing install.packages(c("graph", "RBGL"), dependencies=TRUE).

This should have you up and running.  The R Venerable help page can be accessed here, but its really not that useful.  Below are some examples you can run as a bit of a primer.  The trickiest thing to learn is how the weights are assigned in the Venn plot.  As a general rule of thumb, if SetNames=c("A","B","C"), then Weight=c(notABC, A, B, AB, C, AC, BC, ABC).  Its a bit frustrating to have no control over general plotting details such as color, label location, and rotation, but I guess I can live with these things for now.  Also of note, the plotting algorithm tries its best to converge at a Venn diagram that fits circle areas to your data, but if for some reason it can't there is no guarantee the plot will match your data, so double check things thoroughly!  Here are some examples below.  If you know of some new or better options out there, please let me know in the comments section.  Enjoy.



The output looks like this:


Wednesday, January 29, 2014

Bar Plot with 95% Conficence Interval Error Bars in R

R is a great plotting program for making publication quality bar plots of your data.  I often need to make quick bar plots to help collaborators quickly visualize data, so I thought I would put together a very generalized script that can be modified and built on as a template to make future bar plots in R.

In this script you manually enter your data (as you would see it in a 2x2 table) and then calculate the estimated frequency and 95% CI around that frequency using the binom.confint function in the binom package.  Next, a parametric and non-parametric p-value is calculated with the binom.test and fisher.test commands, respectively.  These statistics are then plotted using the R's barplot function.  The example code is below along with what the plot will look like.  P-values and N's are automatically filled and the y limits are calculated to ensure the graph includes all the plotted details.



Wednesday, January 15, 2014

Add an Overall Title to an Array of R Plots

Adding a top-level title to a group of R plots is relatively easy...as long as you know the correct commands and sequence to put them in.  Here is some quick code to place a title on the top of multiple plots in R:



Note: The command to include an overall array title needs to be at the end of the code after the other plots have been generated.

Thursday, December 12, 2013

Generate Coverage Plot in R from Depth Data

Here is a simple way to take coverage data (coverage.depth) and make a plot in R to visualize it.  All you need is a column with base pair coordinates (V1) and a column with respective depth (V2) and you are all set to plug it in the below R code.

Sum Overlapping Base Pairs of Features from Chromosomal BED File

I had a .bed file of genomic features on a chromosome that I wanted to figure out the extent of overlap of the features to investigate commonly covered genes as well as positions where features were likely to form.  I wanted to generate a plot similar to a coverage depth plot from next-generation sequencing reads.  I am sure more efficient methods exist, but here is some Python code that takes in a .bed file of features (features.bed) and creates an output file (features.depth) with the feature overlap "depth" every 5,000 base pairs across the areas which contain features in your chromosomal .bed file.



Create Triangle Plot from Inferred Genetic Ancestry

I previously posted on how to infer ancestry for a group of study participants using SNP genotypes.  Today, I want to visually plot some of the output in R.  Two informative plots that can be generated from the output are a standard plot of the two eigenvectors with percent ancestry overlaid and a triangle (or ternary) plot with each axis representing percentage of one of the three ancestral populations (ex: European, African, and Asian).

Here is some simple code to plot this in R.  There is no base package to plot the triangle plot, so the plotrix package will need to first be installed.  The ancestry.txt file is the output file from SNPWEIGHTS, but other output could be formatted to work as well.


The output should look similar to the plots below.