Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label coverage. Show all posts
Showing posts with label coverage. Show all posts

Thursday, February 20, 2014

Zero Fill Coverage Gaps in Samtools Depth Output

Merging depth output from multiple .bam files can be difficult since Samtools only outputs depth counts for coordinates with non-zero coverage.  If you want to merge depth output from these .bam files you first need to fill in the base pair positions of no coverage with zero values so the depth output for all .bam files is the same length.  Then using a simple UNIX cat you can merge multiple .bam file depth output into one file for comparison and analysis.

Here is a simple Python script to zero fill Samtools depth output:



And below is an example of how to use it:

Thursday, December 12, 2013

Generate Coverage Plot in R from Depth Data

Here is a simple way to take coverage data (coverage.depth) and make a plot in R to visualize it.  All you need is a column with base pair coordinates (V1) and a column with respective depth (V2) and you are all set to plug it in the below R code.

Sum Overlapping Base Pairs of Features from Chromosomal BED File

I had a .bed file of genomic features on a chromosome that I wanted to figure out the extent of overlap of the features to investigate commonly covered genes as well as positions where features were likely to form.  I wanted to generate a plot similar to a coverage depth plot from next-generation sequencing reads.  I am sure more efficient methods exist, but here is some Python code that takes in a .bed file of features (features.bed) and creates an output file (features.depth) with the feature overlap "depth" every 5,000 base pairs across the areas which contain features in your chromosomal .bed file.



Friday, November 22, 2013

Find SNP Overlap from Different SNP Arrays

If you have .bed files for array manifests it is relatively easy to use UNIX to compare the overlap between SNP positions in the two files.  Here is a quick one line of code to do so:


This essentially finds all rows that are the same in the two files.  It requires all unique rows in each file and will only work for one base pair long items.

Make BED File from Illumina Manifest

To compare coverage across SNP arrays, it is handy to convert Illumina manifest files into .bed files for easy manipulation.  To do this, the first step is to download the desired manifest file from the Illumina website.  Here is a site that lists the downloads available for each array they support.  Once you find the array select the manifest file (I usually choose the CSV file) and download from the browser or copy the link and use wget.  Here is an example for the Methylation450 array:


Next, look at the manifest file and determine what columns are for the chromosome and the base pair coordinates (usually some name like Chr and Coordinate).  Put these column numbers into the below line of UNIX code and you will have a .bed file from the manifest file.  Here I am making a file only for the X chromosome, but this can be modified to fit your respective needs as well.


You should be all set with a .bed file from your manifest file.  Just ensure that when you are making comparisons you are using the correct genome build.

Sunday, May 5, 2013

Samtools Coverage Depth for Multiple Regions of Interest

Here's some code to use Samtools to extract the sequencing coverage depth from a BAM file for multiple regions of interest as specified in a BED file.  The resulting filename_depth.txt gives coverage data for each base in the region that is in the BAM file.