Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Thursday, June 20, 2013

Clean BAM Files Generated by CGATools

Tired of seeing the error "samtools: bam_pileup.c:112: resolve_cigar2: Assertion `s->k < c->n_cigar' failed."  in those .bam files you generated using CGATools.  Here's a way to remove those pesky lines that Samtools does not like.  Basically, we are going to remove any line where the cigar string fulfills any of the following:

starts with \d+N\dD
starts with \d+P
starts with \d+I
ends with \d+P

where \d+ is the Perl regular expression meaning any integer containing an unspecified number of digits.  Here is a pipeline that uses Samtools and AWK to do this for us.

Enjoy nice, clean (and hopefully problem free) .bam files.

No comments:

Post a Comment