Thursday, June 20, 2013

Clean BAM Files Generated by CGATools

Tired of seeing the error "samtools: bam_pileup.c:112: resolve_cigar2: Assertion `s->k < c->n_cigar' failed."  in those .bam files you generated using CGATools.  Here's a way to remove those pesky lines that Samtools does not like.  Basically, we are going to remove any line where the cigar string fulfills any of the following:

starts with \d+N\dD
starts with \d+P
starts with \d+I
ends with \d+P

where \d+ is the Perl regular expression meaning any integer containing an unspecified number of digits.  Here is a pipeline that uses Samtools and AWK to do this for us.

Enjoy nice, clean (and hopefully problem free) .bam files.

