Wednesday, June 26, 2013

Remove a List of Reads from a BAM File

Sometimes it is necessary to remove a subset of reads from a .bam file.  In my case, I wanted to remove a few chimeric reads where it appeared reads from different amplicons were fusing together before entering the sequencer.  Here is a line of code where I use Samtools and grep to remove a list of read ID's from the original .bam file and create a new filtered .bam file.  Hope it is useful for other applications as well.

Note the trailing hyphen at the end.


  1. Thank you so much. I learnt that the tools downstream wont work when we remove manually lines from sam files - convert them to bam and index them. Its much better using commands to do tasks.

    1. HeMan, glad you found the post helpful!

  2. Hello!
    I try to use this script for my RNA-seq project.
    I have accepted_hits.bam file from TopHat, and for creating counting_table at HTSeq I need to remove three reads from this bam file with following headers:
    1) HWI-ST538:357:D2BKUACXX:1:1105:13318:13823.
    2) HWI-ST538:357:D2BKUACXX:1:1107:19710:10717
    3) HWI-ST538:357:D2BKUACXX:1:2314:13745:61117
    After running aforementioned script, samtools create 0 bytes sample1_filter.bam file.
    How I can fix this issue?
    Thank You for help.

  3. May I ask what's the purpose of the trailing hyphen at the end?

  4. To elaborate, the command goes through but the new output file still has all the reads I setup to delete.