starts with \d+N\dD
starts with \d+P
starts with \d+I
ends with \d+P
where \d+ is the Perl regular expression meaning any integer containing an unspecified number of digits. Here is a pipeline that uses Samtools and AWK to do this for us.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
samtools view -h chr1.bam | awk '{if ($6!~/^[[:digit:]]+N[[:digit:]]+D|^[[:digit:]]+P|[[:digit:]]+P\$|^[[:digit:]]+I/) print $0}' | samtools view -bhS - > chr1_clean.bam |
Enjoy nice, clean (and hopefully problem free) .bam files.
No comments:
Post a Comment