Today I needed to calculate minor allele frequencies (MAFs) for sequence variants called in a .vcf file. I couldn't find any programs that would do this for me, so I wrote a quick script to do it in Python.
This can be run in python from the command prompt by typing:
where project.vcf is the vcf file you want to calculate MAFs for. It will return a project.txt file that contains the calculated MAF values. This script will only work for SNPs and does not work on insertions and deletions.
Alternatively, if Python scares you there is a bit of a round about way that will do this for you too. First use vcftools to convert your .vcf file into a Plink compatible .ped and .map file.
Then, open Plink and run the --freq option on the newly created .ped file.
**UPDATE**
Today I found an updated way to use Vcftools to directly calculate the MAF values for you. It just takes the simple command --freq. Here is some example code:
I would like to ask if this works with the varscan2 out vcf file which has vcf4.1 format. I tried the script but it does not seem to work citing the error at the first variant line which should be for the format. Do you have any other way I can calculate the MAF from my VCF file
ReplyDeletehelps. thank you.
ReplyDeletedoes it calculate the MAF from the DP4 fields of a vcf file? I tried the vcftools --vcf file.vcf --freq.
ReplyDelete